HTML is the new HTML5

Few days ago Ian Hickson wrote a blog: HTML is the new HTML5, he referred “we moved to a new development model“, and comes with two major changes:

  1. The HTML specification will henceforth just be known as “HTML”, with the URL http://whatwg.org/html.
  2. The WHATWG HTML spec now became “living standard“, “It’s more mature than any version of the HTML specification”. I took a screenshot below:

HTML Living Standard

So, in a simple sentence: HTML is going to “unversioned model“, according to this there is definitely a concern: OK, living standard? Does this mean the standard could be changed/updated/revised at anytime? I saw one person asked posted this question and Ian Hickson replied, he emphasized WHATWG worked hard on backward-capability, and they worked tightly with browser vendors to make sure they do not change things that most people depend on, this is good and really important!

So, things is getting more and more interesting with HTML 5 (should I just call HTML?), few days before (18 January 2011) W3 just “unveiled” the HTML 5 logo: http://www.w3.org/News/2011#entry-8992, I bet there could be more or less confusion:)

Unique URL in Ajax Web Application

Background

Few days ago one of my friend asked me how does Gmail change its URL while user operates inside it without page refreshing, I’ve no idea about that, he then shared a link Ajax Pattern – Unique URLs which deep dives into this topic, as the article mentioned: Unique URL make your website’s link “Bookmarkable, Linkable, Type-In-Able”, plus Sharable IMHO, easy to be shared to Social network which is extremely important nowadays.

Implementation

The key technology to achieve the “”Unique URL” goal could be summarized into two points:

  1. If the content in the page has been updated by Ajax significantly enough, update the URL (location.hash) as well.
  2. // Ajax rendering all blog entries in page number 5
    location.hash = 'Blogs&Page5';
  3. Every time the Ajax page loads, JS should understand the URL and render related content as well.
  4. <body onload="restoreAjaxContent()">
    <script type="text/javascript">
    function restoreAjaxContent(){
        var urlHash = location.hash;
        var curPageNo = urlHash.replace('Blogs&Page','');
        // Safe parse curPageNo into number, handles wrong parameters (Ignored for here)
        // Display loading text/image on the page (optional but better UX).
        // Ajax rendering all blog entries in page number curPageNo.
    }
    </script>
    </body>

What I want to emphasize is the hash value, i.e. the content behind # is originally expected an HTML element’s name attribute used for In-Page navigation, it is completely the contract between client Browser, HTML content and JavaScript, server side cannot get the information directly except we explicitly pass the value to the server side (hidden post, URL query string, Ajax etc), therefore,if some user access the unique URL, your website’s client side JS should parse the hash and retrieve relevant data from server side.

Pros & Cons

Advantage

  1. Better user experience.
    Every time user accesses the unique URL Ajax page, the fixed part of this page loaded first, and then loads the main content asynchronously, if the main content is large enough, for example, contains images or rich media content, the “async loading” is much better than the page loading blocked by downloading those images/medias.
    For instance, originally loading one specific page requires 2 seconds totally, after applying the this “async loading“,  the fixed part cost 0.4 seconds to be loaded and the main content costs 1.8 seconds, from user’s point of view,  usually the latter case would be better because the user see your page is partially loaded within a short period (0.4) which feels good enough, plus a graceful loading/splash screen, eventually the UX got improved significantly!
  2. Better SEO support (Performance Aspect)
    The page rendering speed is an important fact for a search engine’s crawler, by applying “async loading”, the crawler will deem the page it is crawling has a good loading speed – 0.4 seconds.
  3. Easy to support W3 web standard
    This is sort of kidding:) Since your main content is Ajax loaded, W3 validator (so does Search Engine) won’t validate the main content which is very possible does not strictly adhere all the standard rules.

Disadvantage

  1. Main content cannot be indexed by Search engines
    All main content is loaded by JavaScript, Search engine won’t crawler those content, this is a serious problem! However, it is easy to walk around, builds a traditional page without Ajax, store it into sitemap.xml and submit to search engine Web Master tool.
  2. Harder to development and to maintain
    Client JavaScript/Ajax development is more complex and less convenient comparing to server side technology like ASP.NET, JAVA EE or PHP.  Although there are jQuery (write less, do more), Prototype.js (make develop JS in a more OO way), DoJo and so on, it still might not be very happy while a developer is struggling with mixed HTML/CSS/JavaScript:)

P.S. I spent two days in updating my blog (http://WayneYe.com), revised the paging style from traditional into the Ajax Unique URL pattern above, it is no doing Ajax paging and update URL like “http://wayneye.com/#Blogs&Page5“, it definitely “Bookmarkable, Linkable, Type-In-Able and Sharable“, plus a loading panel and content fades in effect, I believe its UX is much better than before.

IP address to geolocation

Background

Few months ago I found an interesting website: http://ipinfodb.com/, it provided API which could “translate” any IP Address into a geography location including City/Region/Country as well as latitude/longitude and time zone information, to invoke its API, a registered API key is required (which is free).  Since beforehand I stored visitor’s IP Addresses into my own database, I decided to utilize InfoDB API to store visitor’s GEO locations.

Just few days ago, I casually emitted an idea: summarize those GEO location records and display them on Google Map, hum, it is feasible:)

So, the process is: Track visitor’s IP addresses -> “Translate” them to Geography location -> Show them on Google Map!

(PS, I’ve been using Google Analytics for my Geek Place – http://WayneYe.com for more than two years, it is no double extremely powerful, and it already contains a feature “Map Overlay“, however, due to privacy policy, Google Analytics does NOT display visitor’s IP address, they explained this at: http://www.google.com/support/analytics/bin/answer.py?hl=en&answer=86214).

Implementation

The first task I need to do is track visitor’s IP Address, most of the time, user visits a website in browser submits an HTTP GET request (an HTTP data package) based on Transmission Control Protocol (not all the time) , browser passed the ball to DNS server(s), after several times “wall passes”, the original request delivered to the designation – the web server, during the process, the original Http request was possibly transferred through a number of routers/proxies and many other stuff, the request’s header information might have been updated: Via (Standard HTTP request header) or X-Forwarded-For (non-standard header but widely used), could be the original ISP’s information/IP Address OR possibly one of the proxy’s IP Address.

So, usually the server received the request and saw Via/X-Forwarded-For header information, it got to know visitor’s IP address (NOT all the time, some times ISP’s IP address), in ASP.NET, it is simply to call Request.UserHostAddress, however, we can never simply trust this because of two major reasons:

  1. Malicious application can forge HTTP request, if you are unlucky to trust it and have it inserted into Database, then SQL Injection hole will be utilized by Malicious application.
  2. Not all the visitors are human being, part of them (sometimes majority of them, for example, my website has very few visitors per week while a number of “robot visitors”^_^) could be search engine crawlers, I must distinguish human visitors and crawlers, otherwise I would be happy to see a lot of “visitors” came from “Mountain View, CA” ^_^.

For #1: I use regular expression to validate the string I got from Request.UserHostAddress:

public static Boolean IsValidIP(string ip)
{
    if (System.Text.RegularExpressions.Regex.IsMatch(ip, "[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}"))
    {
        string[] ips = ip.Split('.');
        if (ips.Length == 4 || ips.Length == 6)
        {
            if (System.Int32.Parse(ips[0]) < 256 && System.Int32.Parse(ips[1]) < 256
                & System.Int32.Parse(ips[2]) < 256 & System.Int32.Parse(ips[3]) < 256)
                return true;
            else
                return false;
        }
        else
            return false;
    }
    else
        return false;
}

If the result is “0.0.0.0”, I will ignore it.

For #2, so far I haven’t found a “perfect way” to solve this issue (and I guess there might be no perfect solution to identify all the search engines in the world, please correct me if I am wrong); However, I’ve defined two rules to try my best to identify them for general and normal situations:

Rule #1:

Request which contains “Cookie” Header with “ASP.NET_SessionIdAND its value is equal with server side, then it should be a normal user who has just visited my website within the one session.

Notes: there might be two exceptions for rule #1,

  1. If user’s browser has disabled Cookie then this rule will NOT be effective since the client request will never contain a Cookie header since the browser disabled it.
  2. Assume there is a crawler who crawls my website and accept storing cookie, then #1 will not be effective. However, I don’t think a crawler will firstly request a SessionID and then request again using the same SessionID).

Rule #2:

Define a crawler list and analyses whether “User-Agent” header contains one of them, the list should be configurable. Refer more Crawler example at: http://en.wikipedia.org/wiki/Web_crawler#Examples_of_Web_crawlers

Talk is cheap, show me the code, I wrote a method to identify crawlers by applying two rules above.

public static Boolean IsCrawlerRequest()
{
    // Rule 1: Request which contains "Cookie" Header with "ASP.NET_SessionId" and its value is equal with server side, 
    // then it should be a normal user (except maliciously forging, I don't think a crawler will firstly request a sessionID and then request again with the SessionID).
    //if (HttpContext.Current.Request.Cookies["ASP.NET_SessionId"] != null
    //    && HttpContext.Current.Request.Cookies["ASP.NET_SessionId"].Value == HttpContext.Current.Session.SessionID)
    if (HttpContext.Current.Request.Headers["Cookie"] != null
        && HttpContext.Current.Request.Headers["Cookie"].Contains("ASP.NET_SessionId"))
        return false;  // Should be a normal user browsing my website using a browser.

    // Rule 2: define a crawler list and analyses whether "User-Agent" header contains one of them, this should be configurable
    // Refer more Crawler example at: http://en.wikipedia.org/wiki/Web_crawler#Examples_of_Web_crawlers
    var crawlerList = new String[] { "google", "bing", "msn", "yahoo", "baidu", "soso", "sogou", "youdao" };

    if (!String.IsNullOrEmpty(HttpContext.Current.Request.UserAgent))
        foreach (String bot in crawlerList)
            if (HttpContext.Current.Request.UserAgent.ToLower(CultureInfo.InvariantCulture).Contains(bot))
                return true; // It is a crawler

    return false;
}

Please be aware that I commented out HttpContext.Current.Request.Cookies["ASP.NET_SessionId"] != null, since I found that Request.Cookie will ALWAYS contain “ASP.NET_SessionId” EVENT IF the browser disabled Cookie storing, I will do further investigation and double check later!

Ok, now we get normal users’ IP Addresses and filtered search engine crawlers, the next step is invoking InfoDB API to “translate” IP Address to Geolocation, you need register an API KEY here, and then submit an HTTP GET request to:

http://api.ipinfodb.com/v2/ip_query.php?key=%5BAPI KEY]&ip=[IP Address]&timezone=false

It returns XML below, I take IP=”117.136.8.14″ for example:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Status>OK</Status>
  <CountryCode>CN</CountryCode>
  <CountryName>China</CountryName>
  <RegionCode>23</RegionCode>
  <RegionName>Shanghai</RegionName>
  <City>Shanghai</City>
  <ZipPostalCode></ZipPostalCode>
  <Latitude>31.005</Latitude>
  <Longitude>121.409</Longitude>
  <Timezone>0</Timezone>
  <Gmtoffset>0</Gmtoffset>
  <Dstoffset>0</Dstoffset>
  <TimezoneName></TimezoneName>
  <Isdst></Isdst>
  <Ip>117.136.8.14</Ip>
</Response>

Wow, looks precise:), I am going to show visitor’s geolocation on Google Map (I know this compromises visitor’s privacy but my personal blog http://WayneYe.com is not a company and I will NEVER earn a cent by doing this:)).

Anyway, I use the latest Google Map JavaScript API V3, and there are two major functionalities:

1. Display visitor’s Geolocation as long as user’s browser support “navigator.geolocation” property (Google Chrome, Mozilla Filefox support it, IE not support), default location will be set to New York City if the browser does not support this W3 recommended standard, a sample below (I used Google Chrome):

VisitorInfo

I am now living in Shanghai

2. Display a specified blog’s visitors’ geolocations on Google Map, screenshot below shows the visitors’ geolocations who visited my blog: My new Dev box – HP Z800 Workstation, by clicking each geolocation, it will show on Google Map.

Visitors of <My new Dev box - HP Z800 Workstation>

Visitors of <My new Dev box - HP Z800 Workstation>

The JavaScript code showing below:

<script type="text/javascript">
    var initialLocation;
    var newyork = new google.maps.LatLng(40.69847032728747, -73.9514422416687);
    var browserSupportFlag = new Boolean();
    var map;
    var myOptions
    var infowindow = new google.maps.InfoWindow();

    function initialize() {
        myOptions = {
            zoom: 6,
            mapTypeId: google.maps.MapTypeId.ROADMAP
        };
        map = new google.maps.Map(document.getElementById("googleMapContainer"), myOptions);

        // Try W3C Geolocation (Preferred)
        if (navigator.geolocation) {
            browserSupportFlag = true;
            navigator.geolocation.getCurrentPosition(function (position) {
                map.setCenter(new google.maps.LatLng(position.coords.latitude, position.coords.longitude));
                infowindow.setContent('Hi, dear WayneYe.com visitor! You are here:)');
                infowindow.setPosition(new google.maps.LatLng(position.coords.latitude, position.coords.longitude));
                infowindow.open(map);
            }, function () {
                handleNoGeolocation(browserSupportFlag);
            });
        } else {
            browserSupportFlag = false;
            handleNoGeolocation(browserSupportFlag);
        }

        function handleNoGeolocation(errorFlag) {
            //contentString = 'Cannot track your location, default to New York City.';

            map.setCenter(newyork);
            //infowindow.setContent(contentString);
            //infowindow.setPosition(newyork);
            infowindow.open(map);
        }
    }

    function setGoogleMapLocation(geoLocation, latitude, longitude) {
        contentString = geoLocation;

        var visitorLocation = new google.maps.LatLng(latitude, longitude);

        map.setCenter(visitorLocation);
        infowindow.setContent(contentString);
        infowindow.setPosition(visitorLocation);
        infowindow.open(map);
    }
</script>

My visitor record page is: http://wayneye.com/VisitRecord.

Improve ASP.NET website performance by enabling compression in IIS

GZIP format is developed by GNU Project and standardized by IETF in RFC 1952, which MUST be considered by web developers to improve their websites’ performance, there are several Quintessential articles documented using gzip compression, they are:

10 Tips for Writing High-Performance Web Applications
Best Practices for Speeding Up Your Web Site
How To Optimize Your Site With GZIP Compression
IIS 7 Compression. Good? Bad? How much?

A gzip compressed HTTP package can significantly save bandwidth thus speed up browser rendering after use hitting enter, and user experience got improved finally, nowadays most of the popular browsers such as IE, Firefox, Chrome, Opera support gzip encoded content (please refer: http://en.wikipedia.org/wiki/HTTP_compression).

PS: the other compression encoding is deflate, “but it’s less effective and less popular” (refer: http://developer.yahoo.com/performance/rules.html).

Yahoo uses gzip compression and suggest developers do that:

Compression in IIS

For ASP.NET developer who host website on IIS like me, to achieve this is fairly easy, open IIS manager and select your website, then go to Compression Module. (Apache admins refer here)

IISCompressionModule

Double click and you will see:

IIS supports two kinds of compression:

  • Static Compression
    IIS compress a specific file at first time and only the first time, afterward every time IIS receive request on this file it will return the compressed data.  This is usually used on the files not frequently changing such as a static html file, an rarely changed XML, a Word document or any file doesn’t change frequently.
  • Dynamic Compress
    IIS will do compression EVERY time a client’s request on one specific file, this usually used on some content that often changes, for example, there is a large CSV file generating engine located on the server back end, and suppose to transfer to the client side, we can use dynamica compress.

Scott Forsyth’s pointed out:”Compression is a trade-off of CPU for Bandwidth.”, one of the new feature in IIS 7 is web masters can customize IIS compression strategy based on the actual need, you can modify the applicationHost.config under “%windir%\System32\inetsrv\config”, below is my sample:

<httpCompression staticCompressionEnableCpuUsage="80" dynamicCompressionDisableCpuUsage="80" directory="%SystemDrive%\inetpub\temp\IIS Temporary Compressed Files">
	<scheme name="gzip" dll="%Windir%\system32\inetsrv\gzip.dll" />
	<staticTypes>
		<add mimeType="text/*" enabled="true" />
		<add mimeType="message/*" enabled="true" />
		<add mimeType="application/x-javascript" enabled="true" />
		<add mimeType="application/atom+xml" enabled="true" />
		<add mimeType="application/xaml+xml" enabled="true" />
		<add mimeType="*/*" enabled="false" />
	</staticTypes>
	<dynamicTypes>
		<add mimeType="text/*" enabled="true" />
		<add mimeType="message/*" enabled="true" />
		<add mimeType="application/x-javascript" enabled="true" />
		<add mimeType="*/*" enabled="false" />
	</dynamicTypes>
</httpCompression>

More detailed is described here: http://www.iis.net/ConfigReference/system.webServer/httpCompression

PS, Dynamic Compress is not installed by default, we need install it by turning on Windows Features:

My server host provider uses IIS 6.0 and does NOT enabling compression, I checked compression status for http://wayneye.com by using the free tool provided by port80software and the result is:

WOW, I must convince them to enable gzip compression!!

Programmatically compression using C#

We can also programmatically compress the ASP.NET http response, for example I want to transfer a large CSVfile to the client, a simple ASPX page named ReturnGzipPackage.aspx:

protected void Page_Load(object sender, EventArgs e)
{
    Response.Headers.Add("Content-Disposition", "attachment; filename=IAmLarge.csv");
    Response.ContentType = "text/csv";
    Response.TransmitFile("D:\\IAmLarge.csv");
    Response.End();
}

If the request was submitted by a client browser, browser will automatically decompress the Http package,  but in the Windows client application or Windows Service, we developers can also adopt gzip compression to save bandwidth, once received gzip Http package from the server, we can programmatically decompressing, I wrote a client console application to submit the Http request and receive/decompress the Http response package.

/* Submit Http request to server with Accept-Encoding: gzip */
WebClient client = new WebClient();
// It is mandatory by the client, if this header information is not specified, server will return the original content type, in my case, it is: text/csv
//client.Headers.Add("Accept-Encoding", "gzip");

using(Stream gzipData = client.OpenRead("http://localhost/StudyASPNET/gzipHttp/ReturnGzipPackage.aspx"))
{
    WebHeaderCollection responseHeaders = client.ResponseHeaders;

    using(GZipStream gzip = new GZipStream(gzipData, CompressionMode.Decompress))
    {
        using(StreamReader reader= new StreamReader(gzip))
        {
            String content = reader.ReadToEnd();
            File.WriteAllText("D:\\Downloaded.csv", content);
        }
    }
}

Please be aware of one thing: “Accept-Encoding: gzip” is supported by all browsers by default, i.e. browser will automatically decompresses compressed Http package, so in the code we MUST explicitly specify “Accept-Encoding: gzip“, below is what I investigated:

First time, I explicitly set “Accept-Encoding: gzip”, the Http response header contains “Content-Encoding: gzip“, and the depression/file saving operations complete without any issues.

Second time, I commented out the code, the result is, received Http headers does NOT contain content encoding information, since the server deemed you doesn’t accept gzip encoded content, it won’t return you compressed file, instead, it returned the original file, in my case, the csv file itself.

Conclusion & Hints

Using Http Compression is one of the best practice to speed up the web sites, usually consider compress files like below:

  1. Doesn’t change frequently.
  2. Has a significant compress-rate such as Html, XML, CSV, etc.
  3. Dynamically generated and the server CPU has availability.

Please be aware that do NOT compress JPG, PNG, FLV, XAP and those kind of image/media files, since they are already compressed, compress them will waste CPU resources and you got a compressed copy with few KBs reduced:)

Investigation on XUL

Due to my working requirement I spent several hours on Mozilla XUL, as I have many years experience on HTML, XML, CSS and JavaScript, the learning process is not very painful, I record my effort in this post, FMFI (for my future information~~Smile).

Definition

XUL (pronounced /?zu?l/ “zool”), the XML User Interface Language, is an XML user interface markup language developed by the Mozilla project. XUL operates in Mozilla cross-platform applications such as Firefox and Flock. The Mozilla Gecko layout engine provides an implementation of XUL used in the Firefox browser.[1]

I mainly learnt from Mozilla MDC and I summarized useful links below:

Wiki page: http://en.wikipedia.org/wiki/XUL
XULRunner MDC https://developer.mozilla.org/en/XULRunner
Getting started with XULRunner https://developer.mozilla.org/en/Getting_started_with_XULRunner
XUL References https://developer.mozilla.org/en/XUL_Reference
Debugging a XULRunner Application https://developer.mozilla.org/en/Debugging_a_XULRunner_Application

Hello World step by step

I am using Windows 7 Ultimate 64 Bit so I took Windows as example, XUL is definitely cross-platform (Mac, Linux).

  1. Download XULRunner for Windows from here: http://releases.mozilla.org/pub/mozilla.org/xulrunner/releases/
  2. Unzip the package to anywhere you want, I take “%UserProfile%\Desktop\XUL\XULRunner” for example.
  3. Follow this Guide and download “myapp” under %UserProfile%\Desktop\XUL, screenshot below:
    XulMyapp
  4. Use any text editor open “%UserProfile%\Desktop\XUL\myapp\chrome\content\main.xul” and “%UserProfile%\Desktop\XUL\myapp\chrome\content\main.js”.
  5. Run XULRunner with parameter pointing to your application.ini to run this XUL instance.

    Invoke XULRunner from command line

All done, I show my simple demo code below:

Main.xul:

<?xml version="1.0"?>
<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>
<?xml-stylesheet href="style.css" type="text/css"?>

<window id="main" title="Login Demo" width="400" height="300" xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">
  <script type="application/javascript" src="chrome://myapp/content/main.js"/>

  <caption label="Login Demo"/>
  <vbox>
    <lable value="User Name: "/>
    <textbox id="txtUid" maxwidth="400" maxlength="10" />
    <separator/>
    <lable value="Password: "/>
    <textbox type="password" id="txtPwd" maxlength="10" />
    <separator/>
    <button id="btnLogin" label="Login" oncommand="doLogin();" width="300" />
    <separator/>
    <label id="lbl" value=""  />
  </vbox>

  <separator />
</window>

main.js

function $(id) {
return document.getElementById(id);
}

function doLogin() {
var lbl = $("lbl");

lbl.value = "Your user name: " + $("txtUid").value + ", your password: " + $("txtPwd").value;
}

My XUL Demo

My XUL Demo

My XUL Demo

Happy coding:)