IP address to geolocation

Background

Few months ago I found an interesting website: http://ipinfodb.com/, it provided API which could “translate” any IP Address into a geography location including City/Region/Country as well as latitude/longitude and time zone information, to invoke its API, a registered API key is required (which is free).  Since beforehand I stored visitor’s IP Addresses into my own database, I decided to utilize InfoDB API to store visitor’s GEO locations.

Just few days ago, I casually emitted an idea: summarize those GEO location records and display them on Google Map, hum, it is feasible:)

So, the process is: Track visitor’s IP addresses -> “Translate” them to Geography location -> Show them on Google Map!

(PS, I’ve been using Google Analytics for my Geek Place – http://WayneYe.com for more than two years, it is no double extremely powerful, and it already contains a feature “Map Overlay“, however, due to privacy policy, Google Analytics does NOT display visitor’s IP address, they explained this at: http://www.google.com/support/analytics/bin/answer.py?hl=en&answer=86214).

Implementation

The first task I need to do is track visitor’s IP Address, most of the time, user visits a website in browser submits an HTTP GET request (an HTTP data package) based on Transmission Control Protocol (not all the time) , browser passed the ball to DNS server(s), after several times “wall passes”, the original request delivered to the designation – the web server, during the process, the original Http request was possibly transferred through a number of routers/proxies and many other stuff, the request’s header information might have been updated: Via (Standard HTTP request header) or X-Forwarded-For (non-standard header but widely used), could be the original ISP’s information/IP Address OR possibly one of the proxy’s IP Address.

So, usually the server received the request and saw Via/X-Forwarded-For header information, it got to know visitor’s IP address (NOT all the time, some times ISP’s IP address), in ASP.NET, it is simply to call Request.UserHostAddress, however, we can never simply trust this because of two major reasons:

  1. Malicious application can forge HTTP request, if you are unlucky to trust it and have it inserted into Database, then SQL Injection hole will be utilized by Malicious application.
  2. Not all the visitors are human being, part of them (sometimes majority of them, for example, my website has very few visitors per week while a number of “robot visitors”^_^) could be search engine crawlers, I must distinguish human visitors and crawlers, otherwise I would be happy to see a lot of “visitors” came from “Mountain View, CA” ^_^.

For #1: I use regular expression to validate the string I got from Request.UserHostAddress:

public static Boolean IsValidIP(string ip)
{
    if (System.Text.RegularExpressions.Regex.IsMatch(ip, "[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}"))
    {
        string[] ips = ip.Split('.');
        if (ips.Length == 4 || ips.Length == 6)
        {
            if (System.Int32.Parse(ips[0]) < 256 && System.Int32.Parse(ips[1]) < 256
                & System.Int32.Parse(ips[2]) < 256 & System.Int32.Parse(ips[3]) < 256)
                return true;
            else
                return false;
        }
        else
            return false;
    }
    else
        return false;
}

If the result is “0.0.0.0”, I will ignore it.

For #2, so far I haven’t found a “perfect way” to solve this issue (and I guess there might be no perfect solution to identify all the search engines in the world, please correct me if I am wrong); However, I’ve defined two rules to try my best to identify them for general and normal situations:

Rule #1:

Request which contains “Cookie” Header with “ASP.NET_SessionIdAND its value is equal with server side, then it should be a normal user who has just visited my website within the one session.

Notes: there might be two exceptions for rule #1,

  1. If user’s browser has disabled Cookie then this rule will NOT be effective since the client request will never contain a Cookie header since the browser disabled it.
  2. Assume there is a crawler who crawls my website and accept storing cookie, then #1 will not be effective. However, I don’t think a crawler will firstly request a SessionID and then request again using the same SessionID).

Rule #2:

Define a crawler list and analyses whether “User-Agent” header contains one of them, the list should be configurable. Refer more Crawler example at: http://en.wikipedia.org/wiki/Web_crawler#Examples_of_Web_crawlers

Talk is cheap, show me the code, I wrote a method to identify crawlers by applying two rules above.

public static Boolean IsCrawlerRequest()
{
    // Rule 1: Request which contains "Cookie" Header with "ASP.NET_SessionId" and its value is equal with server side, 
    // then it should be a normal user (except maliciously forging, I don't think a crawler will firstly request a sessionID and then request again with the SessionID).
    //if (HttpContext.Current.Request.Cookies["ASP.NET_SessionId"] != null
    //    && HttpContext.Current.Request.Cookies["ASP.NET_SessionId"].Value == HttpContext.Current.Session.SessionID)
    if (HttpContext.Current.Request.Headers["Cookie"] != null
        && HttpContext.Current.Request.Headers["Cookie"].Contains("ASP.NET_SessionId"))
        return false;  // Should be a normal user browsing my website using a browser.

    // Rule 2: define a crawler list and analyses whether "User-Agent" header contains one of them, this should be configurable
    // Refer more Crawler example at: http://en.wikipedia.org/wiki/Web_crawler#Examples_of_Web_crawlers
    var crawlerList = new String[] { "google", "bing", "msn", "yahoo", "baidu", "soso", "sogou", "youdao" };

    if (!String.IsNullOrEmpty(HttpContext.Current.Request.UserAgent))
        foreach (String bot in crawlerList)
            if (HttpContext.Current.Request.UserAgent.ToLower(CultureInfo.InvariantCulture).Contains(bot))
                return true; // It is a crawler

    return false;
}

Please be aware that I commented out HttpContext.Current.Request.Cookies["ASP.NET_SessionId"] != null, since I found that Request.Cookie will ALWAYS contain “ASP.NET_SessionId” EVENT IF the browser disabled Cookie storing, I will do further investigation and double check later!

Ok, now we get normal users’ IP Addresses and filtered search engine crawlers, the next step is invoking InfoDB API to “translate” IP Address to Geolocation, you need register an API KEY here, and then submit an HTTP GET request to:

http://api.ipinfodb.com/v2/ip_query.php?key=%5BAPI KEY]&ip=[IP Address]&timezone=false

It returns XML below, I take IP=”117.136.8.14″ for example:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Status>OK</Status>
  <CountryCode>CN</CountryCode>
  <CountryName>China</CountryName>
  <RegionCode>23</RegionCode>
  <RegionName>Shanghai</RegionName>
  <City>Shanghai</City>
  <ZipPostalCode></ZipPostalCode>
  <Latitude>31.005</Latitude>
  <Longitude>121.409</Longitude>
  <Timezone>0</Timezone>
  <Gmtoffset>0</Gmtoffset>
  <Dstoffset>0</Dstoffset>
  <TimezoneName></TimezoneName>
  <Isdst></Isdst>
  <Ip>117.136.8.14</Ip>
</Response>

Wow, looks precise:), I am going to show visitor’s geolocation on Google Map (I know this compromises visitor’s privacy but my personal blog http://WayneYe.com is not a company and I will NEVER earn a cent by doing this:)).

Anyway, I use the latest Google Map JavaScript API V3, and there are two major functionalities:

1. Display visitor’s Geolocation as long as user’s browser support “navigator.geolocation” property (Google Chrome, Mozilla Filefox support it, IE not support), default location will be set to New York City if the browser does not support this W3 recommended standard, a sample below (I used Google Chrome):

VisitorInfo

I am now living in Shanghai

2. Display a specified blog’s visitors’ geolocations on Google Map, screenshot below shows the visitors’ geolocations who visited my blog: My new Dev box – HP Z800 Workstation, by clicking each geolocation, it will show on Google Map.

Visitors of <My new Dev box - HP Z800 Workstation>

Visitors of <My new Dev box - HP Z800 Workstation>

The JavaScript code showing below:

<script type="text/javascript">
    var initialLocation;
    var newyork = new google.maps.LatLng(40.69847032728747, -73.9514422416687);
    var browserSupportFlag = new Boolean();
    var map;
    var myOptions
    var infowindow = new google.maps.InfoWindow();

    function initialize() {
        myOptions = {
            zoom: 6,
            mapTypeId: google.maps.MapTypeId.ROADMAP
        };
        map = new google.maps.Map(document.getElementById("googleMapContainer"), myOptions);

        // Try W3C Geolocation (Preferred)
        if (navigator.geolocation) {
            browserSupportFlag = true;
            navigator.geolocation.getCurrentPosition(function (position) {
                map.setCenter(new google.maps.LatLng(position.coords.latitude, position.coords.longitude));
                infowindow.setContent('Hi, dear WayneYe.com visitor! You are here:)');
                infowindow.setPosition(new google.maps.LatLng(position.coords.latitude, position.coords.longitude));
                infowindow.open(map);
            }, function () {
                handleNoGeolocation(browserSupportFlag);
            });
        } else {
            browserSupportFlag = false;
            handleNoGeolocation(browserSupportFlag);
        }

        function handleNoGeolocation(errorFlag) {
            //contentString = 'Cannot track your location, default to New York City.';

            map.setCenter(newyork);
            //infowindow.setContent(contentString);
            //infowindow.setPosition(newyork);
            infowindow.open(map);
        }
    }

    function setGoogleMapLocation(geoLocation, latitude, longitude) {
        contentString = geoLocation;

        var visitorLocation = new google.maps.LatLng(latitude, longitude);

        map.setCenter(visitorLocation);
        infowindow.setContent(contentString);
        infowindow.setPosition(visitorLocation);
        infowindow.open(map);
    }
</script>

My visitor record page is: http://wayneye.com/VisitRecord.

Advertisements

Improve ASP.NET website performance by enabling compression in IIS

GZIP format is developed by GNU Project and standardized by IETF in RFC 1952, which MUST be considered by web developers to improve their websites’ performance, there are several Quintessential articles documented using gzip compression, they are:

10 Tips for Writing High-Performance Web Applications
Best Practices for Speeding Up Your Web Site
How To Optimize Your Site With GZIP Compression
IIS 7 Compression. Good? Bad? How much?

A gzip compressed HTTP package can significantly save bandwidth thus speed up browser rendering after use hitting enter, and user experience got improved finally, nowadays most of the popular browsers such as IE, Firefox, Chrome, Opera support gzip encoded content (please refer: http://en.wikipedia.org/wiki/HTTP_compression).

PS: the other compression encoding is deflate, “but it’s less effective and less popular” (refer: http://developer.yahoo.com/performance/rules.html).

Yahoo uses gzip compression and suggest developers do that:

Compression in IIS

For ASP.NET developer who host website on IIS like me, to achieve this is fairly easy, open IIS manager and select your website, then go to Compression Module. (Apache admins refer here)

IISCompressionModule

Double click and you will see:

IIS supports two kinds of compression:

  • Static Compression
    IIS compress a specific file at first time and only the first time, afterward every time IIS receive request on this file it will return the compressed data.  This is usually used on the files not frequently changing such as a static html file, an rarely changed XML, a Word document or any file doesn’t change frequently.
  • Dynamic Compress
    IIS will do compression EVERY time a client’s request on one specific file, this usually used on some content that often changes, for example, there is a large CSV file generating engine located on the server back end, and suppose to transfer to the client side, we can use dynamica compress.

Scott Forsyth’s pointed out:”Compression is a trade-off of CPU for Bandwidth.”, one of the new feature in IIS 7 is web masters can customize IIS compression strategy based on the actual need, you can modify the applicationHost.config under “%windir%\System32\inetsrv\config”, below is my sample:

<httpCompression staticCompressionEnableCpuUsage="80" dynamicCompressionDisableCpuUsage="80" directory="%SystemDrive%\inetpub\temp\IIS Temporary Compressed Files">
	<scheme name="gzip" dll="%Windir%\system32\inetsrv\gzip.dll" />
	<staticTypes>
		<add mimeType="text/*" enabled="true" />
		<add mimeType="message/*" enabled="true" />
		<add mimeType="application/x-javascript" enabled="true" />
		<add mimeType="application/atom+xml" enabled="true" />
		<add mimeType="application/xaml+xml" enabled="true" />
		<add mimeType="*/*" enabled="false" />
	</staticTypes>
	<dynamicTypes>
		<add mimeType="text/*" enabled="true" />
		<add mimeType="message/*" enabled="true" />
		<add mimeType="application/x-javascript" enabled="true" />
		<add mimeType="*/*" enabled="false" />
	</dynamicTypes>
</httpCompression>

More detailed is described here: http://www.iis.net/ConfigReference/system.webServer/httpCompression

PS, Dynamic Compress is not installed by default, we need install it by turning on Windows Features:

My server host provider uses IIS 6.0 and does NOT enabling compression, I checked compression status for http://wayneye.com by using the free tool provided by port80software and the result is:

WOW, I must convince them to enable gzip compression!!

Programmatically compression using C#

We can also programmatically compress the ASP.NET http response, for example I want to transfer a large CSVfile to the client, a simple ASPX page named ReturnGzipPackage.aspx:

protected void Page_Load(object sender, EventArgs e)
{
    Response.Headers.Add("Content-Disposition", "attachment; filename=IAmLarge.csv");
    Response.ContentType = "text/csv";
    Response.TransmitFile("D:\\IAmLarge.csv");
    Response.End();
}

If the request was submitted by a client browser, browser will automatically decompress the Http package,  but in the Windows client application or Windows Service, we developers can also adopt gzip compression to save bandwidth, once received gzip Http package from the server, we can programmatically decompressing, I wrote a client console application to submit the Http request and receive/decompress the Http response package.

/* Submit Http request to server with Accept-Encoding: gzip */
WebClient client = new WebClient();
// It is mandatory by the client, if this header information is not specified, server will return the original content type, in my case, it is: text/csv
//client.Headers.Add("Accept-Encoding", "gzip");

using(Stream gzipData = client.OpenRead("http://localhost/StudyASPNET/gzipHttp/ReturnGzipPackage.aspx"))
{
    WebHeaderCollection responseHeaders = client.ResponseHeaders;

    using(GZipStream gzip = new GZipStream(gzipData, CompressionMode.Decompress))
    {
        using(StreamReader reader= new StreamReader(gzip))
        {
            String content = reader.ReadToEnd();
            File.WriteAllText("D:\\Downloaded.csv", content);
        }
    }
}

Please be aware of one thing: “Accept-Encoding: gzip” is supported by all browsers by default, i.e. browser will automatically decompresses compressed Http package, so in the code we MUST explicitly specify “Accept-Encoding: gzip“, below is what I investigated:

First time, I explicitly set “Accept-Encoding: gzip”, the Http response header contains “Content-Encoding: gzip“, and the depression/file saving operations complete without any issues.

Second time, I commented out the code, the result is, received Http headers does NOT contain content encoding information, since the server deemed you doesn’t accept gzip encoded content, it won’t return you compressed file, instead, it returned the original file, in my case, the csv file itself.

Conclusion & Hints

Using Http Compression is one of the best practice to speed up the web sites, usually consider compress files like below:

  1. Doesn’t change frequently.
  2. Has a significant compress-rate such as Html, XML, CSV, etc.
  3. Dynamically generated and the server CPU has availability.

Please be aware that do NOT compress JPG, PNG, FLV, XAP and those kind of image/media files, since they are already compressed, compress them will waste CPU resources and you got a compressed copy with few KBs reduced:)