IP address to geolocation

Background

Few months ago I found an interesting website: http://ipinfodb.com/, it provided API which could “translate” any IP Address into a geography location including City/Region/Country as well as latitude/longitude and time zone information, to invoke its API, a registered API key is required (which is free).  Since beforehand I stored visitor’s IP Addresses into my own database, I decided to utilize InfoDB API to store visitor’s GEO locations.

Just few days ago, I casually emitted an idea: summarize those GEO location records and display them on Google Map, hum, it is feasible:)

So, the process is: Track visitor’s IP addresses -> “Translate” them to Geography location -> Show them on Google Map!

(PS, I’ve been using Google Analytics for my Geek Place – http://WayneYe.com for more than two years, it is no double extremely powerful, and it already contains a feature “Map Overlay“, however, due to privacy policy, Google Analytics does NOT display visitor’s IP address, they explained this at: http://www.google.com/support/analytics/bin/answer.py?hl=en&answer=86214).

Implementation

The first task I need to do is track visitor’s IP Address, most of the time, user visits a website in browser submits an HTTP GET request (an HTTP data package) based on Transmission Control Protocol (not all the time) , browser passed the ball to DNS server(s), after several times “wall passes”, the original request delivered to the designation – the web server, during the process, the original Http request was possibly transferred through a number of routers/proxies and many other stuff, the request’s header information might have been updated: Via (Standard HTTP request header) or X-Forwarded-For (non-standard header but widely used), could be the original ISP’s information/IP Address OR possibly one of the proxy’s IP Address.

So, usually the server received the request and saw Via/X-Forwarded-For header information, it got to know visitor’s IP address (NOT all the time, some times ISP’s IP address), in ASP.NET, it is simply to call Request.UserHostAddress, however, we can never simply trust this because of two major reasons:

  1. Malicious application can forge HTTP request, if you are unlucky to trust it and have it inserted into Database, then SQL Injection hole will be utilized by Malicious application.
  2. Not all the visitors are human being, part of them (sometimes majority of them, for example, my website has very few visitors per week while a number of “robot visitors”^_^) could be search engine crawlers, I must distinguish human visitors and crawlers, otherwise I would be happy to see a lot of “visitors” came from “Mountain View, CA” ^_^.

For #1: I use regular expression to validate the string I got from Request.UserHostAddress:

public static Boolean IsValidIP(string ip)
{
    if (System.Text.RegularExpressions.Regex.IsMatch(ip, "[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}"))
    {
        string[] ips = ip.Split('.');
        if (ips.Length == 4 || ips.Length == 6)
        {
            if (System.Int32.Parse(ips[0]) < 256 && System.Int32.Parse(ips[1]) < 256
                & System.Int32.Parse(ips[2]) < 256 & System.Int32.Parse(ips[3]) < 256)
                return true;
            else
                return false;
        }
        else
            return false;
    }
    else
        return false;
}

If the result is “0.0.0.0”, I will ignore it.

For #2, so far I haven’t found a “perfect way” to solve this issue (and I guess there might be no perfect solution to identify all the search engines in the world, please correct me if I am wrong); However, I’ve defined two rules to try my best to identify them for general and normal situations:

Rule #1:

Request which contains “Cookie” Header with “ASP.NET_SessionIdAND its value is equal with server side, then it should be a normal user who has just visited my website within the one session.

Notes: there might be two exceptions for rule #1,

  1. If user’s browser has disabled Cookie then this rule will NOT be effective since the client request will never contain a Cookie header since the browser disabled it.
  2. Assume there is a crawler who crawls my website and accept storing cookie, then #1 will not be effective. However, I don’t think a crawler will firstly request a SessionID and then request again using the same SessionID).

Rule #2:

Define a crawler list and analyses whether “User-Agent” header contains one of them, the list should be configurable. Refer more Crawler example at: http://en.wikipedia.org/wiki/Web_crawler#Examples_of_Web_crawlers

Talk is cheap, show me the code, I wrote a method to identify crawlers by applying two rules above.

public static Boolean IsCrawlerRequest()
{
    // Rule 1: Request which contains "Cookie" Header with "ASP.NET_SessionId" and its value is equal with server side, 
    // then it should be a normal user (except maliciously forging, I don't think a crawler will firstly request a sessionID and then request again with the SessionID).
    //if (HttpContext.Current.Request.Cookies["ASP.NET_SessionId"] != null
    //    && HttpContext.Current.Request.Cookies["ASP.NET_SessionId"].Value == HttpContext.Current.Session.SessionID)
    if (HttpContext.Current.Request.Headers["Cookie"] != null
        && HttpContext.Current.Request.Headers["Cookie"].Contains("ASP.NET_SessionId"))
        return false;  // Should be a normal user browsing my website using a browser.

    // Rule 2: define a crawler list and analyses whether "User-Agent" header contains one of them, this should be configurable
    // Refer more Crawler example at: http://en.wikipedia.org/wiki/Web_crawler#Examples_of_Web_crawlers
    var crawlerList = new String[] { "google", "bing", "msn", "yahoo", "baidu", "soso", "sogou", "youdao" };

    if (!String.IsNullOrEmpty(HttpContext.Current.Request.UserAgent))
        foreach (String bot in crawlerList)
            if (HttpContext.Current.Request.UserAgent.ToLower(CultureInfo.InvariantCulture).Contains(bot))
                return true; // It is a crawler

    return false;
}

Please be aware that I commented out HttpContext.Current.Request.Cookies["ASP.NET_SessionId"] != null, since I found that Request.Cookie will ALWAYS contain “ASP.NET_SessionId” EVENT IF the browser disabled Cookie storing, I will do further investigation and double check later!

Ok, now we get normal users’ IP Addresses and filtered search engine crawlers, the next step is invoking InfoDB API to “translate” IP Address to Geolocation, you need register an API KEY here, and then submit an HTTP GET request to:

http://api.ipinfodb.com/v2/ip_query.php?key=%5BAPI KEY]&ip=[IP Address]&timezone=false

It returns XML below, I take IP=”117.136.8.14″ for example:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Status>OK</Status>
  <CountryCode>CN</CountryCode>
  <CountryName>China</CountryName>
  <RegionCode>23</RegionCode>
  <RegionName>Shanghai</RegionName>
  <City>Shanghai</City>
  <ZipPostalCode></ZipPostalCode>
  <Latitude>31.005</Latitude>
  <Longitude>121.409</Longitude>
  <Timezone>0</Timezone>
  <Gmtoffset>0</Gmtoffset>
  <Dstoffset>0</Dstoffset>
  <TimezoneName></TimezoneName>
  <Isdst></Isdst>
  <Ip>117.136.8.14</Ip>
</Response>

Wow, looks precise:), I am going to show visitor’s geolocation on Google Map (I know this compromises visitor’s privacy but my personal blog http://WayneYe.com is not a company and I will NEVER earn a cent by doing this:)).

Anyway, I use the latest Google Map JavaScript API V3, and there are two major functionalities:

1. Display visitor’s Geolocation as long as user’s browser support “navigator.geolocation” property (Google Chrome, Mozilla Filefox support it, IE not support), default location will be set to New York City if the browser does not support this W3 recommended standard, a sample below (I used Google Chrome):

VisitorInfo

I am now living in Shanghai

2. Display a specified blog’s visitors’ geolocations on Google Map, screenshot below shows the visitors’ geolocations who visited my blog: My new Dev box – HP Z800 Workstation, by clicking each geolocation, it will show on Google Map.

Visitors of <My new Dev box - HP Z800 Workstation>

Visitors of <My new Dev box - HP Z800 Workstation>

The JavaScript code showing below:

<script type="text/javascript">
    var initialLocation;
    var newyork = new google.maps.LatLng(40.69847032728747, -73.9514422416687);
    var browserSupportFlag = new Boolean();
    var map;
    var myOptions
    var infowindow = new google.maps.InfoWindow();

    function initialize() {
        myOptions = {
            zoom: 6,
            mapTypeId: google.maps.MapTypeId.ROADMAP
        };
        map = new google.maps.Map(document.getElementById("googleMapContainer"), myOptions);

        // Try W3C Geolocation (Preferred)
        if (navigator.geolocation) {
            browserSupportFlag = true;
            navigator.geolocation.getCurrentPosition(function (position) {
                map.setCenter(new google.maps.LatLng(position.coords.latitude, position.coords.longitude));
                infowindow.setContent('Hi, dear WayneYe.com visitor! You are here:)');
                infowindow.setPosition(new google.maps.LatLng(position.coords.latitude, position.coords.longitude));
                infowindow.open(map);
            }, function () {
                handleNoGeolocation(browserSupportFlag);
            });
        } else {
            browserSupportFlag = false;
            handleNoGeolocation(browserSupportFlag);
        }

        function handleNoGeolocation(errorFlag) {
            //contentString = 'Cannot track your location, default to New York City.';

            map.setCenter(newyork);
            //infowindow.setContent(contentString);
            //infowindow.setPosition(newyork);
            infowindow.open(map);
        }
    }

    function setGoogleMapLocation(geoLocation, latitude, longitude) {
        contentString = geoLocation;

        var visitorLocation = new google.maps.LatLng(latitude, longitude);

        map.setCenter(visitorLocation);
        infowindow.setContent(contentString);
        infowindow.setPosition(visitorLocation);
        infowindow.open(map);
    }
</script>

My visitor record page is: http://wayneye.com/VisitRecord.