Monday, 2 June 2014

Determine What Country A Website User Is In

One of my other concerns is a website which sells MP3s of original children's music.

These sell all over the world, so it is important that prices are displayed in local currency if possible. The easiest way to do that is by IP address. When the site was originally created, it was built using HTML and javascript, with only a very small amount of PHP to access some tables of IP ranges and the appropriate country. When the website was rebuilt around a MySQL database, this method of identifying the user's country remained, because it worked.

But these things are ever evolving, and the IP ranges I was using quickly went out of date. I have been aware for some time that I need to find a way on automatically making sure that the IP ranges are up to date.

After a bit of looking around, I have found that an up to date CSV file of IP ranges can be downloaded from http://software77.net/geo-ip/. The plan is to download the CSV file automatically on a regular basis, then rebuild the existing PHP arrays.

This can all be accomplished with a simple bash script, run from cron once a week. The IP file is donationware, so I have arranged a regular $5 monthly payment for the privilege which seems fair, I think.

The first step is to download the file. This can be accomplished with a simple wget:
wget http://software77.net/geo-ip/?DL=1 -O ./IpToCountry.csv.gz


The file is compressed, so it needs to be uncompressed. I also extract only the fields that are required. The fields on the incoming file are IP From, IP To, Registry, Assigned, 2-Letter Country Code, 3-Letter Country Code and Country. I only need IP From, IP To and the 2-Letter Country Code.

Both of these tasks can be accomplished in a single command line:
gunzip -c IpToCountry.csv.gz  | awk -F, '!/^#/ {gsub(/"/, "", $0);print $1, $2, $5}'  > ./IpToCountry.csv


This uncompresses the file, strips off all of the comment lines, removes any inverted commas, extracts the necessary fields and creates a stripped down, custom built file that can be used as required.

Next comes the rebuilding of the PHP IP arrays. These consist of 256 numbered files, starting at 0.php through to 255.php. Each of these contains a PHP script defining an array of ranges of IP addresses and the country that they belong to. The appropriate PHP script is included at run time, dependent on the first segment of the IP address.

The first step is to write the header for each script. This opens the php tag and starts to declare the array:
<?php
//-
$ranges=Array(


This  header is identical for all 256 files.

Next, the arrays themselves must be created. These can be converted directly from each line in the CSV file in the format:
"IP From" => array("IP To","2-Digit Country Code")


IP From and IP To are both formatted as a 32-bit integer, rather than the 4 segments traditionally recognised as an IP address. To convert the 4-segment IP address to the 32 bit integer it represents, multiply each segment by increasing factors of 256.

eg for IP address "1.2.3.4"
     (1*256*256*256)+(2*256*256)+(3*256)+4

Similarly, the IP segments can be determined by reversing the process. In this instance, we only need the first of the 4 segments, to determine which file we are writing to. This can be achieved by dividing IP From by (256*256*256), and using only the integer returned.

Finally, a footer is added to the end of all of the files. Like the header, this is identical for all 256 files, simply closing off the array and closing the php tag:
);
?>


The finished script looks like:
#!/bin/bash
#######################################################
#
# csv2IP.sh
#
# Douglas Milne 2 June 2014
#
# Download a csv file of IP address ranges for countries
# and convert to php arrays
#
#######################################################

# create a temporary directory to build the files
# Files are built here, then moved to the correct location on completion
# This minimizes the amount of time the files are unavailable as recreating them can take several minutes
mkdir ~/iptemp 2>/dev/null
cd  ~/iptemp

# Download the csv file
wget http://software77.net/geo-ip/?DL=1 -O ./IpToCountry.csv.gz >/dev/null 2>&1
status=$?
if (( status != 0 ))
then
   echo "Error downloading csv file"
   exit 2
fi
# Uncompress the CSV file and select only the To, From and Country columns
gunzip -c IpToCountry.csv.gz  | awk -F, '!/^#/ {gsub(/"/, "", $0);print $1, $2, $5}'  > ./IpToCountry.csv

# Create a new .php file for the first digit of possible ip addresses, ie 0-255, and write a header to it
for ((i=0; i<=255; i++))
do
   echo -e "<?php\n//-\n\$ranges=Array(" > $i.php
done

# Add the IP ranges as specified in the CSV file to the php files.
# The first digit of each ip address, and therefore the file to write to,
# is determined by the integer result of dividing the address by 16777216
cat IpToCountry.csv | awk '{print $1,$2,$3}' | while
   read ipFrom ipTo Country
do
   (( ipmsb = ipFrom / 16777216 ))
   echo -e "\"$ipFrom\" => array(\"$ipTo\",\"$Country\")," >> $ipmsb.php
done

# Add a footer to each of the php files
# and move the files to the correct location.
for ((i=0; i<=255; i++))
do
   echo -e ");\n?>" >> $i.php
   mv $i.php ~/ip_files
done


By way of example of the output, the file for all IP addresses from 45.0.0.0 to 45.255.255.255 is
$ cat 45.php
<?php
//-
$ranges=Array(
"754974720" => array("755105791","US"),
"757071872" => array("759169023","ZZ"),
"765460480" => array("767557631","UY"),
);
?>

This tells us that some of these are in the US, some are reserved and some are in Uruguay. Most of the output files are considerably larger than this, and some are smaller.

The script is run from cron a couple of times a week. Software77 request that a time other than right on the hour is chosen for download, so that everybody isn't tryng to download at once. It's worth reading the comments in the CSV file and on their website, because breaking the rules can result in a barring.

So how does a webpage make use of this information? The following PHP function takes the IP address of the client
function iptocountry($ip) {
    $numbers = preg_split( "/\./", $ip);  
    include("ip_files/".$numbers[0].".php");
    $code=($numbers[0] * 16777216) + ($numbers[1] * 65536) + ($numbers[2] * 256) + ($numbers[3]);  
    foreach($ranges as $key => $value){
        if($key<=$code){
            if($ranges[$key][0]>=$code){$two_letter_country_code=$ranges[$key][1];break;}
            }
    }
    return $two_letter_country_code;
}

This function converts the IP address into a 32-bit integer, includes the appropriate array file, then searches through the array until it finds the range that contains the 32-bit integer
This can be called using the "REMOTE_ADDR" entry in the $_SERVER array.
$two_letter_country_code=iptocountry($_SERVER['REMOTE_ADDR']);