Wednesday, April 6, 2016

How To Map User Location with GeoIP and ELK (Elasticsearch, Logstash, and Kibana)

How To Map User Location with GeoIP and ELK (Elasticsearch, Logstash, and Kibana)

Introduction

IP Geolocation, the process used to determine the physical location of an IP address, can be leveraged for a variety of purposes, such as content personalization and traffic analysis. Traffic analysis by geolocation can provide invaluable insight into your user base as it allows you to easily see where they users are coming from, which can help you make informed decisions about the ideal geographical location(s) of your application servers and who your current audience is. In this tutorial, we will show you how to create a visual geo-mapping of the IP addresses of your application's users, by using a GeoIP database with Elasticsearch, Logstash, and Kibana.

Here's a short explanation of how it all works. Logstash uses a GeoIP database to convert IP addresses into latitude and longitude coordinate pair, i.e. the approximate physical location of an IP address. The coordinate data is stored in Elasticsearch in geo_point fields, and also converted into a geohash string. Kibana can then read the Geohash strings and draw them as points on a map of earth, known in Kibana 4 as a Tile Map visualization.
Let's take a look at the prerequisites now.

Prerequisites

To follow this tutorial, you must have a working ELK stack. Additionally, you must have logs that contain IP addresses that can be filtered into a field, like web server access logs. If you don't already have these two things, you can follow the first two tutorials in this series. The first tutorial will set up an ELK stack, and second one will show you how to gather and filter Nginx or Apache access logs:

Add geo_point Mapping to Filebeat Index

Assuming you followed the prerequisite tutorials, you have already done this. However, we are including this step again in case you skipped it because the TileMap visualization requires that your GeoIP coordinates are stored in Elasticsearch as a geo_point type.
On the server that Elasticsearch is installed, download the Filebeat index template to your home directory:
  • cd ~
  • curl -O https://gist.githubusercontent.com/thisismitch/3429023e8438cc25b86c/raw/d8c479e2a1adcea8b1fe86570e42abab0f10f364/filebeat-index-template.json
Then load the template with this command:
  • curl -XPUT 'http://localhost:9200/_template/filebeat?pretty' -d@filebeat-index-template.json

Download Latest GeoIP Database

MaxMind provides free and paid GeoIP databases—the paid versions are more accurate. Logstash also ships with a copy of the free GeoIP City database, GeoLite City. In this tutorial, we will download the latest GeoLite City database, but feel free to use a different GeoIP database if you wish.
Let's download the latest GeoLite City database gzip archive into the /etc/logstash directory. Do so by running these commands:
  • cd /etc/logstash
  • sudo curl -O "http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz"
Now let's unarchive it:
  • sudo gunzip GeoLiteCity.dat.gz
This will extract the GeoLite City database to /etc/logstash/GeoLiteCity.dat, which we will specify in our Logstash configuration.
Note that the GeoLite databases are updated by MaxMind on the first Tuesday of each month. Therefore, if you want to always have the latest database, you should set up a cron job that will download the database once a month.
Now we're ready to configure Logstash to use the GeoIP database.

Configure Logstash to use GeoIP

To get Logstash to store GeoIP coordinates, you need to identify an application that generates logs that contain an public IP address that you can filter as a discrete field. A fairly ubiquitous application that generates logs with this information is a web server, such as Nginx or Apache, so we will use Nginx access logs as the example. If you're using different logs, just make the necessary adjustments to the example.
In the Adding Filters to Logstash tutorial, the Nginx filter is stored in a file called 11-nginx-filter.conf. If your filter is located elsewhere, just edit that file instead.
Let's edit the Nginx filter now:
  • sudo vi /etc/logstash/conf.d/11-nginx-filter.conf
Under the grok section (in the if [type]... block), add these lines:
11-nginx-filter.conf excerpt
    geoip {
      source => "clientip"
      target => "geoip"
      database => "/etc/logstash/GeoLiteCity.dat"
      add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
      add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
    }
    mutate {
      convert => [ "[geoip][coordinates]", "float"]
    }
This configures this filter to convert an IP address stored in the clientip field (specified in source), using the GeoLite City database that we downloaded earlier. We are specifying the source as "clientip" because that is the name of the field that the Nginx user IP address is being stored in—be sure to change this value if you are storing the IP address information in a different field.
Just to be clear of what the filter should look like after you add the , here are the contents of the complete11-nginx-filter.conf file:
11-nginx-filter.conf — updated
filter {
  if [type] == "nginx-access" {
    grok {
      match => { "message" => "%{NGINXACCESS}" }
    }
    geoip {
      source => "clientip"
      target => "geoip"
      database => "/etc/logstash/GeoLiteCity.dat"
      add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
      add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
    }
    mutate {
      convert => [ "[geoip][coordinates]", "float"]
    }
  }
}
Save and exit.
To put the changes into effect let's restart Logstash.
  • sudo service logstash restart
If everything was configured correctly, Logstash should now be storing the GeoIP coordinates with your Nginx access logs (or whichever application is generating the logs). Note that this change is notretroactive, so your previously gathered logs will not have GeoIP information added.
Let's verify that the GeoIP functionality is working properly in Kibana.

Connect to Kibana

The easiest way to verify if Logstash was configured correctly, with GeoIP enabled, is to open Kibana in a web browser. Do that now.
Find a log message that your application generated since you enabled the GeoIP module in Logstash. Following the Nginx example, we can search Kibana for type: "nginx-access" to narrow the log selection.
Then expand one of the messages to look at the table of fields. You should see some new geoip fields that contain information about how the IP address was mapped to a real geographical location. For example:
Example GeoIP Fields
Note: If you don't see any logs, generate some by accessing your application, and ensure that your time filter is set to a recent time. If you don't see any GeoIP information (or if it's incorrect), you probably did not configure Logstash properly.
If you see proper GeoIP information in this view, you are ready to create your map visualization.

Create Tile Map Visualization

Note: If you haven't used Kibana visualizations yet, check out the Kibana Dashboards and Visualizations Tutorial.
To map out the IP addresses in Kibana, let's create a Tile Map visualization.
Click Visualize in the main menu.
Under Create a new visualization, select Tile map.
Under Select a search source you may select either option. If you have a saved search that will find that log messages that you want to map, feel free to select that search.
Under Select buckets type, select Geo Coordinates.
In the Aggregation drop-down, select Geohash.
In the Field drop-down, select geoip.location.
Now click the green Apply button.
Example GeoMap
If any logs from your selection (your search and time filter) contain GeoIP information, they will be drawn on the map, as in the screenshot above.
Be sure to play with the Precision slider, and the items under view options to adjust the visualization to your liking. The Precision slider can be adjusted to modify the length of the Geohash string that is being used to map the location.
When you are satisfied with your visualization, be sure to save it.

Conclusion

Now that you have your GeoIP information mapped out in Kibana, you should be set. By itself, it should give you a rough idea of the geographical location of your users. It can be even more useful if you correlate it with your other logs by adding it to a dashboard.
Good luck!



0 comments:

Post a Comment