If you have a system that has launched to production
Written by Nguyễn Văn Quí (NVQ), an senior, inquisitive developer from Safewhere team.
If you have a system that has launched to production, that stores logs in flat file format in multiple machines and in web farm environment, the ability to visualize that logs data is very helpful. On the one hand, we can build a centralized database and code the application to write all logs to it, then build a report system to visualize out data. Doing that way in most of cases you need programming and enough time to develop. In this post, I will share my experience with another approach in which you just need some tools and a few hours to get what you want.
Quick start
Let’s begin with modeling your web farm:
In each web server, we already had logs that are stored in flat file and in JSON format, each line is a JSON document.
{"LogLevel":"INFO","Type":"SYS","LogMessage":"User request 001.","EventId":"001","LogId":"9f447a60-52ef-414b-973f-84ad47b5d2fb","Timestamp":"2017-12-22T15:13:04.6041704Z"}
{"LogLevel":"INFO","Type":"SYS","LogMessage":"User request 002.","EventId":"002","LogId":"a7a838d7-7861-4c0d-ae24-80e89a19b894","Timestamp":"2017-12-23T09:21:16.0756974Z"}
{"LogLevel":"INFO","Type":"SYS","LogMessage":"User request 002.","EventId":"002","LogId":"0f565d46-884b-4c5e-9e39-38a99e94f469","Timestamp":"2017-12-23T14:51:59.4560385Z"}
What we want is to build a pie chart to show percentage of ‘User request 001’ and ‘User request 002’ in all time. For sure it needs to be counted in all web servers.
Here is what we are going to do.
Actually we will not do directly on production like that. We will test and verify it at local environment first.
- Install a self-host an Elasticsearch instance with docker.
- Install and configure Logstash to feed logs to Elasticsearch.
- Query aggregation data via DSL query and show them on Google Chart.
Please remember, the version of Logstash and Elasticsearch should be the same to avoid compatibility issues.
1. Install an Elasticsearch instance
You can install Elasticsearch by download installer and install to your dedicated web server. That is traditional way. In this post, I will use Docker container for Elastic instance. That is the easiest way for better scalability and faster deploying. For the list of Docker hosting services, you can take a look here.
If you have not used Docker before, don’t worry because using Docker is a piece of cake. Let’s download docker and following steps to install Docker.
When your Docker environment is ready, let’s pull and run an official Elasticsearch image by following steps from this document.
In order to verify your Docker run command had been successful launching an Elasticsearch instance, navigate to http://localhost:9200/_cluster/health?pretty=true with default username/password credential “elastic/changeme”. It should give you “green” status.
Your Elasticsearch instance is ready for feeding logs.
2. Install and configure your Logstash
Download and install Logstash here https://www.elastic.co/downloads/logstash.
Be noted that Logstash is built on top of Java, therefore, you need to install Java runtime first.
After installed Logstash to your local, create a config file logstash.conf with content as below.
input {
file {
path => "C:/web-app-001/*.log"
codec => "json"
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
user => "elastic"
password => "changeme"
index => "web-app-logs"
}
stdout { codec => rubydebug }
}
While web-app-001 is your directory of log files. Run command below to start feeding your logs.
bin/logstash -f logstash.conf
For real environment, you may need to create cron job or Windows schedule to run Logstash automatically in the background.
Logstash will monitor each new log line and feed them to the target Elasticsearch. You should see something likes below.
3. Query aggregation
Elasticsearch provides a very good REST API interface to query indexed content. You can use postman tool.
Now back to our log format.
{"LogLevel":"INFO","Type":"SYS","LogMessage":"User request 001.","EventId":"001","LogId":"9f447a60-52ef-414b-973f-84ad47b5d2fb","Timestamp":"2017-12-22T15:13:04.6041704Z"}
We will query via provided REST API. First, configure Authorization header.
Now, to group all INFO log items by LogMessage, using below query.
Everything we need is in aggregation.buckets field of output JSON result. Let’s copy and find some online Google Chart tool to visualize it. Here is what I found http://jsfiddle.net/kimmobrunfeldt/79ffvayr/. We just need go to JAVASCRIPT panel, and revise some lines in drawChart function as below.
This is the final result http://jsfiddle.net/kjn3hhfo/
Behind the scene
We already tried a very simple use case, that is most of thing is default setting. In real environment, we need more work to configure and controlling our logs flow.
Logstash is a very powerful tool to monitor input, filter, parse, transform and output. There are a lot of input plugins, filter plugins, codec plugins that will help you most of use cases. Elasticseach also is one of output plugins.
Let’s take more read on Logstash documentation https://www.elastic.co/guide/en/logstash/current/getting-started-with-logstash.html.
Elasticsearch is considered as document oriented NoSQL database that supports full-text search. The top features are handling high volume of data, scalability and auto-discovery. You can read more about Elasticsearch advantages here https://www.brainvire.com/elastic-search-usage-benefits/.
Docker container also has many advantages when comparing with classic hosting or virtual machine https://blog.kumina.nl/2017/04/the-benefits-of-containers-and-container-technology/. Most of FaaS are built on top of Docker container.
-Nguyen Van Qui-