Skip to main content

Logging NGINX to a Database

When I recently rebuilt my personal website, I wanted to avoid using Google Analytics. The main reason was for visitor privacy and not wanting to collect users personal data. The other reason was to avoid using JavaScript when it wasn't required.

It'd still be nice to have an idea how many page views my site receives. In order to do this, I used a technique as old as time, reading server logs.

To make my life easier, I decided to store my NGINX logs in a Database. This allowed me to easily write SQL queries for page views between date ranges or on particular URLS. I chose PostgreSQL as my database as I was the most familiar with it.

I only collect the following pieces of data for each request

Below is my NGINX config

# /etc/nginx/sites-enabled/default
# Configure the log format
log_format json_combined escape=json
'{'
  '"time_local":"$time_local",'
  '"host":"$host",'
  '"status":"$status",'
  '"request":"$request_uri"'
'}';

# Determines whether the request should be logged or not.
# Default is to log the request by setting $loggable to true
# Except in cases where the URL ends with .jpg, .css, .js etc 
map $request_uri $loggable {
  default 1;
  ~*\.(jpg|css|js)$ 0; # Exclude
}

# Your server block
server {
listen 80;
...
# Configure the location of the log file and only log if $loggable is true.
access_log /var/log/nginx/[YOUR_DOMAIN]/access.log json_combined if=$loggable;
...
}

If a request meets the "loggable" criteria it will be written to the access log in a JSON format. I chose JSON as this gave me flexibility to change the schema or add keys at a later date without having to change the database table.

Then I use rsyslog to store it in my PostgreSQL database.

# /etc/rsyslog.d/50-your-domain.conf
# Load the imfile input module
module(load="imfile") # Load the imfile input module

input(type="imfile"
      File="/var/log/nginx/[YOUR_DOMAIN]/access.log"
      Tag="[YOUR_DOMAIN]:")

# Load the ompgsql output module
module(load="ompgsql")

# Define a template for row insertion of your data.
# The template below assumes you have a table called
# [YOUR_TABLE] and are inserting columns named
# "log_line" (with the log payload) and "created_at" (with the timestamp).
template(name="sql-syslog" type="list" option.sql="on") {
  constant(value="INSERT INTO [YOUR_TABLE] (log_line, created_at) values ('")
  property(name="msg")
  constant(value="','")
  property(name="timereported" dateformat="pgsql" date.inUTC="on")
  constant(value="')")
}

# The output "action". This line instructs rsyslog
# to check if the log line is tagged "[YOUR_DOMAIN]:" (a tag
# which we set with the imfile module configuration above)
# and if so to use the sql-syslog template we defined
# above to insert it into Postgres.
if( $syslogtag == '[YOUR_DOMAIN]:')  then {
  action(type="ompgsql" server="localhost"
    user="DB_USER"
    pass="YOUR_PASSWORD"
    db="YOUR_DATABASE"
    template="sql-syslog"
    queue.type="linkedList")
}

The database table is very simple. It has an integer ID sequence, a log_line jsonb field for storing the NGINX request and a created_at timestamp field.

CREATE TABLE access_log(
  id serial PRIMARY KEY,
  log_line jsonb,
  created_at timestamp
);

To find out how many page views each URL has received you can use the following query.

Note the ->> operator to fetch the 'request' key from the JSONB field log_line

SELECT
  log_line ->> 'request' AS request,
  count(log_line)
FROM 
  access_log
WHERE
  log_line ->> 'status' != '404'
GROUP BY
  log_line ->> 'request'
ORDER BY
  count DESC;

Result

Request                          | Count
---------------------------------+-------
 /                               |  1162
 /blog/flight-visualization/     |    75
 /about/                         |    74