Cleaning Up Rails 4 Production Logging

Making your logs useful and easy to parse

7/30/2013

•

13 min read

This article originally appeared about ten years ago on my Ruby (and Rails) focused blog. I've learned a lot since then (including how poorly named that blog was) and thought the posts wouldn't be relevant anymore. However, after re-reading those posts there's still a lot of what I wrote that I like. I will be revisiting these in future posts examining how my thinking has changed. Watch here for updates.

I've seen some very effective use of application logging to debug difficult problems. The foundation of good logging is being sure that your application is logging the right things. However, it's just as important for your log files to be easy to parse - both for a human reading the file and for any scripts that might be written to extract information from them.

While Rails' conventions allow you to easily write powerful web applications the default logging available comes up short on both capturing critical information and providing it in a consumable fashion. This can be expected since Rails itself is just a framework and thus the logging available out of the box is really only about the framework itself. Left unmodified, the default logging can be a bit surprising the first time you go to look at a production problem.

I'll be describing how to start taming your production logs to make them useful and ready to scale along with you application. I'll show you how to turn several lines of rambling output per request into one concise line with the essential information that needed to debugging production problems. This will be the basis for adding logging specific to your business logic and future debugging requirements.

In this article I'm assuming a relatively simple Rails application at the beginning of its lifecycle that runs as a single instance. The application might have a handful of standard RESTful resources with their initial generated routing.

Many of the ideas here have been borrowed from the lograge gem which does a good job at cleaning things up but isn't as extensible as creating your own logging.

Invalid URLs

One of the first thing you're likely to notice in your default production logs for applications deployed to the wild is a lot of routing errors:

I, [2013-08-08T20:30:17.677248 #14985]  INFO -- : Started GET "/manager/html" for 220.165.88.21 at 2013-08-08 20:30:17 -0700
F, [2013-08-08T20:30:17.688957 #14985] FATAL -- :
ActionController::RoutingError (No route matches [GET] "/manager/html"):
  actionpack (4.0.0) lib/action_dispatch/middleware/debug_exceptions.rb:21:in `call'
  actionpack (4.0.0) lib/action_dispatch/middleware/show_exceptions.rb:30:in `call'
  railties (4.0.0) lib/rails/rack/logger.rb:38:in `call_app'
... plus 17 more lines

These errors represent lame attempts by hackers to exploit various known vulnerabilities in other applications. They're generally harmless to your Rails application as long as you pay attention to a few security rules.

From a best-practices standpoint, I feel it is a bit severe to classify these errors as fatal since most logging conventions consider fatal errors as catastrophic and generally unrecoverable. For these routing errors your server will continue happily working, send out a 404 response and go on to process future requests.

Even if you don't go on to implement any more sophisticated logging schemes like are outlined in the remainder of this article, it is probably a good idea to solve this problem by adding a catch-all route to eliminate routing errors. This has the additional benefit of allowing you to put your own information on a 404 error page and provide ways to direct users to somewhere useful, like maybe a search of your site. You also get rid of the stock Rails error page which is thankfully looking better in Rails 4.

To create the catch-all route, add this line to the end of your config/routes.rb file:

match '*path', via: :all, to: 'pages#error_404'

It's important that this be the last rule in your routes file as it will route all requests that aren't matched by a previously defined route and anything you put after this rule will be ignored. The globbed match *path says match everything, and the via: :all option is needed to match all http actions because in Rails 4 you are no longer allowed to use a generic match rule. Instead you are encouraged to use matches for specific http actions (like get, post, and put). You'll also need to add a PagesController with an error_404 method and view to render your own 404 page:

class PagesController < ApplicationController
  def error_404
    render status: 404
  end
end

Now when the would-be hackers start sending bogus URLs to your service you will see logs like this:

I, [2013-08-08T20:30:25.090069 #15002]  INFO -- : Started GET "/manager/html" for 220.165.88.21 at 2013-08-08 20:30:25 -0700
I, [2013-08-08T20:30:25.109424 #15002]  INFO -- : Processing by PagesController#error_404 as */*
I, [2013-08-08T20:30:25.109579 #15002]  INFO -- :   Parameters: {"path"=>"manager/html"}
I, [2013-08-08T20:30:25.146743 #15002]  INFO -- :   Rendered pages/error_404.html.erb within layouts/application (1.9ms)
I, [2013-08-08T20:30:25.149406 #15002]  INFO -- : Completed 200 OK in 40ms (Views: 7.3ms | ActiveRecord: 0.0ms)

To avoid bugs in your routing scheme you will want to make sure that your acceptance/integration/system tests are thorough enough to catch real errors. There may come a day when you have a bug in your routing you need to track down but having the stack trace of what you routed to is highly unlikely to shed much light on the real problem which probably occurred when you generated the errant URL in the first place.

Single request logging

Our next goal is to turn the information logged for a single request into something that is parsable by both humans and computers. For the most part the information you get in a Rails log is centered on a request and what the framework knows about that request at various points in time. Rather than log all information about a particular request on multiple lines it will be much easier for later processing if we have a one line format. For the humans it works best to keep things lined up neatly in columns so that they can be quickly scanned for anomalous entries.

The five lines in the above example for a single 404 error are fairly typical of what you'll see for all requests in a production log. There are a number of problems that are immediately apparent:

The "I" at the beginning of the line indicates that this is an info level log. The word INFO appears later in each line resulting in unnecessary duplication.
The number inside the brackets represents the process id of the Rails server. Assuming that it's a small application there may only be one Rails process, thus, making this unnecessary duplication. Since we'll be logging a single request per line there won't be a need to try and sort things out and even if you do need to correlate a request's data across several lines of logging you'll probably want something more unique to the request than the id of the process of a server.
In the first line that starts a request the date/time is repeated.
The second and fourth lines are statements about what happens internally in the application. For most routes the controller/action and files used for rendering are well-defined and repeating this information in the log doesn't serve much purpose. Ultimately it gets in the way of other important information.
I don't often find execution times like those in the fourth and fifth lines to be useful for general debugging. If there are performance problems you'll probably need to roll out heavier tools like metrics collection anyway.
Finally, when you have five lines of output for each request it makes it difficult to visually scan the log file to see problems. It's also more difficult to build scripts to associate data from a single request and use it in other ways (such as metrics reporting).

Here are the things that we want to keep in our log:

Log level. The single letter is a good choice since it doesn't take up much real estate and makes it easy to keep columns aligned.
Date/time, in this case we'll be using a consistent format like the ISO 8601 standard including time zone information. Since it is possible that a service may end up running in different timezones it is a good idea to log in UTC so that you can coordinate logs from different servers.
HTTP method (GET, POST, etc.).
Response status code
Request origin IP address
URL path
Request parameters
Response location (for redirect requests).

Rails' logging is done through Rails.logger and it is easy to add your own logging to your application. The downside is that Rails doesn't provide convenient hooks to control the default logging originating from the different parts of the framework. To keep our logging clean we'll be creating our own log file and so won't be using the default log/production.log file. Since our logging solution captures most of the same information as the default loggers you may want to turn off default logging entirely which you can do by directing it to /dev/null by adding this to config/environments/production.rb:

config.logger = Logger.new('/dev/null')

Basic request logging

Rails provides a notification mechanism which is used to provide information to its loggers. ActiveSupport::Notifications is used to issue events at various points in time as the framework processes requests. Rails creates subclasses of the ActiveSupport::LogSubscriber for different parts of its framework. A search on the API documentation shows there are four different LogSubscribers:

The documentation for each of these classes enumerates most of the events available but you might find it necessary to read the source code of the class to figure out both the information available to the event and possibly some other undocumented events that may be issued.

As it turns out, almost all of the information we want to log comes from a single event: the process_action event from the ActionController. So we'll create our own LogSubscriber and look at the information in event.payload which is where all the interesting pieces are. To figure out the data available we'll create our LogSubscriber class and attach it to the ActionController with this code which we'll put in app/lib/request_summary_logging.rb:

module RequestSummaryLogging
  class LogSubscriber < ActiveSupport::LogSubscriber
    def process_action event
      pp event.payload
    end
  end
end

RequestSummaryLogging::LogSubscriber.attach_to :action_controller

Running our server with rails server -e production and navigating to a page (a resource update makes a good example) gives us the following output:

{:controller=>"PostsController",
 :action=>"update",
 :params=>
  {"utf8"=>"✓",
   "_method"=>"patch",
   "authenticity_token"=>"QlEN1XvD/5Ny57GhC8WT6r9o0tP//QAa2gARm2Umqlw=",
   "post"=>{"title"=>"Rails Logging", "body"=>"Could be improved."},
   "commit"=>"Update Post",
   "action"=>"update",
   "controller"=>"posts",
   "id"=>"1"},
 :format=>:html,
 :method=>"POST",
 :path=>"/posts/1",
 :status=>302,
 :view_runtime=>nil,
 :db_runtime=>71.38505}

We can now see that this event provides all of the information we set out to log except for the log level (we'll be setting our own anyway), the date/time, IP address, and redirect location. We'll see a bit later how to get all these pieces into our log message but first we need to have a way to get all of this into our log file.

Setting up our logger

The ActiveRecord::LogSubscriber class provides a logger method but unfortunately it's the same as the default Rails logger and we don't want our logging information in that same stream. We'll need to create our own logger and point it to the file we want our logging to go to. We can add this to our RequestSummaryLogging::LogSubscriber class and while we're at it set up a Formatter for our logger.

require 'active_support/buffered_logger'

module RequestSummaryLogging
  class LogSubscriber < ActiveSupport::LogSubscriber
    def logger
      @logger ||= create_logger
    end

    def create_logger
      logger = ActiveSupport::BufferedLogger.new(File.expand_path('../../../log/request_summary.log', __FILE__))
      logger.formatter = Formatter.new
      logger
    end

    class Formatter
      def call severity, time, progname, msg
        "#{severity[0]} #{time.utc.strftime('%FT%T.%6NZ')} #{msg}\n"
      end
    end
    ...

  end
end

Some things to note in this implementation:

The logger method memoizes an instance variable so that opening the logfile and configuring it is only done once. This also could have been done in some kind of initializer if we were developing a more general-purpose solution.
We let the BufferedLogger.new method open our log file and put it in log/request_summary.log. It is a good idea to do this as an absolute path since sometimes Rails doesn't run in the correct directory to be able to use a relative path. BufferedLogger.new also sets the correct file attributes for things like automatically flushing buffers.
We set up a Formatter. This is where the severity and time are added to the log entry. Later when we format the rest of the entry it will be passed in as msg to the formatter.

We can now format the payload in our process_action method:

module RequestSummaryLogging
  class LogSubscriber < ActiveSupport::LogSubscriber
    ...

    INTERNAL_PARAMS = %w(controller action format _method only_path)

    def process_action event
      payload = event.payload
      param_method = payload[:params]['_method']
      method = param_method ? param_method.upcase : payload[:method]
      status = compute_status(payload)
      path = payload[:path]
      params = payload[:params].except(*INTERNAL_PARAMS)

      message = "%-6s #{status} #{path}" % method
      message << " parameters=#{params}" unless params.empty?

      logger.info message
    end

    def compute_status payload
      status = payload[:status]
      if status.nil? && payload[:exception].present?
        exception_class_name = payload[:exception].first
        status = ActionDispatch::ExceptionWrapper.status_code_for_exception(exception_class_name)
      end
      status
    end
  end
end

Note here that we're removing certain parameters that could be inferred from the requests path, again to keep the clutter out of our log file. Also we format the http method to a fixed width so that the remaining information will line up in columns and we preface the parameters with parameters= which makes it easier to parse log lines where optional data is involved (not all requests will have parameters).

The logic in the compute_status method is necessary because we aren't given a status in the event payload if an exception has occurred during the processing of the request. Rails doesn't provide a handy way to get the status in and we have to compute it ourselves with code borrowed directly from ActionController::LogSubscriber.

Our log now looks like:

I 2013-08-09T03:30:33.234173Z GET    200 /posts
I 2013-08-09T03:30:33.297396Z GET    200 /posts/new
I 2013-08-09T03:30:33.441780Z POST   302 /posts parameters={"utf8"=>"✓", "post"=>{"title"=>"Rails ...
I 2013-08-09T03:30:33.510095Z GET    200 /posts/1 parameters={"id"=>"1"}
I 2013-08-09T03:30:33.589517Z GET    200 /posts/1/edit parameters={"id"=>"1"}
I 2013-08-09T03:30:33.655532Z PATCH  302 /posts/1 parameters={"utf8"=>"✓", "post"=>{"title"=>"Rai ...

Getting the IP address for our log

Looking back at our list of things we wanted to keep in our logs we see that we only have two things left to get, the IP address and the location a redirect is sent to. Both of these are a bit tricky as they aren't part of the payload for the event we're using. The IP address is the tougher of the two as it comes from a notification triggered by the rack middleware. We could register for that notification too and store the IP address for later logging but to demonstrate a different technique we'll actually register our own middleware to pick off the IP address:

module RequestSummaryLogging
  class LogMiddleware
    def initialize app
      @app = app
    end

    def call env
      request = ActionDispatch::Request.new(env)
      Thread.current[:logged_ip] = request.ip
      @app.call(env)
    ensure
      Thread.current[:logged_ip] = nil
    end
  end
  ...

end

An important thing to note here is that we need some place to stash our IP address since our middleware code runs before our chosen logging event. We need to make sure to store it somewhere that will be associated with the later logging for the same request so we use the fiber-local variable storage mechanism of the current thread: Thread.current. We use this rather than a global variable which may get overwritten if multiple requests can be processed concurrently which depends on the web application server you use. You can read more about how this middleware works in this RailsGuide.

To add the IP address to our one-line logger we need to add a line to retrieve the IP address from the current thread and replace the message formatter in our process_action method, again padding the IP address to its maximum potential width to keep everything nicely aligned:

def process_action event

  ...
  ip = Thread.current[:logged_ip]

  message = "%-6s #{status} %-15s #{path}" % [method, ip]
  ...

end

Finally, we need to install our middleware by adding this to config/environments/production.rb:

App::Application.configure do
  config.middleware.use RequestSummaryLogging::LogMiddleware
  ...

end

Now our logging looks like:

I 2013-08-09T03:30:38.270914Z GET    200 205.251.242.244 /posts
I 2013-08-09T03:30:38.311104Z GET    200 205.251.242.244 /posts/new
I 2013-08-09T03:30:38.472275Z POST   302 205.251.242.244 /posts parameters={"utf8"=>"✓", "post"=> ...
I 2013-08-09T03:30:38.553411Z GET    200 205.251.242.244 /posts/1 parameters={"id"=>"1"}
I 2013-08-09T03:30:38.631428Z GET    200 205.251.242.244 /posts/1/edit parameters={"id"=>"1"}
I 2013-08-09T03:30:38.730665Z PATCH  302 205.251.242.244 /posts/1 parameters={"utf8"=>"✓", "post" ...

Adding redirect location

The final piece of information we set out to add to our log is the location for a request that has a redirect response. Since this is an optional piece of information and not in all log entries we'll add it much like we did the parameters, introducing it with redirect_to=. Like the IP address, the location is not available to our process_action method but instead comes from another event from the ActionController, redirect_to. We'll use the same thread storage mechanism that we did for the IP address to save the location. Here's our new redirect_to event handler along with our completed logging in process_action

module RequestSummaryLogging
  class LogSubscriber < ActiveSupport::LogSubscriber

    ...
    def redirect_to event
      Thread.current[:logged_location] = event.payload[:location]
    end

    INTERNAL_PARAMS = %w(controller action format _method only_path)

    def process_action event
      payload = event.payload
      param_method = payload[:params]['_method']
      method = param_method ? param_method.upcase : payload[:method]
      status = compute_status(payload)
      path = payload[:path]
      params = payload[:params].except(*INTERNAL_PARAMS)

      ip = Thread.current[:logged_ip]
      location = Thread.current[:logged_location]
      Thread.current[:logged_location] = nil

      message = "%-6s #{status} %-15s #{path}" % [method, ip]
      message << " redirect_to=#{location}" if location
      message << " parameters=#{params}" unless params.empty?

      logger.info message
    end
    ...

  end
end

Now our logging for a redirect looks like:

I 2013-08-09T03:30:43.445226Z POST   302 205.251.242.244 /posts redirect_to=http://rubyjunky.com/posts/3 parameters={"utf8"=>"✓ ...

Wrapping it up

We now have a logger that collects critical information about each request we get and presents it on an easily read single line. There are other critical things that still need to be added like logging for real application exceptions which would indicate some sort of problem. Also, as your application grows in complexity there are additional events that can be issued and additional information from them that you'll want logged. Finally, you'll want to start logging information meaningful to your application domain, like each time a new user is created.