Log tracing with Correlation-id

In almost every enterprise system we log information which helps us in debugging in case of any issues. However sometimes it becomes challenging to follow and infer anything meaningful from these logs.

There can be many reasons for this problem

  • Too little logging
  • Too much logging
  • Not logging any context information so as to identify the requester for which the logs were written

The first two problems are still easy to solve. We need to identify critical points in the user journey and log only those. All the other information can be logged at more verbose level like DEBUG.

However the third problem requires a some clever methodology which we will discuss in detail in this post.

Need to log some context information with each log statement

Say suppose we are processing a payment. A payment can fail for many reasons like invalid source account, invalid target account etc. In case the target account is not correct we log something like below

log.error("Target account is invalid. Payment failed!");

Nice! now anyone who is reading the log will understand that the reason for payment was an invalid account number. But wait! How would we identify the user who initiated the payment. At any point in time many simultaneous users use our systems. In case a customer raises an issue that his payment failed we need to first find the logs for that customer and then from only his logs we can find the root cause.

I hope at this point we all can agree that in addition to just logging the statements we need a way to link these logs to a particular user request which we do by logging some context information.

So now my logs become something like

log.error("Target account is invalid. Payment failed! for {}", user);

Identifying some unique and universally present context id

Now since we agree we need some context information to log the second important step is to identify the context information. A good context data should have the below properties

  • Unique for the domain in which we are logging, accounts/user in banking, Files in CMS systems etc
  • Should be present for the entire life cycle of the request

The first point is self explanatory. For the second one lets take the same payment example:

We will get a HTTP request to initiate a payment. The first step will be to check if the accounts are valid. In case they are valid we will check if the debit amount is valid. Once that check also passes we will then create a payment instruction. In this case the context data which we want to pick should be available across all these steps.

A more standardised approach is to create a unique context id just for the sake of log tracing. In this way we are not mixing the domain data with the data we need for linking multiple log statement. This will also help us in situations where we are dealing with say a system which does not need say user id but we are still using it just to make our logs work.

Finally, So what is Correlation-id?

Correlation_id is nothing but a dedicated unique id which we generate and then log with each log statement to link related logs from a single request/user. It depends on you when you want to generate this unique id. This can be done at the start of the user session to link all user activities in a single session. Generally most enterprise systems generate a unique id per user request so as to uniquely identify a single request end to end in the logs.

Reference Architecture

In this we are using log4j MDC system which helps us in setting a context id which the log system then logs automatically with each log statement. This is just one of the several ways of ensuring that log statements automatically include correlation id with each log. We can also write some interceptors or can have our own custom implementation of the log statements which can then add correlation id with each log statement.

  1. User sends a request to transfer some money
  2. We intercepts this request in a servlet filter and generates a unique uuid for this request
  3. We set this unique id in our logging system’s MDC so that we have this universally present and also we will not have to log this explicitly.
  4. All the subsequent logs in this request will automatically log this id with the log statement
  5. Finally before sending the response, we set this id in the X-Correlation-ID header
  6. In case of failure we show a pop-up to the user with this correlation-id also listed.
  7. User can then raise a support issue with the bank and can mention the id of the error which he got when he tried to transfer the money.
  8. Support engineer can easily get all the logs of the exact transfer request which failed with the provided id.

Thus in this post we can clearly see that logging correlation id can be really helpful and will actually help us in effectively using the application logs. I hope I was able to highlight the importance of this correlation id design without going into the implementation details. Please comment and share in case you find this helpful and please let me know in case you have any questions on the same. Thanks!

Leave a comment