Dreaming for errors – integrations, error handling & logging

Don’t think that all of your data is going to import every time, if you’re lucky some of it might some of the time.

When designing or writing your integration application it is essential that you expect errors and, as a result, design your program with the possibility for errors at the forefront of what you’re trying to achieve. In fact errors are probably just as critical as the data itself, if not more so.

This blog describes our best practices for designing error handling and logging within an integration.

Design Pattern – The Layered Approach to Error Handling

In a previous position I had the unenviable task of having to maintain several of a colleague’s poorly written import programs. Each time I made a modification I would re-factor a certain part of the program just for my own sanity and in conjunction I maintained several integrations I’d written myself. Over time each import converged to a similar structure, all based upon the way the integration handled errors.

The resulting pattern was a 3, sometimes 4, layer structure where each layer had its own error handling responsibilities.

Atomic transaction: This is the inner most layer of your integration and is responsible for managing a single ‘atomic’ transaction. On failure it is this layer which cancels or rolls-back any half written transaction.
Transaction handler: The transaction handler is essentially a loop which calls into the ‘atomic’ transaction handler. When the atomic handler raises an error, this layer needs to decide how to proceed: whether to move to the next transaction or abort.
Task handler (optional): When your integration interfaces with anything more than a single end-point the task handler manages the handling of exceptional and non-exceptional data. Where a transaction succeeds, it needs to be submitted to the next end-point, whereas a failure usually means that it should not be processed at the next.
Entry point: The outermost layer to your application. It is responsible for:
[inner-list list=”1″] [inner-list-el double=”1″]Generation of the end user log or processing report.[/inner-list-el] [inner-list-el double=”1″]Catching and logging any unexpected or catastrophic errors.[/inner-list-el] [/inner-list]

Atomic transactions

It goes without saying that whenever an error occurs in your automated integration it shouldn’t leave half written, partial transactions, database records, or files.

Sometimes however the concept of a transaction needs some thought, and requires input from business requirements, the technology of the platform and the ability to re-process data.

For example, should a batch of cash transactions into an accounting system be imported in an all-or-nothing approach, or should the integration import the ones it can, omitting the erroneous transactions? This scenario has implications on where the batch is posted; without the erroneous transactions, it may affect the bank reconciliation function at a later point.

Logging types

In our experience there are two audiences for the logs of any integration: end-users and technical staff. Users will want something they can understand, and technical people want something they can use to debug or understand a problem.

End-user reports

Are sent to the people responsible for monitoring any integration/data feed.
Should be in readable, easily digested format.
Should allow the user to be able to quickly reconcile data passed between applications.
Should provide the user enough information to allow them to identify and/or resolve the issue, or at least identify any erroneous data.

Catch all & Catastrophic error logging

Used to catch unexpected errors and exceptions, useful for technical staff to help identify isolated issues.
Catastrophic errors are simply those you didn’t plan; for example, a loss of network connection; a corrupted file; a sudden disconnection from a database; an untimely permissions change; or out of memory exception.
In contrast, an invalid item code on a sales order import is not catastrophic; it should be handled by your standard logging mechanism and reported on your end-user reports.
These errors should be logged locally to either a local file or local database, windows event log. Don’t rely on remote resources such as remote databases or email as network availability could be limited or the cause.

Runtime logging

Errors, warnings & messages should be persisted as early as possible, which means writing to a database or file as and when they occur. Should a catastrophic error occur you have a process log, a trace, which would otherwise be lost if they were held in memory and persisted at the end of the process.

Log libraries

It is worth mentioning that there are a number of open source logging and tracing libraries available such as nlog, log4net, the .net trace libraries and Microsoft’s Enterprise framework which can provide you with a robust and feature rich way for persisting errors and warnings to a number of mediums. If you decide to use these, as opposed to rolling-your-own log scheme, the libraries can require some understanding to configure & setup and their outputs are not suited for end users.

Generating the end user process report

At the conclusion of processing it is necessary to generate the process report for the end users. If you’ve followed our recommendations, this should mean re-querying the medium to which the errors were persisted during processing and re-formatting them into an easy to read format.

The best and easiest format to send the end users a process report is email for the following reasons:

Everyone has email.
Portable and amendable: Annotations can be made and forwarded to others if remedial action needs to be made by another person.
Logs can be easily formatted with html and presented nicely.

Summary

Error handling and logging is a key component of any application, more so with integration! Without proper error logging and reporting, your organisation will at best spend needless time resolving and reconciling data between applications and at worst lose customers because of poor service and lost transactions.

We hope this has been of use to some of you and if you ask us nicely, we may just provide you with some samples and template code illustrating the best practices we’ve described.