In this final part of our multi-part post (see part 1 and part 2), I will discuss the various transports which can be used to synchronise data between applications.

As opposed to the previous two parts, the strategies in this post are not limited to synchronisation and apply equally to integration.

File transfer

The old school method for data exchange has existed for as long as computers have interchanged data. File transfer involves one application creating a file and putting it in an agreed location, whilst the receiving application reads, processes and then does something with the file. Pros Cons Tips for use

Database & staging tables

Our preferred means for both integration and synchronisation, the use of databases & staging tables providing a much more robust transport as compared to file exchange.

Staging tables are an intermediate set of database tables where the sender records pending data for the recipient(s) to later process. Pros Cons Tips for Use

Message queue

A message queue is a transport level asynchronous protocol where a sender places a message into a queue until the receiver removes the message to process it. One of the key attributes of a message queue, especially for data synchronisation, is that messages are guaranteed to be delivered in FIFO order. Message Queue

Message queues have traditionally been limited to larger scale integration and synchronisation and where both sender and receiver exist on the same network. However, with the explosion of cloud computing and an agreed standard (AMQP), there is an increasing prevalence of low cost services: Pros Cons Tips for use

No interchange

In this scenario there is no intermediate data form, instead each application holds its pending data until the recipient application retrieves it.

Unless an application specifically integrates with another this style of interchange requires some form of middleware to mediate the interaction since:

Summary

If a choice can be made, use databases/staging tables as they are by far the easiest and most robust strategy.

We expect an increased use of message queues, especially with hosted infrastructure environments such as Azure and Amazon.

CSV: Dear oh dear! It may be easy to produce and simple to read, interpret and load into Excel. But it’s 2013! And there are more sophisticated and better means for exchanging data, so why are we still using this archaic format?
In Part I of this multi-part blog some of the most common issues for synchronisation were discussed. This second post will describe the strategies available for synchronising data between applications, their advantages and pitfalls. The strategy is the means to determine when a record should be inserted, updated, or deleted.

Data Structures

It is necessary to first understand the issues surrounding more complex data structures. Simple single level data structures, such as customer records, can be readily synchronised. However, as data structures become more complex so does the strategy. Illustration of Hierarchical Structure The above example is illustrative, both order and order lines must be synchronised; an order without lines is not an order. However, the third level of data – the serial–numbers add several complications:
  • Is it necessary to synchronise them? If they are used for transactional purposes, yes, but a view-only application may not require them.
  • What do the serial numbers mean when an order line is updated? Are they a fresh set of serials, or just additions? While this question applies also to lines, the deeper the structure the more pronounced the issue becomes. This is due to the feedback of how the update is made on each level of data.

Means of Identifying Updates

Inserting & Updating Records Inserts and updates follow a simple pattern where the destination application is queried to identify if it contains an existing record. An update is performed where a match is found; otherwise when no matches are found, an insert is performed. Deleted Records Handling deletions is a much harder issue, primarily as the delete needs to be identified. There are three approaches: Tombstoning Tombstoning is a technique which instead of actually deleting the record, marks it as deleted, via a field within the record’s table. What makes this a preferred method for handling deletes is that the tombstone makes the delete operation explicit and in turn reduces complexity around synchronisation. SageCRM Tombstone Tombstoning however requires the source application to provide such functionality (usually only a feature when data synchronisation/integration is integral to the application e.g. CRM). For applications that don’t support tombstoning, it is necessary to record the record deletion(s) in a separate interface table or file, usually by way of a trigger (to be discussed in part 3 of this blog). Record Comparison Record comparison involves comparing a set of data from the source application to the set of data in the destination application. A delete is performed when a record is found in the destination but not in the source. Delete By Comparison In contrast to ‘Tombstoning’ this suffers from three primary problems:
  • Complex: Record comparison is implicit; it requires the ‘data synchroniser’ to determine if a record has been deleted.
  • Auditing: There is a lack of an audit trail where records just disappear, with no facility to check why/when a deletion occurred.
  • OK for child records not for the primary entity: Using comparison for child records in hierarchical structure, e.g. order lines within an order, can be appropriate where there are limited records and the detection of deleted records is cheap and relatively straight forward.
Using comparison for a primary entity (e.g. the order) requires the examination of a large set of data to determine whether a delete is required. Re-create Everything A third alternative is to delete and re-create all records on each ‘sync’. This strategy is easiest to implement, covering inserts, updates and deletes, since it doesn’t require any logic to detect and handle changes. However, its use is only limited to the following scenarios:
  • Data is loaded at a database level e.g. reporting & data warehouses and CRMs. Applications with a sophisticated data structure and/or API usually frown on this approach as business logic and data validation is side-stepped and can lead to integrity issues.
  • The destination application is view only and non-transactional.
  • There are no (or very limited) foreign-key constraints: Foreign keys enforce relationships between records. When a record is deleted so must all the associated records. For example, deleting a customer record also requires deleting all the customer’s associated order & invoice records.
  • Data size: Deleting and re-inserting all records in a large dataset may take a long time to complete and can be expensive in terms of computing power, particularly with hosted environments where there can be CPU cycle costs. Pruning the dataset, usually by date, is an effective strategy to limit the load size.

Handling Application Logic

When synchronising data between applications it is necessary to negotiate and handle the restrictions or logic the API applies to the data.
  • Business logic is much stricter than any database enforcement.
  • Transactional data moves through a process, where the status of the transaction (or line) dictates whether modifications can be made.
  • Deletes are never easy – Particularly with ERP applications, deletes are handled in a number of ways: master items may need to be marked as inactive (a form of tombstoning); transactions can either be deleted or cancelled.
Strategies for handling application logic
    • Atomicity: An update should be all or nothing; never have a situation where some updates to a single transaction or master item fail and some succeed. Whilst this should apply whether synchronising at data level or application level, it’s easier to fall into a situation where a partial update is made due to lack of understanding of the API.
    • Is your synchronisation dumb or smart: You will need to make a choice as to how much logic you embed into your synchronisation, and often they start dumb and evolve to smart. What this means is the amount of logic required to cover certain scenarios or use cases. At the stupid end, there is no logic and a simply try/catch block is placed around your processing to catch any exceptions. Conversely, at the smart end you may check for an entity’s existence, raise workflow events, or handle specific edge cases.

Summary

This post hopefully has helped to explain the most common synchronisation strategies and pitfalls. It is by no means complete and we would like to hear your opinions. The next post will discuss the various means i.e. file transfer, database, etc. available for data synchronisation.
In this first of a series of posts, I will layout some of the common issues encountered specifically when synchronising data between applications. Synchronisation is the replication of data between two or more applications. It is usually more than a one-way push. Synchronisation is the continued updating of data to keep all applications reflecting the same view. Firstly, there are two broad categories, each with its own set of properties, that data falls into:

Master Data

Master Data is conceptualised as key entities used within an organisation. Customers, Employees, Items & Suppliers are all considered master data. At the simple end of synchronising, Master Data records involve a single record of data per entity. However, Master Data often has a number of child attributes such as addresses for customers, bank details for employees, and a range of attributes such as locations, suppliers and units of measure for items. Child Attributes It is increasingly expected that Master Data be updatable from multiple applications where each application holds its own copy of a common ‘truth’.

Transactional Data

Whilst sometimes more complex due to their header/detail structure transactions, they are usually easier to synchronise due to the following characteristics:
  • Simple direction/flow – One application is the master and the other(s) are slaves thus eliminating issues such as conflicts.
  • Transactions are less likely to be updated – Once a transaction is posted it has reached its final state and requires only a single push to the target application. There are however a couple of notable exceptions: Sales & Purchase Orders often need to be synchronised to reflect their current ‘state’ and because of their more complex structure and heavy business logic require careful attention. Further, Customer & Supplier Payments can have a complex many-to-many interaction with their corresponding invoice documents.
There is a third category: Master Data with transactional properties. An example of this is a project structure, where the project goes through several stages throughout its life and where each stage has different business rules affecting its behaviour.

Issues Surrounding Synchronisation

Data Structure Complexity – More complex data involving header, details, sub-details, etc need strategies which address changes for each ‘level’ of data. Synchronisation & Conflict Resolution
  • Synchronisation needs to be considered at each level of data.
  • Deciding which application, if any, is the master or whether bi-directional integration will be supported.
  • Conflict Resolution – What rules should be followed where there is a conflict and is there sufficient data available to resolve a conflict and/or data storage to mediate conflicts?
Data Impedance & Mismatch – The higher the resistance or more transformation required the less able you are to synchronise it.
  • Field Level Transformation – Destructive transformation, where the result of a derived or calculated field cannot be applied in reverse or accurately reversed can prevent bi-directional synchronisation.
  • Field Lengths and Type Differences – Two common issues arise: truncation of textual fields such as name, addresses & comments, and the mapping of enumerated fields i.e. drop down fields, where the choice is identified as an integer value within the source or target application.
  • Data Structure Mapping – Cases where the structure of the source data does not match the target and where transformation is required to massage the data into the required shape.
Data Interchange – As the sophistication, complexity and/or frequency of the synchronisation increases so does the requirement for more sophisticated means for transferring data. Data Take On – Finally, issues around the amount of historical data, whether the load time for data is significant and whether data cleansing for old or historical data needs to take place. In subsequent posts I will describe the various strategies available to resolve these issues. Part II will describe strategies available for synchronising data between applications, their advantages and pitfalls.

Contact

Realisable Software Ltd provides code-free, cost-effective applications integration solutions for SMEs. Our core IMan product is designed to integrate almost any application with a number of Sage solutions and online payment processors.

Looking to purchase IMan, please see our resellers here.

Realisable Software
Ph: +44 (0) 208 123 1017

Copyright © Realisable. All rights reserved.
Realisable is a registered trademark

Close

Request Demo

Realisable Software Ltd provides code-free, cost-effective applications integration solutions for SMEs. Our core IMan product is designed to integrate almost any application with a number of Sage solutions and online payment processors.

Looking to purchase IMan, please see our resellers here.

Realisable Software
Ph: +44 (0) 208 123 1017

Copyright © Realisable. All rights reserved.
Realisable is a registered trademark

Close

Access Downloads

Realisable Software Ltd provides code-free, cost-effective applications integration solutions for SMEs. Our core IMan product is designed to integrate almost any application with a number of Sage solutions and online payment processors.

Looking to purchase IMan, please see our resellers here.

Realisable Software
Ph: +44 (0) 208 123 1017

Copyright © Realisable. All rights reserved.
Realisable is a registered trademark

Close