Data centralisation and quality assurance reporting

20 Sep ‘16

If you want to improve something, you need to centralise your data. Otherwise, your reporting will be incomplete and your triggers will be uninteresting.

Data quality

Our clients trust us with their data and tend to have two questions when we centralise it:

  1. Were the data processed correctly? What about duplicates?
  2. How do we know that everything is OK?

What we’ve done

  1. Selected a processing workflow
    • ensured security
    • examined about 50 complex cases
  2. Developed data quality reporting
    • we report on the whole database and each import separately
    • we show duplicate and similar data
    • we find typos in names, invalid contact info, name and gender mismatches
  3. We store a revision history for each customer’s personal data

What makes centralisation difficult?

Here are several examples:

several examples
  • Someone authorised on the site with one email address, then placed an order for their mother, gave her contact info, and clicked «subscribe» in the order. Without deleting the original email, you need to send the SMS about the order and an opt-in to their mother.
  • You’d think that a phone number could be used as a unique ID, but there are complications. For example, several different people gave the fictitious number 900-111-11-11. You can’t consider them all the same customer.
  • There was a customer named John who was born in 1984. Now you receive information that the correct name is Peter Johnson. Should you change just the name, and leave the year of birth?

We’ve thought through tons of these cases and developed behavioural patterns. The process was long and painful, but we are very pleased with the result.

How can I check everything?

We’ve made three reports


The «Overview: external systems, discount cards, unusual events» report helps you view:

  • whether all customers have been uploaded from every system
  • if the data are for the correct period
  • information about discount cards
  • the number of duplicates and similar accounts

You should look at the «Customer and order distribution» report if you have doubts that:

  • all orders were uploaded
  • all orders are associated with the correct customers

The «Name, gender, contact info: revisions and typos» report shows whether everything is OK with particularly important data for personalised communications:

  • how many phone numbers, email addresses and names are empty or have been edited
  • how many invalid contacts and typos there are

You can view examples of these customers everywhere. For example, we can see how the data look for customers with typos in their names


Or we can see what «incorrect gender» is referring to


You can view each report for any customer segment or specific import


You can view the full revision history for each customer. This goes for any customer in the database, not just the examples in the report.

For example, say we wanted to view what «there was an email address, but it was revised» refers to. We can see the source of each revision, the import number and the system user. In other words, if anything odd happens, it’s clear where and from whom you can get more detail.


If you «click to view», you can see revisions for all additional fields, as well as the reason why they were merged



A well-designed workflow for processing, reporting and revision history for each customer gives you confidence in your data quality. If you have doubts or don’t understand something, you know where to go and what to do.

Integrations don’t always run smoothly: reporting and revision histories help you identify problems, evaluate them, and understand what needs to be fixed and where. This is highly valuable and our customers appreciate it.

All of this functionality and knowledge is available for free to our subscribers on any plan.

Responsible for the feature:

  • Nikolai Andreichuk, Architect
  • Leonid Gureev, Developer
  • Sergei Korobkov, Developer
  • Svetlana Selmeneva, Product Manager


Tell us a little about yourself

We’ll respond within 24 hours

Partnership request

Typically we’re answering within 24 hours