ETL Tools: How to get clean marketing data-pipelines

data harmonization blog

What are ETL tools?

ETL tools are used to extract, transform and load data from different sources in different formats. Marketing data by default is siloed, which means you have a different data set in Google Ads, Facebook Ads, Google Analytics and Salesforce as they are not built on the same technologies. The different sources tend to also have different terminologies. The more complex the technology landscape you are operating in gets, the harder it is to harmonise your data. Larger organisations will have different campaigns running in different countries with different currencies using different time-zones and naming structures. Depending on the organisation the data ownership lies with different teams and people with different KPI’s.

Why ETL tools are important

While everyone talks about data modelling, machine learning and AI, data scientists still spend too much of their time on cleaning and organising data. A survey conducted by Crowdflower shows that

Data preparation accounts for about 80% of the work of data scientists.

datai preparation

With cleaning and organising data taking up 60% of the time and collecting data sets taking up 19%, there is only 21% time left for non-repetitive growth tasks.

In performance marketing, we often see the pattern that marketers jump into conclusions based on incomplete data sets and then optimise on the wrong set of outcomes. This often leads to negative outcomes.

The top 3 mistakes we see in this area are:

  • Optimising based on a last-click attribution model
  • Double counting of conversion by looking at platforms in isolation (e.g. Facebook and Google Ads conversion both take credit for conversions)
  • Not including order cancellations or bad leads into the optimisations

That’s why data harmonisation needs to happen before optimisations can be made.

How to get started with an ETL tool

In this section, we’re going to explain the basics for data harmonisation for marketing. We basically follow an ETL process. ETL stands for Extract, Transform and Load.

  1. Get all the data in one place (Extract)

    As the first step, you want to get all data into a centralised place in the raw format. Historically data was often processed on retrieval but as storage is getting cheaper and cheaper, we recommend storing unprocessed raw data. It helps in case you want to reload raw data or transform data differently in the future.
    ETL Tools Step 1

    Note: Some data sources only allow data for certain historical date-ranges to be loaded. Better be safe and store the data

  2. Transform the data

    Now we want to transform the data to match a standard set of dimensions. By looking at the raw data it becomes clear quickly that naming conventions between the different sources are totally different. Date formats can be in various formats (e.g. 2020-11-02 or 2/11/2020), currencies might be different from source to source, campaign namings might be different.ETL Tools Step 2

  3. Load the data

    Now that you have all the data in the right format you should be able to combine them using simple joins or lookups. This will allow you to gain powerful insights into your customer journeys and media performance.ETL Tools Step 3

Wrapping it up: Get the data right first

If your data is not normalised yet, start with this as step one. Getting this right is extremely critical and will lead to positive outcomes. Needless to say, data, if analysed correctly is the key to understanding what your next move should be to stay ahead in this rapidly changing environment.

If you are interested to know more about ETL tools and technologies which can help you extract, transform and load your data please contact us for a demo.