Exciting times: Big Data Warehousing in the Azure stack

For our back-end team, the past months have been exciting to say the least. Not only did COVID-19 force us into a new way of working, we also got the opportunity to do a project in a technology stack which is new to us! So despite everyone working from home, with lots of Teams calls we managed to deliver a completely new E-commerce and Digital Marketing Data Warehouse in Microsoft Azure. In a series of blog posts we would like to introduce you to the different pieces of this project that resulted in the recent successful go-live.

We were asked by E-commerce and Digital Marketing teams to create a new data warehouse. Data from existing and new sources could be combined to create a superset of dashboards with various reporting angles and to enable data scientists to do analyses on structured data available at their fingertips. Next to this, the customer expected reduced operational cost and complete data ownership in a system that is fit for the future. All these new features and improvements in the end should enable the E-commerce and Digital Marketing teams to improve their Performance Marketing results by taking decisions based on real data. Together with the customer, we decided to go with a Microsoft Azure stack, using Azure Synapse, as well as Event Hubs, Databricks, Data Factory, Logic Apps, Storage Accounts, PowerBI and more.

So what kind of data are we talking about? First of all there are webshops and brand websites. This means there is hit data available via Google Analytics, giving us insight into the events that happened when someone visited a website, from source all the way to checkout. Next to that, the order management system, Magento, was connected to provide SKU-level revenue and profit figures. Customer profiles are stored in Salesforce Sales Cloud and email campaigns in Salesforce Marketing Cloud. BlueConic is another system in which customer profile data is kept. Finally, annual targets, Performance Marketing Spend, product master data and the likes can be extracted from flat files or the SAP ERP system. Blog number 2 will go deeper into the source system setup, such as Google Analytics using Google BigQuery and Polybase loading into Synapse, as well as BlueConic with Event Hubs, Databricks and Data Factory.

Azure Synapse (formerly known as SQL Data Warehouse) is a massive parallel Data Warehouse solution for big data applications. In the past months we learned about creating optimized data models that make use of the parallel architecture. We also experienced how it integrates with Azure BLOB Storage for loading via Polybase. Blog number 3 will be dedicated to Azure Synapse completely.

The DevOps organization is now in charge of supporting and enhancing the solution we built. Spoiled as we are with SAP tools, we were looking for similar source code control and tools to move objects throughout the landscape. For Synapse, we decided to implement Azure DevOps which hosts a Git code base and pipelines to deploy changes to the Quality Assurance and Production environments. This will be covered in the concluding blog of this series.

We sincerely hope you enjoy reading about our journey into e-commerce and digital marketing on the Azure stack! Feel free to leave comments.


Image by muneebfarman from Pixabay
This article belongs to
  • azure
  • digital marketing
  • e-commerce
  • Paul Schoondermark