With every data analytics & reporting project, there is always the requirement to load data from external applications and systems into the analytics platform. We call this Data Integration or in a more modern parlance Data Engineering.
There is a vibrant ecosystem of tools available for ETL & ELT, Microsoft SSIS (SQL Server Integration Services), Talend, Matillion, IBM Data Stage, Informatica and others have been available for many years. The more modern ones are Azure Data Factory & Matillion that are cloud-only or cloud-first solutions.
Our knowledge of both ETL and ELT helps us design the best solution for your business. Your organisation must pick the best tools and employ the correct methodology for project success.
Bringing your business data into a data warehouse usually involves one of two different methods. ETL and ELT. While on the face of it, these two topologies look pretty similar but depending on volumes of data, they could perform very differently.
Both use the same three steps. Below we briefly look at these stages and how their order can impact your business.
Extract – getting the raw data from your different data sources, for example, financial or manufacturing databases.
Transform – converting the data into a standard format the target database (Data Warehouse) and BI tools will use. Transforming data may include cleaning and fixing data issues or enriching the dataset.
ETL is the more traditional model for data warehousing. It has been the standard, having been around for well over two decades. It is a well tested and trusted method to extract data from your business systems to power your reports.
The main advantage of ETL is that it’s relatively easy to use. Once you’ve decided on the format of your data, the graphical front ends of most ETL tools, make it pretty straight forward map out.
However, it does come with some issues. By far the biggest is the time taken for the data to be available to the end-user and coupled with relative inflexibility once the data warehouse is live may mean that ETL is a potentially poor choice for a company with big data.
ELT is the newer version of the model which has the last two steps switched. That doesn’t sound like much of a difference, but in terms of large scale businesses, it can be huge.
ELT means “Extract” and “Load” the raw data and loading it to a storage system for later processing (Transform). You do not have to worry at that stage about business rules, data cleansing, take the data and save it.
There are many systems geared up to store massive amounts of quickly arriving unstructured data sets. Amazon S3, Azure Storage, Hadoop etc. can be used as Data Lakes and we will talk more about those below.