ETL – Extract, Transform & Load
ETL is the more traditional model for data warehousing. It has been the standard, having been around for well over two decades. It is a well tested and trusted method to extract data from your business systems to power your reports.
The main advantage of ETL is that it’s relatively easy to use. Once you’ve decided on the format of your data, the graphical front ends of most ETL tools, make it pretty straight forward map out.
However, it does come with some issues. By far the biggest is the time taken for the data to be available to the end-user and coupled with relative inflexibility once the data warehouse is live may mean that ETL is a potentially poor choice for a company with big data.
The best use case for ETL is for companies that are ingesting relational data from existing database systems that have relatively low data volumes and data that is slowly changing.
ELT – Extract, Load & Transform
ELT is the newer version of the model which has the last two steps switched. That doesn’t sound like much of a difference, but in terms of large scale businesses, it can be huge.
ELT means “Extract” and “Load” the raw data and loading it to a storage system for later processing (Transform). You do not have to worry at that stage about business rules, data cleansing, take the data and save it.
There are many systems geared up to store massive amounts of quickly arriving unstructured data sets. Amazon S3, Azure Storage, Hadoop etc. can be used as Data Lakes and we will talk more about those below.
By taking advantage of cloud technologies volumes of data are less of an issue, meaning you can scale data volumes easily and quickly. However, as the datasets get more substantial, the cost of the cloud infrastructure increases due to additional storage and processing requirements.