Azure Data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data.
You can create data integration solutions using the data factory service that can ingest data from various stores, transfer/process the data, and publish the result data to the data stores.
It enables enterprises to ingest data from multiple on-premises and cloud sources easily, and gets your data where it needs to go. Prepare and partition your data as you ingest it, or apply pre-processing steps.
Create data pipelines that move and transform data, and then run the pipelines on a specified schedule (hourly, daily, weekly etc.).
Provides rich visualisations to display the lineage and dependencies between your data pipelines, and monitor all your data pipelines from a single unified view.
What can we use Data Factory for?
Use it to ingest data from multiple on-premises and cloud sources.
Schedule, orchestrate, and manage the data transformation and analysis process.
Transform raw data into finished or shaped data that’s ready for consumption by BI tools or by your on-premises or cloud applications and services.
Manage your entire network of data pipelines at a glance to identify issues and take action.
What is a pipeline
A pipeline is a logical grouping of activities. They are used to group activities into a unit that together, you need to understand an activity first.
What are activities
Activities define the actions to perform on your data. For example, you may use a Copy activity to copy data from one data store. Similarly, you may use a Hive activity, which runs a Hive query on an Azure HDInsight cluster to transform or analyse your data. You may also choose to create a custom .NET activity to run your own code.
Data movement activities
Copy Activity in Data Factory copies data from a source data store to sink and data store. Data from any source can be written to any sink.
Data transformation activities
Data Transformation Activity transforms data to the desired format and shape. Transformation activities that can be added to pipelines either individually or chained with other activity.
Linked services define the information needed for Data Factory to connect to external resources (Examples: Azure Storage, on-premises SQL Server, Azure HDInsight).
Linked services are used for two purposes in Data Factory:
- To represent a data store including, but not limited to, an on-premises SQL Server, Oracle database, file share, or an Azure Blob Storage account.
- To represent a compute resource that can host the execution of an activity. For example, the HDInsight Hive activity runs on an HDInsight Hadoop cluster.
Linked services link data stores to an Azure data factory. Datasets represent data structures with in the data stores. For example, an Azure Storage linked service provides connection information for Data Factory to connect to an Azure Storage account.
An Azure Blob datasheet specifies the blob container and folder in the Azure Blob Storage from which the pipeline should read the data. Similarly, an Azure SQL linked service provides connection information for an Azure SQL database and an Azure SQL datasheets specifies teh table that contains the data.
Partners & Technology
We partner with our suppliers & technology providers to provide a best in class service.