Mohsen
Azure Data Factory
by
, 02-22-2022 at 10:12 AM (63429 نمایش ها)
Purpose and reason of using Azure Data Factory for business and customers:
1. Azure Data Factory is a fully managed, serverless data integration service that enable you to visually integrate data sources, construct ETL and ELT processes code-free or write your own code. (Simply, it allows you to move data from one service to another service via automated pipelines. You can easily and automatically receive/send data from/to your customer or between your own different services.)
ETL:
E: extraction: pull data from all your data sources. This extraction can be either from structured relational databases or unstructured data sources such as images and emails.
T: transformation: clean, process, and convert data, fitting it into the existing format in your data storage.
L: load : load data into the storage destination.
ETL vs ELT: ETL transforms your data before loading, while ELT transforms data only after loading to your warehouse
.
The above is the just one simple example of applying Azure Data Factory. You can have multiple Pipelines and with complex setup for various scenarios.
Simple Example:One of our customers has some data inside CSV files on his company blob storage. He has different applications that generate and consume the data inside the CSV files.
The customer wants the data inside those CSV files to be sent to our SQL server table every 1 hour! And our application then processes it.
In this scenario, the customer uses Azure Data Factory to push data into our SQL server tables automatically. We just need to provide the customer SQL connection and open Microsoft Data Factory IP addresses on our firewall (We use “service-tag” option on our Azure Network Security Group (NSG) and every time Microsoft changes the Data Factory IP, it will be updated automatically
.
Describing the above scenario:
You can have multiple Copy Activity but always you need 2 data sets for each (source and sink).
1. Create a pipeline inside your Azure data Factory.
2. Create a Linked services [Source] (Note: pick up the correct source. In above example, we are using blob storage)
3. Create a source dataset (in above example we use Azure blob storage) and choose the correct format (we use CSV)
4. Create a Linked Services [destination] (Note: pick up the correct type. In above example we use SQL (virtual machine)
5. Create a Sink Dataset.
6. Create a desire Activity from pipeline tab. (in above example we used “Copy Activity”).
You can check the mapping tab. (The data structure in CVS source and destination SQL table should be identical and already been created.)
You can write your own code on “Sink” tab (Pre-copy script) which allows you to perform a query before the data be copied. For example, you can delete the previous/old data.
7. You can set up triggers depending on your scenario.
8. Click on Publish.
It was a simple scenario to give you the idea of using Azure Data Factory.
References:
https://www.youtube.com/watch?v=EpDkxTHAhOs
https://blog.panoply.io/etl-vs-elt-the-difference-is-in-the-how
Written by: Mohsen Pourghorbani Dinachali
CopyRight: This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
CopyRight Notice: You can use or share this article for free but you have to mention the the writer name ( Written by: Mohsen Pourghorbani Dinachali) and share the Link: https://forum.golzarion.com/entry.php?b=29&langid=1