Data Pipelines

A data pipeline, sometimes referred to as an ETL pipeline, is a sequence of ETL jobs that work together to transforms data and information to be consumable by one or more data products.

Generally, each ETL in a data pipeline will extract data from one or more data sources, transform it for some particular purpose, then load it to a new data store. Subsequent ETLs will consume data from that store, then transform and load it to their own data store, and so on.

Deeper Knowledge on Data Pipelines

Apache Kafka

A distributed event streaming platform for data-pipelines and analytics

Apache Spark

A data processing engine for batch processing, stream processing, and machine learning

Data Products

Ways of making data available

Extract Transform Load (ETL)

Ways to extract, transform, and load data

Broader Topics Related to Data Pipelines

Data Engineering

Engineering approaches to data management

Data Products

Ways of making data available

Data Pipelines Knowledge Graph