Data Pipelines

A data pipeline, sometimes referred to as an ETL pipeline, is a sequence of ETL jobs that work together to transforms data and information to be consumable by one or more data products.

Generally, each ETL in a data pipeline will extract data from one or more data sources, transform it for some particular purpose, then load it to a new data store. Subsequent ETLs will consume data from that store, then transform and load it to their own data store, and so on.

Deeper Knowledge on Data Pipelines

Apache Kafka

Apache Kafka

A distributed event streaming platform for data-pipelines and analytics

Data Products

Data Products

Ways of making data available

Extract Transform Load (ETL)

Extract Transform Load (ETL)

Ways to extract, transform, and load data

Apache Spark

Apache Spark

A data processing engine for batch processing, stream processing, and machine learning

Broader Topics Related to Data Pipelines

Data Products

Data Products

Ways of making data available

Data Engineering

Data Engineering

Engineering approaches to data management

Data Pipelines Knowledge Graph