Data Pipelines

A data pipeline, sometimes referred to as an ETL pipeline, is a process that ingests raw data and transforms it to be useful to an organization as information consumable by one or more data products. Data pipelines may be chained together where the output from one pipeline is passed as input to another.

Deeper Knowledge on Data Pipelines

Apache Kafka

A distributed event streaming platform for data-pipelines and analytics

Apache Spark

A data processing engine for batch processing, stream processing, and machine learning

Data Products

Ways of making data available

Data warehouses

Ways to extract, transform, and load data

Extract Transform Load (ETL)

Ways to extract, transform, and load data

Broader Topics Related to Data Pipelines

Data Engineering

Engineering approaches to data management

Data Products

Ways of making data available

Data Pipelines Knowledge Graph