Apache Kafka

Apache Kafka is open-source software from Apache that provides a real-time event streaming platform typically used in microservice architectures and data-pipelines.

Kafka servers are run in clusters that can consist of a single server, or multiple servers that span data centers. Kafka clients are applications that read, write, and process events from Kafka. Kafka Connect imports and exports data as event streams from various data sources to the cluster and between clusters.

Conceptually, Kafka is centered on events: records or messages that consist of a key, value, timestamp, and optional metadata. Producers are client applications that publish events to Kafka, whereas consumers are client applications that subscribe to and process events. To enable scalability and parallel processing, Kafka can provide guarantees such as to ensure an event is processed exactly once.

Events are organized by topics which consist of a persisted, ordered series of related events. Persistence means that these events can be processed by consumers immediately or retroactively. Topics are further organized into partitions, which are determined by the event key. Partitioning plays a key role in how Kafka scales as well as in functionality behind guarantees.

Video: What is Apache Kafka

Kafka resources

Broader Topics Related to Apache Kafka

James's Knowledge Graph

Apache Kafka

Video: What is Apache Kafka

Kafka resources

Broader Topics Related to Apache Kafka

Publish/Subscribe Pattern (Pub-sub)

Data Pipelines

Open-Source Software

Apache Software Foundation (ASF)

Microservices

Apache Kafka Knowledge Graph