Apache Kafka

Apache Kafka is open-source software from Apache that provides a real-time event streaming platform typically used in microservice architectures and data-pipelines.

Kafka servers are run in clusters that can consist of a single server, or multiple servers that span data centers. Kafka clients are applications that read, write, and process events from Kafka. Kafka Connect imports and exports data as event streams from various data sources to the cluster and between clusters.

Conceptually, Kafka is centered on events: records or messages that consist of a key, value, timestamp, and optional metadata. Producers are client applications that publish events to Kafka, whereas consumers are client applications that subscribe to and process events. To enable scalability and parallel processing, Kafka can provide guarantees such as to ensure an event is processed exactly once.

Events are organized by topics which consist of a persisted, ordered series of related events. Persistence means that these events can be processed by consumers immediately or retroactively. Topics are further organized into partitions, which are determined by the event key. Partitioning plays a key role in how Kafka scales as well as in functionality behind guarantees.

Video: What is Apache Kafka

Kafka resources

Broader Topics Related to Apache Kafka

Apache Software Foundation (ASF)

Overview of the Apache Software Foundation (ASF)

Data Pipelines

Ways of making data available

Microservices

A software architecture in which applications are made up of loosely coupled services

Open-Source Software

Useful open source software projects

Publish/Subscribe Pattern (Pub-sub)

A software engineering design pattern to separate responsibility between commands and queries

Apache Kafka Knowledge Graph