Overview
In OctoMesh, data pipelines are integral to the Extract, Transform, Load (ETL) processes that ensure efficient data handling across distributed environments. Pipelines are executed by Adapters, which can be deployed either at the edge (close to data sources) or centrally in the cloud.
Pipelines
Pipelines are executed by Adapters. Adapters deployed at the edge manage the initial stages of the data lifecycle, including data capture, preprocessing, and preliminary transformations. Adapters deployed centrally in cloud environments handle extensive data transformation, integration, and aggregation tasks.
Key Features of Pipelines:
- Flexible deployment: Pipelines can run on edge or cloud Adapters depending on requirements.
- Low latency: Edge-deployed Adapters ensure fast data handling close to data sources.
- Scalable architecture: Cloud-deployed Adapters are capable of managing increased data flows and complex processing requirements.
- Advanced data management: Ensures data consistency and facilitates comprehensive analytics.
The diagram below shows the execution roles of Adapters in the ETL process:
DataFlow
A DataFlow is a logical grouping of related Pipelines that work together as part of a single data processing workflow. In the Construction Kit model, the DataFlow entity (previously called DataPipeline in Communication-2) serves as the parent container for one or more Pipeline instances. A PipelineTrigger is a child of DataFlow and triggers Pipeline execution on a schedule (via cron expressions).
DataFlows also establish a shared topic exchange in the event hub, enabling inter-pipeline communication. Pipelines within the same DataFlow can send data to each other using the ToPipelineDataEvent and FromPipelineDataEvent nodes with routing keys based on the target pipeline's runtime entity ID.
Construction Kit Model
The construction kit of the model database is managing the data of adapter and pipelines that includes descriptions, state and relationships.