Concepts
Before getting started with BuildFlow, it’s important to understand the core concepts and terminology used throughout the documentation.
buildflow.yaml | a configuration file that defines your application |
Flow | a container type for user-defined Processors. |
Processor | a user-defined, individually scalable component of a flow |
Endpoint | processor pattern that receives and responds to HTTP requests |
Service | a collection of endpoints that are deployed and scaled together |
Collector | processor pattern that receives HTTP requests and writes to a Sink |
Consumer | processor pattern that reads from a Source and writes to a Sink |
Replica | an individual instance of a processor that is reading from a source and writing to a sink. More replicas are added and remove with autoscaling. |
Primitives | a resource that a processor may read from, write to, or manage |
Dependencies | a dependency that is required to call your processor |
Buildflow Yaml
The buildflow.yaml file is a configuration file that defines your application. This can be used to configure what primitives and resources your application manages and connects to.
Flows
The Flow class is the entrypoint into the BuildFlow Framework. They act as a container type for user-defined Processors and are responsible for orchestrating the Processors across the Runtime and Infrastructure submodules.
TLDR; Flows act as a container type for a user’s application:
from buildflow import Flow
app = Flow()
...
Processors
At a high-level, Processors are a user-defined function that are individually scalable. BuildFlow offers different patterns of processors to handle different use-cases (endpoints, collectors, consumers).
Endpoints
An endpoint is a processor pattern intended for serving data over HTTP. Endpoints can be grouped under a service that allows multiple endpoints to be scaled and deployed together.
service = app.service("my-service")
@service.endpoint("/", method="post")
def my_processor(request: MyRequestType):
# TODO(developer): Add processing logic
return payload
Collectors
A collector is a processor pattern intended for receiving data over HTTP and dumping it to a sink.
@app.collector("/", method="post", sink=...)
def my_processor(request: MyRequestType):
# TODO(developer): Add processing logic
return payload
Consumers
A consumer is a processor pattern intended for event driven data processing. Consumers read from an unbounded source such as (Kafka, AWS SQS, or GCP Pub/Sub) and outputs to a Sink. They are attached to a Flow by using the @app.consumer
decorator.
@app.consumer(source=..., sink=...)
def my_processor(payload: MyPayloadType):
# TODO(developer): Add processing logic
return payload
Processor Groups
Processor groups allow you to group like processor patterns together and scale them together. We currently only support grouping endpoints together into a service, but will soon be adding support for grouping consumers and collectors.
Services
A service is a group of endpoints that are deployed and scaled together. Services are defined by calling the app.service()
decorator.
Then endpoints can be attached with the @service.endpoint()
decorator.
app = Flow()
service = app.service("my-service")
@service.endpoint("/", method="post")
def my_processor(request: MyRequestType):
# TODO(developer): Add processing logic
return payload
Replicas
A replica is an individual instance of a processor. As the load of your source changes, BuildFlow will automatically scale the number of replicas to match the load. Replicas can be configured to have their own CPU and GPU requirements. For example you can have a service that serves a model that requires a GPU and a service for auth that requires a CPU. If your auth service is receiving more traffic than your model service, BuildFlow will automatically scale the auth service to handle the load while keeping the model service at a minimum number of replicas. Which means you won’t need more GPUs! Resource requirements are defined when you initialize a processor.
app = Flow()
service = app.service(num_cpus=1)
@service.endpoint("/", method="post")
def my_endpoint(request: MyRequestType):
...
@app.collector("/", method="post", sink=..., num_cpus=3)
def my_collector(request: MyRequestType):
...
@app.consumer(source=..., sink=..., num_cpus=2)
def my_consumer(payload: MyPayloadType):
...
Primitives
Primitives represent an individual resource that a processor may read, write to, or simply reference. For example, a BigQueryTable
is a primitive that represents a single BigQuery table. Primitives define how to read from a source, write to a sink, and also how to manage the primitive with pulumi.
Primitives can be used as inputs (Sources) or outputs (Sinks) for consumers and collectors. A full list of our support primitives can be found in our primitive docs. You can also define your own primitives for reading, writing, and managing custom resources.
Dependencies
Dependencies are any thing that is needed to call your processor. For example maybe you need to open a connection to your database, or maybe you need to download a model from a remote bucket. BuildFlow allows you define any dependencies, and also provides many dependencies out of the box to make your integrations easier.