In this example we will create a chatbot that uses the Llama2 7b chat model. All the code for this example can be found on GitHub.

Our application will:

  1. Download our model from remote storage in a dependency ensuring the model is ready before we receive traffic, and it can be shared across requests.
  2. Expose an endpoint that allows chat with the model
  3. When our endpoint is called it will send the request through the model and return the response.

If all goes well, you should have a working chatbot that looks like this:

Before completing the example ensure you have installed BuildFlow with all extra dependencies.

Our basic model is hosted on a public bucket, but you can also host your own model on a private bucket.


Clone the GitHub Repo

git clone
cd launchflow-model-serving

Install your requirements

    pip install -r requirements.txt

Run your project

Run your project with:

    buildflow run

If you want to experiment loading the model from S3 instead of GCS simply set USE_GCP=false in the .env file

Once running you can visit http://localhost:8000 to begin chatting with the AI!


What's next?

Now that you have a working chatbot, you can start to customize it to your needs. Such as adding google auth for user authentication or a postgres database for permanent storage. Or even hosting your own model on a private bucket.