In this example we will create a chatbot that uses the Llama2 7b chat model. All the code for this example can be found on GitHub.
Our application will:
- Download our model from remote storage in a dependency ensuring the model is ready before we receive traffic, and it can be shared across requests.
- Expose an endpoint that allows chat with the model
- When our endpoint is called it will send the request through the model and return the response.
If all goes well, you should have a working chatbot that looks like this:
Our basic model is hosted on a public bucket, but you can also host your own model on a private bucket.
Clone the GitHub Repo
git clone email@example.com:launchflow/launchflow-model-serving.git cd launchflow-model-serving
Install your requirements
pip install -r requirements.txt
Run your project
Run your project with:
If you want to experiment loading the model from S3 instead of GCS simply set
USE_GCP=false in the
Once running you can visit http://localhost:8000 to begin chatting with the AI!