Running a Local Large Language Model with Docker

This guide will walk you through running a local large language model (LLM) using Docker. We'll use the Koboldcpp repository, a lightweight and efficient LLM runner.

Prerequisites

Requirements

Docker: Ensure Docker is installed and running on your system.
Git: Required for cloning repositories.
Internet access: To download the required model file within the model build.

Dockerfile Explanation

The Dockerfile below sets up an environment to run Koboldcpp and the Erosumika-7B model.

FROM ubuntu:latest # (1)!

# (2)!
RUN apt-get update && apt-get install -y  \ 
    sudo \
    python3-pip \
    gfortran \
    gcc \
    git \
    curl \
    wget

# (3)!
WORKDIR /
RUN mkdir ./home/koboldcpp_dir
COPY ./koboldcpp_dir ./home/koboldcpp_dir

# (4)!
WORKDIR /home/koboldcpp_dir/models
RUN wget --no-check-certificate -O Erosumika-7B-v3-0.2-Q4_K_M-imat.gguf "https://huggingface.co/Lewdiculous/Erosumika-7B-v3-0.2-GGUF-IQ-Imatrix/resolve/main/Erosumika-7B-v3-0.2-Q4_K_M-imat.gguf?download=true"

# (5)!
WORKDIR /
WORKDIR /home/koboldcpp_dir
RUN make

# (6)!
WORKDIR /
COPY ./start_program.sh /home/koboldcpp_dir
WORKDIR /home/koboldcpp_dir
RUN chmod 555 start_program.sh

# (7)!
EXPOSE 5001

CMD "./start_program.sh"

We have chosen the latest ubuntu base image
Installing some core dependencies for the ubuntu environment
Copying over the koboldcpp repository files
Setting the working directory to the models directory of koboldcpp and downloading a model from Huggingface
Setting the working directory to koboldcpp and running the 'make' command to compile the entire build
Copying over the start_program.sh bash script into the koboldcpp directory and granting it privilege to run
Exposing the necessary port, whereby the web application and API can be accessed.

Workflow Diagram

Below is a high-level workflow of how the container is set up and executed:

graph TD
    A[Start] --> B[Download or Clone Koboldcpp Repository]
    B --> C[Set Up Environment with Dockerfile]
    C --> D[Configure start_program.sh]
    D --> E[Build Image]
    E --> F[Run Container]
    F --> G[Access LLM Locally via web application and API]

Steps to Execute

Clone the koboldcpp Repository

Within your project directory, run this command to clone the repository into a folder called koboldcpp_dir. Alternatively, you can also download the latest code release and place it in a similarly named folder.

git clone https://github.com/LostRuins/koboldcpp koboldcpp_dir

Prepare the start_program.sh bash script

Within your project directory, create a file called start_program.sh, which will contain your runtime commands for running koboldcpp within your container. You may customize this with various options as specified in the koboldcpp docs.

Example

python3 koboldcpp.py \
--contextsize 4096 \
--threads 8 \
--model ./models/Erosumika-7B-v3-0.2-Q4_K_M-imat.gguf

Build the Docker Image

Build the Docker image using the Dockerfile provided:

docker build -t koboldcpp-image .

This step can take a few minutes depending on your internet speed and computer performance, as all the project dependencies, including the GGUF model takes place here. When you attempt to re-build the same image, Docker will cache all the un-edited commands, so the longer wait should only be on the first build for most use cases.

Run the Container

Execute the following command to start the container:

docker run --name koboldcpp-container -p 5001:5001 koboldcpp-image

A Successful Run of the Docker container

Access the LLM

Open your browser and navigate to http://localhost:5001 to interact with the LLM or visit http://localhost:5001/api to interact with the Swagger docs for the API backend.

The KoboldCPP UI

The KoboldCPP UI running locally.

The KoboldCPP API Docs (Swagger

The KoboldCPP API running locally.

Tips and Notes

Optimizations

Use a GPU-enabled environment for faster inference (requires additional setup).

Model Format

The Erosumika-7B-v3-0.2-Q4_K_M model is stored in GGUF format, optimized for Koboldcpp.

With this setup, you should have a functional environment to run a large language model locally using Docker and Koboldcpp. Happy experimenting!