Skip to content

Ollama

Ollama runs open-weight models directly on your hardware. No API key, no cloud dependency, no per-token charges. Models run on your CPU or GPU and never send data off your machine.

  • Zero cost — no API fees, ever. Your only cost is electricity.
  • Full privacy — prompts and completions never leave your machine.
  • Offline capable — works without an internet connection after model download.
  • Fast iteration — no rate limits, no quotas, no waiting.
Terminal window
curl -fsSL https://ollama.ai/install.sh | sh
Terminal window
brew install ollama

Download the installer from ollama.ai.

Terminal window
ollama --version
Terminal window
# Recommended starter model — good balance of quality and speed
ollama pull llama3.2
# Lightweight option for fast machines
ollama pull phi3
# Larger model for better quality (needs 16GB+ RAM)
ollama pull llama3.1:70b
Terminal window
export MODELREINS_PROVIDER=ollama
export MODELREINS_OLLAMA_HOST=http://localhost:11434

Or in modelreins.config.json:

{
"provider": "ollama",
"ollama": {
"host": "http://localhost:11434",
"model": "llama3.2"
}
}

Start the worker:

Terminal window
modelreins worker start --provider ollama
ModelRAM RequiredSpeedQualityBest For
llama3.28 GBFastGoodGeneral tasks, summaries, extraction
phi34 GBVery fastModerateQuick completions, high throughput
codellama8 GBFastGood (code)Code generation, review
mistral8 GBFastGoodMultilingual, instruction following
llama3.1:70b48 GBSlowExcellentComplex reasoning (GPU recommended)

Override the default model for specific jobs:

Terminal window
modelreins job dispatch \
--provider ollama \
--model codellama \
--prompt "Review this function for bugs" \
--input ./handler.ts

Ollama automatically detects and uses NVIDIA GPUs via CUDA. For Apple Silicon, Metal acceleration is enabled by default.

Check GPU status:

Terminal window
ollama ps

If your model is running on CPU and you have a GPU available, ensure your drivers are up to date and restart Ollama.

“Could not connect to Ollama” — Make sure the Ollama service is running:

Terminal window
# Start the service
ollama serve
# Or on macOS, launch the app — it starts the server automatically

Model not found — Pull the model first: ollama pull <model-name>. Model names are case-sensitive.

Slow responses — Check if the model fits in RAM. If the system is swapping, use a smaller model or add more memory.