Ollama

Ollama runs open-weight models directly on your hardware. No API key, no cloud dependency, no per-token charges. Models run on your CPU or GPU and never send data off your machine.

Why Ollama

Zero cost — no API fees, ever. Your only cost is electricity.
Full privacy — prompts and completions never leave your machine.
Offline capable — works without an internet connection after model download.
Fast iteration — no rate limits, no quotas, no waiting.

Install Ollama

macOS / Linux

curl -fsSL https://ollama.ai/install.sh | sh

macOS (Homebrew)

brew install ollama

Windows

Download the installer from ollama.ai.

Verify installation

ollama --version

Pull a model

# Recommended starter model — good balance of quality and speed
ollama pull llama3.2

# Lightweight option for fast machines
ollama pull phi3

# Larger model for better quality (needs 16GB+ RAM)
ollama pull llama3.1:70b

Configure ModelReins

export MODELREINS_PROVIDER=ollama
export MODELREINS_OLLAMA_HOST=http://localhost:11434

Or in modelreins.config.json:

{
  "provider": "ollama",
  "ollama": {
    "host": "http://localhost:11434",
    "model": "llama3.2"
  }
}

Start the worker:

modelreins worker start --provider ollama

Starter models

Model	RAM Required	Speed	Quality	Best For
`llama3.2`	8 GB	Fast	Good	General tasks, summaries, extraction
`phi3`	4 GB	Very fast	Moderate	Quick completions, high throughput
`codellama`	8 GB	Fast	Good (code)	Code generation, review
`mistral`	8 GB	Fast	Good	Multilingual, instruction following
`llama3.1:70b`	48 GB	Slow	Excellent	Complex reasoning (GPU recommended)

Selecting a model per job

Override the default model for specific jobs:

modelreins job dispatch \
  --provider ollama \
  --model codellama \
  --prompt "Review this function for bugs" \
  --input ./handler.ts

GPU acceleration

Ollama automatically detects and uses NVIDIA GPUs via CUDA. For Apple Silicon, Metal acceleration is enabled by default.

Check GPU status:

ollama ps

If your model is running on CPU and you have a GPU available, ensure your drivers are up to date and restart Ollama.

Troubleshooting

“Could not connect to Ollama” — Make sure the Ollama service is running:

# Start the service
ollama serve

# Or on macOS, launch the app — it starts the server automatically

Model not found — Pull the model first: ollama pull <model-name>. Model names are case-sensitive.

Slow responses — Check if the model fits in RAM. If the system is swapping, use a smaller model or add more memory.