Skip to content

Troubleshooting

Symptom: Workers were running fine, then suddenly all jobs fail with 401 Unauthorized or Authentication failed.

Cause: You rotated your API key (at the provider or at the ModelReins coordinator) but not all workers picked up the new key.

Fix:

  1. Update the key in your environment or config file:
Terminal window
export MODELREINS_API_KEY=new-key-here
  1. Restart all workers:
Terminal window
modelreins worker restart --all
  1. If using the Companion App, open the tray menu → Settings → update the API key and click Save. The worker restarts automatically.

  2. If using the MCP channel, update the key in your .mcp.json or VS Code settings and restart the MCP client.

Prevention: Use a shared config file or secrets manager so all workers read from the same source. The MODELREINS_CONFIG_URL env var can point workers at a remote config endpoint.


Symptom: modelreins worker start --provider ollama fails with Could not connect to Ollama at http://localhost:11434.

Fix:

  1. Check if Ollama is running:
Terminal window
curl http://localhost:11434/api/tags

If this fails, start the Ollama service:

Terminal window
# Linux
systemctl start ollama
# macOS — open the Ollama app, or:
ollama serve
  1. If Ollama is on a different host or port:
Terminal window
export MODELREINS_OLLAMA_HOST=http://192.168.1.50:11434
  1. If running in Docker, make sure the container can reach the host network:
Terminal window
docker run --network host mediagato/modelreins-worker --provider ollama

Symptom: Jobs dispatched to LM Studio fail with Model not found or No model loaded.

Fix:

  1. Open LM Studio and check the Local Server tab.
  2. Make sure a model is selected and loaded in the model dropdown. The server can run without a model selected, but it won’t process requests.
  3. Verify the server is serving the expected model:
Terminal window
curl http://localhost:1234/v1/models
  1. If the model name in the response doesn’t match your ModelReins config, update the config:
{
"lmstudio": {
"model": "TheBloke/Llama-3.2-GGUF"
}
}

Use the exact model name from the /v1/models response.


Symptom: A job shows status: running indefinitely. The worker processing it may have crashed or disconnected.

Fix:

  1. Check which worker has the job:
Terminal window
modelreins job info <job-id>
  1. Check if that worker is still alive:
Terminal window
modelreins worker list
  1. If the worker is dead, release the job back to the queue:
Terminal window
modelreins job retry <job-id>
  1. If this happens frequently, increase the job timeout and enable automatic reaping:
{
"jobs": {
"timeout_seconds": 300,
"reap_stale_after_seconds": 600
}
}

The coordinator will automatically requeue jobs that have been running longer than reap_stale_after_seconds without a heartbeat.


Symptom: Cloud provider jobs fail with 429 Too Many Requests or Rate limit exceeded.

Fix:

  1. Immediate: Reduce the concurrency on the affected worker:
Terminal window
modelreins worker update <worker-id> --concurrency 1
  1. Short-term: Enable built-in rate limit handling. ModelReins will automatically back off and retry:
{
"providers": {
"claude": {
"rate_limit": {
"max_concurrent": 3,
"retry_after_seconds": 10,
"max_retries": 5
}
}
}
}
  1. Long-term: Spread load across providers using routing rules. Add OpenRouter as a fallback — it handles rate limiting across multiple upstream providers:
{
"routing": {
"strategy": "fallback",
"chain": ["claude", "openrouter"]
}
}

Symptom: modelreins worker list shows the worker as connected, but it never processes any jobs.

Fix:

  1. Check that the worker’s provider matches the jobs in the queue:
Terminal window
modelreins job list --status pending
modelreins worker info <worker-id>

If jobs are queued for claude but the worker only supports ollama, it won’t pick them up.

  1. Check routing rules. If your routing config specifies a worker name or tag that doesn’t match:
Terminal window
modelreins config show routing
  1. Verify the worker has available capacity:
Terminal window
modelreins worker info <worker-id> --verbose

Look for concurrency: 0 or paused: true.


Symptom: The dashboard URL returns a blank page or connection refused.

Fix:

  1. Check the coordinator is running:
Terminal window
modelreins status
  1. The dashboard is served by the coordinator on the same port (default 7420):
Terminal window
curl http://localhost:7420/health
  1. If accessing remotely, check firewall rules allow traffic on port 7420.

  2. If using a reverse proxy, ensure WebSocket connections are proxied (the dashboard uses WebSockets for live updates):

location / {
proxy_pass http://localhost:7420;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}