Skip to content

LM Studio

LM Studio provides a desktop application for downloading, managing, and serving open-weight models. It exposes an OpenAI-compatible local API that ModelReins connects to directly.

  • Free forever — no API fees, no subscriptions, no usage limits.
  • Full privacy — all inference runs locally. Nothing leaves your machine.
  • GUI model management — browse, download, and configure models through a visual interface.
  • High-volume friendly — no rate limits. Run as many jobs as your hardware can handle.
  • OpenAI-compatible API — works with any tool expecting the OpenAI API format.
  1. Download from lmstudio.ai for macOS, Windows, or Linux.
  2. Run the installer and launch LM Studio.
MinimumRecommended
RAM8 GB16 GB+
Disk10 GB free50 GB+ (models are large)
GPUNone (CPU works)6 GB+ VRAM for acceleration
  1. Open LM Studio and go to the Discover tab.
  2. Search for a model — start with TheBloke/Llama-3.2 or microsoft/phi-3.
  3. Click Download and wait for it to complete.
  4. Go to the Chat tab, select the model from the dropdown, and verify it responds.
  1. Click the Local Server tab (left sidebar, <-> icon).
  2. Select your loaded model from the dropdown.
  3. Click Start Server.
  4. The server runs on http://localhost:1234 by default.

Verify the server is running:

Terminal window
curl http://localhost:1234/v1/models

You should see your loaded model in the response.

Terminal window
export MODELREINS_PROVIDER=lmstudio
export MODELREINS_LMSTUDIO_HOST=http://localhost:1234

Or in modelreins.config.json:

{
"provider": "lmstudio",
"lmstudio": {
"host": "http://localhost:1234",
"model": "llama-3.2"
}
}

Start the worker:

Terminal window
modelreins worker start --provider lmstudio
ModelDownload SizeQualityBest For
Llama 3.2 7B4 GBGoodGeneral tasks, balanced speed/quality
Phi-3 Mini2.3 GBModerateFast completions, lower resource machines
Mistral 7B4 GBGoodInstruction following, multilingual
CodeLlama 7B4 GBGood (code)Code generation, refactoring
Llama 3.1 70B40 GBExcellentComplex tasks (needs powerful GPU)

LM Studio can load one model at a time per server instance. To serve multiple models, run multiple instances on different ports:

{
"provider": "lmstudio",
"lmstudio": {
"host": "http://localhost:1234",
"model": "llama-3.2"
}
}

For multi-model routing, consider using Ollama instead — it handles model switching automatically.

PlatformStatusNotes
macOS (Apple Silicon)SupportedMetal acceleration, best experience
macOS (Intel)SupportedCPU only, slower
WindowsSupportedNVIDIA CUDA acceleration available
LinuxSupportedNVIDIA CUDA, some AMD ROCm support

“Connection refused” — The LM Studio server is not running. Open LM Studio, go to Local Server, and click Start Server.

“Model not found” — Make sure a model is selected and loaded in the Local Server tab. The model dropdown must show an active model.

Slow responses — Use a smaller quantization (Q4 instead of Q8) or a smaller model. Check that GPU offloading is enabled in LM Studio settings if you have a compatible GPU.