LM Studio
LM Studio provides a desktop application for downloading, managing, and serving open-weight models. It exposes an OpenAI-compatible local API that ModelReins connects to directly.
Why LM Studio
Section titled “Why LM Studio”- Free forever — no API fees, no subscriptions, no usage limits.
- Full privacy — all inference runs locally. Nothing leaves your machine.
- GUI model management — browse, download, and configure models through a visual interface.
- High-volume friendly — no rate limits. Run as many jobs as your hardware can handle.
- OpenAI-compatible API — works with any tool expecting the OpenAI API format.
Install LM Studio
Section titled “Install LM Studio”- Download from lmstudio.ai for macOS, Windows, or Linux.
- Run the installer and launch LM Studio.
System requirements
Section titled “System requirements”| Minimum | Recommended | |
|---|---|---|
| RAM | 8 GB | 16 GB+ |
| Disk | 10 GB free | 50 GB+ (models are large) |
| GPU | None (CPU works) | 6 GB+ VRAM for acceleration |
Load a model
Section titled “Load a model”- Open LM Studio and go to the Discover tab.
- Search for a model — start with
TheBloke/Llama-3.2ormicrosoft/phi-3. - Click Download and wait for it to complete.
- Go to the Chat tab, select the model from the dropdown, and verify it responds.
Enable the local server
Section titled “Enable the local server”- Click the Local Server tab (left sidebar,
<->icon). - Select your loaded model from the dropdown.
- Click Start Server.
- The server runs on
http://localhost:1234by default.
Verify the server is running:
curl http://localhost:1234/v1/modelsYou should see your loaded model in the response.
Configure ModelReins
Section titled “Configure ModelReins”export MODELREINS_PROVIDER=lmstudioexport MODELREINS_LMSTUDIO_HOST=http://localhost:1234Or in modelreins.config.json:
{ "provider": "lmstudio", "lmstudio": { "host": "http://localhost:1234", "model": "llama-3.2" }}Start the worker:
modelreins worker start --provider lmstudioModel recommendations
Section titled “Model recommendations”| Model | Download Size | Quality | Best For |
|---|---|---|---|
| Llama 3.2 7B | 4 GB | Good | General tasks, balanced speed/quality |
| Phi-3 Mini | 2.3 GB | Moderate | Fast completions, lower resource machines |
| Mistral 7B | 4 GB | Good | Instruction following, multilingual |
| CodeLlama 7B | 4 GB | Good (code) | Code generation, refactoring |
| Llama 3.1 70B | 40 GB | Excellent | Complex tasks (needs powerful GPU) |
Running multiple models
Section titled “Running multiple models”LM Studio can load one model at a time per server instance. To serve multiple models, run multiple instances on different ports:
{ "provider": "lmstudio", "lmstudio": { "host": "http://localhost:1234", "model": "llama-3.2" }}For multi-model routing, consider using Ollama instead — it handles model switching automatically.
Platform support
Section titled “Platform support”| Platform | Status | Notes |
|---|---|---|
| macOS (Apple Silicon) | Supported | Metal acceleration, best experience |
| macOS (Intel) | Supported | CPU only, slower |
| Windows | Supported | NVIDIA CUDA acceleration available |
| Linux | Supported | NVIDIA CUDA, some AMD ROCm support |
Troubleshooting
Section titled “Troubleshooting”“Connection refused” — The LM Studio server is not running. Open LM Studio, go to Local Server, and click Start Server.
“Model not found” — Make sure a model is selected and loaded in the Local Server tab. The model dropdown must show an active model.
Slow responses — Use a smaller quantization (Q4 instead of Q8) or a smaller model. Check that GPU offloading is enabled in LM Studio settings if you have a compatible GPU.