Run Your Own Coding Agent on Your Laptop (for Free)
In this guide, you will learn how to download a coding LLM to your laptop using Ollama and wire it into Claude Code or Codex with a single command.
Required tools
Ollama App
Updated
Apr 20, 2026
The Rundown
In this guide, you will learn how to download a coding LLM to your laptop using Ollama and wire it into Claude Code or Codex with a single command. You will be able to run your coding agent for free on simple tasks and keep proprietary code entirely on your machine.
Who This Is Useful For
- Solo devs and indie founders paying $100-200/mo for Claude Code Max or Codex Pro on work that does not need frontier reasoning
- Students, hobbyists, and new coders who want a working agentic setup to practice on for free
- Anyone working on proprietary or NDA code who wants the whole loop to stay on their laptop
What You Will Build

A local Claude Code (or Codex, or OpenCode) session pointed at a free Ollama model running on your own hardware. Same agent, same interface, zero per-token cost, nothing leaving your machine.
What You Need to Get Started
- Ollama installed. If you do not have it, follow our Ollama install guide first
- Claude Code signed in and working, or OpenCode (installer in Step 5)
- A terminal and a project folder you actually want to work in
- 16 GB of RAM is ideal. 8 GB works with smaller models. 32 GB+ is better.
Step 1 Check Your Hardware and Ask an LLM What to Run
Before you pull anything, know what your machine can handle. On Mac: Apple menu > About This Mac > screenshot the specs panel. On Windows: Settings > System > About > screenshot.
Drop that screenshot into Claude, ChatGPT, or any LLM, and ask: "Which Ollama coder models can I realistically run on this machine with Claude Code or Codex?" It will read the RAM and chip off your screenshot and give you a short list.
Rough guide to what fits where:
Your RAM | Recommended size | Example |
|---|---|---|
8 GB | 3B params or smaller |
|
12 GB | 4-7B params |
|
16 GB | 7-12B params |
|
32 GB+ or GPU | 20B+ |
|
Pro tip: Download the biggest version you can reasonably fit. Bigger means more reliable tool calls, which is the thing that actually breaks small models inside Claude Code.
Step 2 Pick a Coder Model That Supports Agentic Tools
Browse ollama.com/search?q=coder and open the page for a model your LLM recommended. Critical check: scroll to the Applications section on the model page and confirm it lists Claude Code, Codex, OpenCode, or OpenClaw. If it lists none of them, the model does not support the tool calls agentic coding requires. Skip it.
The three that held up best in our testing:
qwen3-coderis the purpose-built coder pick. Strong raw code generation at its size.gemma4has explicit tool-use and thinking training. Handles multi-step tool chains more reliably.gpt-ossis OpenAI's open-weights MoE model with strong agentic support.
Pro tip: If you cannot choose between two candidates, pull both. Disk is cheap and you can swap between them with the
--modelflag. Keep the one that actually holds up on your workflow.
Step 3 Launch Your Coding Agent With the Local Model
From the model's Ollama page, copy the launch command. It looks like:
ollama launch claude --model gemma4:e4b

Open a terminal in your actual project folder, paste the command, and hit enter. Confirm the download when it prompts you, then wait for the weights to pull the first time. After that, Ollama drops you into Claude Code pointed at the local model instead of Anthropic's API.
Inside the session, type /model to confirm which model is wired in. Every response from here costs zero tokens.
Pro tip: Run
ollama psin a second terminal to see what is actually running. It shows the active model, RAM in use, and GPU utilization. 100% GPU means you are fully accelerated. Anything lower means part of the model is spilling to CPU, and responses will be slower.
Step 4 Bump the Context Window Before You Do Anything Serious
This is the single most important setting in the whole setup. By default, Ollama allocates only 4K of context per model, which is way too small for agentic coding. Claude Code will read one file, fill the buffer, and immediately start forgetting the rest of the conversation.
Fix it once and never think about it again:
- Open the Ollama app
- Ollama menu > Settings
- Find the Context slider
- Bump it to 32K to start, or higher if your specs allow
Pro tip: Ask your LLM what context size your specs can safely handle. Maxing it out can push the model past your GPU's limits and crash things. Start at 32K, verify with
ollama ps, then raise if there is headroom.
Step 5 Try OpenCode If Claude Code Feels Too Heavy
Claude Code is a sophisticated harness with a lot of tools exposed at once. Small local models sometimes get confused by the volume of choices. OpenCode is a lighter-weight coding agent built for this case, and it uses the same ollama launch pattern.
Install it on Mac with one line:
curl -fsSL https://opencode.ai/install | bash
Then launch it the same way:
ollama launch opencode --model gemma4:e4b
Pro tip: Smaller models work better when you ask them to think less. Hit Shift+Tab inside Claude Code or OpenCode to toggle plan mode (the agent writes its approach before touching files). Lower the reasoning effort in Settings if the agent keeps over-thinking simple tasks.
Going Further
The single highest-leverage move once the basics are working is a hybrid setup. Keep paying for Claude Code or Codex as your main agent, but configure your local model as a cheap subagent it orchestrates. The frontier model handles architecture and planning. The local model grinds through the boilerplate. You keep the smart work and offload the cheap work, and the token meter slows way down.
Two more places to take this:
- Remote-control your local session from your phone. Run
claude remote-controlinside the session and scan the QR code with the Claude mobile app. You can kick off a task at your desk and supervise it from anywhere. - Put the model on a dedicated machine. An old Mac mini or repurposed PC makes a great always-on local inference box. Hit it from your laptop over your network and stop fighting your daily driver for RAM.
Local models are not yet a full replacement for frontier cloud coding agents. Treat this as a practice environment, a privacy layer for sensitive projects, and a budget hedge on the tasks that do not need frontier reasoning. The gap closes every few months.
