The Local-First AI Stack — No API Keys Required

Last Updated: May 2026

You don't need OpenAI to build a capable AI development workflow. With three open-source tools and a decent GPU, you can run a fully autonomous coding assistant, an AI-powered automation layer, and a local LLM — without a single API key or subscription.

Here's the stack we actually run.

The Stack

•Ollama — Local LLM runtime (DeepSeek V4 or Llama 4)
•OpenCode — Autonomous coding agent with project awareness
•n8n — Workflow automation with AI nodes
•LM Studio (optional) — For local model management UI

Total monthly cost: $0 in software. Hardware: one RTX 4090 (~$160/mo amortized over 3 years, or free if you already have one).

Step 1: Ollama (Local LLM Runtime)

Ollama is the easiest way to run open-weight models locally. One command:

brew install ollama
ollama pull deepseek-v4
ollama pull llama-4-scout

DeepSeek V4 at 8B parameters runs at 45 tokens/second on an RTX 4090. Fast enough for autocomplete and agentic task execution. Llama 4 Scout is better for reasoning-heavy tasks but slower (18 tok/s).

Start the Ollama API server:

ollama serve

It exposes an OpenAI-compatible API at http://localhost:11434`. OpenCode and n8n both work with this endpoint natively.


Step 2: OpenCode (AI Coding Agent)
OpenCode connects to any OpenAI-compatible endpoint, so pointing it at Ollama takes one config change:
# ~/.opencode/config.yaml
provider: openai
api_key: local  # Ollama doesn't need a key
base_url: http://localhost:11434/v1
model: deepseek-v4

Then run:
opencode --init
opencode "refactor the auth module to use JWT RS256"

OpenCode will plan the refactor, show you the changes, and ask for confirmation before applying. All code stays on your machine.
Step 3: n8n (AI-Powered Automation)
n8n connects Ollama to your existing tools. Example workflow: when a GitHub PR is opened, use a local LLM to automatically classify the PR type, check if it needs security review, and post a comment.
docker run --name n8n -p 5678:5678 n8nio/n8n

Then build a workflow with:
1. GitHub trigger (PR opened)
2. Ollama node (classify PR type)
3. Conditional router (security review needed?)
4. GitHub node (post comment or label)
No external AI API calls. Everything runs through Ollama on localhost.
What You Can't Do Locally (Yet)
Local LLMs are genuinely capable but have real limitations:
•Code completion: DeepSeek V4 8B is competitive with GPT-4o on most tasks. Works.
•Complex refactoring: OpenCode on local models handles 70% of what you'd ask GPT-4o. The hard 30% still trips it up.
•Vision tasks: Ollama's vision support is early. Screenshot analysis doesn't work well locally yet.
•Real-time web search: Ollama can't browse. Need an external tool for that.
•Long contexts: Running 128k context on local hardware requires an A100. On an RTX 4090, 8k is the practical limit.
When to Add a Hosted Model
The hybrid approach that works in practice:
•Local Ollama for: code completions, batch tasks, anything with sensitive data
•Groq for: interactive agentic tasks where latency matters, complex reasoning
This gives you privacy for the stuff that matters and speed for the stuff that's time-sensitive. Groq's API is cheap enough ($15/month for moderate usage) that combining it with local doesn't change the cost profile meaningfully.
Setup Time
•Ollama: 10 minutes
•OpenCode: 5 minutes
•n8n: 15 minutes
Total: 30 minutes to a fully functional local-first AI development environment. The barrier to entry is genuinely low now.