Running Local Deep Researcher with Ollama on Ubuntu (2026 Guide)
Last updated April 2026 — refreshed for current model/tool versions.
Running a local deep research assistant on Ubuntu gives you iterative, citation-backed reports without sending your data to any cloud. This guide covers two production-ready options — the lightweight langchain-ai/local-deep-researcher (Python script, minimal dependencies) and the feature-rich LearningCircuit/local-deep-research (v1.6.6, web UI, encrypted databases, 20+ search sources) — both powered by Ollama and tested on Ubuntu 24.04 LTS.
What changed in 2026 — read this first if you followed a 2025 guideTwo distinct projects now dominate this space: the originallangchain-ai/local-deep-researcher(lightweight, LangGraph-based) and the community forkLearningCircuit/local-deep-research(v1.6.6 as of April 29 2026, with a full web UI, REST API, and AES-256 encrypted databases). Know which one you are installing.Recommended model changed: Gemma 3 12B → Gemma 4 E4B or 12B. Google released Gemma 4 on April 2, 2026. The E4B (4.5B effective parameters, 9.6 GB download) outperforms the old 12B on most reasoning tasks and runs comfortably in 12 GB VRAM. The 26B MoE variant (18 GB download) is now the high-end local option.Ollama v0.22.0 (April 28, 2026) is the current stable release. It adds model-batching support, Gemma 4 tool-calling, and MLX runner improvements for Apple Silicon. Linux GPU users are unaffected by MLX changes but benefit from the batching patch.Python 3.12 is the new minimum forlocal-deep-research(pip package). Ubuntu 24.04 ships Python 3.12 by default, so no workarounds are needed. The originallangchain-airepo still requires Python 3.11+.Docker Compose is now the recommended install path forlocal-deep-research, pulling Ollama, SearXNG, and the application in a single command. Bare-metal pip install is still supported but requires SQLCipher system libraries.DuckDuckGo is now the default search backend in both projects (no API key required). SearXNG self-hosting is still recommended for high-volume or privacy-critical workloads.
Want the full picture? Read our continuously-updated Gemma 4 Complete Guide (2026) — small-footprint open weights, on-device deployment, and benchmarks.
TL;DR: Which project should you run?
| Factor | langchain-ai/local-deep-researcher | LearningCircuit/local-deep-research |
|---|---|---|
| Install complexity | Low (git clone + pip) | Low (Docker Compose) / Medium (pip) |
| Web UI | LangGraph Studio (separate install) | Built-in at localhost:5000 |
| Search sources | DuckDuckGo, SearXNG, Tavily, Perplexity | 20+ (arXiv, PubMed, Wikipedia, Semantic Scholar, web, private docs) |
| Multi-user | No | Yes (RBAC, per-user encrypted DB) |
| SimpleQA benchmark | Not published | ~95% (GPT-4.1-mini + SearXNG, focused-iteration) |
| Best for | Developers, scripting, integration | Power users, teams, research workflows |
Prerequisites
- OS: Ubuntu 24.04 LTS (Noble Numbat) — ships Python 3.12, tested configuration
- RAM: 16 GB minimum; 32 GB recommended for 26B+ models
- Storage: 25 GB free (models range from 9.6 GB to 20 GB)
- GPU: NVIDIA GPU with 12 GB+ VRAM strongly recommended (RTX 3060 12 GB handles Gemma 4 E4B well); CPU-only is possible but expect 3–8 tokens/second
- Software: Git, Python 3.12 (pre-installed on Ubuntu 24.04), Docker + Docker Compose (for the LearningCircuit variant)
Optional: NVIDIA CUDA setup (for GPU acceleration)
Ollama auto-detects your GPU on installation. If you want to verify CUDA is available:
nvidia-smi
# Should show your GPU and CUDA version (12.x or higher)
If CUDA is missing, add the NVIDIA APT repository and install:
sudo apt update
sudo apt install -y nvidia-driver-570 cuda-toolkit-12-8
# Reboot after driver install
sudo reboot
Ollama handles the rest — you do not need to manually configure CUDA paths for inference.
Step 1 — Install Ollama (v0.22.0)
Ollama manages model downloads, versioning, and serves a local OpenAI-compatible API at localhost:11434.
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable --now ollama
ollama --version
# Expected: ollama version 0.22.0 (or newer)
Verify the service is running:
curl http://localhost:11434/api/version
# Returns: {"version":"0.22.0"}
Step 2 — Pull a language model
Gemma 4 is now the recommended model family for this workflow. Choose based on your VRAM:
| Model | Pull command | Download size | Min VRAM | Use case |
|---|---|---|---|---|
| Gemma 4 E4B (4.5B eff.) | ollama pull gemma4:e4b |
9.6 GB | 10 GB | Laptops, 12 GB GPUs |
| Gemma 4 26B MoE | ollama pull gemma4:26b |
18 GB | 20 GB | Workstations, 24 GB GPUs |
| Gemma 4 31B Dense | ollama pull gemma4:31b |
20 GB | 24 GB | High-end workstations |
| Qwen 3 8B (alt.) | ollama pull qwen3:8b |
~5 GB | 8 GB | Low-VRAM alternative |
For most Ubuntu workstations, start with Gemma 4 E4B:
ollama pull gemma4:e4b
# Or for higher-end machines:
ollama pull gemma4:26b
Gemma 4 supports native function-calling and a 128K–256K context window, which significantly improves the multi-hop research loops these tools rely on.
Option A — langchain-ai/local-deep-researcher (lightweight)
This is the original LangGraph-based project. No database, no user accounts — just a Python process that loops through search, summarize, and gap-fill cycles.
Install
sudo apt update && sudo apt install -y python3 python3-pip python3-venv git
git clone https://github.com/langchain-ai/local-deep-researcher.git
cd local-deep-researcher
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Configure
Copy the example environment file and edit it:
cp .env.example .env
Key variables in .env:
LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
LOCAL_LLM=gemma4:e4b
# Search backend — DuckDuckGo requires no API key
SEARCH_API=duckduckgo
# Or self-hosted SearXNG (recommended for privacy):
# SEARCH_API=searxng
# SEARXNG_INSTANCE=http://localhost:8080
# Number of iterative research cycles (3–10)
MAX_WEB_RESEARCH_LOOPS=5
FETCH_FULL_PAGE=true
Run
Start the LangGraph development server:
python -m local_deep_researcher.main --topic "Current state of nuclear fusion energy 2026" --cycles 5
Or launch the web UI through LangGraph Studio (requires LangGraph CLI):
pip install langgraph-cli
langgraph dev
# Opens studio UI in your browser
Option B — LearningCircuit/local-deep-research (full-featured)
This community fork has evolved significantly. Version 1.6.6 (April 29, 2026) adds journal quality scoring, a 212K+ indexed academic source database, and predatory journal auto-removal. It is the better choice for teams or serious research workflows. It's also what many r/LocalLLaMA practitioners reach for when they need academic source coverage alongside web results.
If you want a broader local AI agent setup alongside this tool, see the OpenClaw + Ollama setup guide for running local AI agents — it covers model routing and multi-agent orchestration that pairs well with local research tools.
Install via Docker Compose (recommended)
sudo apt install -y docker.io docker-compose-plugin
sudo usermod -aG docker $USER
# Log out and back in, then:
git clone https://github.com/LearningCircuit/local-deep-research.git
cd local-deep-research
docker compose up -d
This starts three containers: the application server, Ollama (pre-configured to pull Gemma 4 E4B on first run), and SearXNG for private web search. The web UI is available at http://localhost:5000.
Enable GPU acceleration in Docker
# Install NVIDIA Container Toolkit first
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Run with GPU override:
docker compose -f docker-compose.yml -f docker-compose.gpu.override.yml up -d
Install via pip (bare-metal)
Ubuntu 24.04 requires SQLCipher system libraries before the pip install:
sudo apt-get update
sudo apt-get install -y sqlcipher libsqlcipher0 libsqlcipher-dev
python3 -m venv .venv
source .venv/bin/activate
pip install local-deep-research
# Or with MCP server support for Claude Desktop:
pip install "local-deep-research[mcp]"
local-deep-research
# Opens at http://localhost:5000
Configure Ollama endpoint
When running via Docker Compose, reference Ollama by container hostname, not localhost:
OLLAMA_BASE_URL=http://ollama:11434
LOCAL_LLM=gemma4:e4b
SEARXNG_INSTANCE=http://searxng:8080
For bare-metal installs, use localhost:11434 as normal.
Search backend options
| Backend | Setup effort | Privacy | API key required | Best for |
|---|---|---|---|---|
| DuckDuckGo | None | Medium | No | Quick start |
| SearXNG (self-hosted) | Low (Docker) | High | No | Privacy-critical workloads |
| Brave Search | Low | High | Yes (free tier) | High-quality web results |
| Tavily | Low | Low (cloud) | Yes (paid) | Best raw search quality |
| arXiv / PubMed | None (built-in) | High | No | Academic research (LDR only) |
Self-hosting SearXNG alongside local-deep-research is the recommended combination for data sovereignty. The Docker Compose stack in the LearningCircuit repo bundles SearXNG automatically.
Performance and benchmarks
Model inference speed on Ubuntu (single user)
| Model | Hardware | Tokens/sec (decode) |
|---|---|---|
| Gemma 4 E4B (Q4_K_M) | RTX 4070 12 GB | 55–75 |
| Gemma 4 26B MoE (Q4_K_M) | RTX 4090 24 GB | 35–50 |
| Gemma 4 E4B (CPU only) | Ryzen 9 7950X, 64 GB RAM | 3–8 |
| Qwen 3 8B (Q4_K_M) | RTX 3060 12 GB | 60–80 |
For a 5-cycle research job, expect 3–8 minutes GPU, 20–45 minutes CPU-only at the above rates. The bottleneck is usually web search latency, not the LLM itself.
Accuracy benchmark
The LearningCircuit project publishes community benchmark results. As of April 2026, the focused-iteration strategy with GPT-4.1-mini + SearXNG achieves ~95% on the SimpleQA benchmark. For fully local runs (Gemma 4 E4B + SearXNG), community results cluster around 72–78% — significantly better than zero-shot prompting but below cloud-model baselines. Submit your own benchmarks at LearningCircuit/ldr-benchmarks.
Gemma 4 model benchmarks (Google, April 2026)
The flagship Gemma 4 31B achieves MMLU-Pro 85.2%, AIME 2026 89.2%, GPQA Diamond 84.3%, and LiveCodeBench v6 80.0%, ranking third among open models on the LMSys Arena leaderboard as of its release date. The E4B and 26B MoE variants trade some accuracy for substantially lower VRAM requirements.
How to choose: decision guide
- Solo developer, scripting-first: Use
langchain-ai/local-deep-researcher. Clone, configure, run. No database overhead. - Research team or multi-user setup: Use
LearningCircuit/local-deep-researchvia Docker Compose. RBAC, encrypted per-user databases, and analytics dashboards are built in. - Academic research focus: Use
local-deep-research— it integrates arXiv, PubMed, Semantic Scholar, and applies journal quality scoring to filter predatory journals. - Fully offline, maximum privacy: Either tool works; pair with a self-hosted SearXNG instance and avoid Tavily/Brave.
- Low VRAM (8–12 GB): Use Gemma 4 E4B or Qwen 3 8B. Both support tool calling in Ollama v0.22+.
- Need Claude Desktop integration: Install
local-deep-research[mcp]— the MCP server connects directly to the Claude Desktop app.
If your team needs to hire someone to build or maintain a local AI research pipeline, Codersera's vetted remote AI engineers can get a production-grade setup running without the trial-and-error.
Advanced configuration
Custom academic search engines (langchain-ai variant)
Edit config/search_engines.yaml to add or weight sources:
arxiv:
name: arXiv
url: https://arxiv.org/search/?query={query}&searchtype=all
parser: academic
weight: 0.9
semantic_scholar:
name: Semantic Scholar
url: https://api.semanticscholar.org/graph/v1/paper/search?query={query}
parser: academic
weight: 0.8
Research modes in local-deep-research v1.6.x
The LearningCircuit variant exposes four modes via the web UI and REST API:
- Quick Summary (30 sec – 3 min): 1–2 search cycles, brief synthesis
- Detailed Research (3–10 min): 3–7 cycles with gap analysis
- Report Generation: Full markdown report with citations and source scoring
- Document Analysis: Interrogate private PDFs and local files
Troubleshooting common issues
| Problem | Cause | Solution |
|---|---|---|
| OOM / model crashes | Model too large for VRAM | Switch to gemma4:e4b or qwen3:8b |
| Ollama not reachable from Docker | Container networking | Use http://ollama:11434, not localhost |
| Port 5000 already in use | Flask default port conflict | Set -e LDR_WEB_PORT=8001 in Docker run |
| SQLCipher install fails (pip) | Missing system libraries | sudo apt install libsqlcipher0 libsqlcipher-dev |
| Slow generation (CPU-only) | No GPU available | Reduce MAX_WEB_RESEARCH_LOOPS to 3; or install CUDA |
| Citation errors / missing sources | Config flag not set | Set cite_sources: true in config.yaml |
| DuckDuckGo rate limiting | Too many requests | Switch to SearXNG self-hosted or add Brave API key |
| Python version mismatch | System Python < 3.12 | Ubuntu 24.04 ships 3.12; on older Ubuntu use pyenv |
What was removed from earlier guides and why
Earlier 2025 guides recommended building llama-cpp-python with CMAKE_ARGS="-DLLAMA_CUBLAS=on" for GPU acceleration. This is no longer needed — Ollama handles GPU acceleration natively and ships its own optimized llama.cpp backend. Do not run a separate llama-cpp-python build alongside Ollama; it will conflict.
The python -m local_deep_research.web.app launch command from pre-1.5 versions is deprecated. Use local-deep-research (the entry point installed by pip) or Docker Compose.
FAQ
Can I run local deep researcher without a GPU on Ubuntu?
Yes. Ollama runs on CPU when no GPU is detected. With a modern 12-core CPU and 32 GB RAM, Gemma 4 E4B generates 3–8 tokens/second — slow but functional for occasional research jobs. For daily use, a GPU is strongly recommended.
What is the difference between local-deep-researcher and local-deep-research?
langchain-ai/local-deep-researcher is the original LangChain team project: a minimal Python script using LangGraph for workflow management. LearningCircuit/local-deep-research is an independent community project that has grown substantially — it adds a full web UI, multi-user support, 20+ search sources, encrypted databases, and a REST API. Both use Ollama for local inference.
Which model should I use with Ollama for research tasks in 2026?
Gemma 4 E4B is the recommended starting point: 9.6 GB download, 128K context, native function-calling, and strong reasoning for its size class. If you have a 24 GB GPU, Gemma 4 26B MoE delivers meaningfully better accuracy. Qwen 3 8B is a solid alternative if you are already familiar with that model family.
Does local deep research work offline after setup?
The LLM inference runs fully offline once models are downloaded. The research workflow itself requires internet access for web search (DuckDuckGo, Brave, etc.) unless you configure it to query only local documents or a self-hosted SearXNG instance that is itself air-gapped. Purely local document analysis (the Document Analysis mode in LearningCircuit's app) works offline.
How do I point local-deep-researcher at Gemma 4 instead of Gemma 3?
In your .env file (langchain-ai variant), set LOCAL_LLM=gemma4:e4b and ensure you have run ollama pull gemma4:e4b. In the LearningCircuit web UI, go to Settings → LLM Provider → Ollama and update the model name field. Restart the service after saving.
Can I connect local-deep-research to Claude Desktop?
Yes. Install with pip install "local-deep-research[mcp]" and configure Claude Desktop to connect to the local MCP server. This lets you trigger research jobs from within Claude Desktop while all processing stays local (the LLM inference hits your Ollama instance, not Anthropic's API).
How many research cycles should I configure?
The default of 3 cycles is a good starting point for quick answers. For thorough research, 5–7 cycles produce substantially better coverage with diminishing returns past 8. Each cycle adds 1–3 minutes of wall time on a GPU machine. Set MAX_WEB_RESEARCH_LOOPS=5 in your .env for a good balance.
Is SearXNG mandatory?
No. Both projects default to DuckDuckGo (no API key, no self-hosted instance needed). SearXNG self-hosting is recommended when you need complete search privacy, higher rate limits, or custom source weighting — but it is optional for getting started.
References and further reading
- langchain-ai/local-deep-researcher — GitHub repository
- LearningCircuit/local-deep-research — GitHub repository (v1.6.6)
- local-deep-research on PyPI — latest version and install instructions
- Gemma 4 model family on Ollama library
- Gemma 4 — Google DeepMind official release page
- Ollama v0.22.0 release notes — GitHub
- SearXNG setup guide for local-deep-research — GitHub docs
- LearningCircuit local-deep-research installation guide — DeepWiki