Running Local Deep Researcher with Ollama on Ubuntu (2026 Guide)

Running Local Deep Researcher with Ollama on Ubuntu (2026 Guide)

Last updated April 2026 — refreshed for current model/tool versions.

Running a local deep research assistant on Ubuntu gives you iterative, citation-backed reports without sending your data to any cloud. This guide covers two production-ready options — the lightweight langchain-ai/local-deep-researcher (Python script, minimal dependencies) and the feature-rich LearningCircuit/local-deep-research (v1.6.6, web UI, encrypted databases, 20+ search sources) — both powered by Ollama and tested on Ubuntu 24.04 LTS.

What changed in 2026 — read this first if you followed a 2025 guideTwo distinct projects now dominate this space: the original langchain-ai/local-deep-researcher (lightweight, LangGraph-based) and the community fork LearningCircuit/local-deep-research (v1.6.6 as of April 29 2026, with a full web UI, REST API, and AES-256 encrypted databases). Know which one you are installing.Recommended model changed: Gemma 3 12B → Gemma 4 E4B or 12B. Google released Gemma 4 on April 2, 2026. The E4B (4.5B effective parameters, 9.6 GB download) outperforms the old 12B on most reasoning tasks and runs comfortably in 12 GB VRAM. The 26B MoE variant (18 GB download) is now the high-end local option.Ollama v0.22.0 (April 28, 2026) is the current stable release. It adds model-batching support, Gemma 4 tool-calling, and MLX runner improvements for Apple Silicon. Linux GPU users are unaffected by MLX changes but benefit from the batching patch.Python 3.12 is the new minimum for local-deep-research (pip package). Ubuntu 24.04 ships Python 3.12 by default, so no workarounds are needed. The original langchain-ai repo still requires Python 3.11+.Docker Compose is now the recommended install path for local-deep-research, pulling Ollama, SearXNG, and the application in a single command. Bare-metal pip install is still supported but requires SQLCipher system libraries.DuckDuckGo is now the default search backend in both projects (no API key required). SearXNG self-hosting is still recommended for high-volume or privacy-critical workloads.

Want the full picture? Read our continuously-updated Gemma 4 Complete Guide (2026) — small-footprint open weights, on-device deployment, and benchmarks.

TL;DR: Which project should you run?

Factor langchain-ai/local-deep-researcher LearningCircuit/local-deep-research
Install complexity Low (git clone + pip) Low (Docker Compose) / Medium (pip)
Web UI LangGraph Studio (separate install) Built-in at localhost:5000
Search sources DuckDuckGo, SearXNG, Tavily, Perplexity 20+ (arXiv, PubMed, Wikipedia, Semantic Scholar, web, private docs)
Multi-user No Yes (RBAC, per-user encrypted DB)
SimpleQA benchmark Not published ~95% (GPT-4.1-mini + SearXNG, focused-iteration)
Best for Developers, scripting, integration Power users, teams, research workflows

Prerequisites

  • OS: Ubuntu 24.04 LTS (Noble Numbat) — ships Python 3.12, tested configuration
  • RAM: 16 GB minimum; 32 GB recommended for 26B+ models
  • Storage: 25 GB free (models range from 9.6 GB to 20 GB)
  • GPU: NVIDIA GPU with 12 GB+ VRAM strongly recommended (RTX 3060 12 GB handles Gemma 4 E4B well); CPU-only is possible but expect 3–8 tokens/second
  • Software: Git, Python 3.12 (pre-installed on Ubuntu 24.04), Docker + Docker Compose (for the LearningCircuit variant)

Optional: NVIDIA CUDA setup (for GPU acceleration)

Ollama auto-detects your GPU on installation. If you want to verify CUDA is available:

nvidia-smi
# Should show your GPU and CUDA version (12.x or higher)

If CUDA is missing, add the NVIDIA APT repository and install:

sudo apt update
sudo apt install -y nvidia-driver-570 cuda-toolkit-12-8
# Reboot after driver install
sudo reboot

Ollama handles the rest — you do not need to manually configure CUDA paths for inference.

Step 1 — Install Ollama (v0.22.0)

Ollama manages model downloads, versioning, and serves a local OpenAI-compatible API at localhost:11434.

curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable --now ollama
ollama --version
# Expected: ollama version 0.22.0 (or newer)

Verify the service is running:

curl http://localhost:11434/api/version
# Returns: {"version":"0.22.0"}

Step 2 — Pull a language model

Gemma 4 is now the recommended model family for this workflow. Choose based on your VRAM:

Model Pull command Download size Min VRAM Use case
Gemma 4 E4B (4.5B eff.) ollama pull gemma4:e4b 9.6 GB 10 GB Laptops, 12 GB GPUs
Gemma 4 26B MoE ollama pull gemma4:26b 18 GB 20 GB Workstations, 24 GB GPUs
Gemma 4 31B Dense ollama pull gemma4:31b 20 GB 24 GB High-end workstations
Qwen 3 8B (alt.) ollama pull qwen3:8b ~5 GB 8 GB Low-VRAM alternative

For most Ubuntu workstations, start with Gemma 4 E4B:

ollama pull gemma4:e4b
# Or for higher-end machines:
ollama pull gemma4:26b

Gemma 4 supports native function-calling and a 128K–256K context window, which significantly improves the multi-hop research loops these tools rely on.

Option A — langchain-ai/local-deep-researcher (lightweight)

This is the original LangGraph-based project. No database, no user accounts — just a Python process that loops through search, summarize, and gap-fill cycles.

Install

sudo apt update && sudo apt install -y python3 python3-pip python3-venv git

git clone https://github.com/langchain-ai/local-deep-researcher.git
cd local-deep-researcher

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Configure

Copy the example environment file and edit it:

cp .env.example .env

Key variables in .env:

LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
LOCAL_LLM=gemma4:e4b

# Search backend — DuckDuckGo requires no API key
SEARCH_API=duckduckgo

# Or self-hosted SearXNG (recommended for privacy):
# SEARCH_API=searxng
# SEARXNG_INSTANCE=http://localhost:8080

# Number of iterative research cycles (3–10)
MAX_WEB_RESEARCH_LOOPS=5
FETCH_FULL_PAGE=true

Run

Start the LangGraph development server:

python -m local_deep_researcher.main --topic "Current state of nuclear fusion energy 2026" --cycles 5

Or launch the web UI through LangGraph Studio (requires LangGraph CLI):

pip install langgraph-cli
langgraph dev
# Opens studio UI in your browser

This community fork has evolved significantly. Version 1.6.6 (April 29, 2026) adds journal quality scoring, a 212K+ indexed academic source database, and predatory journal auto-removal. It is the better choice for teams or serious research workflows. It's also what many r/LocalLLaMA practitioners reach for when they need academic source coverage alongside web results.

If you want a broader local AI agent setup alongside this tool, see the OpenClaw + Ollama setup guide for running local AI agents — it covers model routing and multi-agent orchestration that pairs well with local research tools.

sudo apt install -y docker.io docker-compose-plugin
sudo usermod -aG docker $USER
# Log out and back in, then:

git clone https://github.com/LearningCircuit/local-deep-research.git
cd local-deep-research
docker compose up -d

This starts three containers: the application server, Ollama (pre-configured to pull Gemma 4 E4B on first run), and SearXNG for private web search. The web UI is available at http://localhost:5000.

Enable GPU acceleration in Docker

# Install NVIDIA Container Toolkit first
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Run with GPU override:
docker compose -f docker-compose.yml -f docker-compose.gpu.override.yml up -d

Install via pip (bare-metal)

Ubuntu 24.04 requires SQLCipher system libraries before the pip install:

sudo apt-get update
sudo apt-get install -y sqlcipher libsqlcipher0 libsqlcipher-dev

python3 -m venv .venv
source .venv/bin/activate
pip install local-deep-research
# Or with MCP server support for Claude Desktop:
pip install "local-deep-research[mcp]"

local-deep-research
# Opens at http://localhost:5000

Configure Ollama endpoint

When running via Docker Compose, reference Ollama by container hostname, not localhost:

OLLAMA_BASE_URL=http://ollama:11434
LOCAL_LLM=gemma4:e4b
SEARXNG_INSTANCE=http://searxng:8080

For bare-metal installs, use localhost:11434 as normal.

Search backend options

Backend Setup effort Privacy API key required Best for
DuckDuckGo None Medium No Quick start
SearXNG (self-hosted) Low (Docker) High No Privacy-critical workloads
Brave Search Low High Yes (free tier) High-quality web results
Tavily Low Low (cloud) Yes (paid) Best raw search quality
arXiv / PubMed None (built-in) High No Academic research (LDR only)

Self-hosting SearXNG alongside local-deep-research is the recommended combination for data sovereignty. The Docker Compose stack in the LearningCircuit repo bundles SearXNG automatically.

Performance and benchmarks

Model inference speed on Ubuntu (single user)

Model Hardware Tokens/sec (decode)
Gemma 4 E4B (Q4_K_M) RTX 4070 12 GB 55–75
Gemma 4 26B MoE (Q4_K_M) RTX 4090 24 GB 35–50
Gemma 4 E4B (CPU only) Ryzen 9 7950X, 64 GB RAM 3–8
Qwen 3 8B (Q4_K_M) RTX 3060 12 GB 60–80

For a 5-cycle research job, expect 3–8 minutes GPU, 20–45 minutes CPU-only at the above rates. The bottleneck is usually web search latency, not the LLM itself.

Accuracy benchmark

The LearningCircuit project publishes community benchmark results. As of April 2026, the focused-iteration strategy with GPT-4.1-mini + SearXNG achieves ~95% on the SimpleQA benchmark. For fully local runs (Gemma 4 E4B + SearXNG), community results cluster around 72–78% — significantly better than zero-shot prompting but below cloud-model baselines. Submit your own benchmarks at LearningCircuit/ldr-benchmarks.

Gemma 4 model benchmarks (Google, April 2026)

The flagship Gemma 4 31B achieves MMLU-Pro 85.2%, AIME 2026 89.2%, GPQA Diamond 84.3%, and LiveCodeBench v6 80.0%, ranking third among open models on the LMSys Arena leaderboard as of its release date. The E4B and 26B MoE variants trade some accuracy for substantially lower VRAM requirements.

How to choose: decision guide

  • Solo developer, scripting-first: Use langchain-ai/local-deep-researcher. Clone, configure, run. No database overhead.
  • Research team or multi-user setup: Use LearningCircuit/local-deep-research via Docker Compose. RBAC, encrypted per-user databases, and analytics dashboards are built in.
  • Academic research focus: Use local-deep-research — it integrates arXiv, PubMed, Semantic Scholar, and applies journal quality scoring to filter predatory journals.
  • Fully offline, maximum privacy: Either tool works; pair with a self-hosted SearXNG instance and avoid Tavily/Brave.
  • Low VRAM (8–12 GB): Use Gemma 4 E4B or Qwen 3 8B. Both support tool calling in Ollama v0.22+.
  • Need Claude Desktop integration: Install local-deep-research[mcp] — the MCP server connects directly to the Claude Desktop app.

If your team needs to hire someone to build or maintain a local AI research pipeline, Codersera's vetted remote AI engineers can get a production-grade setup running without the trial-and-error.

Advanced configuration

Custom academic search engines (langchain-ai variant)

Edit config/search_engines.yaml to add or weight sources:

arxiv:
  name: arXiv
  url: https://arxiv.org/search/?query={query}&searchtype=all
  parser: academic
  weight: 0.9

semantic_scholar:
  name: Semantic Scholar
  url: https://api.semanticscholar.org/graph/v1/paper/search?query={query}
  parser: academic
  weight: 0.8

Research modes in local-deep-research v1.6.x

The LearningCircuit variant exposes four modes via the web UI and REST API:

  • Quick Summary (30 sec – 3 min): 1–2 search cycles, brief synthesis
  • Detailed Research (3–10 min): 3–7 cycles with gap analysis
  • Report Generation: Full markdown report with citations and source scoring
  • Document Analysis: Interrogate private PDFs and local files

Troubleshooting common issues

Problem Cause Solution
OOM / model crashes Model too large for VRAM Switch to gemma4:e4b or qwen3:8b
Ollama not reachable from Docker Container networking Use http://ollama:11434, not localhost
Port 5000 already in use Flask default port conflict Set -e LDR_WEB_PORT=8001 in Docker run
SQLCipher install fails (pip) Missing system libraries sudo apt install libsqlcipher0 libsqlcipher-dev
Slow generation (CPU-only) No GPU available Reduce MAX_WEB_RESEARCH_LOOPS to 3; or install CUDA
Citation errors / missing sources Config flag not set Set cite_sources: true in config.yaml
DuckDuckGo rate limiting Too many requests Switch to SearXNG self-hosted or add Brave API key
Python version mismatch System Python < 3.12 Ubuntu 24.04 ships 3.12; on older Ubuntu use pyenv

What was removed from earlier guides and why

Earlier 2025 guides recommended building llama-cpp-python with CMAKE_ARGS="-DLLAMA_CUBLAS=on" for GPU acceleration. This is no longer needed — Ollama handles GPU acceleration natively and ships its own optimized llama.cpp backend. Do not run a separate llama-cpp-python build alongside Ollama; it will conflict.

The python -m local_deep_research.web.app launch command from pre-1.5 versions is deprecated. Use local-deep-research (the entry point installed by pip) or Docker Compose.

FAQ

Can I run local deep researcher without a GPU on Ubuntu?

Yes. Ollama runs on CPU when no GPU is detected. With a modern 12-core CPU and 32 GB RAM, Gemma 4 E4B generates 3–8 tokens/second — slow but functional for occasional research jobs. For daily use, a GPU is strongly recommended.

What is the difference between local-deep-researcher and local-deep-research?

langchain-ai/local-deep-researcher is the original LangChain team project: a minimal Python script using LangGraph for workflow management. LearningCircuit/local-deep-research is an independent community project that has grown substantially — it adds a full web UI, multi-user support, 20+ search sources, encrypted databases, and a REST API. Both use Ollama for local inference.

Which model should I use with Ollama for research tasks in 2026?

Gemma 4 E4B is the recommended starting point: 9.6 GB download, 128K context, native function-calling, and strong reasoning for its size class. If you have a 24 GB GPU, Gemma 4 26B MoE delivers meaningfully better accuracy. Qwen 3 8B is a solid alternative if you are already familiar with that model family.

Does local deep research work offline after setup?

The LLM inference runs fully offline once models are downloaded. The research workflow itself requires internet access for web search (DuckDuckGo, Brave, etc.) unless you configure it to query only local documents or a self-hosted SearXNG instance that is itself air-gapped. Purely local document analysis (the Document Analysis mode in LearningCircuit's app) works offline.

How do I point local-deep-researcher at Gemma 4 instead of Gemma 3?

In your .env file (langchain-ai variant), set LOCAL_LLM=gemma4:e4b and ensure you have run ollama pull gemma4:e4b. In the LearningCircuit web UI, go to Settings → LLM Provider → Ollama and update the model name field. Restart the service after saving.

Can I connect local-deep-research to Claude Desktop?

Yes. Install with pip install "local-deep-research[mcp]" and configure Claude Desktop to connect to the local MCP server. This lets you trigger research jobs from within Claude Desktop while all processing stays local (the LLM inference hits your Ollama instance, not Anthropic's API).

How many research cycles should I configure?

The default of 3 cycles is a good starting point for quick answers. For thorough research, 5–7 cycles produce substantially better coverage with diminishing returns past 8. Each cycle adds 1–3 minutes of wall time on a GPU machine. Set MAX_WEB_RESEARCH_LOOPS=5 in your .env for a good balance.

Is SearXNG mandatory?

No. Both projects default to DuckDuckGo (no API key, no self-hosted instance needed). SearXNG self-hosting is recommended when you need complete search privacy, higher rate limits, or custom source weighting — but it is optional for getting started.

References and further reading

  1. langchain-ai/local-deep-researcher — GitHub repository
  2. LearningCircuit/local-deep-research — GitHub repository (v1.6.6)
  3. local-deep-research on PyPI — latest version and install instructions
  4. Gemma 4 model family on Ollama library
  5. Gemma 4 — Google DeepMind official release page
  6. Ollama v0.22.0 release notes — GitHub
  7. SearXNG setup guide for local-deep-research — GitHub docs
  8. LearningCircuit local-deep-research installation guide — DeepWiki