Running Local Deep Researcher with Ollama on Ubuntu (2026 Guide)

Published 03 Apr 2025 • Updated 20 Jul 2026 • 10 min read

Quick answer. To run a local deep researcher on Ubuntu, install Ollama v0.22, pull Gemma 4 E4B, then choose langchain-ai/local-deep-researcher (lightweight, scriptable) or LearningCircuit/local-deep-research v1.6.6 (Docker Compose, web UI, 20+ search sources). Both run fully offline on Ubuntu 24.04 with DuckDuckGo or self-hosted SearXNG search.

Last updated April 2026 — refreshed for current model/tool versions.

Running a local deep research assistant on Ubuntu gives you iterative, citation-backed reports without sending your data to any cloud. This guide covers two production-ready options — the lightweight langchain-ai/local-deep-researcher (Python script, minimal dependencies) and the feature-rich LearningCircuit/local-deep-research (v1.6.6, web UI, encrypted databases, 20+ search sources) — both powered by Ollama and tested on Ubuntu 24.04 LTS.

What changed in 2026 — read this first if you followed a 2025 guideTwo distinct projects now dominate this space: the original langchain-ai/local-deep-researcher (lightweight, LangGraph-based) and the community fork LearningCircuit/local-deep-research (v1.6.6 as of April 29 2026, with a full web UI, REST API, and AES-256 encrypted databases). Know which one you are installing.Recommended model changed: Gemma 3 12B → Gemma 4 E4B or 12B. Google released Gemma 4 on April 2, 2026. The E4B (4.5B effective parameters, 9.6 GB download) outperforms the old 12B on most reasoning tasks and runs comfortably in 12 GB VRAM. The 26B MoE variant (18 GB download) is now the high-end local option.Ollama v0.22.0 (April 28, 2026) is the current stable release. It adds model-batching support, Gemma 4 tool-calling, and MLX runner improvements for Apple Silicon. Linux GPU users are unaffected by MLX changes but benefit from the batching patch.Python 3.12 is the new minimum for local-deep-research (pip package). Ubuntu 24.04 ships Python 3.12 by default, so no workarounds are needed. The original langchain-ai repo still requires Python 3.11+.Docker Compose is now the recommended install path for local-deep-research, pulling Ollama, SearXNG, and the application in a single command. Bare-metal pip install is still supported but requires SQLCipher system libraries.DuckDuckGo is now the default search backend in both projects (no API key required). SearXNG self-hosting is still recommended for high-volume or privacy-critical workloads.

Want the full picture? Read our continuously-updated Gemma 4 Complete Guide (2026) — small-footprint open weights, on-device deployment, and benchmarks.

Want the full picture? Read our continuously-updated Self-Hosting LLMs — For the bigger picture on running large language models on your own hardware — sizing, quantization, serving stacks, and cost trade-offs — see our complete guide to self-hosting LLMs..

TL;DR: Which project should you run?

Factor	langchain-ai/local-deep-researcher	LearningCircuit/local-deep-research
Install complexity	Low (git clone + pip)	Low (Docker Compose) / Medium (pip)
Web UI	LangGraph Studio (separate install)	Built-in at localhost:5000
Search sources	DuckDuckGo, SearXNG, Tavily, Perplexity	20+ (arXiv, PubMed, Wikipedia, Semantic Scholar, web, private docs)
Multi-user	No	Yes (RBAC, per-user encrypted DB)
SimpleQA benchmark	Not published	~95% (GPT-4.1-mini + SearXNG, focused-iteration)
Best for	Developers, scripting, integration	Power users, teams, research workflows

Prerequisites

OS: Ubuntu 24.04 LTS (Noble Numbat) — ships Python 3.12, tested configuration
RAM: 16 GB minimum; 32 GB recommended for 26B+ models
Storage: 25 GB free (models range from 9.6 GB to 20 GB)
GPU: NVIDIA GPU with 12 GB+ VRAM strongly recommended (RTX 3060 12 GB handles Gemma 4 E4B well); CPU-only is possible but expect 3–8 tokens/second
Software: Git, Python 3.12 (pre-installed on Ubuntu 24.04), Docker + Docker Compose (for the LearningCircuit variant)

Optional: NVIDIA CUDA setup (for GPU acceleration)

Ollama auto-detects your GPU on installation. If you want to verify CUDA is available:

nvidia-smi
# Should show your GPU and CUDA version (12.x or higher)

If CUDA is missing, add the NVIDIA APT repository and install:

sudo apt update
sudo apt install -y nvidia-driver-570 cuda-toolkit-12-8
# Reboot after driver install
sudo reboot

Ollama handles the rest — you do not need to manually configure CUDA paths for inference.

Step 1 — Install Ollama (v0.22.0)

Ollama manages model downloads, versioning, and serves a local OpenAI-compatible API at localhost:11434.

curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable --now ollama
ollama --version
# Expected: ollama version 0.22.0 (or newer)

Verify the service is running:

curl http://localhost:11434/api/version
# Returns: {"version":"0.22.0"}

Step 2 — Pull a language model

Gemma 4 is now the recommended model family for this workflow. Choose based on your VRAM:

Model	Pull command	Download size	Min VRAM	Use case
Gemma 4 E4B (4.5B eff.)	`ollama pull gemma4:e4b`	9.6 GB	10 GB	Laptops, 12 GB GPUs
Gemma 4 26B MoE	`ollama pull gemma4:26b`	18 GB	20 GB	Workstations, 24 GB GPUs
Gemma 4 31B Dense	`ollama pull gemma4:31b`	20 GB	24 GB	High-end workstations
Qwen 3 8B (alt.)	`ollama pull qwen3:8b`	~5 GB	8 GB	Low-VRAM alternative

For most Ubuntu workstations, start with Gemma 4 E4B:

ollama pull gemma4:e4b
# Or for higher-end machines:
ollama pull gemma4:26b

Gemma 4 supports native function-calling and a 128K–256K context window, which significantly improves the multi-hop research loops these tools rely on.

Option A — langchain-ai/local-deep-researcher (lightweight)

This is the original LangGraph-based project. No database, no user accounts — just a Python process that loops through search, summarize, and gap-fill cycles.

Install

sudo apt update && sudo apt install -y python3 python3-pip python3-venv git

git clone https://github.com/langchain-ai/local-deep-researcher.git
cd local-deep-researcher

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Configure

Copy the example environment file and edit it:

cp .env.example .env

Key variables in .env:

LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
LOCAL_LLM=gemma4:e4b

# Search backend — DuckDuckGo requires no API key
SEARCH_API=duckduckgo

# Or self-hosted SearXNG (recommended for privacy):
# SEARCH_API=searxng
# SEARXNG_INSTANCE=http://localhost:8080

# Number of iterative research cycles (3–10)
MAX_WEB_RESEARCH_LOOPS=5
FETCH_FULL_PAGE=true

Run

Start the LangGraph development server:

python -m local_deep_researcher.main --topic "Current state of nuclear fusion energy 2026" --cycles 5

Or launch the web UI through LangGraph Studio (requires LangGraph CLI):

pip install langgraph-cli
langgraph dev
# Opens studio UI in your browser

Option B — LearningCircuit/local-deep-research (full-featured)

This community fork has evolved significantly. Version 1.6.6 (April 29, 2026) adds journal quality scoring, a 212K+ indexed academic source database, and predatory journal auto-removal. It is the better choice for teams or serious research workflows. It's also what many r/LocalLLaMA practitioners reach for when they need academic source coverage alongside web results.

If you want a broader local AI agent setup alongside this tool, see the OpenClaw + Ollama setup guide for running local AI agents — it covers model routing and multi-agent orchestration that pairs well with local research tools.

Install via Docker Compose (recommended)

sudo apt install -y docker.io docker-compose-plugin
sudo usermod -aG docker $USER
# Log out and back in, then:

git clone https://github.com/LearningCircuit/local-deep-research.git
cd local-deep-research
docker compose up -d

This starts three containers: the application server, Ollama (pre-configured to pull Gemma 4 E4B on first run), and SearXNG for private web search. The web UI is available at http://localhost:5000.

Enable GPU acceleration in Docker

# Install NVIDIA Container Toolkit first
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Run with GPU override:
docker compose -f docker-compose.yml -f docker-compose.gpu.override.yml up -d

Install via pip (bare-metal)

Ubuntu 24.04 requires SQLCipher system libraries before the pip install:

sudo apt-get update
sudo apt-get install -y sqlcipher libsqlcipher0 libsqlcipher-dev

python3 -m venv .venv
source .venv/bin/activate
pip install local-deep-research
# Or with MCP server support for Claude Desktop:
pip install "local-deep-research[mcp]"

local-deep-research
# Opens at http://localhost:5000

Configure Ollama endpoint

When running via Docker Compose, reference Ollama by container hostname, not localhost:

OLLAMA_BASE_URL=http://ollama:11434
LOCAL_LLM=gemma4:e4b
SEARXNG_INSTANCE=http://searxng:8080

For bare-metal installs, use localhost:11434 as normal.

Search backend options

Backend	Setup effort	Privacy	API key required	Best for
DuckDuckGo	None	Medium	No	Quick start
SearXNG (self-hosted)	Low (Docker)	High	No	Privacy-critical workloads
Brave Search	Low	High	Yes (free tier)	High-quality web results
Tavily	Low	Low (cloud)	Yes (paid)	Best raw search quality
arXiv / PubMed	None (built-in)	High	No	Academic research (LDR only)

Self-hosting SearXNG alongside local-deep-research is the recommended combination for data sovereignty. The Docker Compose stack in the LearningCircuit repo bundles SearXNG automatically.

Performance and benchmarks

Model inference speed on Ubuntu (single user)

Model	Hardware	Tokens/sec (decode)
Gemma 4 E4B (Q4_K_M)	RTX 4070 12 GB	55–75
Gemma 4 26B MoE (Q4_K_M)	RTX 4090 24 GB	35–50
Gemma 4 E4B (CPU only)	Ryzen 9 7950X, 64 GB RAM	3–8
Qwen 3 8B (Q4_K_M)	RTX 3060 12 GB	60–80

For a 5-cycle research job, expect 3–8 minutes GPU, 20–45 minutes CPU-only at the above rates. The bottleneck is usually web search latency, not the LLM itself.

Accuracy benchmark

The LearningCircuit project publishes community benchmark results. As of April 2026, the focused-iteration strategy with GPT-4.1-mini + SearXNG achieves ~95% on the SimpleQA benchmark. For fully local runs (Gemma 4 E4B + SearXNG), community results cluster around 72–78% — significantly better than zero-shot prompting but below cloud-model baselines. Submit your own benchmarks at LearningCircuit/ldr-benchmarks.

Gemma 4 model benchmarks (Google, April 2026)

The flagship Gemma 4 31B achieves MMLU-Pro 85.2%, AIME 2026 89.2%, GPQA Diamond 84.3%, and LiveCodeBench v6 80.0%, ranking third among open models on the LMSys Arena leaderboard as of its release date. The E4B and 26B MoE variants trade some accuracy for substantially lower VRAM requirements.

How to choose: decision guide

Solo developer, scripting-first: Use langchain-ai/local-deep-researcher. Clone, configure, run. No database overhead.
Research team or multi-user setup: Use LearningCircuit/local-deep-research via Docker Compose. RBAC, encrypted per-user databases, and analytics dashboards are built in.
Academic research focus: Use local-deep-research — it integrates arXiv, PubMed, Semantic Scholar, and applies journal quality scoring to filter predatory journals.
Fully offline, maximum privacy: Either tool works; pair with a self-hosted SearXNG instance and avoid Tavily/Brave.
Low VRAM (8–12 GB): Use Gemma 4 E4B or Qwen 3 8B. Both support tool calling in Ollama v0.22+.
Need Claude Desktop integration: Install local-deep-research[mcp] — the MCP server connects directly to the Claude Desktop app.

If your team needs to hire someone to build or maintain a local AI research pipeline, Codersera's vetted remote AI engineers can get a production-grade setup running without the trial-and-error.

Advanced configuration

Custom academic search engines (langchain-ai variant)

Edit config/search_engines.yaml to add or weight sources:

arxiv:
  name: arXiv
  url: https://arxiv.org/search/?query={query}&searchtype=all
  parser: academic
  weight: 0.9

semantic_scholar:
  name: Semantic Scholar
  url: https://api.semanticscholar.org/graph/v1/paper/search?query={query}
  parser: academic
  weight: 0.8

Research modes in local-deep-research v1.6.x

The LearningCircuit variant exposes four modes via the web UI and REST API:

Quick Summary (30 sec – 3 min): 1–2 search cycles, brief synthesis
Detailed Research (3–10 min): 3–7 cycles with gap analysis
Report Generation: Full markdown report with citations and source scoring
Document Analysis: Interrogate private PDFs and local files

Troubleshooting common issues

Problem	Cause	Solution
OOM / model crashes	Model too large for VRAM	Switch to `gemma4:e4b` or `qwen3:8b`
Ollama not reachable from Docker	Container networking	Use `http://ollama:11434`, not `localhost`
Port 5000 already in use	Flask default port conflict	Set `-e LDR_WEB_PORT=8001` in Docker run
SQLCipher install fails (pip)	Missing system libraries	`sudo apt install libsqlcipher0 libsqlcipher-dev`
Slow generation (CPU-only)	No GPU available	Reduce `MAX_WEB_RESEARCH_LOOPS` to 3; or install CUDA
Citation errors / missing sources	Config flag not set	Set `cite_sources: true` in config.yaml
DuckDuckGo rate limiting	Too many requests	Switch to SearXNG self-hosted or add Brave API key
Python version mismatch	System Python < 3.12	Ubuntu 24.04 ships 3.12; on older Ubuntu use `pyenv`

What was removed from earlier guides and why

Earlier 2025 guides recommended building llama-cpp-python with CMAKE_ARGS="-DLLAMA_CUBLAS=on" for GPU acceleration. This is no longer needed — Ollama handles GPU acceleration natively and ships its own optimized llama.cpp backend. Do not run a separate llama-cpp-python build alongside Ollama; it will conflict.

The python -m local_deep_research.web.app launch command from pre-1.5 versions is deprecated. Use local-deep-research (the entry point installed by pip) or Docker Compose.

FAQ

Can I run local deep researcher without a GPU on Ubuntu?

Yes. Ollama runs on CPU when no GPU is detected. With a modern 12-core CPU and 32 GB RAM, Gemma 4 E4B generates 3–8 tokens/second — slow but functional for occasional research jobs. For daily use, a GPU is strongly recommended.

What is the difference between local-deep-researcher and local-deep-research?

langchain-ai/local-deep-researcher is the original LangChain team project: a minimal Python script using LangGraph for workflow management. LearningCircuit/local-deep-research is an independent community project that has grown substantially — it adds a full web UI, multi-user support, 20+ search sources, encrypted databases, and a REST API. Both use Ollama for local inference.

Which model should I use with Ollama for research tasks in 2026?

Gemma 4 E4B is the recommended starting point: 9.6 GB download, 128K context, native function-calling, and strong reasoning for its size class. If you have a 24 GB GPU, Gemma 4 26B MoE delivers meaningfully better accuracy. Qwen 3 8B is a solid alternative if you are already familiar with that model family.

Does local deep research work offline after setup?

The LLM inference runs fully offline once models are downloaded. The research workflow itself requires internet access for web search (DuckDuckGo, Brave, etc.) unless you configure it to query only local documents or a self-hosted SearXNG instance that is itself air-gapped. Purely local document analysis (the Document Analysis mode in LearningCircuit's app) works offline.

How do I point local-deep-researcher at Gemma 4 instead of Gemma 3?

In your .env file (langchain-ai variant), set LOCAL_LLM=gemma4:e4b and ensure you have run ollama pull gemma4:e4b. In the LearningCircuit web UI, go to Settings → LLM Provider → Ollama and update the model name field. Restart the service after saving.

Can I connect local-deep-research to Claude Desktop?

Yes. Install with pip install "local-deep-research[mcp]" and configure Claude Desktop to connect to the local MCP server. This lets you trigger research jobs from within Claude Desktop while all processing stays local (the LLM inference hits your Ollama instance, not Anthropic's API).

How many research cycles should I configure?

The default of 3 cycles is a good starting point for quick answers. For thorough research, 5–7 cycles produce substantially better coverage with diminishing returns past 8. Each cycle adds 1–3 minutes of wall time on a GPU machine. Set MAX_WEB_RESEARCH_LOOPS=5 in your .env for a good balance.

Is SearXNG mandatory?

No. Both projects default to DuckDuckGo (no API key, no self-hosted instance needed). SearXNG self-hosting is recommended when you need complete search privacy, higher rate limits, or custom source weighting — but it is optional for getting started.