Running Ollama VIC-20 on Ubuntu: 2026 Comprehensive Guide
Last updated April 2026 — refreshed for current Ollama versions, Ubuntu 24.04, and 2026 model ecosystem.
Ollama VIC-20 is a sub-20 KB, zero-install JavaScript frontend for chatting with locally hosted large language models. Paired with Ollama v0.22 — the inference layer that now ships with native GPU support, image generation, and coding-agent integrations — it gives you a private, fast, no-cloud AI stack on any Ubuntu 22.04 or 24.04 machine in under ten minutes.
This guide walks through every step: installing Ollama on Ubuntu, enabling GPU acceleration, pulling current 2026-era models (Llama 4, Gemma 4, Qwen 3), and wiring up the VIC-20 frontend. It also covers the broader frontend landscape so you can decide whether VIC-20 is the right tool or whether you need something like Open WebUI or the OpenClaw + Ollama setup guide for running local AI agents instead.
What changed in 2026 — key updates for returning readersOllama v0.22.0 (April 28, 2026) added support for NVIDIA Nemotron 3 Omni, Poolside Laguna XS.2, and a fix for a desktop-app session-killing regression. Minimum NVIDIA driver: 531; minimum ROCm for AMD: v7.New models dominate the library. Llama 4 Scout (17B active params), Gemma 4 (26B MoE activating ~4B per token), and Qwen 3 replace the Llama 3 and Gemma 3 recommendations in the original post. Pull commands have changed accordingly.Ubuntu 24.04 LTS is now the recommended OS. Ubuntu 22.04 still works, but 24.04 ships the kernel 6.x series with better NVIDIA and AMD driver integration out of the box. Ubuntu 20.04 has reached end-of-standard-support and should not be used for new AI workloads.ollama launchis new. A single command bootstraps coding agents (Claude Code, GitHub Copilot CLI, VS Code Copilot) against your local models. This didn't exist in 2025.CORS configuration for VIC-20 now requires an explicit environment variable. Ollama no longer whitelistsfile://origins by default; you must setOLLAMA_ORIGINSwhen openingindex.htmldirectly from disk.Image generation landed on Linux. SetOLLAMA_ENABLE_IMAGE_GENERATION=1and pull a diffusion model — Ollama manages it like any other model.
Want the full picture? Read our continuously-updated Llama 4 Complete Guide (2026) — Scout and Maverick variants, MoE architecture, and deployment patterns.
TL;DR: Ollama + VIC-20 on Ubuntu
| Question | Answer (April 2026) |
|---|---|
| Ollama version | v0.22.0 (released April 28, 2026) |
| Recommended Ubuntu version | 24.04 LTS (22.04 still supported) |
| Minimum RAM (CPU-only, 7B model) | 16 GB system RAM |
| GPU VRAM for 7B models | 8 GB VRAM (RTX 3060 or better) |
| VIC-20 frontend size | <20 KB, no install, single HTML file |
| Best model for 8 GB VRAM | Llama 4 Scout or Gemma 4 (26B MoE) |
| Install time (on 100 Mbps internet) | ~5 min Ollama + ~5–20 min model pull |
| CORS needed for VIC-20? | Yes — set OLLAMA_ORIGINS (see below) |
Prerequisites
- Ubuntu 22.04 LTS or 24.04 LTS (fresh or existing install; server or desktop)
- sudo privileges
- 16 GB RAM minimum for CPU-only inference on 7B models; 8 GB RAM can run 3B models
- NVIDIA GPU (optional but recommended): driver 531 or newer, compute capability 5.0+. Check with
nvidia-smi. - AMD GPU (optional): ROCm v7 installed. AMD acceleration is Linux-only in 2026; Windows support is on the Ollama roadmap.
- Internet connection for the initial install and model pulls
- A modern web browser (Firefox, Chrome, Chromium) for the VIC-20 frontend
Step 1: Install Ollama on Ubuntu
The canonical install path is the official shell script, which detects your architecture (x86_64 or ARM64), installs the ollama binary under /usr/local/bin, and registers a systemd service automatically.
1.1 Update system packages
sudo apt update && sudo apt upgrade -y
1.2 Run the Ollama install script
curl -fsSL https://ollama.com/install.sh | sh
The script installs Ollama, enables the systemd unit ollama.service, and starts the service. On Ubuntu 24.04, it also detects NVIDIA and AMD GPUs and installs the appropriate acceleration libraries if your drivers are already present.
1.3 Verify the installation
ollama --version
# Expected: ollama version 0.22.0 (or newer)
systemctl status ollama
# Should show: Active: active (running)
If the service is not running:
sudo systemctl enable --now ollama
1.4 Confirm GPU detection
ollama ps
# With a model loaded, output shows "100% GPU" if acceleration is active.
# Without a loaded model, just check the Ollama logs:
journalctl -u ollama --since "5 minutes ago" | grep -i gpu
Step 2: GPU Acceleration Setup
GPU acceleration reduces inference time from 20–60 seconds per response (CPU) to 1–5 seconds on 7B–8B models. If you are running CPU-only, skip to Step 3.
2.1 NVIDIA GPUs
Ollama requires NVIDIA driver version 531 or newer and compute capability 5.0+ (GeForce 900-series and newer). To install drivers on Ubuntu 24.04:
sudo ubuntu-drivers autoinstall
sudo reboot
# After reboot:
nvidia-smi
# Confirm driver version >= 531
Once the correct drivers are installed, re-run curl -fsSL https://ollama.com/install.sh | sh if you installed Ollama before the drivers. The script will detect CUDA and configure the GPU backend. Alternatively, just restart the Ollama service:
sudo systemctl restart ollama
Expected throughput benchmarks (as of April 2026, community-measured):
| GPU | VRAM | Model | Tokens/sec (generation) |
|---|---|---|---|
| RTX 3060 | 12 GB | Llama 4 Scout 17B (Q4) | ~60–80 t/s |
| RTX 4060 | 8 GB | Gemma 4 E4B | ~40–50 t/s |
| RTX 4090 | 24 GB | Qwen 3 14B (Q4) | ~120–150 t/s |
| AMD Radeon RX 760M (iGPU) | shared | Gemma 4 26B (Q4_K_M) | ~21 t/s generation, ~239 t/s prefill |
| CPU-only (16-core, 32 GB RAM) | — | 7B model (Q4) | 3–8 t/s |
2.2 AMD GPUs
Ollama uses the ROCm v7 library for AMD acceleration on Linux. Install ROCm using the AMD ROCm quick-start guide, then restart the Ollama service. Supported cards include RX 5000-series and newer (RX 9000-series, Radeon PRO, Instinct MI). ROCm is Linux-only as of April 2026.
2.3 Intel and other GPUs (Vulkan)
Experimental Vulkan support is available for Intel and other GPUs on Linux. Enable it with:
export OLLAMA_VULKAN=1
sudo systemctl restart ollama
Step 3: Pull Current 2026-Era Models
Ollama's library hosts 100+ models. The Llama 3 recommendation from the original 2025 post is superseded; here are the current best choices for different hardware configurations.
3.1 Model selection guide
| Hardware | Recommended model | Pull command | VRAM / RAM needed |
|---|---|---|---|
| 8 GB VRAM (RTX 4060) | Gemma 4 E4B | ollama pull gemma4:4b |
~5 GB VRAM |
| 12 GB VRAM (RTX 3060) | Llama 4 Scout | ollama pull llama4:scout |
~10 GB VRAM |
| 24 GB VRAM (RTX 3090/4090) | Qwen 3 14B or Gemma 4 26B | ollama pull qwen3:14b |
~10–16 GB VRAM |
| 16 GB RAM (CPU-only) | Gemma 4 E4B or Qwen 3 4B | ollama pull gemma4:4b |
~5 GB RAM |
| 32 GB RAM (CPU-only) | Llama 4 Scout (Q4) | ollama pull llama4:scout |
~12 GB RAM |
3.2 Pull and run a model
# Pull Llama 4 Scout (best overall for 12 GB VRAM in 2026)
ollama pull llama4:scout
# Pull Gemma 4 E4B (best for 8 GB VRAM — 85 t/s on consumer hardware)
ollama pull gemma4:4b
# Pull Qwen 3 8B (best for coding tasks)
ollama pull qwen3:8b
# Test any model immediately in the terminal:
ollama run llama4:scout "Explain how transformers work in two sentences."
# List downloaded models:
ollama list
3.3 Quantization and memory trade-offs
By default, Ollama pulls Q4_K_M quantization. Moving from Q8 to Q4 cuts VRAM usage by 40–50% with a perplexity increase of roughly 1–3% — acceptable for most chat and coding tasks. Specify a different quantization with a tag, e.g. ollama pull qwen3:8b-q8_0.
Step 4: Configure Ollama for CORS (Required for VIC-20)
Ollama binds to http://localhost:11434 by default and allows cross-origin requests from 127.0.0.1 and 0.0.0.0. When the VIC-20 index.html is opened as a file:// URL in a browser, the browser sends requests with a null origin, which Ollama blocks by default. You must explicitly allow it.
4.1 Configure CORS via systemd (persistent)
sudo systemctl edit ollama.service
Add the following under the [Service] section in the override file:
[Service]
Environment="OLLAMA_ORIGINS=*"
Save, then reload:
sudo systemctl daemon-reload
sudo systemctl restart ollama
For tighter security, replace * with null (file:// origins) or your specific browser extension ID.
4.2 Configure CORS temporarily (for testing)
OLLAMA_ORIGINS="*" ollama serve
Note: this starts a second Ollama process if the systemd service is already running. Stop the service first with sudo systemctl stop ollama.
Step 5: Install and Run the VIC-20 Frontend
The VIC-20 frontend is a single-page HTML/JavaScript application maintained by shokuninstudio. The entire application weighs under 20 KB. There is no build step, no package manager, and no server process — you open a file in a browser.
5.1 Clone the repository
git clone https://github.com/shokuninstudio/Ollama-VIC-20.git
cd Ollama-VIC-20
5.2 Open the frontend
# Open index.html in your default browser:
xdg-open index.html
# Or specify a browser:
firefox index.html
chromium-browser index.html
If Ollama is running and CORS is configured (Step 4), the VIC-20 UI will load and immediately query http://localhost:11434/api/tags to populate the model dropdown.
5.3 Select a model and start chatting
- In the VIC-20 interface, select your downloaded model from the dropdown menu.
- Type your message in the input field and press Enter or click Send.
- To save a conversation, click the single-click save button — it downloads the conversation as a Markdown file.
5.4 VIC-20 vs. other frontends: how to choose
| Frontend | Size / install | Features | Best for |
|---|---|---|---|
| VIC-20 | <20 KB, no install | Chat, save-as-Markdown | Minimalists, air-gapped machines, privacy-first users |
| Ollamadore-64 | <64 KB, no install | Chat, slightly richer UI | Same use case as VIC-20, slightly more polish |
| Open WebUI | Docker, ~500 MB image | Multi-user, RAG, image upload, tools | Teams, power users, RAG pipelines |
| Hollama | Node.js app | Web UI, customizable | Developers who want to self-host a web interface |
| LM Studio | Desktop app | GUI model manager, chat | Non-technical users on desktop |
For teams shipping real AI workflows — RAG pipelines, coding agents, and multi-model routing — see the OpenClaw + Ollama setup guide for running local AI agents, which covers the full agentic stack. If you are looking to extend your engineering team with developers who specialize in local AI infrastructure, Codersera vets remote AI engineers who have shipped production Ollama deployments.
Step 6: Using Ollama Launch for Coding Agents (New in 2026)
Ollama v0.22 ships the ollama launch command, which bootstraps coding agents against your local models in one step. This is entirely new since the original post was published.
6.1 Launch Claude Code locally
# Launch Claude Code backed by a local Qwen 3.5 model:
ollama launch claude --model qwen3.5
# Or with Kimi-K2.5 (cloud API via Ollama proxy):
ollama launch claude --model kimi-k2.5:cloud
Claude Code requires a minimum 64k-token context window. Verify your model supports it before using it for large codebases.
6.2 Launch GitHub Copilot CLI locally
ollama launch copilot --model llama4:scout --yes -- -p "how does this repository work?"
6.3 Launch VS Code Copilot agent mode
ollama launch vscode
This wires your local Ollama instance into VS Code's Copilot Chat extension. The local model can run terminal commands, edit files, and fix its own mistakes — with no cloud dependency.
Troubleshooting Common Issues
Ollama service not starting
journalctl -u ollama -n 50
Common causes: port 11434 already in use; missing CUDA libraries after a driver upgrade. Free the port with sudo lsof -i :11434 and kill the conflicting process, or change Ollama's port with OLLAMA_HOST=0.0.0.0:11435.
GPU not detected
nvidia-smi # Must show your GPU
ollama ps # Shows "CPU" if GPU not in use
If nvidia-smi works but Ollama uses CPU: the driver version is below 531, or Ollama was installed before the GPU drivers. Run the install script again after driver installation, or set CUDA_VISIBLE_DEVICES=0 in the systemd override.
VIC-20 shows no models / blank dropdown
- Confirm Ollama is running:
curl http://localhost:11434/api/tagsshould return JSON. - Check CORS: if the response is blocked, re-verify
OLLAMA_ORIGINS=*is set in the systemd service and the service has been restarted. - Confirm at least one model is pulled:
ollama list.
Model download is slow or stalls
Ollama downloads models in parallel chunks. If a download stalls, kill the process and re-run ollama pull <model> — it resumes from where it left off. Ensure you have at least 2× the model size in free disk space (the model is stored in compressed GGUF format under /usr/share/ollama/.ollama/models).
Changing model storage location
Models default to /usr/share/ollama/.ollama/models on Linux. To redirect to a larger disk, add to the systemd override:
[Service]
Environment="OLLAMA_MODELS=/mnt/bigdisk/ollama-models"
Model keeps getting unloaded
By default, Ollama unloads models from memory after 5 minutes of inactivity. Increase the keep-alive duration:
[Service]
Environment="OLLAMA_KEEP_ALIVE=30m"
Performance Benchmarks (April 2026)
The following are real-world community-measured figures, not synthetic projections. Sources are linked in the References section.
| Configuration | Model | Generation speed | Source |
|---|---|---|---|
| AMD Ryzen AI MAX+ 128 GB RAM | Gemma 4 26B MoE | ~85 t/s | Google / community benchmarks |
| AMD Radeon 760M iGPU (Vulkan), Ubuntu 24.04 | Gemma 4 26B Q4_K_M | ~21 t/s gen, ~239 t/s prefill | DEV Community |
| Ubuntu 24.04 VM, 4 vCPU, 16 GB RAM (CPU-only) | Gemma 4 E4B | ~4.2 GB RAM, usable for testing | Community VRAM/RAM reports |
| r/LocalLLaMA community, various hardware | Llama 3.1 8B (for reference) | ~55 t/s GPU average | r/LocalLLaMA |
Quantization impact: Moving from Q8 to Q4_K_M reduces VRAM by 40–50% with a 1–3% perplexity increase — generally acceptable for chat and code tasks.
What Was Removed from the Original Guide — and Why
The original 2025 guide recommended pulling llama3 (now superseded by Llama 4 Scout) and used an outdated Ubuntu 22.04-only framing. The manual systemd file instructions have been replaced with the standard systemctl edit ollama.service override pattern, which is safer and survives package upgrades. The original post also did not mention CORS — a common failure point when opening VIC-20 from a file:// URL — or GPU setup, which is now the primary reason users install Ollama at all.
Decision Tree: Is This the Right Setup for You?
- You want the simplest possible local AI chat with zero install beyond Ollama → VIC-20 is correct. Follow this guide.
- You want multi-user access, RAG, or document upload → Use Open WebUI (
docker run -d -p 3000:8080 ghcr.io/open-webui/open-webui:main). - You want to run coding agents (Claude Code, Copilot) on local models → Use
ollama launch(Step 6 of this guide) or the OpenClaw setup guide. - You want a polished desktop app with a GUI model manager → Use LM Studio (Windows/macOS) or GPT4All.
- You need maximum inference throughput for production workloads → Use vLLM or llama.cpp directly; Ollama's abstraction adds latency under concurrent load.
FAQ
Which Ubuntu version should I use for Ollama in 2026?
Ubuntu 24.04 LTS is recommended. It ships with kernel 6.8, which has better NVIDIA and AMD driver integration than 22.04. Ubuntu 22.04 still works and Ollama officially supports it, but new installs should use 24.04. Ubuntu 20.04 is end-of-standard-support and should not be used for new AI workloads.
Can I run Ollama VIC-20 without a GPU?
Yes. Ollama runs on CPU only. With 16 GB RAM you can run 7B–8B models at 3–8 tokens/second — slow but functional. For daily use, a GPU is strongly recommended. The Gemma 4 E4B model is particularly efficient on CPU, fitting in ~4.2 GB RAM.
Which model should I start with in 2026?
On an 8 GB VRAM GPU: ollama pull gemma4:4b. On a 12 GB VRAM GPU: ollama pull llama4:scout. On CPU-only with 16 GB RAM: ollama pull gemma4:4b. These replace the 2025 recommendation of llama3, which is now outdated.
Does Ollama send my data anywhere?
No. When you run ollama serve locally, all inference happens on your machine. The only external network requests are the model downloads from ollama.com/library during ollama pull. After that, the model runs fully offline. VIC-20 also has no telemetry or external dependencies.
VIC-20 shows a blank page or no models. What do I do?
First, confirm Ollama is running: curl http://localhost:11434/api/tags. If that returns JSON but VIC-20 is blank, the issue is CORS. Open the browser console (F12 → Console), look for a CORS error, and follow Step 4 to set OLLAMA_ORIGINS=*. Also confirm you have at least one model downloaded with ollama list.
Can I run multiple models at the same time?
Yes, if your system has enough VRAM or RAM. Ollama loads models into memory on demand. Set OLLAMA_MAX_LOADED_MODELS=2 in your systemd override to allow two models simultaneously. With a single 12 GB VRAM GPU, two 7B models will likely exceed VRAM and fall back to CPU.
Is there a larger VIC-20 alternative from the same developer?
Yes. The same author (shokuninstudio) maintains Ollamadore 64 — a sub-64 KB frontend with a slightly richer interface. The install and usage pattern is identical. Clone from https://github.com/shokuninstudio/Ollamadore-64.
Can I use Ollama as a backend for AI coding tools in 2026?
Yes — this is one of the biggest 2026 additions. The ollama launch command supports Claude Code, GitHub Copilot CLI, and VS Code Copilot agent mode. Models that work well for coding include Qwen 3 8B and Llama 4 Scout. See Step 6 of this guide and the OpenClaw + Ollama setup guide for the full agentic workflow.
References & Further Reading
- Ollama GitHub Releases — official release notes and changelogs
- Ollama GPU Documentation — supported hardware, VRAM requirements, environment variables
- Ollama FAQ — CORS configuration, model storage, concurrency settings
- Ollama VIC-20 GitHub Repository — source code and usage notes
- Ollama Model Library — full index of available models
- Gemma 4 on AMD Ryzen mini PC: real-world benchmark (DEV Community, 2026)
- AMD ROCm Quick Start Guide — required for AMD GPU acceleration
- Ollama Benchmark Tool (GitHub) — throughput testing for local LLMs