Ollama VIC-20

Running Ollama VIC-20 on Ubuntu: 2026 Comprehensive Guide

Published 03 Mar 2025 • Updated 31 May 2026 • 12 min read

Last updated April 2026 — refreshed for current Ollama versions, Ubuntu 24.04, and 2026 model ecosystem.

Ollama VIC-20 is a sub-20 KB, zero-install JavaScript frontend for chatting with locally hosted large language models. Paired with Ollama v0.22 — the inference layer that now ships with native GPU support, image generation, and coding-agent integrations — it gives you a private, fast, no-cloud AI stack on any Ubuntu 22.04 or 24.04 machine in under ten minutes.

This guide walks through every step: installing Ollama on Ubuntu, enabling GPU acceleration, pulling current 2026-era models (Llama 4, Gemma 4, Qwen 3), and wiring up the VIC-20 frontend. It also covers the broader frontend landscape so you can decide whether VIC-20 is the right tool or whether you need something like Open WebUI or the OpenClaw + Ollama setup guide for running local AI agents instead.

What changed in 2026 — key updates for returning readersOllama v0.22.0 (April 28, 2026) added support for NVIDIA Nemotron 3 Omni, Poolside Laguna XS.2, and a fix for a desktop-app session-killing regression. Minimum NVIDIA driver: 531; minimum ROCm for AMD: v7.New models dominate the library. Llama 4 Scout (17B active params), Gemma 4 (26B MoE activating ~4B per token), and Qwen 3 replace the Llama 3 and Gemma 3 recommendations in the original post. Pull commands have changed accordingly.Ubuntu 24.04 LTS is now the recommended OS. Ubuntu 22.04 still works, but 24.04 ships the kernel 6.x series with better NVIDIA and AMD driver integration out of the box. Ubuntu 20.04 has reached end-of-standard-support and should not be used for new AI workloads.ollama launch is new. A single command bootstraps coding agents (Claude Code, GitHub Copilot CLI, VS Code Copilot) against your local models. This didn't exist in 2025.CORS configuration for VIC-20 now requires an explicit environment variable. Ollama no longer whitelists file:// origins by default; you must set OLLAMA_ORIGINS when opening index.html directly from disk.Image generation landed on Linux. Set OLLAMA_ENABLE_IMAGE_GENERATION=1 and pull a diffusion model — Ollama manages it like any other model.

Want the full picture? Read our continuously-updated Llama 4 Complete Guide (2026) — Scout and Maverick variants, MoE architecture, and deployment patterns.

TL;DR: Ollama + VIC-20 on Ubuntu

Question	Answer (April 2026)
Ollama version	v0.22.0 (released April 28, 2026)
Recommended Ubuntu version	24.04 LTS (22.04 still supported)
Minimum RAM (CPU-only, 7B model)	16 GB system RAM
GPU VRAM for 7B models	8 GB VRAM (RTX 3060 or better)
VIC-20 frontend size	<20 KB, no install, single HTML file
Best model for 8 GB VRAM	Llama 4 Scout or Gemma 4 (26B MoE)
Install time (on 100 Mbps internet)	~5 min Ollama + ~5–20 min model pull
CORS needed for VIC-20?	Yes — set `OLLAMA_ORIGINS` (see below)

Prerequisites

Ubuntu 22.04 LTS or 24.04 LTS (fresh or existing install; server or desktop)
sudo privileges
16 GB RAM minimum for CPU-only inference on 7B models; 8 GB RAM can run 3B models
NVIDIA GPU (optional but recommended): driver 531 or newer, compute capability 5.0+. Check with nvidia-smi.
AMD GPU (optional): ROCm v7 installed. AMD acceleration is Linux-only in 2026; Windows support is on the Ollama roadmap.
Internet connection for the initial install and model pulls
A modern web browser (Firefox, Chrome, Chromium) for the VIC-20 frontend

Step 1: Install Ollama on Ubuntu

The canonical install path is the official shell script, which detects your architecture (x86_64 or ARM64), installs the ollama binary under /usr/local/bin, and registers a systemd service automatically.

1.1 Update system packages

sudo apt update && sudo apt upgrade -y

1.2 Run the Ollama install script

curl -fsSL https://ollama.com/install.sh | sh

The script installs Ollama, enables the systemd unit ollama.service, and starts the service. On Ubuntu 24.04, it also detects NVIDIA and AMD GPUs and installs the appropriate acceleration libraries if your drivers are already present.

1.3 Verify the installation

ollama --version
# Expected: ollama version 0.22.0 (or newer)

systemctl status ollama
# Should show: Active: active (running)

If the service is not running:

sudo systemctl enable --now ollama

1.4 Confirm GPU detection

ollama ps
# With a model loaded, output shows "100% GPU" if acceleration is active.
# Without a loaded model, just check the Ollama logs:
journalctl -u ollama --since "5 minutes ago" | grep -i gpu

Step 2: GPU Acceleration Setup

GPU acceleration reduces inference time from 20–60 seconds per response (CPU) to 1–5 seconds on 7B–8B models. If you are running CPU-only, skip to Step 3.

2.1 NVIDIA GPUs

Ollama requires NVIDIA driver version 531 or newer and compute capability 5.0+ (GeForce 900-series and newer). To install drivers on Ubuntu 24.04:

sudo ubuntu-drivers autoinstall
sudo reboot
# After reboot:
nvidia-smi
# Confirm driver version >= 531

Once the correct drivers are installed, re-run curl -fsSL https://ollama.com/install.sh | sh if you installed Ollama before the drivers. The script will detect CUDA and configure the GPU backend. Alternatively, just restart the Ollama service:

sudo systemctl restart ollama

Expected throughput benchmarks (as of April 2026, community-measured):

GPU	VRAM	Model	Tokens/sec (generation)
RTX 3060	12 GB	Llama 4 Scout 17B (Q4)	~60–80 t/s
RTX 4060	8 GB	Gemma 4 E4B	~40–50 t/s
RTX 4090	24 GB	Qwen 3 14B (Q4)	~120–150 t/s
AMD Radeon RX 760M (iGPU)	shared	Gemma 4 26B (Q4_K_M)	~21 t/s generation, ~239 t/s prefill
CPU-only (16-core, 32 GB RAM)	—	7B model (Q4)	3–8 t/s

2.2 AMD GPUs

Ollama uses the ROCm v7 library for AMD acceleration on Linux. Install ROCm using the AMD ROCm quick-start guide, then restart the Ollama service. Supported cards include RX 5000-series and newer (RX 9000-series, Radeon PRO, Instinct MI). ROCm is Linux-only as of April 2026.

2.3 Intel and other GPUs (Vulkan)

Experimental Vulkan support is available for Intel and other GPUs on Linux. Enable it with:

export OLLAMA_VULKAN=1
sudo systemctl restart ollama

Step 3: Pull Current 2026-Era Models

Ollama's library hosts 100+ models. The Llama 3 recommendation from the original 2025 post is superseded; here are the current best choices for different hardware configurations.

3.1 Model selection guide

Hardware	Recommended model	Pull command	VRAM / RAM needed
8 GB VRAM (RTX 4060)	Gemma 4 E4B	`ollama pull gemma4:4b`	~5 GB VRAM
12 GB VRAM (RTX 3060)	Llama 4 Scout	`ollama pull llama4:scout`	~10 GB VRAM
24 GB VRAM (RTX 3090/4090)	Qwen 3 14B or Gemma 4 26B	`ollama pull qwen3:14b`	~10–16 GB VRAM
16 GB RAM (CPU-only)	Gemma 4 E4B or Qwen 3 4B	`ollama pull gemma4:4b`	~5 GB RAM
32 GB RAM (CPU-only)	Llama 4 Scout (Q4)	`ollama pull llama4:scout`	~12 GB RAM

3.2 Pull and run a model

# Pull Llama 4 Scout (best overall for 12 GB VRAM in 2026)
ollama pull llama4:scout

# Pull Gemma 4 E4B (best for 8 GB VRAM — 85 t/s on consumer hardware)
ollama pull gemma4:4b

# Pull Qwen 3 8B (best for coding tasks)
ollama pull qwen3:8b

# Test any model immediately in the terminal:
ollama run llama4:scout "Explain how transformers work in two sentences."

# List downloaded models:
ollama list

3.3 Quantization and memory trade-offs

By default, Ollama pulls Q4_K_M quantization. Moving from Q8 to Q4 cuts VRAM usage by 40–50% with a perplexity increase of roughly 1–3% — acceptable for most chat and coding tasks. Specify a different quantization with a tag, e.g. ollama pull qwen3:8b-q8_0.

Step 4: Configure Ollama for CORS (Required for VIC-20)

Ollama binds to http://localhost:11434 by default and allows cross-origin requests from 127.0.0.1 and 0.0.0.0. When the VIC-20 index.html is opened as a file:// URL in a browser, the browser sends requests with a null origin, which Ollama blocks by default. You must explicitly allow it.

4.1 Configure CORS via systemd (persistent)

sudo systemctl edit ollama.service

Add the following under the [Service] section in the override file:

[Service]
Environment="OLLAMA_ORIGINS=*"

Save, then reload:

sudo systemctl daemon-reload
sudo systemctl restart ollama

For tighter security, replace * with null (file:// origins) or your specific browser extension ID.

4.2 Configure CORS temporarily (for testing)

OLLAMA_ORIGINS="*" ollama serve

Note: this starts a second Ollama process if the systemd service is already running. Stop the service first with sudo systemctl stop ollama.

Step 5: Install and Run the VIC-20 Frontend

The VIC-20 frontend is a single-page HTML/JavaScript application maintained by shokuninstudio. The entire application weighs under 20 KB. There is no build step, no package manager, and no server process — you open a file in a browser.

5.1 Clone the repository

git clone https://github.com/shokuninstudio/Ollama-VIC-20.git
cd Ollama-VIC-20

5.2 Open the frontend

# Open index.html in your default browser:
xdg-open index.html

# Or specify a browser:
firefox index.html
chromium-browser index.html

If Ollama is running and CORS is configured (Step 4), the VIC-20 UI will load and immediately query http://localhost:11434/api/tags to populate the model dropdown.

5.3 Select a model and start chatting

In the VIC-20 interface, select your downloaded model from the dropdown menu.
Type your message in the input field and press Enter or click Send.
To save a conversation, click the single-click save button — it downloads the conversation as a Markdown file.

5.4 VIC-20 vs. other frontends: how to choose

Frontend	Size / install	Features	Best for
VIC-20	<20 KB, no install	Chat, save-as-Markdown	Minimalists, air-gapped machines, privacy-first users
Ollamadore-64	<64 KB, no install	Chat, slightly richer UI	Same use case as VIC-20, slightly more polish
Open WebUI	Docker, ~500 MB image	Multi-user, RAG, image upload, tools	Teams, power users, RAG pipelines
Hollama	Node.js app	Web UI, customizable	Developers who want to self-host a web interface
LM Studio	Desktop app	GUI model manager, chat	Non-technical users on desktop

For teams shipping real AI workflows — RAG pipelines, coding agents, and multi-model routing — see the OpenClaw + Ollama setup guide for running local AI agents, which covers the full agentic stack. If you are looking to extend your engineering team with developers who specialize in local AI infrastructure, Codersera vets remote AI engineers who have shipped production Ollama deployments.

Step 6: Using Ollama Launch for Coding Agents (New in 2026)

Ollama v0.22 ships the ollama launch command, which bootstraps coding agents against your local models in one step. This is entirely new since the original post was published.

6.1 Launch Claude Code locally

# Launch Claude Code backed by a local Qwen 3.5 model:
ollama launch claude --model qwen3.5

# Or with Kimi-K2.5 (cloud API via Ollama proxy):
ollama launch claude --model kimi-k2.5:cloud

Claude Code requires a minimum 64k-token context window. Verify your model supports it before using it for large codebases.

6.2 Launch GitHub Copilot CLI locally

ollama launch copilot --model llama4:scout --yes -- -p "how does this repository work?"

6.3 Launch VS Code Copilot agent mode

ollama launch vscode

This wires your local Ollama instance into VS Code's Copilot Chat extension. The local model can run terminal commands, edit files, and fix its own mistakes — with no cloud dependency.

Troubleshooting Common Issues

Ollama service not starting

journalctl -u ollama -n 50

Common causes: port 11434 already in use; missing CUDA libraries after a driver upgrade. Free the port with sudo lsof -i :11434 and kill the conflicting process, or change Ollama's port with OLLAMA_HOST=0.0.0.0:11435.

GPU not detected

nvidia-smi          # Must show your GPU
ollama ps           # Shows "CPU" if GPU not in use

If nvidia-smi works but Ollama uses CPU: the driver version is below 531, or Ollama was installed before the GPU drivers. Run the install script again after driver installation, or set CUDA_VISIBLE_DEVICES=0 in the systemd override.

Confirm Ollama is running: curl http://localhost:11434/api/tags should return JSON.
Check CORS: if the response is blocked, re-verify OLLAMA_ORIGINS=* is set in the systemd service and the service has been restarted.
Confirm at least one model is pulled: ollama list.

Model download is slow or stalls

Ollama downloads models in parallel chunks. If a download stalls, kill the process and re-run ollama pull <model> — it resumes from where it left off. Ensure you have at least 2× the model size in free disk space (the model is stored in compressed GGUF format under /usr/share/ollama/.ollama/models).

Changing model storage location

Models default to /usr/share/ollama/.ollama/models on Linux. To redirect to a larger disk, add to the systemd override:

[Service]
Environment="OLLAMA_MODELS=/mnt/bigdisk/ollama-models"

Model keeps getting unloaded

By default, Ollama unloads models from memory after 5 minutes of inactivity. Increase the keep-alive duration:

[Service]
Environment="OLLAMA_KEEP_ALIVE=30m"

Performance Benchmarks (April 2026)

The following are real-world community-measured figures, not synthetic projections. Sources are linked in the References section.

Configuration	Model	Generation speed	Source
AMD Ryzen AI MAX+ 128 GB RAM	Gemma 4 26B MoE	~85 t/s	Google / community benchmarks
AMD Radeon 760M iGPU (Vulkan), Ubuntu 24.04	Gemma 4 26B Q4_K_M	~21 t/s gen, ~239 t/s prefill	DEV Community
Ubuntu 24.04 VM, 4 vCPU, 16 GB RAM (CPU-only)	Gemma 4 E4B	~4.2 GB RAM, usable for testing	Community VRAM/RAM reports
r/LocalLLaMA community, various hardware	Llama 3.1 8B (for reference)	~55 t/s GPU average	r/LocalLLaMA

Quantization impact: Moving from Q8 to Q4_K_M reduces VRAM by 40–50% with a 1–3% perplexity increase — generally acceptable for chat and code tasks.

What Was Removed from the Original Guide — and Why

The original 2025 guide recommended pulling llama3 (now superseded by Llama 4 Scout) and used an outdated Ubuntu 22.04-only framing. The manual systemd file instructions have been replaced with the standard systemctl edit ollama.service override pattern, which is safer and survives package upgrades. The original post also did not mention CORS — a common failure point when opening VIC-20 from a file:// URL — or GPU setup, which is now the primary reason users install Ollama at all.

Decision Tree: Is This the Right Setup for You?

You want the simplest possible local AI chat with zero install beyond Ollama → VIC-20 is correct. Follow this guide.
You want multi-user access, RAG, or document upload → Use Open WebUI (docker run -d -p 3000:8080 ghcr.io/open-webui/open-webui:main).
You want to run coding agents (Claude Code, Copilot) on local models → Use ollama launch (Step 6 of this guide) or the OpenClaw setup guide.
You want a polished desktop app with a GUI model manager → Use LM Studio (Windows/macOS) or GPT4All.
You need maximum inference throughput for production workloads → Use vLLM or llama.cpp directly; Ollama's abstraction adds latency under concurrent load.

FAQ

Which Ubuntu version should I use for Ollama in 2026?

Ubuntu 24.04 LTS is recommended. It ships with kernel 6.8, which has better NVIDIA and AMD driver integration than 22.04. Ubuntu 22.04 still works and Ollama officially supports it, but new installs should use 24.04. Ubuntu 20.04 is end-of-standard-support and should not be used for new AI workloads.

Can I run Ollama VIC-20 without a GPU?

Yes. Ollama runs on CPU only. With 16 GB RAM you can run 7B–8B models at 3–8 tokens/second — slow but functional. For daily use, a GPU is strongly recommended. The Gemma 4 E4B model is particularly efficient on CPU, fitting in ~4.2 GB RAM.

Which model should I start with in 2026?

On an 8 GB VRAM GPU: ollama pull gemma4:4b. On a 12 GB VRAM GPU: ollama pull llama4:scout. On CPU-only with 16 GB RAM: ollama pull gemma4:4b. These replace the 2025 recommendation of llama3, which is now outdated.

Does Ollama send my data anywhere?

No. When you run ollama serve locally, all inference happens on your machine. The only external network requests are the model downloads from ollama.com/library during ollama pull. After that, the model runs fully offline. VIC-20 also has no telemetry or external dependencies.

VIC-20 shows a blank page or no models. What do I do?

First, confirm Ollama is running: curl http://localhost:11434/api/tags. If that returns JSON but VIC-20 is blank, the issue is CORS. Open the browser console (F12 → Console), look for a CORS error, and follow Step 4 to set OLLAMA_ORIGINS=*. Also confirm you have at least one model downloaded with ollama list.

Can I run multiple models at the same time?

Yes, if your system has enough VRAM or RAM. Ollama loads models into memory on demand. Set OLLAMA_MAX_LOADED_MODELS=2 in your systemd override to allow two models simultaneously. With a single 12 GB VRAM GPU, two 7B models will likely exceed VRAM and fall back to CPU.

Is there a larger VIC-20 alternative from the same developer?

Yes. The same author (shokuninstudio) maintains Ollamadore 64 — a sub-64 KB frontend with a slightly richer interface. The install and usage pattern is identical. Clone from https://github.com/shokuninstudio/Ollamadore-64.

Can I use Ollama as a backend for AI coding tools in 2026?

Yes — this is one of the biggest 2026 additions. The ollama launch command supports Claude Code, GitHub Copilot CLI, and VS Code Copilot agent mode. Models that work well for coding include Qwen 3 8B and Llama 4 Scout. See Step 6 of this guide and the OpenClaw + Ollama setup guide for the full agentic workflow.