Codersera

Run Qwen3.5‑0.8B with OpenClaw + Ollama on CPU Locally (Free Step‑by‑Step Guide)

Learn how to install, run, benchmark, and compare Qwen3.5‑0.8B with OpenClaw and Ollama on your CPU for free. Private, local AI with practical demos.

This report explains how to build a modern, completely local AI stack that runs entirely on your CPU using three components:

  • Qwen3.5‑0.8B – Alibaba’s tiny but surprisingly capable multimodal model from the Qwen 2.5, Qwen 3 and Qwen 3.5 "Small" series.
  • Ollama – a popular local LLM runtime that makes pulling and running models as simple as a single command.
  • OpenClaw – an open‑source local AI gateway and agent framework that orchestrates tools, files, and workflows on your machine.

The goal is to:

  • Install all tools on Windows, macOS, or Linux.
  • Run Qwen3.5‑0.8B fully offline on CPU via Ollama.
  • Connect it to OpenClaw for agent‑style workflows.
  • Benchmark speed and quality and compare with other runtimes and models.
  • Understand pricing, licensing, and how this stack differs from competitors.

What is Qwen3.5‑0.8B?

Alibaba’s Qwen 3.5 series is a family of open‑source multimodal language models designed to bring "flagship level" intelligence to smaller sizes, from 0.8B to 9B parameters in the compact tier and much larger models in the medium and flagship tiers.

Key characteristics

  • Model size: 0.8 billion parameters – one of the smallest Qwen 3.5 variants, targeted at phones, laptops, and edge devices.
  • Multimodal: Native support for both text and images across the whole series (including 0.8B), instead of bolting vision components on top.
  • Architecture: Uses innovations like Gated Delta Networks and sparse Mixture‑of‑Experts (MoE) to pack more capability into fewer active parameters.
  • Training: Enhanced reinforcement learning (RL) on reasoning‑heavy tasks improves instruction following and multi‑step problem solving, even in small models.
  • License: Open‑weight and released under Apache 2.0, which allows commercial use and redistribution with minimal restrictions.​

At the small end, Alibaba explicitly markets Qwen3.5‑0.8B and 2B as optimized for phones, laptops, and edge devices, highlighting faster speed and smaller memory footprint compared to the 4B and 9B variants.

Why 0.8B is interesting for CPU users

For a CPU‑only setup, smaller parameter counts mean:

  • Lower RAM requirements – the model file is around ~500 MB in Ollama’s quantized format, so it fits easily even on a modest laptop.​
  • Faster inference – fewer parameters to process at each token leads to much better latency on CPUs compared to 4B/9B models.
  • Energy and heat savings – smaller models generate less CPU load, which is important for laptops and small desktops.

You will not get the same deep reasoning quality as the 4B or 9B versions, but for everyday chat, basic coding help, and lightweight analysis, Qwen3.5‑0.8B performs far above what earlier sub‑1B models could achieve.


Ollama in a nutshell

Ollama is a local LLM runtime that wraps the performant llama.cpp backend in a friendly CLI and HTTP API, with an integrated model library.

Core features

  • One‑line install on macOS, Linux, and Windows using official packages or an install script.
  • Model registry with ready‑to‑run builds of popular models like Qwen, Llama, Mistral, Gemma, and more.
  • Docker‑like workflow: ollama pull to download, ollama run to start chatting with a model.
  • Built‑in API server on http://localhost:11434, making it easy to integrate with tools such as OpenClaw or Open WebUI.
  • Automatic quantization and hardware detection – it chooses formats and GPU/CPU backends for you while still allowing manual tuning for advanced users.

Guides and community tests often describe Ollama as the fastest path from zero to a working local LLM in 2026, especially for developers who are comfortable with the terminal.

CPU‑only considerations

While much Ollama marketing focuses on GPU acceleration, it also runs entirely on CPU.

  • On Linux, a single shell script installs everything, and Ollama runs as a background server.​
  • On Windows and macOS, installers provide a native app and CLI; the same binary supports CPU‑only inference when no GPU is available or when GPU flags are disabled.
  • Community reports show that Ollama’s CPU performance is good for small and medium models, although raw speed can sometimes lag a tuned llama.cpp build.

For this guide, CPU‑only mode is sufficient for Qwen3.5‑0.8B on most laptops and desktops.


OpenClaw in a nutshell

OpenClaw is an open‑source local AI gateway and agent framework. It sits between LLMs and local or cloud tools, allowing workflows that can read and write files, run scripts, call APIs, and maintain long‑term memory.

Key features

  • Local execution layer: OpenClaw gives AI models controlled access to the local filesystem, scripts, and browsers through a sandboxed environment.
  • Persistent Markdown memory: It stores user preferences and long‑term context as Markdown documents, which users can inspect and edit.
  • Extensible skills and tools: Workflows are defined as "skills" that chain commands like shell scripts, Python programs, and local LLM calls.​
  • Multiple deployment modes: Installer script, npm global install, Docker, Podman, Nix, and more.
  • Cross‑platform: Supports macOS, Linux, and Windows (commonly via WSL2 on Windows for best compatibility).

OpenClaw is often positioned as a more security‑aware, locally focused alternative to cloud‑hosted agent platforms, aiming to keep data and execution on your own machine.


Why combine Qwen3.5‑0.8B + Ollama + OpenClaw?

Putting these three components together gives a powerful local stack:

  1. Qwen3.5‑0.8B brings modern multimodal intelligence at a size that runs comfortably on CPUs.
  2. Ollama makes installing and serving the model trivial, exposing a simple HTTP API and CLI.
  3. OpenClaw orchestrates tools, scripts, and documents around the LLM, building agent‑like workflows such as local data analysis, PDF summarization, and file automation.

Benefits of this stack

  • 100% local and free: All three components are open source or open‑weight, with no per‑token cloud fees. Apache‑2.0 licensing for Qwen 3.5 allows commercial use.
  • CPU‑friendly: The 0.8B model is explicitly tuned for edge devices and laptops with as little as 2–4 GB of RAM available for the model.
  • Privacy by design: Data, prompts, and files never leave your machine, which is especially important for business documents or personal data.
  • Extendable: You can later swap Qwen3.5‑0.8B for larger Qwen3.5 models or other families (Llama, Mistral, Gemma) without redesigning everything.

This makes the stack particularly attractive for:

  • Developers who want a local assistant for code, docs, and terminal tasks.
  • Data analysts or students who want a local data analysis agent without sending files to the cloud.​
  • Privacy‑focused users exploring AI on older or GPU‑less hardware.

System requirements and hardware recommendations

Based on official docs, community guides, and Qwen 3.5 installation examples via Ollama:

ComponentMinimum for Qwen3.5‑0.8B (CPU‑only)Recommended for smoother experience
OSWindows 10+, macOS, or modern LinuxSame, ideally recent macOS/Linux kernel
CPUAny 4‑core CPU from last 5–7 years8+ cores (e.g., Ryzen 5/7, i5/i7 10th gen+)
RAM8 GB total (2–3 GB free)16 GB+
Storage~4 GB free for tools + model10+ GB to try more models
GPUNot requiredOptional; CPU‑only works fine for 0.8B

Qwen3.5‑0.8B specifically is advertised as a model that can "run on almost anything," with the Ollama guide citing ~500 MB of storage and minimal hardware.​

Hardware requirements by model size (Qwen 3.5 Small series)

The same guide provides a concise hardware table for the small models when run with Ollama:​

ModelParametersApprox. model size (quantized)Minimum RAM/VRAMTypical use case
Qwen3.5‑0.8B0.8B~500 MB2 GBPhones, basic laptops, edge devices
Qwen3.5‑2B2B~1.5 GB4 GBLightweight agents, enhanced reasoning
Qwen3.5‑4B4B~2.5 GB6 GBGeneral‑purpose laptop assistant
Qwen3.5‑9B9B~5 GB8 GBHigher quality reasoning on stronger machines

For this article, the focus is on 0.8B because it guarantees good CPU‑only performance on mainstream hardware.


Step 1 – Install Ollama (Windows, macOS, Linux)

Windows

  1. Open the official Ollama download page and choose Windows.
  2. Download OllamaSetup.exe and run it.
  3. Follow the installer; it installs under your user account (no admin rights needed) and sets up the background service and CLI.
  4. Open PowerShell and verify:

powershellollama -v
ollama list

The API will be available at http://localhost:11434 once the service is running.​

macOS

  1. Visit the Ollama website, download the latest macOS package, and move Ollama.app to Applications.​
  2. Launch Ollama.app once; it starts the background service.
  3. Confirm installation in Terminal:

bashollama -v
ollama list

Linux

On Linux, the official install script is the easiest path:​

bashcurl -fsSL https://ollama.com/install.sh | sh
ollama -v
ollama list

Alternatively, download the standalone ollama-linux-amd64 binary from the download page and run it directly without root:

bash./ollama-linux-amd64 serve &
./ollama-linux-amd64 run llama2

This approach is commonly used on clusters or locked‑down servers where sudo is not available.​


Step 2 – Pull and test Qwen3.5‑0.8B in Ollama

Once Ollama is installed, pulling Qwen3.5‑0.8B is a single command.

Pull the model

Ollama exposes Qwen 3.5 0.8B as a named library entry:

bashollama pull qwen3.5:0.8b

The official library page describes Qwen 3.5 as a family of open‑source multimodal models and includes the 0.8B variant as a ready‑to‑run model.

Run an interactive chat

Start a chat session directly in the terminal:

bashollama run qwen3.5:0.8b

Type a simple prompt such as:

Explain what Qwen3.5‑0.8B is in two sentences.

Exit with Ctrl+C when done.

Quick functional tests

To confirm that the model is usable for basic assistant tasks on CPU:

  1. General knowledge: Ask for a short explanation of a concept (e.g., "What is overfitting in machine learning?").
  2. Reasoning: Ask a multi‑step question (e.g., "If a train travels 120 km in 2 hours, what is its speed, and how long to go 300 km?").
  3. Text transformation: Request rewriting text in simpler English.

Users report that even the smallest Qwen 3.5 models show strong instruction following and multilingual capability compared to older tiny models.


Step 3 – Install and verify OpenClaw

The official OpenClaw installer script works on macOS, Linux, and Windows (PowerShell, typically via WSL2):

bashcurl -fsSL https://openclaw.ai/install.sh | bash

This installs the CLI globally (via npm under the hood when needed), checks for Node.js 22+, and may run an onboarding wizard.

Alternative: npm global install

If Node 22+ is already installed and npm is configured:

bashnpm install -g openclaw@latest
openclaw onboard --install-daemon

The --install-daemon flag registers OpenClaw as a background service (systemd on Linux, launchd on macOS), ensuring the gateway keeps running across reboots.

Post‑install checks

After installation, run the basic diagnostics:

bashopenclaw doctor # check for config issues
openclaw status # gateway status
openclaw dashboard # open browser UI

If the Control UI opens and shows a healthy gateway, OpenClaw is ready.


Step 4 – Connect OpenClaw to Ollama and Qwen3.5‑0.8B

OpenClaw communicates with local LLMs via HTTP APIs. Ollama exposes such an API at http://localhost:11434, which OpenClaw can call as a tool.

While exact configuration files vary by version, the high‑level pattern (based on a public tutorial for using OpenClaw with Ollama as a local data analyst) is:

  1. Define a workspace skill that includes tools for running local commands and calling Ollama.
  2. Point the HTTP client to the Ollama endpoint.
  3. Reference that tool in an agent configuration.​

Example: simple skill snippet (conceptual)

A simplified pseudo‑configuration might look like this (YAML‑ish for illustration):

text# skills/local-ollama-qwen35-08b.skill.md
---
name: local-qwen35-08b
summary: "Use local Qwen3.5-0.8B via Ollama to answer questions."

steps:
- id: ask-model
tool: http
args:
method: POST
url: "http://localhost:11434/api/chat"
body:
model: "qwen3.5:0.8b"
messages:
- role: system
content: "You are a helpful local assistant."
- role: user
content: "{{ input }}"

In real OpenClaw setups, this is written as a Markdown front matter block plus narrative instructions, but the idea is the same: a step that posts user input to the Ollama chat API and returns the model’s reply.​

Testing the connection

  1. Ensure Ollama is running (ollama serve if required) and Qwen3.5‑0.8B is pulled.
  2. Start/Open the OpenClaw gateway if it is not already running.
  3. Trigger the skill (e.g., via a slash command or the web UI, depending on configuration).
  4. Ask a simple question and confirm a response appears.

If the response is slow but steady, your CPU is handling the 0.8B model as expected.


Demo: local data analyst workflow with OpenClaw + Ollama + Qwen3.5‑0.8B

A well‑documented example of OpenClaw + Ollama usage is a Local Data Analyst demo that:

  • Accepts a CSV dataset and optional context documents.
  • Runs an agent workflow that generates charts and a textual report.
  • Stores outputs as trend_chart.pnganalysis_report.md, and tool_trace.json on disk.​

How the workflow is structured

According to the tutorial:​

  • web interface (web_assistant.py) handles file uploads and sends a slash command to OpenClaw.
  • An OpenClaw agent loads the workspace skill, executes tools (shell commands, Python scripts), and calls the local LLM via Ollama for reasoning and summarization.
  • An analysis engine (main.py) reads the dataset, infers relevant columns, generates charts, and writes outputs to disk.

Adapting this to Qwen3.5‑0.8B simply means using qwen3.5:0.8b as the Ollama model in the skill configuration.

Example usage flow

  1. Start Ollama and ensure Qwen3.5‑0.8B is available.
  2. Launch OpenClaw and verify the dashboard.
  3. Run the Local Data Analyst web app from the tutorial repo.
  4. Upload a small CSV (e.g., monthly sales data).
  5. Trigger the analysis workflow.

Within a short time, you should see:

  • A line chart of trends (trend_chart.png).
  • A Markdown summary capturing key insights (analysis_report.md).
  • A JSON trace of which tools and commands ran (tool_trace.json).​

Even with a tiny 0.8B model, this demonstrates how tool‑augmented reasoning can compensate for limited raw model capacity when the tools and workflow are well‑designed.


Benchmarks and expected performance

Global Qwen 3.5 benchmark context

Public benchmarks for the wider Qwen 3.5 lineup show impressive results, especially for the 9B and medium‑sized models:

  • Qwen3.5‑9B scores 82.5 on MMLU‑Pro and 81.7 on GPQA Diamond, approaching larger proprietary systems.
  • On multimodal benchmarks like MMMU‑Pro and MathVision, Qwen 3.5 small and medium models outperform competing small models such as GPT‑5‑Nano in several tests.​
  • Reviewers note that the architecture delivers "remarkable efficiency," with smaller models competing well against earlier Qwen 3 and other open models with many more parameters.

While these headline numbers are mostly for 4B and 9B variants, they indicate that the underlying architecture is strong even when scaled down to 0.8B.

CPU performance expectations for 0.8B

Exact tokens‑per‑second for Qwen3.5‑0.8B on CPU vary by hardware and quantization, but some reasonable expectations can be drawn from local LLM speed tests and small‑model behavior:

  • Model size: At ~500 MB and 0.8B parameters, it is significantly lighter than typical 7B models (often 4–8 GB quantized) and should decode tokens much faster on CPU.
  • Runtime: Community tests with comparable small models (1–2B) using llama.cpp and Ollama report tens to hundreds of tokens per second on modern CPUs, depending on quantization and threads.
  • Practical feel: On a mid‑range 8‑core CPU with 16 GB RAM, users can expect near‑instant responses for short prompts and acceptable latency for longer outputs.

For context, independent speed tests have shown that:

  • In some scenarios, llama.cpp reaches around 161 tokens/s, versus 89 tokens/s for Ollama with the same model.​
  • In other tests, Ollama has outperformed LM Studio by 10–34% in inference speed, showing that runtime performance is highly workload‑dependent.

Since Qwen3.5‑0.8B is optimized for edge and low‑resource devices, CPU‑only speed is one of its design targets.

Latency vs. model size (qualitative)

A qualitative comparison for Qwen 3.5 models on CPU‑only setups:

ModelLatency on mid‑range CPU (subjective)Quality vs. 0.8B
0.8BVery fast, almost instant for short repliesBaseline
2BFast, minor delays on longer generationsNoticeable improvement on reasoning
4BModerate; acceptable but slower on long outputsMuch stronger reasoning and knowledge
9BSlower on CPU‑only; better with GPUApproaches older large models and some smaller proprietary models

For pure CPU usage, 0.8B and 2B offer the best balance between speed and usability, while 4B and 9B become more comfortable if at least partial GPU offload is available.


How this stack compares to other local LLM options

Comparing: Ollama vs LM Studio vs llama.cpp vs GPT4All vs Jan

A number of 2025–2026 comparisons and guides evaluate the main local LLM runtimes.

RuntimeInterfaceBest forPerformance notesEase of setup
OllamaCLI + REST APIDevelopers, scripts, integrationsOften 10–20% faster than LM Studio in some tests; in others slightly slower than hand‑tuned llama.cpp.Very easy; one‑line install and ollama run
LM StudioDesktop GUINon‑technical users, quick experimentsSometimes slower than Ollama; in some Mac tests, LM Studio outperformed Ollama (e.g., 237 t/s vs 149 t/s on Gemma 3 1B).Extremely easy; full GUI, no terminal needed
llama.cppCLI / libraryPower users, maximum speedCan be up to 1.8× faster than Ollama in certain benchmarks; exposes low‑level tuning.Harder; requires manual builds and model management
GPT4AllDesktop GUIBeginners wanting a ChatGPT‑like appEmphasizes local RAG and document chat; simple configuration.Easy; installer + built‑in model download
JanDesktop GUI + local backendPrivacy‑focused local chat100% offline by default; supports multiple runtimes under the hood.Easy; but less focused on dev workflows

USP of Ollama in this stack: it strikes a middle ground between raw performance and usability, offers a consistent curated model library, and exposes a simple HTTP API that OpenClaw can call without extra adapters.

Comparing agent frameworks: OpenClaw vs others

OpenClaw competes with other open‑source or commercial agent frameworks and gateways.

ToolFocusSelf‑hostedNotable USP
OpenClawLocal gateway + tools + memoryYesStrong emphasis on local execution, Markdown memory, and security controls.
Anything LLMKnowledge base + document chatYesMulti‑LLM hub focused on RAG for teams.
SuperAGIMulti‑agent workflowsYesExtensive automation for GTM, sales, and support.
NanoClaw / NullClawLightweight or edge deploymentsYesTiny footprints (single binaries, Zig code) aimed at minimal environments.
eesel AIBusiness support automationEnterpriseAI teammate model for customer support.

USP of OpenClaw: combines a general‑purpose local gateway, rich tool orchestration, and persistent Markdown memory in one coherent platform, making it particularly suited for personal machines and privacy‑sensitive use cases.

Comparing models: Qwen3.5‑0.8B vs other small models

Direct head‑to‑head benchmarks for the 0.8B model are still emerging, but several trends are clear from coverage of the Qwen 3.5 lineup and community comparisons of small models.

ModelSize categoryMultimodalLicenseNoted strengths
Qwen3.5‑0.8BTiny (sub‑1B)YesApache‑2.0Native multimodality, strong small‑model efficiency, optimized for edge devices.
Qwen3.5‑4BSmallYesApache‑2.0Much stronger reasoning, often rivaling older 30B‑scale models.
Qwen3.5‑9BSmall/mediumYesApache‑2.0Approaches performance of much larger models, top scores on MMLU‑Pro and GPQA.
Gemma 3 1BSmallOften text‑onlyOpen‑weightVery fast on Apple Silicon; strong coding in some tests.
Llama 4 (small variants)SmallUsually text‑onlyOpen‑weightGeneral‑purpose assistants with strong reasoning and broad tool support.

USP of Qwen3.5‑0.8B: unlike many tiny models that strip out modalities or compromise instruction following, it remains natively multimodal and trained with scaled RL, giving it better instruction compliance and image‑aware capabilities in a sub‑1B footprint.


Pricing and licensing

Model: Qwen 3.5

Alibaba has released Qwen 3.5 models, including the small series, as open‑source, open‑weight models under Apache‑2.0, meaning:

  • Free to download and use.
  • Permits commercial usage and redistribution.
  • Requires preserving the license and notices but no royalties.

Ollama

Ollama is distributed as free software for personal and commercial use. The runtime itself is open source, and the primary "cost" is local compute and storage.

Some users may later choose paid cloud infrastructure (e.g., remote GPUs or hosting providers), but the tool itself and local usage are free.

OpenClaw

OpenClaw is an open‑source gateway, and self‑hosting on your own machine incurs no license fees.

The main cost levers are:

  • CPU time and electricity to run workflows.
  • Optional cloud integrations or infrastructure if used.

Summary: run everything free on CPU

For a pure local CPU‑only deployment on a single machine:

  • No per‑token or subscription costs for Qwen3.5‑0.8B, Ollama, or OpenClaw.
  • Hardware requirements are modest, especially for 0.8B.
  • You retain full control over data and can run offline.

Practical testing checklist

To systematically evaluate your Qwen3.5‑0.8B + OpenClaw + Ollama stack on CPU, consider the following tests.

1. Responsiveness and speed

  • Measure rough response time for a 3–4 sentence answer.
  • Try longer prompts (e.g., 1–2 pages of text) and note if latency is still acceptable.
  • Compare with a 2B or 4B model if your hardware allows, to feel the trade‑off between speed and quality.

2. Instruction following

  • Ask the model to follow strict formats (bullet lists, JSON, tables).
  • Test multi‑step instructions, such as "analyze then summarize then give recommendations."
  • Evaluate if Qwen3.5‑0.8B stays on task; small models can drift more easily than 9B+.

3. Tool use via OpenClaw

  • Create a simple OpenClaw skill that:
    • Reads a text file.
    • Asks the model to summarize it via Ollama.
    • Writes the summary back to disk.
  • Check that the workflow executes correctly and that OpenClaw’s Markdown memory and tool trace are updated.

4. Multimodal behavior (optional)

If and when Qwen3.5‑0.8B’s vision features are exposed via Ollama and your configuration, test:

  • Image captioning on small images.
  • Extracting text‑level descriptions of charts or diagrams.

Coverage confirms that even the small Qwen 3.5 models are natively multimodal, but specific runtime support may lag behind.

5. Stability and resource use

  • Monitor CPU and RAM usage while running long sessions.
  • Ensure that running Qwen3.5‑0.8B plus OpenClaw does not cause swapping or overheating on your device.
  • If necessary, lower context length or concurrency in Ollama’s configuration for smoother CPU‑only operation.

Quick comparison table: this stack vs cloud APIs

AspectQwen3.5‑0.8B + OpenClaw + Ollama (CPU‑only)Typical cloud LLM API
CostFree beyond hardware and electricityPer‑token or monthly fees
PrivacyData stays on local machineData sent to external servers
LatencyLocal; may be slower on weak CPUsOften fast, backed by large GPU clusters
ControlFull control over models, tools, and workflowsLimited to provider’s models and features
Setup difficultyMedium – requires installing three toolsLow – call a REST API
Offline useYes, once models are downloadedUsually no

For many developers and privacy‑sensitive users, the trade‑off of slightly more setup and possibly lower raw model quality is acceptable given the privacy, cost, and control benefits.


FAQs

  1. Can I really run Qwen3.5‑0.8B on an old laptop CPU?
    Yes, as long as you have at least 8 GB RAM and a reasonably modern 64‑bit CPU, the ~500 MB model should run, though responses may be slower on very old hardware.
  2. How does Qwen3.5‑0.8B compare to using a 4B or 9B model?
    0.8B is much faster and lighter but weaker in complex reasoning; 4B and 9B provide far better benchmark scores and quality but are harder to run on CPU‑only machines.
  3. Is there any cost or license restriction for commercial projects?
    Qwen 3.5 uses Apache‑2.0, Ollama and OpenClaw are free to self‑host, so commercial use is allowed as long as you respect the licenses and notices.
  4. Can I swap in other models without changing OpenClaw?
    Yes, usually you just change the model field in the Ollama API call (for example, from qwen3.5:0.8b to a Llama or Gemma model) and keep the rest of the workflow identical.
  5. Is CPU‑only enough, or do I need a GPU later?
    CPU‑only is fine for 0.8B and 2B models; if you later want 9B or larger models with lower latency, adding a GPU or using a remote GPU host becomes more attractive.

🚀 Try Codersera Free for 7 Days

Connect with top remote developers instantly. No commitment, no risk.

✓ 7-day free trial✓ No credit card required✓ Cancel anytime