9 min to read
AutoGLM-Phone-9B represents a paradigm shift in mobile automation. Unlike traditional scripts that rely on rigid XML hierarchies, this 9-billion parameter Visual Language Model (VLM) "sees" your phone screen like a human does and plans actions intelligently.
With a success rate of 36.2% on the rigorous AndroidLab benchmark (outperforming GPT-4o) and 89.7% on common tasks, it is currently the most advanced open-source solution for turning a standard Android device into an autonomous agent.
For over a decade, "smart" assistants like Siri and Google Assistant have been frustratingly limited. They can set timers or tell you the weather, but ask them to "Order my usual order from Starbucks and pay with the card ending in 1234," and they fail. They lack agency—the ability to interact with third-party app interfaces (GUIs) directly.
AutoGLM-Phone-9B is an open-source model developed by Zhipu AI (and the Open-AutoGLM community). It is a specialized version of the GLM (General Language Model) family, fine-tuned specifically for Graphical User Interface (GUI) interaction.
The "9B" refers to its 9 billion parameters. In the world of Large Language Models (LLMs), 9B is considered "mid-sized"—large enough to possess sophisticated reasoning and planning capabilities, but small enough to run on consumer-grade hardware (like a decent gaming PC) with relatively low latency.
Unlike a standard text-only model, AutoGLM is a Visual Language Model (VLM). Its architecture consists of two primary components working in a loop:
If you ask standard GPT-4 to "click the button," it can't. It has no concept of your screen's coordinate system. AutoGLM, however, has been trained on datasets of GUI interactions. It understands that a "magnifying glass" icon usually means "search" and that a "hamburger menu" contains settings. It outputs precise screen coordinates, allowing it to interact with any app, even ones it has never seen before.
In the rapidly evolving field of AI agents, reliability is the most critical metric. How often does the agent actually complete the task without getting stuck?
We compared AutoGLM-Phone-9B against the industry-leading generalist models, GPT-4o and Claude 3.5 Sonnet, on the AndroidLab (VAB-Mobile) benchmark. This is a rigorous test suite designed to measure an agent's ability to navigate complex apps.

Key Takeaways:
While the 36.2% score on difficult benchmarks might seem low, it represents complex, multi-step problem solving on unfamiliar apps. On common tasks within popular Chinese apps (like WeChat, Meituan, and Taobao) where the model has had more exposure, the success rate jumps to an impressive 89.7%. This means for daily routines—ordering food, booking rides, checking messages—it is highly reliable.
To run AutoGLM-Phone-9B, you need to distinguish between the Host (where the AI thinks) and the Client (the phone that acts).
This is where the 9B model resides.
Note: If you lack this hardware, you can use the API mode (connecting to Zhipu's cloud), but this article focuses on the open-source, local execution method.
Follow these steps to deploy AutoGLM. We assume you are using a Linux/WSL environment with Python installed.
First, ensure you have Anaconda or Miniconda installed to manage dependencies.
bash# Create a fresh environment for AutoGLM
conda create -n autoglm python=3.10
conda activate autoglm# Install PyTorch (ensure CUDA version matches your driver) torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install
Your computer needs to talk to your phone.
sudo apt-get install android-tools-adbVerify connection:
bashadb devices# Output should show: "List of devices attached"
# ZR222... device
Download the AutoGLM code from the official Zhipu AI / Open-AutoGLM repository.
bashgit clone https://github.com/zai-org/Open-AutoGLM.gitcd Open-AutoGLM# Install required python packages -r requirements.txt
pip installpip install -e .
Once installed, running the agent is straightforward. You will use a Python script to initiate the "Loop": Snapshot -> Inference -> Action.
This command tells the agent to use the 9B model to perform a specific task.
bashpython main.py \
--device-id YOUR_DEVICE_SERIAL_HERE \
--model "autoglm-phone-9b" \
--prompt "Open YouTube and search for 'SEO optimization tips for 2025' and play the first video"
For continuous usage, you can run an interactive session where the agent stays alive, waiting for new commands.
bashpython main.py --interactive --device-id YOUR_DEVICE_SERIAL
What happens next?
adb shell screencap).adb shell input tap x y) to open the app.How does AutoGLM stack up against other "Action Agents"?
| Feature | AutoGLM-Phone-9B | AppAgent (Tencent) | Ferret-UI (Apple) | GPT-4o (OpenAI) |
|---|---|---|---|---|
| Primary Method | Visual Language Model (VLM) | VLM + XML Exploration | Multimodal Understanding | Generalist VLM |
| Android Success Rate | 36.2% (Highest) | ~25-30% | N/A (UI Focus only) | 31.2% |
| Open Source? | Yes (Code & Weights) | Yes | Yes (Weights only) | No (API only) |
| Interaction Speed | Moderate (Local Inference) | Slow (Exploration Phase) | Fast (Efficiency focus) | Variable (Network Latency) |
| Deployment | Local GPU or API | Local GPU | Research / Local | Cloud API |
| Best For... | End-to-end Automation | App Testing & Exploration | Screen Understanding | Chat & General Queries |
Why choose AutoGLM-Phone-9B?
(x, y) pixel coordinates is superior to generalist models, significantly reducing "missed clicks."Common Issue 1: "ADB Device Offline"
Common Issue 2: Agent Clicks Wrong Location
--resolution flag in the configuration matches your device.Common Issue 3: Hallucinations (Stuck in a Loop)
Best Practice: The "Human-in-the-Loop"
For sensitive tasks (money transfer, deleting data), always run in "Interactive Mode" where the agent asks for confirmation before the final "Commit" tap.
Currently, AutoGLM-Phone-9B relies on a PC to do the heavy lifting. However, 2025/2026 flagship phones (Snapdragon 8 Gen 5, Dimensity 9500) are incorporating powerful NPUs (Neural Processing Units) capable of running 7B-10B models directly on the device.
We expect a "AutoGLM-Mobile-Quantized" version soon, which will allow this agent to run entirely on your phone without a PC connection, draining battery but offering true, portable autonomy.
AutoGLM-Phone-9B is an AI agent that visually understands your Android phone’s screen and performs actions like tapping, typing, searching, navigating apps, and completing multi-step tasks automatically.
You need a PC with an NVIDIA GPU (16–24GB VRAM recommended), 32GB RAM, and around 50GB storage. Your Android phone must have USB debugging enabled.
Yes. It achieves a 36.2% success rate on AndroidLab—higher than GPT-4o and Claude—because it is fine-tuned specifically for mobile UI interaction.
Not yet. Today it requires a PC host, but future quantized versions may run directly on-device as NPUs grow more powerful.
Yes. The software and model weights are open-source. You only pay for your hardware or optional API usage.
AutoGLM-Phone-9B is not just a tech demo; it is a glimpse into the future of computing. By moving beyond text generation to actual action execution, it transforms the smartphone from a tool that demands your attention into an agent that saves it. Whether you are an SEO expert automating keyword research on mobile apps or a developer testing UI flows, AutoGLM offers the most advanced, open, and reliable toolkit available today.
Connect with top remote developers instantly. No commitment, no risk.
Tags
Discover our most popular articles and guides
Running Android emulators on low-end PCs—especially those without Virtualization Technology (VT) or a dedicated graphics card—can be a challenge. Many popular emulators rely on hardware acceleration and virtualization to deliver smooth performance.
The demand for Android emulation has soared as users and developers seek flexible ways to run Android apps and games without a physical device. Online Android emulators, accessible directly through a web browser.
Discover the best free iPhone emulators that work online without downloads. Test iOS apps and games directly in your browser.
Top Android emulators optimized for gaming performance. Run mobile games smoothly on PC with these powerful emulators.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.
ApkOnline is a cloud-based Android emulator that allows users to run Android apps and APK files directly from their web browsers, eliminating the need for physical devices or complex software installations.
Choosing the right Android emulator can transform your experience—whether you're a gamer, developer, or just want to run your favorite mobile apps on a bigger screen.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.