___ ___ __ ________
\ \ / / | | / |
\ \ / / | | | (-----'
\ \/ / | | \ \
\ / | | .----) |
\__/ |__| |________/
Visual Tester
VIS β Autonomous Android Testing Agent

VIS is an autonomous testing agent for Android that combines UIAutomator accessibility trees with multi-modal vision models. It sees the screen, understands context, and takes action β no brittle XPath selectors required.
Core Value: Test Android apps like a human would.
- π§ Semantic understanding β finds elements by meaning, not static IDs
- π Self-healing β falls back to visual analysis when standard selectors fail
- π Local sovereignty β runs models via Ollama, your data stays on your machine
- β‘ Fast β optimized Go core with async capture and streaming
Quick Start
# Install
git clone https://github.com/uelkerd/vis.git
cd vis && make build
# Ensure prerequisites are running
ollama pull llama3.2-vision:11b # Vision model
adb devices # Verify device connected
# Run your first task
./bin/vis --task "open the Settings app"
Installation
From Source (recommended)
git clone https://github.com/uelkerd/vis.git
cd vis
make build # Binary at ./bin/vis
make install # Installs to $GOPATH/bin
From GitHub Releases
Download pre-built binaries for your platform from Releases:
# macOS (Apple Silicon)
curl -L https://github.com/uelkerd/vis/releases/latest/download/vis_Darwin_arm64.tar.gz | tar xz
chmod +x vis && mv vis /usr/local/bin/
# Linux (x86_64)
curl -L https://github.com/uelkerd/vis/releases/latest/download/vis_Linux_x86_64.tar.gz | tar xz
chmod +x vis && mv vis /usr/local/bin/
Prerequisites
| Dependency |
Purpose |
Install |
| Go 1.24+ |
Build from source |
golang.org/dl |
| ADB |
Android device control |
brew install android-platform-tools or Android SDK |
| Ollama |
Local vision model inference |
ollama.com |
| Android device |
Physical or emulated |
USB debugging enabled |
Usage
Natural Language Tasks (--task)
Describe what you want in plain English. VIS parses the intent, resolves app names, and executes on the device.
# Launch apps (human-readable names resolved automatically)
vis --task "open the Settings app"
vis --task "open Calculator"
vis --task "open Chrome"
# Navigation
vis --task "scroll down"
vis --task "press back"
vis --task "go home"
# Interact with elements
vis --task "tap on the search button"
vis --task "type 'hello world' into the search field"
# With verbose logging
vis --task "open Settings" -v # DEBUG level
vis --task "open Settings" -vv # TRACE level (most detailed)
vis --task "open Settings" -q # Quiet (warnings/errors only)
Dry Run Mode (--dry-run)
Parse and plan without touching the device β useful for validating NLP parsing.
vis --task "open Calculator and type 123" --dry-run -v
# Logs: "dry-run: would execute action" with parsed intent details
Vision Streaming (--stream)
Continuous screen analysis β VIS captures and describes what it sees in real-time.
vis --stream # Run indefinitely (Ctrl+C to stop)
vis --stream -v # With debug output
Maestro Flows (--maestro)
Run structured test flows defined in YAML.
vis --maestro flows/login-test.yaml
vis --maestro flows/checkout.yaml -v
Hybrid Vision-Flows (--hybrid)
Combine structured flows with vision-based fallbacks.
vis --hybrid flows/search-flow.yaml
Test Cycles (--test-cycle)
Run continuous iteration cycles for stress testing.
vis --test-cycle 10 # Run 10 iterations
vis --test-cycle 50 -v # 50 iterations with debug logging
MCP Server Mode (--server)
Start VIS as an MCP (Model Context Protocol) server for integration with other tools.
vis --server # Start MCP server on stdin/stdout
vis --mcp # Alias for --server
Environment Setup (setup)
Check prerequisites and download required models.
vis setup
Device Targeting (--device)
Target a specific device when multiple are connected.
vis --task "open Settings" --device 29021FDH2009DQ
vis --task "open Settings" --device emulator-5554
Report Control (--report)
Reports are generated by default to reports/ (auto-cleaned, keeps 10 most recent).
vis --task "open Settings" --report=false # Disable report generation
Environment Variables
| Variable |
Default |
Description |
VIS_MODEL |
moondream:latest |
Vision model for screen analysis |
VIS_NLU_MODEL |
llama3.1:latest |
NLU model for natural language parsing |
VIS_OLLAMA_URL |
http://localhost:11434/api/generate |
Ollama API endpoint |
VIS_TIMEOUT |
120 |
Model timeout in seconds |
TEST_DEVICE_ID |
(none) |
Specific ADB device for tests |
# Example: configure for production use
export VIS_MODEL="llama3.2-vision:11b"
export VIS_NLU_MODEL="qwen-agentic:latest"
export VIS_TIMEOUT=180
Known Apps
VIS resolves human-readable app names to Android package IDs automatically:
| Name |
Package |
| Settings |
com.android.settings |
| Calculator |
com.google.android.calculator |
| Chrome |
com.android.chrome |
| Gmail |
com.google.android.gm |
| Maps |
com.google.android.apps.maps |
| Camera |
com.google.android.GoogleCamera |
| Calendar |
com.google.android.calendar |
| Phone |
com.google.android.dialer |
| Files |
com.google.android.apps.nbu.files |
| Clock |
com.google.android.deskclock |
| Photos |
com.google.android.apps.photos |
| Expo Go |
host.exp.exponent |
Any unrecognized name is passed through as a raw package ID.
Architecture
VIS follows a Capture-Analyze-Decide-Act (CADA) autonomous agent loop:
βββββββββββ βββββββββββ βββββββββββ βββββββββββ
β CAPTURE βββββΆβ ANALYZE βββββΆβ DECIDE βββββΆβ ACT β
β ADB β β Ollama β β Agent β β ADB β
β screencapβ β Vision β β NLP β β tap/ β
β uidump β β Model β β Parser β β swipe β
βββββββββββ βββββββββββ βββββββββββ βββββββββββ
β² β
ββββββββββββββββββββββββββββββββββββββββββββββ
(continuous loop)
- Capture β Screenshots via ADB with JPEG compression and caching
- Analyze β Vision models interpret screen content semantically
- Decide β NLP parser + agent logic determines the next action
- Act β ADB executes taps, swipes, inputs, key events
Project Structure
cmd/vis/ CLI entry point
internal/
βββ adb/ ADB device control (taps, swipes, inputs, key events)
βββ agent/ Core CADA loop orchestration
βββ capture/ Screenshot acquisition and caching
βββ config/ Environment-based configuration
βββ hybrid/ Hybrid selector engine
βββ livefeed/ Scrcpy live feed integration
βββ mcp/ Model Context Protocol server
βββ nlp/ Natural language task parsing
βββ reporting/ HTML and JUnit report generation
βββ resilience/ Circuit breaker and retry patterns
βββ selector/ Self-healing element location engine
βββ setup/ Ollama environment setup
βββ types/ Shared domain types
βββ vis/ Vision model client (Ollama API)
scripts/ Build & test automation
e2e/ End-to-end tests (requires device + Ollama)
Development
make build # Build binary
make test # Run unit tests
make test-cover # Run tests with coverage
make lint # Run linter
make clean # Clean build artifacts
# Physical device test suite (requires connected Android device + Ollama)
./scripts/device-test.sh
License
Distributed under the MIT License. See LICENSE for details.