vis

module

v1.0.0 Latest Latest Go to latest Published: Apr 14, 2026 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/uelkerd/vis

Links

Open Source Insights

README ¶

  ___      ___  __    ________  
  \  \    /  / |  |  /        | 
   \  \  /  /  |  | |   (-----' 
    \  \/  /   |  |  \   \      
     \    /    |  | .----)   |  
      \__/     |__| |________/  

     Visual Tester

VIS — Autonomous Android Testing Agent

VIS is an autonomous testing agent for Android that combines UIAutomator accessibility trees with multi-modal vision models. It sees the screen, understands context, and takes action — no brittle XPath selectors required.

Core Value: Test Android apps like a human would.

🧠 Semantic understanding — finds elements by meaning, not static IDs
🔄 Self-healing — falls back to visual analysis when standard selectors fail
🔒 Local sovereignty — runs models via Ollama, your data stays on your machine
⚡ Fast — optimized Go core with async capture and streaming

Quick Start

# Install
git clone https://github.com/uelkerd/vis.git
cd vis && make build

# Ensure prerequisites are running
ollama pull llama3.2-vision:11b   # Vision model
adb devices                       # Verify device connected

# Run your first task
./bin/vis --task "open the Settings app"

Installation

From Source (recommended)

git clone https://github.com/uelkerd/vis.git
cd vis
make build          # Binary at ./bin/vis
make install        # Installs to $GOPATH/bin

From GitHub Releases

Download pre-built binaries for your platform from Releases:

# macOS (Apple Silicon)
curl -L https://github.com/uelkerd/vis/releases/latest/download/vis_Darwin_arm64.tar.gz | tar xz
chmod +x vis && mv vis /usr/local/bin/

# Linux (x86_64)
curl -L https://github.com/uelkerd/vis/releases/latest/download/vis_Linux_x86_64.tar.gz | tar xz
chmod +x vis && mv vis /usr/local/bin/

Prerequisites

Dependency	Purpose	Install
Go 1.24+	Build from source	golang.org/dl
ADB	Android device control	`brew install android-platform-tools` or Android SDK
Ollama	Local vision model inference	ollama.com
Android device	Physical or emulated	USB debugging enabled

Usage

Natural Language Tasks (`--task`)

Describe what you want in plain English. VIS parses the intent, resolves app names, and executes on the device.

# Launch apps (human-readable names resolved automatically)
vis --task "open the Settings app"
vis --task "open Calculator"
vis --task "open Chrome"

# Navigation
vis --task "scroll down"
vis --task "press back"
vis --task "go home"

# Interact with elements
vis --task "tap on the search button"
vis --task "type 'hello world' into the search field"

# With verbose logging
vis --task "open Settings" -v      # DEBUG level
vis --task "open Settings" -vv     # TRACE level (most detailed)
vis --task "open Settings" -q      # Quiet (warnings/errors only)

Dry Run Mode (`--dry-run`)

Parse and plan without touching the device — useful for validating NLP parsing.

vis --task "open Calculator and type 123" --dry-run -v
# Logs: "dry-run: would execute action" with parsed intent details

Vision Streaming (`--stream`)

Continuous screen analysis — VIS captures and describes what it sees in real-time.

vis --stream                  # Run indefinitely (Ctrl+C to stop)
vis --stream -v               # With debug output

Maestro Flows (`--maestro`)

Run structured test flows defined in YAML.

vis --maestro flows/login-test.yaml
vis --maestro flows/checkout.yaml -v

Hybrid Vision-Flows (`--hybrid`)

Combine structured flows with vision-based fallbacks.

vis --hybrid flows/search-flow.yaml

Test Cycles (`--test-cycle`)

Run continuous iteration cycles for stress testing.

vis --test-cycle 10            # Run 10 iterations
vis --test-cycle 50 -v         # 50 iterations with debug logging

MCP Server Mode (`--server`)

Start VIS as an MCP (Model Context Protocol) server for integration with other tools.

vis --server                  # Start MCP server on stdin/stdout
vis --mcp                     # Alias for --server

Environment Setup (`setup`)

Check prerequisites and download required models.

vis setup

Device Targeting (`--device`)

Target a specific device when multiple are connected.

vis --task "open Settings" --device 29021FDH2009DQ
vis --task "open Settings" --device emulator-5554

Report Control (`--report`)

Reports are generated by default to reports/ (auto-cleaned, keeps 10 most recent).

vis --task "open Settings" --report=false   # Disable report generation

Environment Variables

Variable	Default	Description
`VIS_MODEL`	`moondream:latest`	Vision model for screen analysis
`VIS_NLU_MODEL`	`llama3.1:latest`	NLU model for natural language parsing
`VIS_OLLAMA_URL`	`http://localhost:11434/api/generate`	Ollama API endpoint
`VIS_TIMEOUT`	`120`	Model timeout in seconds
`TEST_DEVICE_ID`	(none)	Specific ADB device for tests

# Example: configure for production use
export VIS_MODEL="llama3.2-vision:11b"
export VIS_NLU_MODEL="qwen-agentic:latest"
export VIS_TIMEOUT=180

Known Apps

VIS resolves human-readable app names to Android package IDs automatically:

Name	Package
Settings	`com.android.settings`
Calculator	`com.google.android.calculator`
Chrome	`com.android.chrome`
Gmail	`com.google.android.gm`
Maps	`com.google.android.apps.maps`
Camera	`com.google.android.GoogleCamera`
Calendar	`com.google.android.calendar`
Phone	`com.google.android.dialer`
Files	`com.google.android.apps.nbu.files`
Clock	`com.google.android.deskclock`
Photos	`com.google.android.apps.photos`
Expo Go	`host.exp.exponent`

Any unrecognized name is passed through as a raw package ID.

Architecture

VIS follows a Capture-Analyze-Decide-Act (CADA) autonomous agent loop:

┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐
│ CAPTURE  │───▶│ ANALYZE  │───▶│  DECIDE  │───▶│   ACT   │
│ ADB      │    │ Ollama   │    │ Agent    │    │ ADB     │
│ screencap│    │ Vision   │    │ NLP      │    │ tap/    │
│ uidump   │    │ Model    │    │ Parser   │    │ swipe   │
└─────────┘    └─────────┘    └─────────┘    └─────────┘
     ▲                                            │
     └────────────────────────────────────────────┘
                    (continuous loop)

Capture — Screenshots via ADB with JPEG compression and caching
Analyze — Vision models interpret screen content semantically
Decide — NLP parser + agent logic determines the next action
Act — ADB executes taps, swipes, inputs, key events

Project Structure

cmd/vis/              CLI entry point
internal/
├── adb/              ADB device control (taps, swipes, inputs, key events)
├── agent/            Core CADA loop orchestration
├── capture/          Screenshot acquisition and caching
├── config/           Environment-based configuration
├── hybrid/           Hybrid selector engine
├── livefeed/         Scrcpy live feed integration
├── mcp/              Model Context Protocol server
├── nlp/              Natural language task parsing
├── reporting/        HTML and JUnit report generation
├── resilience/       Circuit breaker and retry patterns
├── selector/         Self-healing element location engine
├── setup/            Ollama environment setup
├── types/            Shared domain types
└── vis/              Vision model client (Ollama API)
scripts/              Build & test automation
e2e/                  End-to-end tests (requires device + Ollama)

Development

make build          # Build binary
make test           # Run unit tests
make test-cover     # Run tests with coverage
make lint           # Run linter
make clean          # Clean build artifacts

# Physical device test suite (requires connected Android device + Ollama)
./scripts/device-test.sh

License

Distributed under the MIT License. See LICENSE for details.

Directories ¶

Path	Synopsis
cmd
vis command
internal
adb
agent Package agent provides hybrid executor adapter for Maestro flows.	Package agent provides hybrid executor adapter for Maestro flows.
capture
config
hybrid Package hybrid provides hybrid execution mode that combines deterministic Maestro selectors with vision-based healing for self-healing tests.	Package hybrid provides hybrid execution mode that combines deterministic Maestro selectors with vision-based healing for self-healing tests.
livefeed
maestro Package maestro provides Maestro-style YAML flow parsing and execution.	Package maestro provides Maestro-style YAML flow parsing and execution.
mcp
nlp
reporting
resilience Package resilience provides circuit breaker and retry patterns for external service calls.	Package resilience provides circuit breaker and retry patterns for external service calls.
selector Package selector provides self-healing element location strategies for UI automation.	Package selector provides self-healing element location strategies for UI automation.
setup Package setup provides first-run setup utilities for the Vision agent.	Package setup provides first-run setup utilities for the Vision agent.
storage
tasks Package tasks provides built-in task implementations for common Android testing operations.	Package tasks provides built-in task implementations for common Android testing operations.
textutil Package textutil provides text processing utilities for internationalization.	Package textutil provides text processing utilities for internationalization.
types Package types defines shared domain types for the Vision Debugging Agent.	Package types defines shared domain types for the Vision Debugging Agent.
vis

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL