vis

module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 14, 2026 License: MIT

README ΒΆ

  ___      ___  __    ________  
  \  \    /  / |  |  /        | 
   \  \  /  /  |  | |   (-----' 
    \  \/  /   |  |  \   \      
     \    /    |  | .----)   |  
      \__/     |__| |________/  

     Visual Tester

VIS β€” Autonomous Android Testing Agent

codecov Go Report Card Go Version

VIS is an autonomous testing agent for Android that combines UIAutomator accessibility trees with multi-modal vision models. It sees the screen, understands context, and takes action β€” no brittle XPath selectors required.

Core Value: Test Android apps like a human would.

  • 🧠 Semantic understanding β€” finds elements by meaning, not static IDs
  • πŸ”„ Self-healing β€” falls back to visual analysis when standard selectors fail
  • πŸ”’ Local sovereignty β€” runs models via Ollama, your data stays on your machine
  • ⚑ Fast β€” optimized Go core with async capture and streaming

Quick Start

# Install
git clone https://github.com/uelkerd/vis.git
cd vis && make build

# Ensure prerequisites are running
ollama pull llama3.2-vision:11b   # Vision model
adb devices                       # Verify device connected

# Run your first task
./bin/vis --task "open the Settings app"

Installation

git clone https://github.com/uelkerd/vis.git
cd vis
make build          # Binary at ./bin/vis
make install        # Installs to $GOPATH/bin
From GitHub Releases

Download pre-built binaries for your platform from Releases:

# macOS (Apple Silicon)
curl -L https://github.com/uelkerd/vis/releases/latest/download/vis_Darwin_arm64.tar.gz | tar xz
chmod +x vis && mv vis /usr/local/bin/

# Linux (x86_64)
curl -L https://github.com/uelkerd/vis/releases/latest/download/vis_Linux_x86_64.tar.gz | tar xz
chmod +x vis && mv vis /usr/local/bin/
Prerequisites
Dependency Purpose Install
Go 1.24+ Build from source golang.org/dl
ADB Android device control brew install android-platform-tools or Android SDK
Ollama Local vision model inference ollama.com
Android device Physical or emulated USB debugging enabled

Usage

Natural Language Tasks (--task)

Describe what you want in plain English. VIS parses the intent, resolves app names, and executes on the device.

# Launch apps (human-readable names resolved automatically)
vis --task "open the Settings app"
vis --task "open Calculator"
vis --task "open Chrome"

# Navigation
vis --task "scroll down"
vis --task "press back"
vis --task "go home"

# Interact with elements
vis --task "tap on the search button"
vis --task "type 'hello world' into the search field"

# With verbose logging
vis --task "open Settings" -v      # DEBUG level
vis --task "open Settings" -vv     # TRACE level (most detailed)
vis --task "open Settings" -q      # Quiet (warnings/errors only)
Dry Run Mode (--dry-run)

Parse and plan without touching the device β€” useful for validating NLP parsing.

vis --task "open Calculator and type 123" --dry-run -v
# Logs: "dry-run: would execute action" with parsed intent details
Vision Streaming (--stream)

Continuous screen analysis β€” VIS captures and describes what it sees in real-time.

vis --stream                  # Run indefinitely (Ctrl+C to stop)
vis --stream -v               # With debug output
Maestro Flows (--maestro)

Run structured test flows defined in YAML.

vis --maestro flows/login-test.yaml
vis --maestro flows/checkout.yaml -v
Hybrid Vision-Flows (--hybrid)

Combine structured flows with vision-based fallbacks.

vis --hybrid flows/search-flow.yaml
Test Cycles (--test-cycle)

Run continuous iteration cycles for stress testing.

vis --test-cycle 10            # Run 10 iterations
vis --test-cycle 50 -v         # 50 iterations with debug logging
MCP Server Mode (--server)

Start VIS as an MCP (Model Context Protocol) server for integration with other tools.

vis --server                  # Start MCP server on stdin/stdout
vis --mcp                     # Alias for --server
Environment Setup (setup)

Check prerequisites and download required models.

vis setup
Device Targeting (--device)

Target a specific device when multiple are connected.

vis --task "open Settings" --device 29021FDH2009DQ
vis --task "open Settings" --device emulator-5554
Report Control (--report)

Reports are generated by default to reports/ (auto-cleaned, keeps 10 most recent).

vis --task "open Settings" --report=false   # Disable report generation

Environment Variables

Variable Default Description
VIS_MODEL moondream:latest Vision model for screen analysis
VIS_NLU_MODEL llama3.1:latest NLU model for natural language parsing
VIS_OLLAMA_URL http://localhost:11434/api/generate Ollama API endpoint
VIS_TIMEOUT 120 Model timeout in seconds
TEST_DEVICE_ID (none) Specific ADB device for tests
# Example: configure for production use
export VIS_MODEL="llama3.2-vision:11b"
export VIS_NLU_MODEL="qwen-agentic:latest"
export VIS_TIMEOUT=180

Known Apps

VIS resolves human-readable app names to Android package IDs automatically:

Name Package
Settings com.android.settings
Calculator com.google.android.calculator
Chrome com.android.chrome
Gmail com.google.android.gm
Maps com.google.android.apps.maps
Camera com.google.android.GoogleCamera
Calendar com.google.android.calendar
Phone com.google.android.dialer
Files com.google.android.apps.nbu.files
Clock com.google.android.deskclock
Photos com.google.android.apps.photos
Expo Go host.exp.exponent

Any unrecognized name is passed through as a raw package ID.


Architecture

VIS follows a Capture-Analyze-Decide-Act (CADA) autonomous agent loop:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ CAPTURE  │───▢│ ANALYZE  │───▢│  DECIDE  │───▢│   ACT   β”‚
β”‚ ADB      β”‚    β”‚ Ollama   β”‚    β”‚ Agent    β”‚    β”‚ ADB     β”‚
β”‚ screencapβ”‚    β”‚ Vision   β”‚    β”‚ NLP      β”‚    β”‚ tap/    β”‚
β”‚ uidump   β”‚    β”‚ Model    β”‚    β”‚ Parser   β”‚    β”‚ swipe   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
     β–²                                            β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    (continuous loop)
  1. Capture β€” Screenshots via ADB with JPEG compression and caching
  2. Analyze β€” Vision models interpret screen content semantically
  3. Decide β€” NLP parser + agent logic determines the next action
  4. Act β€” ADB executes taps, swipes, inputs, key events

Project Structure

cmd/vis/              CLI entry point
internal/
β”œβ”€β”€ adb/              ADB device control (taps, swipes, inputs, key events)
β”œβ”€β”€ agent/            Core CADA loop orchestration
β”œβ”€β”€ capture/          Screenshot acquisition and caching
β”œβ”€β”€ config/           Environment-based configuration
β”œβ”€β”€ hybrid/           Hybrid selector engine
β”œβ”€β”€ livefeed/         Scrcpy live feed integration
β”œβ”€β”€ mcp/              Model Context Protocol server
β”œβ”€β”€ nlp/              Natural language task parsing
β”œβ”€β”€ reporting/        HTML and JUnit report generation
β”œβ”€β”€ resilience/       Circuit breaker and retry patterns
β”œβ”€β”€ selector/         Self-healing element location engine
β”œβ”€β”€ setup/            Ollama environment setup
β”œβ”€β”€ types/            Shared domain types
└── vis/              Vision model client (Ollama API)
scripts/              Build & test automation
e2e/                  End-to-end tests (requires device + Ollama)

Development

make build          # Build binary
make test           # Run unit tests
make test-cover     # Run tests with coverage
make lint           # Run linter
make clean          # Clean build artifacts

# Physical device test suite (requires connected Android device + Ollama)
./scripts/device-test.sh

License

Distributed under the MIT License. See LICENSE for details.

Directories ΒΆ

Path Synopsis
cmd
vis command
internal
adb
agent
Package agent provides hybrid executor adapter for Maestro flows.
Package agent provides hybrid executor adapter for Maestro flows.
hybrid
Package hybrid provides hybrid execution mode that combines deterministic Maestro selectors with vision-based healing for self-healing tests.
Package hybrid provides hybrid execution mode that combines deterministic Maestro selectors with vision-based healing for self-healing tests.
maestro
Package maestro provides Maestro-style YAML flow parsing and execution.
Package maestro provides Maestro-style YAML flow parsing and execution.
mcp
nlp
resilience
Package resilience provides circuit breaker and retry patterns for external service calls.
Package resilience provides circuit breaker and retry patterns for external service calls.
selector
Package selector provides self-healing element location strategies for UI automation.
Package selector provides self-healing element location strategies for UI automation.
setup
Package setup provides first-run setup utilities for the Vision agent.
Package setup provides first-run setup utilities for the Vision agent.
tasks
Package tasks provides built-in task implementations for common Android testing operations.
Package tasks provides built-in task implementations for common Android testing operations.
textutil
Package textutil provides text processing utilities for internationalization.
Package textutil provides text processing utilities for internationalization.
types
Package types defines shared domain types for the Vision Debugging Agent.
Package types defines shared domain types for the Vision Debugging Agent.
vis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL