git-dual-context

module

v0.0.0-...-b103b01 Latest Latest Go to latest Published: Jan 24, 2026 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/kerneldump/git-dual-context

Links

Open Source Insights

README ¶

Git Dual-Context Analysis

Automated Bug Diagnosis via Dual-Context Diff Analysis & LLMs

git-dual-context is a proof-of-concept tool that implements the Dual-Context Diff Analysis theoretical framework. It leverages Large Language Models (LLMs) to diagnose complex software bugs by analyzing commits through two distinct lenses:

Standard Diff (Micro): The immediate changes introduced by a commit.
Full Comparison Diff (Macro): The evolutionary changes of those files from the commit time to the current HEAD.

By synthesizing these two signals, the tool can identify "sleeper bugs"—issues that arise not from the immediate change, but from how that change interacts with future code evolution (refactors, new feature interactions, etc.).

The Theory

This tool is the reference implementation for the paper:
Enhanced Bug Diagnosis via Dual-Context Diff Analysis

Traditionally, debugging focuses on "What did this commit change?". Use this tool when you need to answer: "How does this commit interact with the current state of the world?"

Features

Automated Hypothesis Testing: Automatically scans the last N commits to calculate the probability ($P(H_k|E)$) that a specific commit caused a given bug.
Dual-Context Analysis:
- Generates Standard Diffs (with context lines) to understand developer intent.
- Generates Full Comparison Diffs to understand evolutionary context.
LLM Integration: Uses Google's Gemini models with configurable model selection.
Smart Filtering: Automatically excludes lock files, vendor directories, test files, CI/CD configs, and build artifacts to focus on logic changes and conserve tokens.
Ordered Streaming Output: Results stream in commit order as they become available—no waiting for all analyses to complete.
Retry Logic: Automatic exponential backoff for rate limits and transient failures.
Graceful Shutdown: Clean handling of Ctrl+C with proper cleanup.

Usage

Prerequisites

Go 1.21+ installed.
A Google Gemini API Key (Get one here).

Installation

# Clone the repository
git clone https://github.com/kerneldump/git-dual-context.git
cd git-dual-context

# Build the binary
go build -o git-commit-analysis ./cmd/git-commit-analysis

Running the Tool

Set your API Key:

export GEMINI_API_KEY="your_api_key_here"

Run the Analysis:

./git-commit-analysis \
  -repo="https://github.com/kerneldump/signal-sentry.git" \
  -error="interval must be greater than 0, got -2" \
  -n 5

Command-Line Options

Flag	Default	Description
`-repo`	`.`	Path to git repository or remote URL
`-branch`	current HEAD	Branch to analyze
`-error`	(required)	The error message or bug description to analyze
`-n`	`5`	Number of commits to analyze
`-j`	`3`	Number of concurrent workers
`-model`	`models/gemini-flash-latest`	Gemini model to use
`-timeout`	`10m`	Timeout per commit analysis
`-o`	stdout	Output file path
`-apikey`	env `GEMINI_API_KEY`	Google Gemini API Key
`-v`	`false`	Verbose output (debug info)

Examples

# Analyze local repository
./git-commit-analysis -error="panic: index out of bounds" -n 10

# Analyze remote repository with custom model
./git-commit-analysis \
  -repo="https://github.com/user/repo.git" \
  -error="connection timeout" \
  -model="models/gemini-1.5-flash"

# Save output to file
./git-commit-analysis -error="nil pointer" -o results.json

# Analyze specific branch with verbose output
./git-commit-analysis \
  -branch="feature/auth" \
  -error="401 unauthorized" \
  -v

# Use fewer workers to avoid rate limits
./git-commit-analysis -error="timeout" -j 1 -n 20

Model Context Protocol (MCP)

This tool is available as an MCP Server, allowing you to use it directly within AI agents (like Gemini-CLI, Claude Desktop, or Cursor) to diagnose bugs in your local repositories.

The server exposes the analyze_root_cause tool, which wraps the core dual-context analysis logic.

For installation and usage instructions, see cmd/mcp-server/README.md.

Library Usage

git-dual-context can be used as a library in your Go projects.

Installation

go get github.com/kerneldump/git-dual-context

Basic Example

package main

import (
	"context"
	"fmt"
	"log"
	"os"

	"github.com/kerneldump/git-dual-context/pkg/analyzer"
	"github.com/go-git/go-git/v5"
	"github.com/google/generative-ai-go/genai"
	"google.golang.org/api/option"
)

func main() {
	ctx := context.Background()
	apiKey := os.Getenv("GEMINI_API_KEY")

	client, err := genai.NewClient(ctx, option.WithAPIKey(apiKey))
	if err != nil {
		log.Fatal(err)
	}
	defer client.Close()

	model := client.GenerativeModel("models/gemini-1.5-pro")
	repo, _ := git.PlainOpen(".")
	headRef, _ := repo.Head()
	headCommit, _ := repo.CommitObject(headRef.Hash())

	errorMsg := "The system is returning a 500 error on the /login endpoint"
	
	// Perform the dual-context analysis
	result, err := analyzer.AnalyzeCommit(ctx, repo, headCommit, headCommit, errorMsg, model)
	if err != nil {
		log.Fatal(err)
	}

	if !result.Skipped {
		fmt.Printf("Probability: %s\n", result.Probability)
		fmt.Printf("Reasoning: %s\n", result.Reasoning)
	}
}

For more details, see examples/basic_usage/main.go.

Core Packages

pkg/analyzer: The reasoning engine. Handles prompt construction, LLM interaction, and response parsing.
pkg/gitdiff: Diff extraction and filtering logic. Handles standard and evolutionary diff generation.

Output Format (NDJSON)

The tool outputs results in Newline Delimited JSON (NDJSON) format. Results stream in commit order as they become available. Output types are distinguished by the type field:

Type	Description
`"result"`	Analysis findings with `hash`, `message`, `probability`, `reasoning`
`"log"`	Progress and status updates with `level`, `msg`, `timestamp`
`"summary"`	Final summary with `total`, `high`, `medium`, `low`, `skipped`, `errors`

Pro-tip: Filter with `jq`

# Show only high-probability commits
./git-commit-analysis -error="..." | jq 'select(.type=="result" and .probability=="HIGH")'

# Get clean result stream (no logs)
./git-commit-analysis -error="..." | jq 'select(.type=="result")'

# Show just the summary
./git-commit-analysis -error="..." | jq 'select(.type=="summary")'

Example Output

{"type":"log","level":"INFO","msg":"Cloning https://github.com/... into temporary directory...","timestamp":"2026-01-18T10:15:00Z"}
{"type":"log","level":"INFO","msg":"Analyzing last 5 commits for error: \"interval must be greater than 0, got -2\"","timestamp":"2026-01-18T10:15:05Z"}
{"type":"result","hash":"be8f779e","message":"Allow negative durations in TimeFilter","probability":"HIGH","reasoning":"The commit modifies NewTimeFilter to accept negative durations instead of ignoring them, which eventually reaches a ticker validation check."}
{"type":"result","hash":"1c932131","message":"Refactor axis bounds calculation","probability":"MEDIUM","reasoning":"The commit modifies axis bounds calculation, which could potentially result in negative intervals in edge cases."}
{"type":"result","hash":"26cb336c","message":"Update README documentation","probability":"LOW","reasoning":"Documentation only change."}
{"type":"summary","total":5,"high":1,"medium":1,"low":1,"skipped":2,"errors":0}

File Filtering

The tool automatically skips files that rarely cause logic bugs:

Category	Examples
Lock files	`go.sum`, `package-lock.json`, `yarn.lock`, `Cargo.lock`, `poetry.lock`
Test files	`_test.go`, `.test.js`, `.spec.ts`, `test_.py`
Vendor/deps	`vendor/`, `node_modules/`
Build output	`dist/`, `build/`, `out/`
CI/CD	`.github/workflows/`, `.gitlab-ci.yml`, `.travis.yml`
IDE config	`.idea/`, `.vscode/`
Cache	`__pycache__/`, `.pytest_cache/`

Limitations & Notes

Token Usage: Analyzing large commits or many files consumes significant context. The tool filters irrelevant files and truncates large diffs (>50KB) automatically.
Rate Limits: The tool includes automatic retry with exponential backoff for rate limit errors (429) and transient failures. Reduce -j workers if you still hit limits.
API Key Security: Prefer the GEMINI_API_KEY environment variable over -apikey flag (command-line args are visible in process lists).

Development

Running Tests

go test ./... -v

Building

go build -o git-commit-analysis ./cmd/git-commit-analysis

License

MIT

Directories ¶

Path	Synopsis
cmd
git-commit-analysis command
mcp-server command
mcp-server/internal/tools
examples
basic_usage command
pkg
analyzer Package analyzer constants for git-dual-context	Package analyzer constants for git-dual-context
config Package config provides configuration file support for git-dual-context	Package config provides configuration file support for git-dual-context
gitdiff Package gitdiff provides utilities for extracting and filtering git diffs.	Package gitdiff provides utilities for extracting and filtering git diffs.
validator Package validator provides input validation utilities for security and safety	Package validator provides input validation utilities for security and safety

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

Git Dual-Context Analysis

The Theory

Features

Usage

Prerequisites

Installation

Running the Tool

Command-Line Options

Examples

Model Context Protocol (MCP)

Library Usage

Installation

Basic Example

Core Packages

Output Format (NDJSON)

Pro-tip: Filter with jq

Example Output

File Filtering

Limitations & Notes

Development

Running Tests

Building

License

Directories ¶

Pro-tip: Filter with `jq`