Bulwarkai


An AI safety proxy that screens every request and response between client applications and Google Vertex AI. Written in Go, deployed on Cloud Run.
bulwarkai.cloud -- landing page, documentation, and API reference.
Why This Exists
This project should not need to exist. The day Google ships built-in prompt/response screening as a first-class Vertex AI feature, this repository gets archived.
Google is catching up. Agent Gateway (Private Preview) is the closest thing to Bulwarkai in Google's portfolio. It routes agent traffic through a governed gateway with Model Armor integration. But it is limited to the Gemini Enterprise Agent Platform; it does not proxy general Vertex AI traffic from arbitrary clients like opencode or Claude Code, does not translate API formats, and does not add structured data screening. The table below shows where each gap stands.
Without a network-level proxy, screening is a policy rather than a control. Library-based screening requires every client to integrate correctly; a single misconfigured client bypasses all of it. Bulwarkai enforces at the infrastructure layer. Clients cannot opt out.
Bulwarkai fills the gaps that remain today:
| Gap |
Status without Bulwarkai |
With Bulwarkai |
| Streaming content screening |
❌ Model Armor does not enforce on streamGenerateContent |
✅ via standalone Model Armor API |
| Structured data detection (SSN, keys) |
❌ Model Armor only covers RAI categories |
✅ regex + DLP |
| Per-user audit identity |
❌ service account only |
✅ forwarded user tokens |
| Model Armor only works on Gemini |
❌ only generateContent on Gemini models; standalone API does not cover third-party |
✅ regex + DLP work on any model |
| No cross-API support |
❌ Vertex AI speaks Gemini format only |
✅ Anthropic, OpenAI, Gemini |
| Client-to-agent proxying |
🔶 Agent Gateway (Preview) -- Agent Platform only |
✅ any Vertex AI client |
What it protects against
Structured data leakage. A developer pastes a customer record with an SSN. A private key from .env gets copied as context. An API key leaks into a code review the model is asked to summarise. The regex inspector catches known patterns in microseconds. The DLP inspector catches statistical matches ("this looks like a phone number") with configurable thresholds.
Content policy violations. Harmful content, prompt injection, jailbreaks, malicious URLs. The Model Armor inspector handles these using Google's content safety models. In strict mode, Vertex AI's built-in enforcement provides a second independent layer.
Credential exposure. AWS access keys, sk- prefixed API keys, email-and-password pairs in the same text. Detected before data leaves the network.
Why user tokens pass through
The service forwards the user's OAuth access token instead of using a service account. This means Vertex AI audit logs show which human made each request. If three developers route through the same proxy, the audit trail distinguishes between them.
The two-token design exists because Cloud Run IAM requires an OIDC identity token for invocation, while Vertex AI requires an OAuth access token. A single OAuth request with scope cloud-platform openid email returns both.
What Bulwarkai adds vs native Vertex AI
This table shows how Bulwarkai combines multiple Google services into a single enforcement layer. Model Armor covers Gemini non-streaming traffic, but skips streaming, third-party models, and structured data detection. Bulwarkai fills those gaps with regex, DLP, and the standalone Model Armor API where it works.
Vertex AI's built-in safety settings vary by model and endpoint. What Gemini blocks, Llama might allow. What generateContent catches, streamGenerateContent skips. There is no single audit trail that covers every request with the same rules, making it difficult to measure compliance consistently. Bulwarkai applies the same inspector chain to every request regardless of model, format, or streaming mode, and logs every screening decision in a structured format.
| Capability |
Vertex AI |
+ Model Armor |
+ Bulwarkai |
| Prompt screening for structured data (SSN, credit cards, private keys) |
❌ |
❌ RAI categories only |
✅ regex + DLP + Model Armor |
| Response screening for structured data |
❌ |
❌ RAI categories only |
✅ regex + DLP + Model Armor |
Streaming content screening (streamGenerateContent) |
❌ |
❌ not enforced on streaming |
✅ via standalone API |
| Content safety on third-party models (Anthropic, Llama) |
❌ |
❌ Model Armor does not work with third-party models |
✅ regex + DLP inspectors |
| Consistent safety enforcement across all models |
❌ varies by model |
❌ Gemini-only inline, skips streaming |
✅ same chain for every model |
| Unified audit trail for compliance |
❌ fragmented by model/endpoint |
❌ no standalone audit log |
✅ structured log for every request |
Content safety on Gemini generateContent |
✅ |
✅ inline integration |
✅ |
| Per-user audit identity in Vertex AI logs |
❌ service account |
❌ same |
✅ forwarded user tokens |
| User-Agent filtering |
❌ |
❌ |
✅ |
| Email domain allowlist |
❌ |
❌ |
✅ |
| API key authentication |
❌ |
❌ |
✅ |
| Prompt redaction in logs |
❌ |
❌ |
✅ |
| Post-response audit (audit mode) |
❌ |
❌ |
✅ |
| Pluggable inspector chain |
❌ |
❌ |
✅ |
| Cross-API format support (Anthropic, OpenAI, Gemini) |
❌ |
❌ |
✅ |
| Works with opencode |
❌ |
❌ |
✅ |
| Works with Claude Code |
❌ |
❌ |
✅ |
| Works with any OpenAI-compatible tool |
❌ |
❌ |
✅ |
| Works with curl / Gemini native |
✅ |
✅ |
✅ |
| First-party Gemini models |
✅ |
✅ |
✅ |
| Third-party models (Anthropic, Llama, etc.) |
✅ |
❌ |
✅ |
| Streaming support for code tools |
✅ |
✅ |
✅ |
| Agent-to-agent traffic governance |
❌ |
🔶 Agent Gateway (Preview) -- Agent Platform only |
❌ (not in scope) |
| Fail-closed when screening unavailable |
❌ |
❌ "skips sanitization and continues" |
✅ strict mode blocks on error |
Legend: ❌ not available 🔶 preview / partial ✅ available
Model Armor current state
| Feature |
Status |
Detail |
Gemini generateContent screening |
✅ |
Inline integration |
Gemini streamGenerateContent screening |
❌ |
Not enforced. Google is aware. |
| Third-party model screening (Anthropic, Llama) |
❌ |
Inline integration is Gemini-only |
| Structured data detection (SSN, keys, credentials) |
❌ |
Only RAI categories (hate, harassment, sexually explicit, dangerous) |
| Fail-open when unavailable |
❌ (a gap) |
"Skips sanitization and continues processing" |
Standalone sanitizeUserPrompt / sanitizeModelResponse API |
✅ |
Gemini only in practice. Bulwarkai uses regex + DLP for third-party models. |
| Agent Gateway integration (Preview) |
🔶 |
Routes agent traffic through Model Armor, but Agent Platform only |
Response modes
|
strict |
fast (alias: input_only) |
audit (alias: buffer) |
| Prompt screened |
✅ |
✅ |
✅ |
| Response screened |
✅ |
❌ |
audit only |
| Streaming |
❌ fake (single chunk) |
✅ real |
✅ real |
| Model Armor platform enforcement |
✅ (via generateContent) |
❌ (streaming bypass) |
❌ (streaming bypass) |
| Added latency |
~500ms (prompt + response) |
~200ms (prompt only) |
~200ms (prompt only) |
| Use case |
Maximum safety |
Lowest latency |
Audit trail |
| Gemini models |
✅ |
✅ |
✅ |
| Anthropic models (if enabled in Vertex AI) |
✅ |
✅ |
✅ |
Deployment impact on controls
| Control |
Cloud Run (production) |
Local (LOCAL_MODE) |
Behind VPC-SC |
| Authentication required |
✅ OIDC or API key |
❌ skipped |
✅ OIDC or API key |
| Domain allowlist |
✅ enforced |
❌ no email to check |
✅ enforced |
| User-Agent filter |
✅ enforced |
✅ enforced |
✅ enforced |
| Vertex AI uses user token |
✅ forwarded |
✅ ADC |
✅ forwarded |
| Inspector chain active |
✅ |
✅ |
✅ |
| Structured audit logs |
✅ Cloud Logging |
✅ stdout |
✅ Cloud Logging |
| Cannot bypass proxy |
❌ users can call Vertex AI directly |
❌ same machine has ADC creds |
✅ perimeter blocks direct access |
Client support
Bulwarkai translates three API formats so existing tools work without modification:
| Client |
Format |
Native Vertex AI |
With Bulwarkai |
| opencode |
OpenAI Chat Completions |
❌ |
✅ |
| Claude Code |
Anthropic Messages |
❌ |
✅ |
| curl / SDK |
Gemini native |
✅ |
✅ |
| Any OpenAI-compatible tool |
OpenAI Chat Completions |
❌ |
✅ |
Quick Start
Local bridge (no Cloud Run needed)
Run on your laptop as a persistent safety proxy. All prompts and responses from your AI tools get screened before reaching Vertex AI.
cp .env.example .env
# set GOOGLE_CLOUD_PROJECT, LOCAL_MODE=true, RESPONSE_MODE=fast
make dev
Point your tools at http://localhost:8080:
| Tool |
Config |
| opencode |
floorServiceUrl: http://localhost:8080 |
| Claude Code |
ANTHROPIC_BASE_URL=http://localhost:8080 |
| curl |
http://localhost:8080/v1/chat/completions |
Uses your gcloud ADC credentials. No OIDC tokens, no IAM, no deployment. All inspectors run normally.
Cloud Run deployment
make dev
See docs/deployment.md for Docker, Terraform, and production setup.
Testing with EICAR-style strings
Bulwarkai provides safe test strings that trigger each inspector without using real sensitive data. Inspired by the EICAR test file used to verify antivirus software.
curl https://bulwarkai-XXXXX.run.app/test-strings
Returns:
{
"ssn": "BULWARKAI-TEST-SSN-000-00-0000",
"credit_card": "BULWARKAI-TEST-CC-0000000000000000",
"private_key": "BULWARKAI-TEST-KEY-BEGIN RSA PRIVATE KEY-END",
"aws_key": "BULWARKAI-TEST-AWS-AKIA0000000000000000",
"api_key": "BULWARKAI-TEST-API-sk-00000000000000000000",
"credentials": "BULWARKAI-TEST-CRED-test@example.com password"
}
Send any of these as a prompt to verify the proxy blocks them:
curl -X POST https://bulwarkai-XXXXX.run.app/v1/chat/completions \
-H "X-Api-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{"model":"gemini-2.5-flash","max_tokens":1024,"messages":[{"role":"user","content":"BULWARKAI-TEST-SSN-000-00-0000"}]}'
All test strings are prefixed with BULWARKAI-TEST so they are clearly identifiable in logs.
Documentation
| Document |
Contents |
| docs/configuration.md |
Every environment variable, defaults, how to disable each feature |
| docs/swagger.yaml |
OpenAPI spec (generated from handler annotations) |
| docs/deployment.md |
Deployment, local dev, LOCAL_MODE, Terraform, Docker |
| docs/design.md |
Architecture (Mermaid diagrams), config, security approach |
| docs/inspectors.md |
Inspector interface, adding new ones, testing |
| docs/operations.md |
Monitoring, alerting, scaling, troubleshooting |
| docs/client-config.md |
opencode plugin, Claude Code, curl examples, geographic restrictions |
| docs/adr/ |
Architecture Decision Records (8 ADRs) |
| CONTRIBUTING.md |
How to contribute, code style, project structure |
GCP Prerequisites
APIs to enable
gcloud services enable run.googleapis.com \
artifactregistry.googleapis.com \
aiplatform.googleapis.com \
modelarmor.googleapis.com \
dlp.googleapis.com \
--project=YOUR_PROJECT_ID
IAM roles
Service account needs: roles/aiplatform.user, roles/modelarmor.user, roles/dlp.reader (if DLP enabled), roles/run.invoker.
Users need: roles/run.invoker on the Cloud Run service.
Model Armor template
gcloud model-armor templates create test-template \
--project=YOUR_PROJECT_ID \
--location=europe-west2 \
--rai-settings-filters='[
{"filterType":"HATE_SPEECH","confidenceLevel":"HIGH"},
{"filterType":"DANGEROUS","confidenceLevel":"MEDIUM_AND_ABOVE"},
{"filterType":"HARASSMENT","confidenceLevel":"HIGH"},
{"filterType":"SEXUALLY_EXPLICIT","confidenceLevel":"HIGH"}
]' \
--pi-and-jailbreak-filter-settings-enforcement=enabled \
--malicious-uri-filter-settings-enforcement=enabled
Or provision via terraform/model_armor.tf.
Org policies
constraints/gcp.resourceLocations may restrict resources to EU regions. All resources default to europe-west2. Cloud Build is not used because it creates US-based temporary resources.
constraints/run.allowedIngress and restrictions on allUsers/allAuthenticatedUsers may require IAM principal-based authentication.