cache

package

v0.0.0-...-f8d2ca8 Latest Latest Go to latest Published: Sep 23, 2025 License: MIT Imports: 20 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/labring/aiproxy

Links

Open Source Insights

README ¶

Cache Plugin Configuration Guide

Overview

The Cache Plugin is a high-performance caching solution for AI API requests that helps reduce latency and costs by storing and reusing responses for identical requests. It supports both in-memory caching and Redis, making it suitable for distributed deployments.

Features

Dual Storage: Supports both in-memory cache and Redis for flexible deployment options
Automatic Fallback: Automatically falls back to in-memory cache when Redis is unavailable
Content-Based Caching: Uses SHA256 hash of request body to generate cache keys
Configurable TTL: Set custom time-to-live for cached items
Size Limits: Configurable maximum item size to prevent memory issues
Cache Headers: Optional headers to indicate cache hits
Zero-Copy Design: Efficient memory usage through buffer pooling

Configuration Example

{
    "model": "gpt-4",
    "type": 1,
    "plugin": {
        "cache": {
            "enable": true,
            "ttl": 300,
            "item_max_size": 1048576,
            "add_cache_hit_header": true,
            "cache_hit_header": "X-Cache-Status"
        }
    }
}

Configuration Fields

Plugin Configuration

Field	Type	Required	Default	Description
`enable`	bool	Yes	false	Whether to enable the Cache plugin
`ttl`	int	No	300	Time-to-live for cached items (in seconds)
`item_max_size`	int	No	1048576 (1MB)	Maximum size of a single cached item (in bytes)
`add_cache_hit_header`	bool	No	false	Whether to add a header indicating cache hit
`cache_hit_header`	string	No	"X-Aiproxy-Cache"	Name of the cache hit header

How It Works

Cache Key Generation

The plugin generates cache keys based on:

Request pattern (e.g., chat completions)
SHA256 hash of the request body

This ensures identical requests hit the cache while different requests don't interfere with each other.

Cache Storage

The plugin uses a two-tier caching strategy:

Redis (if available): Primary storage for distributed caching
Memory: Fallback storage or primary when Redis is not configured

Request Flow

Request Phase:
- Plugin checks if caching is enabled
- Generates cache key from request body
- Looks up cache (Redis first, then memory)
- If hit, immediately returns cached response
- If miss, continues to upstream API
Response Phase:
- Captures response body and headers
- If response is successful, stores in cache
- Respects size limits to prevent memory issues

Usage Example

{
    "plugin": {
        "cache": {
            "enable": true,
            "ttl": 60,
            "item_max_size": 524288,
            "add_cache_hit_header": true
        }
    }
}

Response Header Example

When add_cache_hit_header is enabled:

Cache Hit:

X-Aiproxy-Cache: hit

Cache Miss:

X-Aiproxy-Cache: miss

Documentation ¶

Index ¶

func NewCachePlugin(rdb *redis.Client) plugin.Plugin
type Cache
type Config
type Item

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func NewCachePlugin ¶

func NewCachePlugin(rdb *redis.Client) plugin.Plugin

NewCachePlugin creates a new cache plugin

Types ¶

type Cache ¶

type Cache struct {
	noop.Noop
	// contains filtered or unexported fields
}

Cache implements caching functionality for AI requests

func (*Cache) ConvertRequest ¶

func (c *Cache) ConvertRequest(
	meta *meta.Meta,
	store adaptor.Store,
	req *http.Request,
	do adaptor.ConvertRequest,
) (adaptor.ConvertResult, error)

ConvertRequest handles the request conversion phase

func (*Cache) DoRequest ¶

func (c *Cache) DoRequest(
	meta *meta.Meta,
	store adaptor.Store,
	ctx *gin.Context,
	req *http.Request,
	do adaptor.DoRequest,
) (*http.Response, error)

DoRequest handles the request execution phase

func (*Cache) DoResponse ¶

func (c *Cache) DoResponse(
	meta *meta.Meta,
	store adaptor.Store,
	ctx *gin.Context,
	resp *http.Response,
	do adaptor.DoResponse,
) (usage model.Usage, adapterErr adaptor.Error)

DoResponse handles the response processing phase

type Config ¶

type Config struct {
	Enable            bool   `json:"enable"`
	TTL               int    `json:"ttl"`
	ItemMaxSize       int    `json:"item_max_size"`
	AddCacheHitHeader bool   `json:"add_cache_hit_header"`
	CacheHitHeader    string `json:"cache_hit_header"`
}

type Item ¶

type Item struct {
	Body   []byte              `json:"body"`
	Header map[string][]string `json:"header"`
	Usage  model.Usage         `json:"usage"`
}

Item represents a cached response

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL