processor

package
v0.0.62 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 20, 2025 License: MIT Imports: 23 Imported by: 0

Documentation

Index

Constants

View Source
const EmbeddedLLMGuide = `# Comanda YAML DSL Guide (for LLM Consumption)

This guide specifies the YAML-based Domain Specific Language (DSL) for Comanda workflows, enabling LLMs to generate valid workflow files.

## Overview

Comanda workflows consist of one or more named steps. Each step performs an operation. There are three main types of steps:
1.  **Standard Processing Step:** Involves LLMs, file processing, data operations.
2.  **Generate Step:** Uses an LLM to dynamically create a new Comanda workflow YAML file.
3.  **Process Step:** Executes another Comanda workflow file (static or dynamically generated).

## Core Workflow Structure

A Comanda workflow is a YAML map where each key is a ` + "`step_name`" + ` (string, user-defined), mapping to a dictionary defining the step.

` + "```yaml" + `
# Example of a workflow structure
workflow_step_1:
  # ... step definition ...
another_step_name:
  # ... step definition ...
` + "```" + `

## 1. Standard Processing Step Definition

This is the most common step type.

**Basic Structure:**
` + "```yaml" + `
step_name:
  input: [input source]
  model: [model name]
  action: [action to perform / prompt provided]
  output: [output destination]
  type: [optional, e.g., "openai-responses"] # Specifies specialized handling
  batch_mode: [individual|combined] # Optional, for multi-file inputs
  skip_errors: [true|false] # Optional, for multi-file inputs
  # ... other type-specific fields for "openai-responses" like 'instructions', 'tools', etc.
` + "```" + `

**Key Elements:**
- ` + "`input`" + `: (Required for most, can be ` + "`NA`" + `) Source of data. See "Input Types".
- ` + "`model`" + `: (Required, can be ` + "`NA`" + `) LLM model to use. See "Models".
- ` + "`action`" + `: (Required for most) Instructions or operations. See "Actions".
- ` + "`output`" + `: (Required) Destination for results. See "Outputs".
- ` + "`type`" + `: (Optional) Specifies a specialized handler for the step, e.g., ` + "`openai-responses`" + `. If omitted, it's a general-purpose LLM or NA step.
- ` + "`batch_mode`" + `: (Optional, default: ` + "`combined`" + `) For steps with multiple file inputs, defines if files are processed ` + "`combined`" + ` into one LLM call or ` + "`individual`" + `ly.
- ` + "`skip_errors`" + `: (Optional, default: ` + "`false`" + `) If ` + "`batch_mode: individual`" + `, determines if processing continues if one file fails.

**OpenAI Responses API Specific Fields (used when ` + "`type: openai-responses`" + `):**
- ` + "`instructions`" + `: (string) System message for the LLM.
- ` + "`tools`" + `: (list of maps) Configuration for tools/functions the LLM can call.
- ` + "`previous_response_id`" + `: (string) ID of a previous response for maintaining conversation state.
- ` + "`max_output_tokens`" + `: (int) Token limit for the LLM response.
- ` + "`temperature`" + `: (float) Sampling temperature.
- ` + "`top_p`" + `: (float) Nucleus sampling (top-p).
- ` + "`stream`" + `: (bool) Whether to stream the response.
- ` + "`response_format`" + `: (map) Specifies response format, e.g., ` + "`{ type: \"json_object\" }`" + `.


## 2. Generate Step Definition (` + "`generate`" + `)

This step uses an LLM to dynamically create a new Comanda workflow YAML file.

**Structure:**
` + "```yaml" + `
step_name_for_generation:
  input: [optional_input_source_for_context, or NA] # e.g., STDIN, a file with requirements
  generate:
    model: [llm_model_for_generation, optional] # e.g., gpt-4o-mini. Uses default if omitted.
    action: [prompt_for_workflow_generation] # Natural language instruction for the LLM.
    output: [filename_for_generated_yaml] # e.g., new_workflow.yaml
    context_files: [list_of_files_for_additional_context, optional] # e.g., [schema.txt, examples.yaml]
` + "```" + `
**` + "`generate`" + ` Block Attributes:**
- ` + "`model`" + `: (string, optional) Specifies the LLM to use for generation. If omitted, uses the ` + "`default_generation_model`" + ` configured in Comanda. You can set or update this default model by running ` + "`comanda configure`" + ` and following the prompts for setting a default generation model.
- ` + "`action`" + `: (string, required) The natural language instruction given to the LLM to guide the workflow generation.
- ` + "`output`" + `: (string, required) The filename where the generated Comanda workflow YAML file will be saved.
- ` + "`context_files`" + `: (list of strings, optional) A list of file paths to provide as additional context to the LLM, beyond the standard Comanda DSL guide (which is implicitly included).
- **Note:** The ` + "`input`" + ` field for a ` + "`generate`" + ` step is optional. If provided (e.g., ` + "`STDIN`" + ` or a file path), its content will be added to the context for the LLM generating the workflow. If not needed, use ` + "`input: NA`" + `.

## 3. Process Step Definition (` + "`process`" + `)

This step executes another Comanda workflow file.

**Structure:**
` + "```yaml" + `
step_name_for_processing:
  input: [optional_input_source_for_sub_workflow, or NA] # e.g., STDIN to pass data to the sub-workflow
  process:
    workflow_file: [path_to_comanda_yaml_to_execute] # e.g., generated_workflow.yaml or existing_flow.yaml
    inputs: {key1: value1, key2: value2, optional} # Map of inputs to pass to the sub-workflow.
    # capture_outputs: [list_of_outputs_to_capture, optional] # Future: Define how to capture specific outputs.
` + "```" + `
**` + "`process`" + ` Block Attributes:**
- ` + "`workflow_file`" + `: (string, required) The path to the Comanda workflow YAML file to be executed. This can be a statically defined path or the output of a ` + "`generate`" + ` step.
- ` + "`inputs`" + `: (map, optional) A map of key-value pairs to pass as initial variables to the sub-workflow. These can be accessed within the sub-workflow (e.g., as ` + "`$parent.key1`" + `).
- **Note:** The ` + "`input`" + ` field for a ` + "`process`" + ` step is optional. If ` + "`input: STDIN`" + ` is used, the output of the previous step in the parent workflow will be available as the initial ` + "`STDIN`" + ` for the *first* step of the sub-workflow if that first step expects ` + "`STDIN`" + `.

## Common Elements (for Standard Steps)

### Input Types
- File path: ` + "`input: path/to/file.txt`" + `
- Previous step output: ` + "`input: STDIN`" + `
- Multiple file paths: ` + "`input: [file1.txt, file2.txt]`" + `
- Web scraping: ` + "`input: { url: \"https://example.com\" }`" + ` (Further scrape config under ` + "`scrape_config`" + ` map if needed)
- Database query: ` + "`input: { database: { type: \"postgres\", query: \"SELECT * FROM users\" } }`" + `
- No input: ` + "`input: NA`" + `
- Input with alias for variable: ` + "`input: path/to/file.txt as $my_var`" + `
- List with aliases: ` + "`input: [file1.txt as $file1_content, file2.txt as $file2_content]`" + `

### Chunking
For processing large files, you can use the ` + "`chunk`" + ` configuration to split the input into manageable pieces:

**Basic Structure:**
` + "```yaml" + `
step_name:
  input: "large_file.txt"
  chunk:
    by: lines  # or "tokens"
    size: 1000  # number of lines or tokens per chunk
    overlap: 50  # optional: number of lines or tokens to overlap between chunks
    max_chunks: 10  # optional: limit the total number of chunks processed
  batch_mode: individual  # required for chunking to process each chunk separately
  model: gpt-4o-mini
  action: "Process this chunk of text: {{ current_chunk }}"
  output: "chunk_{{ chunk_index }}_result.txt"  # can use chunk_index in output path
` + "```" + `

**Key Elements:**
- ` + "`chunk`" + `: (Optional) Configuration block for chunking a large input file.
  - ` + "`by`" + `: (Required) Chunking method - either ` + "`lines`" + ` or ` + "`tokens`" + `.
  - ` + "`size`" + `: (Required) Number of lines or tokens per chunk.
  - ` + "`overlap`" + `: (Optional) Number of lines or tokens to include from the previous chunk, providing context continuity.
  - ` + "`max_chunks`" + `: (Optional) Maximum number of chunks to process, useful for testing or limiting processing.
- ` + "`batch_mode: individual`" + `: Required when using chunking to process each chunk as a separate LLM call.
- ` + "`{{ current_chunk }}`" + `: Template variable that gets replaced with the current chunk content in the action.
- ` + "`{{ chunk_index }}`" + `: Template variable for the current chunk number (0-based), useful in output paths.

**Consolidation Pattern:**
A common pattern is to process chunks individually and then consolidate the results:

` + "```yaml" + `
# Step 1: Process chunks
process_chunks:
  input: "large_document.txt"
  chunk:
    by: lines
    size: 1000
  batch_mode: individual
  model: gpt-4o-mini
  action: "Extract key points from: {{ current_chunk }}"
  output: "chunk_{{ chunk_index }}_summary.txt"

# Step 2: Consolidate results
consolidate_results:
  input: "chunk_*.txt"  # Use wildcard to collect all chunk outputs
  model: gpt-4o-mini
  action: "Combine these summaries into one coherent document."
  output: "final_summary.txt"
` + "```" + `

### Models
- Single model: ` + "`model: gpt-4o-mini`" + `
- No model (for non-LLM operations): ` + "`model: NA`" + `
- Multiple models (for comparison): ` + "`model: [gpt-4o-mini, claude-3-opus-20240229]`" + `

### Actions
- Single instruction: ` + "`action: \"Summarize this text.\"`" + `
- Multiple sequential instructions: ` + "`action: [\"Action 1\", \"Action 2\"]`" + `
- Reference variable: ` + "`action: \"Compare with $previous_data.\"`" + `
- Reference markdown file: ` + "`action: path/to/prompt.md`" + `

### Outputs
- Console: ` + "`output: STDOUT`" + `
- File: ` + "`output: results.txt`" + `
- Database: ` + "`output: { database: { type: \"postgres\", table: \"results_table\" } }`" + `
- Output with alias (if supported for variable creation from output): ` + "`output: STDOUT as $step_output_var`" + `

## Variables
- Definition: ` + "`input: data.txt as $initial_data`" + `
- Reference: ` + "`action: \"Compare this analysis with $initial_data\"`" + `
- Scope: Variables are typically scoped to the workflow. For ` + "`process`" + ` steps, parent variables are not directly accessible by default; use the ` + "`process.inputs`" + ` map to pass data.

## Validation Rules Summary (for LLM)

1.  A step definition must clearly be one of: Standard, Generate, or Process.
    *   A step cannot mix top-level keys from different types (e.g., a ` + "`generate`" + ` step should not have a top-level ` + "`model`" + ` or ` + "`output`" + ` key; these belong inside the ` + "`generate`" + ` block).
2.  **Standard Step:**
    *   Must contain ` + "`input`" + `, ` + "`model`" + `, ` + "`action`" + `, ` + "`output`" + ` (unless ` + "`type: openai-responses`" + `, where ` + "`action`" + ` might be replaced by ` + "`instructions`" + `).
    *   ` + "`input`" + ` can be ` + "`NA`" + `. ` + "`model`" + ` can be ` + "`NA`" + `.
3.  **Generate Step:**
    *   Must contain a ` + "`generate`" + ` block.
    *   ` + "`generate`" + ` block must contain ` + "`action`" + ` (string prompt) and ` + "`output`" + ` (string filename).
    *   ` + "`generate.model`" + ` is optional (uses default if omitted).
    *   Top-level ` + "`input`" + ` for the step is optional (can be ` + "`NA`" + ` or provide context).
4.  **Process Step:**
    *   Must contain a ` + "`process`" + ` block.
    *   ` + "`process`" + ` block must contain ` + "`workflow_file`" + ` (string path).
    *   ` + "`process.inputs`" + ` is optional.
    *   Top-level ` + "`input`" + ` for the step is optional (can be ` + "`NA`" + ` or ` + "`STDIN`" + ` to pipe to sub-workflow).

## Chaining and Examples

Steps can be "chained together" by either passing STDOUT from one step to STDIN of the next step or by writing to a file and then having subsequent steps take this file as input.

**Meta-Processing Example:**
` + "```yaml" + `
gather_requirements:
  input: requirements_document.txt
  model: claude-3-opus-20240229
  action: "Based on the input document, define the core tasks for a data processing workflow. Output as a concise list."
  output: STDOUT

generate_data_workflow:
  input: STDIN # Using output from previous step as context
  generate:
    model: gpt-4o-mini # LLM to generate the workflow
    action: "Generate a Comanda workflow YAML to perform the tasks described in the input. The workflow should read 'raw_data.csv', perform transformations, and save to 'processed_data.csv'."
    output: dynamic_data_processor.yaml # Filename for the generated workflow

execute_data_workflow:
  input: NA # Or potentially STDIN if dynamic_data_processor.yaml's first step expects it
  process:
    workflow_file: dynamic_data_processor.yaml # Execute the generated workflow
    # inputs: { source_file: "override_data.csv" } # Optional: override inputs for the sub-workflow
  output: STDOUT # Log output of the process step itself (e.g., success/failure)
` + "```" + `

### Advanced Chaining: Enabling Independent Analysis with Files

The standard ` + "`STDIN`" + `/` + "`STDOUT`" + ` chain is designed for sequential processing, where each step receives the output of the one immediately before it. However, many workflows require a downstream step to **independently analyze outputs from multiple, potentially non-sequential, upstream steps.**

To enable this, you must use files to store intermediate results. This pattern ensures that each output is preserved and can be accessed directly by any subsequent step, rather than being lost in a pipeline.

**The recommended pattern is:**
1.  Each upstream step saves its result to a distinct file (e.g., ` + "`step1_output.txt`" + `, ` + "`step2_output.txt`" + `).
2.  The downstream step that needs to perform the independent analysis lists these files as its ` + "`input`" + `.

**Example: A 3-Step Workflow with a Final Review**

In this scenario, the third step needs to review the outputs of both the first and second steps independently.

` + "```yaml" + `
# Step 1: Initial analysis
analyze_introductions:
  input: introductions.md
  model: gpt-4o-mini
  action: "Perform a detailed analysis of the introductions document. Focus on key themes, writing style, and effectiveness."
  output: step1_analysis.txt

# Step 2: Quality assessment of the original document
quality_assessment:
  input: introductions.md
  model: gpt-4o-mini
  action: "Perform a quality assessment on the original document. Identify strengths and potential gaps."
  output: step2_qa.txt

# Step 3: Final summary based on both outputs
final_summary:
  input: [step1_analysis.txt, step2_qa.txt]
  model: gpt-4o-mini
  action: "Review the results from the analysis (step1_analysis.txt) and the QA (step2_qa.txt). Provide a comprehensive summary that synthesizes the findings from both."
  output: final_summary.md
` + "```" + `

This file-based approach is the correct way to handle any workflow where a step's logic depends on having discrete access to multiple prior outputs.

This guide covers the core concepts and syntax of Comanda's YAML DSL, including meta-processing capabilities. LLMs should use this structure to generate valid workflow files.`

EmbeddedLLMGuide contains the Comanda YAML DSL Guide for LLM consumption This is embedded directly in the binary to avoid file path issues For backward compatibility, we keep this constant

Variables

This section is empty.

Functions

func GetEmbeddedLLMGuide added in v0.0.58

func GetEmbeddedLLMGuide() string

GetEmbeddedLLMGuide returns the Comanda YAML DSL Guide for LLM consumption with the current supported models injected

Types

type ChunkConfig added in v0.0.60

type ChunkConfig struct {
	By        string `yaml:"by"`         // How to split the file: "lines", "bytes", or "tokens"
	Size      int    `yaml:"size"`       // Chunk size (e.g., 10000 lines)
	Overlap   int    `yaml:"overlap"`    // Lines/bytes to overlap between chunks for context
	MaxChunks int    `yaml:"max_chunks"` // Limit total chunks to prevent overload
}

ChunkConfig represents the configuration for chunking a large file

type DSLConfig

type DSLConfig struct {
	Steps         []Step
	ParallelSteps map[string][]Step     // Steps that can be executed in parallel
	Defer         map[string]StepConfig `yaml:"defer,omitempty"`
}

DSLConfig represents the structure of the DSL configuration

func (*DSLConfig) UnmarshalYAML added in v0.0.55

func (c *DSLConfig) UnmarshalYAML(node *yaml.Node) error

UnmarshalYAML is a custom unmarshaler for DSLConfig to handle mixed types at the root level

type GenerateStepConfig added in v0.0.35

type GenerateStepConfig struct {
	Model        interface{} `yaml:"model"`
	Action       interface{} `yaml:"action"`
	Output       string      `yaml:"output"`
	ContextFiles []string    `yaml:"context_files"`
}

GenerateStepConfig defines the configuration for a generate step

type NormalizeOptions

type NormalizeOptions struct {
	AllowEmpty bool // Whether to allow empty strings in the result
}

NormalizeOptions represents options for string slice normalization

type OllamaModelTag added in v0.0.25

type OllamaModelTag struct {
	Name string `json:"name"`
}

OllamaModelTag represents the details of a single model tag from /api/tags

type OllamaTagsResponse added in v0.0.25

type OllamaTagsResponse struct {
	Models []OllamaModelTag `json:"models"`
}

OllamaTagsResponse represents the top-level structure of Ollama's /api/tags response

type PerformanceMetrics added in v0.0.20

type PerformanceMetrics struct {
	InputProcessingTime  int64 // Time in milliseconds to process inputs
	ModelProcessingTime  int64 // Time in milliseconds for model processing
	ActionProcessingTime int64 // Time in milliseconds for action processing
	OutputProcessingTime int64 // Time in milliseconds for output processing
	TotalProcessingTime  int64 // Total time in milliseconds for the step
}

PerformanceMetrics tracks timing information for processing steps

type ProcessStepConfig added in v0.0.35

type ProcessStepConfig struct {
	WorkflowFile   string                 `yaml:"workflow_file"`
	Inputs         map[string]interface{} `yaml:"inputs"`
	CaptureOutputs []string               `yaml:"capture_outputs"`
}

ProcessStepConfig defines the configuration for a process step

type Processor

type Processor struct {
	// contains filtered or unexported fields
}

Processor handles the DSL processing pipeline

func NewProcessor

func NewProcessor(config *DSLConfig, envConfig *config.EnvConfig, serverConfig *config.ServerConfig, verbose bool, runtimeDir ...string) *Processor

NewProcessor creates a new DSL processor

func (*Processor) GetModelProvider

func (p *Processor) GetModelProvider(modelName string) models.Provider

GetModelProvider returns the provider for the specified model

func (*Processor) GetProcessedInputs

func (p *Processor) GetProcessedInputs() []*input.Input

GetProcessedInputs returns all processed input contents

func (*Processor) LastOutput

func (p *Processor) LastOutput() string

LastOutput returns the last output value

func (*Processor) NormalizeStringSlice

func (p *Processor) NormalizeStringSlice(val interface{}) []string

NormalizeStringSlice converts interface{} to []string

func (*Processor) Process

func (p *Processor) Process() error

Process executes the DSL processing pipeline

func (*Processor) SetLastOutput

func (p *Processor) SetLastOutput(output string)

SetLastOutput sets the last output value, useful for initializing with STDIN data

func (*Processor) SetProgressWriter added in v0.0.14

func (p *Processor) SetProgressWriter(w ProgressWriter)

SetProgressWriter sets the progress writer for streaming updates

type ProgressType added in v0.0.14

type ProgressType int

ProgressType represents different types of progress updates

const (
	ProgressSpinner ProgressType = iota
	ProgressStep
	ProgressComplete
	ProgressError
	ProgressOutput       // New type for output events
	ProgressParallelStep // New type for parallel step updates
)

type ProgressUpdate added in v0.0.14

type ProgressUpdate struct {
	Type               ProgressType
	Message            string
	Error              error
	Step               *StepInfo           // Optional step information
	Stdout             string              // Content from STDOUT when Type is ProgressOutput
	IsParallel         bool                // Whether this update is from a parallel step
	ParallelID         string              // Identifier for the parallel step group
	PerformanceMetrics *PerformanceMetrics // Performance metrics for the step
}

ProgressUpdate represents a progress update from the processor

type ProgressWriter added in v0.0.14

type ProgressWriter interface {
	WriteProgress(update ProgressUpdate) error
}

ProgressWriter is an interface for handling progress updates

func NewChannelProgressWriter added in v0.0.14

func NewChannelProgressWriter(ch chan<- ProgressUpdate) ProgressWriter

type Spinner

type Spinner struct {
	// contains filtered or unexported fields
}

func NewSpinner

func NewSpinner() *Spinner

func (*Spinner) Disable

func (s *Spinner) Disable()

Disable prevents the spinner from showing any output

func (*Spinner) SetProgressWriter added in v0.0.14

func (s *Spinner) SetProgressWriter(w ProgressWriter)

func (*Spinner) Start

func (s *Spinner) Start(message string)

func (*Spinner) Stop

func (s *Spinner) Stop()

type Step

type Step struct {
	Name   string
	Config StepConfig
}

Step represents a named step in the DSL

type StepConfig

type StepConfig struct {
	Type       string       `yaml:"type"`            // Step type (default is standard LLM step)
	Input      interface{}  `yaml:"input"`           // Can be string or map[string]interface{}
	Model      interface{}  `yaml:"model"`           // Can be string or []string
	Action     interface{}  `yaml:"action"`          // Can be string or []string
	Output     interface{}  `yaml:"output"`          // Can be string or []string
	NextAction interface{}  `yaml:"next-action"`     // Can be string or []string
	BatchMode  string       `yaml:"batch_mode"`      // How to process multiple files: "combined" (default) or "individual"
	SkipErrors bool         `yaml:"skip_errors"`     // Whether to continue processing if some files fail
	Chunk      *ChunkConfig `yaml:"chunk,omitempty"` // Configuration for chunking large files

	// OpenAI Responses API specific fields
	Instructions       string                   `yaml:"instructions"`         // System message
	Tools              []map[string]interface{} `yaml:"tools"`                // Tools configuration
	PreviousResponseID string                   `yaml:"previous_response_id"` // For conversation state
	MaxOutputTokens    int                      `yaml:"max_output_tokens"`    // Token limit
	Temperature        float64                  `yaml:"temperature"`          // Temperature setting
	TopP               float64                  `yaml:"top_p"`                // Top-p sampling
	Stream             bool                     `yaml:"stream"`               // Whether to stream the response
	ResponseFormat     map[string]interface{}   `yaml:"response_format"`      // Format specification (e.g., JSON)

	// Meta-processing fields
	Generate *GenerateStepConfig `yaml:"generate,omitempty"` // Configuration for generating a workflow
	Process  *ProcessStepConfig  `yaml:"process,omitempty"`  // Configuration for processing a sub-workflow
}

StepConfig represents the configuration for a single step

type StepDependency added in v0.0.20

type StepDependency struct {
	Name      string
	DependsOn []string
}

StepDependency represents a dependency between steps

type StepInfo added in v0.0.14

type StepInfo struct {
	Name         string
	Model        string
	Action       string
	Instructions string // For openai-responses steps
}

StepInfo contains detailed information about a processing step

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL