cmte
Version:
Design by Committee™ except it's just you and LLMs
360 lines (268 loc) • 17.3 kB
Markdown
# Committee
## Overview
This framework enables users to assemble surgical context and create iterative prompts using templates to create chained LLM workflows.
## Core Example: Service Analysis
Let's illustrate the core workflow with an example designed to analyze different microservices based on their specific documentation and source code.
**1. `workflow.yaml`:**
Defines file collections, global context, and the structured `services` object intended for iteration.
```yaml
name: "service-analysis-workflow"
description: "Analyze multiple services using their specific docs and code"
outputPath: "_output/service-analysis"
# Define file collections
files:
# General Docs
architectureDoc: "docs/ARCHITECTURE.md"
# Auth Service Files
authConfigDoc: "docs/AUTH-CONFIG.md"
authCode: ["src/auth/**/*.js", "!src/auth/legacy/**"]
# Data Service Files
dataModelsDoc: "docs/DATA-MODELS.md"
dataCode: "src/data/**/*.js"
# Define universally accessible global variables
global_variables:
# General context available to all tasks
overallArchitecture: "{{ files.architectureDoc }}"
# Define data structures for set iteration
iterable_objects:
# Structured object containing service-specific context
services: # Target for 'for_each: services' in a set
auth: # Key becomes 'item.key' during iteration
# Value becomes 'item.value'
description: "Authentication and Authorization Service"
contact: "auth-team@example.com"
# Embed CONTENT of auth-specific files
configDocContent: "{{ files.authConfigDoc }}"
codeContent: "{{ files.authCode }}"
data: # Key becomes 'item.key'
# Value becomes 'item.value'
description: "Data Processing and Storage Service"
contact: "data-team@example.com"
# Embed CONTENT of data-specific files
modelsDocContent: "{{ files.dataModelsDoc }}"
codeContent: "{{ files.dataCode }}"
# Define the sequence of sets
sets:
- useSet: analyze-service # Iterate over 'services' defined in iterable_objects
for_each: services
```
*(**Note:** The `{{ files.collectionName }}` syntax within `global_variables` or `iterable_objects` embeds the **formatted content** of the files.)*
**2. `sets/analyze-service.set.yaml`:**
Defines a set that iterates over the `services` object defined in the workflow's `iterable_objects`.
```yaml
name: "analyze-service"
description: "Run analysis tasks for each service defined in the context"
# Iterates over the 'services' object from workflow.yaml's iterable_objects
# Each item will be { key: serviceName, value: serviceObject }
for_each: services
tasks:
# These tasks run in parallel for each service
- useTask: identify-service-patterns
# Task context automatically includes 'item', 'item.key', 'item.value'
# and variables from 'global_variables' like 'overallArchitecture'
- useTask: suggest-service-improvements
```
**3. `tasks/analyze-service.md`:**
A task template showing how to access the context provided by the iteration and global variables.
````markdown
Analyze the service: **{{ item.key }}**
**Service Description:** {{ item.value.description }}
**Contact:** {{ item.value.contact }}
**Overall Architecture Context:**
```
{{ overallArchitecture }} # Accessing a global_variable
```
**Service-Specific Configuration Documentation:**
```
{{ item.value.configDocContent }} # Accessing data from item.value
```
**Service-Specific Code:**
```
{{ item.value.codeContent }} # Accessing data from item.value
```
**Analysis Request:**
Based on the overall architecture and the specific documentation and code for the `{{ item.key }}` service, please perform the analysis requested by the calling task (e.g., identify patterns, suggest improvements).
````
This example demonstrates how to:
- Define multiple file sources.
- Define `global_variables` accessible everywhere.
- Structure data for iteration under `iterable_objects`.
- Iterate over this structured data using `for_each`.
- Access the iteration key (`item.key`), iteration value (`item.value.*`), and global variables within a task template.
## Key Concepts
Now let's dive deeper into the core components illustrated above.
### Workflows
A workflow is the top-level container defined in `workflow.yaml`, as seen in the Core Example. It specifies:
- `global_variables` accessible throughout the workflow. These form the base context.
- `iterable_objects` defining data structures (arrays/objects) intended for set iteration via `for_each`.
- Named file collections (`files:`) to gather context using glob patterns. File *content* is typically embedded into `global_variables` or `iterable_objects`.
- An ordered sequence of `sets` to be executed.
### Sets
Sets group related tasks. Sets defined in the workflow's `sets:` list are executed **sequentially** in the order they appear.
Within a single set, the tasks listed are executed **in parallel**. Sets can optionally iterate over arrays or objects defined in the workflow's `iterable_objects` using `for_each`.
### Tasks
Tasks are templated prompts (stored as `.md` files) that perform a specific action using an LLM, like the `analyze-service.md` template in the Core Example. Each task runs with a context including:
- `global_variables` (from `workflow.yaml`).
- Iteration variables (`item`, `item.key`, `item.value` if the set uses `for_each`).
- Outputs from tasks in *previous* sets, accessed via `prior_outputs` defined in the set file.
**Important:** Due to parallel execution within a set, a task cannot access the output of *another task running in the same set*. Input/output dependencies must be managed by sequencing tasks across different sets.
**Escaping Template Syntax:** If you need to include literal `{{` or `}}` characters in your template without them being interpreted as variables, you can escape them with a backslash: `\{{` will render as `{{`, and `\}}` will render as `}}`.
### Referencing Output from Iterated Sets
When dealing with outputs from previous iterated sets, there are two main scenarios:
1. **Accessing Corresponding Iteration Output:** When both the previous set (e.g., `set1`) and the current set (e.g., `set2`) iterate over the **same `for_each` target**, you often need to access the output from the previous set's task corresponding to the *current item* being processed in the current set.
* **Syntax:** `setName.taskName[this].output`
* **Use Case:** An iterated set needs the *specific* output from the *same iteration* of a previous iterated set.
* **Result:** Resolves to the single output value for the current iteration.
**Example (`set2` iterated, needs corresponding output from iterated `set1`):**
```yaml
# In sets/set2.set.yaml (for_each: services)
prior_outputs:
# Get the analyze-service output for the current service
analysis_result: "{{ set1.analyze-service[this].output }}"
```
2. **Collecting All Iteration Outputs:** When a subsequent set (often a non-iterated set, e.g., `setB`) needs to gather **all** the individual outputs generated by a task within a previous *iterated* set (e.g., `setA`).
* **Syntax:** `setName.taskName[*].output`
* **Use Case:** A later set needs to aggregate or process the results from *all* iterations of a previous iterated task.
* **Result:** Resolves to an **array** containing all the output values generated across all iterations of the specified task.
**Example (`setB` non-iterated, needs all outputs from iterated `setA`):**
```yaml
# In sets/setB.set.yaml (NOT iterated)
prior_outputs:
# Gather all results from setA's analyze-item task into an array
all_analysis_results: "{{ setA.analyze-item[*].output }}"
```
*(**Note:** The task template using `{{ all_analysis_results }}` will receive these outputs as a newline-separated string by default. Handle accordingly in your prompt.)*
### Referencing Output from Non-Iterated Sets
If the previous set was **not** iterated, you simply reference its task output directly:
- `setName.taskName.output`: Output from a **non-iterated** task in a previous set. The `taskName` used here **must** match the `useTask` value from the task definition in the previous set's YAML file.
**Example (`set2` non-iterated, needs output from non-iterated `set1`):**
```yaml
# In sets/set2.set.yaml (NOT iterated)
prior_outputs:
taskA_result: "{{ set1.taskA.output }}"
```
**Why other syntaxes fail:**
- `"{{ set1.analyze-service.output }}"`: Refers to the *entire array* of outputs from the iterated task, not the specific one needed.
- `"{{ set1.analyze-service[item.key].output }}"`: The `prior_outputs` resolver doesn't evaluate `{{item.key}}` within the reference string; it looks for a literal key `item.key`.
**Important Convention:** Task outputs are *always* stored and referenced using the exact name specified in the `useTask` field. There is no option to rename outputs.
**Example Set Configuration (`*.set.yml`):**
If `set1` (non-iterated) contains a task `useTask: taskA`, and `set2` (non-iterated) needs its output:
```yaml
name: set2
tasks:
- useTask: process-output
prior_outputs:
# Map the reference to a local variable name for use in the task template
taskA_result: "{{ set1.taskA.output }}" # Reference uses the original task name 'taskA'
```
**Example Task Template (`tasks/process-output.md`):**
```markdown
Processing output for file {{ item.path }}.
Result from Task A in Set 1:
{{ taskA_result }} # Access the output via the name defined in prior_outputs
```
**Note:** Referencing outputs from tasks within the *same* parallel set execution is unreliable and should be avoided. Structure your workflow with sequential sets for dependencies.
## Where Data Comes From: Defining Your Context
Understanding where different types of data are defined and accessed is important for using Committee. The framework uses the following structure:
1. **Global Variables:** Defined in the top-level `global_variables:` block of your `workflow.yaml`. These are accessible to all sets and tasks throughout the workflow execution.
2. **File Collections & Content:** File sources are defined in the `files:` block of `workflow.yaml`. To make file *content* available for LLM analysis, embed it into variables within the `workflow.yaml` `global_variables:` or `iterable_objects:` blocks using `{{ files.collectionName }}`. Task templates (`.md`) can reference `{{ files.collectionName }}` to get a list of *paths*.
3. **Iteration Data (`item`):** Data structures (arrays or objects) intended for iteration using `for_each` are defined in the `iterable_objects:` block of `workflow.yaml`. The `for_each: objectName` directive within a `*.set.yml` file targets one of these workflow iterable objects. Tasks within that set then access the current iteration's data via the `item` object (or `item.key` / `item.value` for object iteration).
4. **Task Outputs (via Prior Outputs):** Outputs from previous tasks are made available to a subsequent task via the `prior_outputs:` block defined under that task in its `*.set.yml` file. This block maps a local name (used in the task template) to the structured output reference string (e.g., `setName.taskName[iterationKey].output`).
Essentially, **`workflow.yaml` is the primary location for defining the initial context (`global_variables`), data sources (`files`), and data for iteration (`iterable_objects`)**, while `*.set.yml` files orchestrate the execution flow and manage dependencies on previously generated task outputs via `prior_outputs`.
## File Collection Handling
You define named file collections in `workflow.yaml` using file paths or glob patterns (`include`/`exclude`):
```yaml
# workflow.yaml
name: "code-review-workflow"
files:
sourceCode:
include: ["src/**/*.js"]
exclude: ["src/vendor/**"]
testFiles: "test/**/*.test.js"
docs: ["README.md", "CONTRIBUTING.md"]
# ... global_variables, iterable_objects, and sets follow ...
```
These collections are primarily used to inject context into your workflow. The way you reference a collection using `{{ files.collectionName }}` has **two behaviors depending on where it is used**:
1. **In `workflow.yaml` (`global_variables:` or `iterable_objects:`):**
* **Behavior:** Embeds the **full content** of each file within the collection directly into the variable's string value. Each file's content is automatically prefixed with a Markdown header indicating its path (e.g., `# path/to/file.js`).
* **Purpose:** This is the primary mechanism for **injecting substantial file content** (like source code, documentation) into the context, making it available to subsequent sets and tasks for direct LLM analysis.
* **Example (`workflow.yaml`):**
```yaml
global_variables:
# Embeds the content of all files matching src/**/*.js,
# each block prefixed with '# filepath'
sourceContext: "{{ files.sourceCode }}"
# Embeds content of README.md and CONTRIBUTING.md
docsContext: "{{ files.docs }}"
```
2. **In Task Templates (`*.md` files):**
* **Behavior:** Renders a **newline-separated list of the file paths** belonging to that collection. It does **not** embed the file content here.
* **Purpose:** Useful for providing informational context within a task prompt, such as listing related files for the LLM's reference, *without* including their potentially large content directly in that specific prompt.
* **Example (`tasks/review-code.md`):**
```markdown
Review the following source code file `{{ item.path }}`:
```javascript
{{ item.content }} # Assuming iteration over a file collection
```
Consider related test files (paths listed below):
{{ files.testFiles }} # Lists paths from the 'testFiles' collection
```
**Key Distinction:** Use `{{ files.collectionName }}` in `workflow.yaml` (`global_variables` or `iterable_objects`) to provide the *content* needed for LLM analysis. Use it in task templates (`.md`) when you only need to reference the *paths* of the files.
*(Note: Advanced pattern filtering within the template tag like `{{ files.collectionName:*.js }}` is not currently implemented.)*
## Two-Phase Thinking
Tasks can optionally perform a preliminary "thinking" step before generating the final response. This is useful for complex analysis or reasoning tasks. Configure this using YAML frontmatter at the top of your task's `.md` file:
```yaml
name: "complex-analysis-task" # Optional: Task name for clarity
thinking: true # REQUIRED: Enables the thinking phase
thinking_prompt: "path/to/thinking-prompt.md" # Optional: Use a separate prompt file for the thinking phase
thinking_instruction: "Analyze the input step-by-step..." # Optional: Specific instruction for the thinking phase
thinking_params:
temperature: 0.2 # Optional: LLM parameters specifically for the thinking phase
# Main Task Prompt
Based on the preceding analysis, provide the final answer.
Context:
{{ context }}
```
- If `thinking: true`, the framework first runs the thinking phase (using the main prompt or `thinking_prompt` if provided, potentially guided by `thinking_instruction`).
- The output of the thinking phase is then automatically prepended to the context provided to the main task prompt for generating the final response.
- You can control LLM parameters specifically for the thinking step using `thinking_params`.
## Using the Framework
### Installation
```bash
# Navigate to the project root directory
# Install globally (recommended for CLI use)
npm install -g .
# Or install locally
npm install .
```
### Basic Usage
1. Create a workflow directory (e.g., `my-workflow/`) containing:
- `workflow.yaml` (workflow definition)
- `sets/` directory (with `.set.yaml` or `.set.yml` set definitions)
- `tasks/` directory (with `.md` task prompt files)
2. Configure your environment variables (e.g., in a `.env` file in your project or system):
```dotenv
# Required for using Anthropic API (if not using --local)
ANTHROPIC_API_KEY=your_api_key_here
# Optional: Specify default model (defaults exist, e.g., Claude 3 Haiku for --lite, Sonnet otherwise)
# DEFAULT_MODEL=claude-3-sonnet-20240229
# Optional: Set maximum tokens for LLM responses (default: 10000)
# MAX_TOKENS=100000
# Optional: Set maximum number of concurrent API requests (default: 10)
# MAX_PARALLEL_REQUESTS=15
# Optional: Set minimum delay between starting parallel API requests (in seconds, default: 0.1)
# Useful for proactively avoiding rate limits based on request frequency.
# REQUEST_DELAY_SECONDS=0.5
# Optional: Set maximum number of retries for failed API calls (default: 20)
# LLM_MAX_RETRIES=10
# Optional: For using a local LLM (requires --local flag)
# Needs a running server compatible with OpenAI API spec (e.g., Ollama, LM Studio)
LOCAL_LLM_URL=http://localhost:11434 # Default Ollama URL example
# Optional: Specify model served by local URL (required if server hosts multiple)
# LOCAL_LLM_MODEL=llama3
```
3. Run the workflow from your terminal:
```