UNPKG

cmte

Version:

Design by Committee™ except it's just you and LLMs

360 lines (268 loc) 17.3 kB
# Committee ## Overview This framework enables users to assemble surgical context and create iterative prompts using templates to create chained LLM workflows. ## Core Example: Service Analysis Let's illustrate the core workflow with an example designed to analyze different microservices based on their specific documentation and source code. **1. `workflow.yaml`:** Defines file collections, global context, and the structured `services` object intended for iteration. ```yaml name: "service-analysis-workflow" description: "Analyze multiple services using their specific docs and code" outputPath: "_output/service-analysis" # Define file collections files: # General Docs architectureDoc: "docs/ARCHITECTURE.md" # Auth Service Files authConfigDoc: "docs/AUTH-CONFIG.md" authCode: ["src/auth/**/*.js", "!src/auth/legacy/**"] # Data Service Files dataModelsDoc: "docs/DATA-MODELS.md" dataCode: "src/data/**/*.js" # Define universally accessible global variables global_variables: # General context available to all tasks overallArchitecture: "{{ files.architectureDoc }}" # Define data structures for set iteration iterable_objects: # Structured object containing service-specific context services: # Target for 'for_each: services' in a set auth: # Key becomes 'item.key' during iteration # Value becomes 'item.value' description: "Authentication and Authorization Service" contact: "auth-team@example.com" # Embed CONTENT of auth-specific files configDocContent: "{{ files.authConfigDoc }}" codeContent: "{{ files.authCode }}" data: # Key becomes 'item.key' # Value becomes 'item.value' description: "Data Processing and Storage Service" contact: "data-team@example.com" # Embed CONTENT of data-specific files modelsDocContent: "{{ files.dataModelsDoc }}" codeContent: "{{ files.dataCode }}" # Define the sequence of sets sets: - useSet: analyze-service # Iterate over 'services' defined in iterable_objects for_each: services ``` *(**Note:** The `{{ files.collectionName }}` syntax within `global_variables` or `iterable_objects` embeds the **formatted content** of the files.)* **2. `sets/analyze-service.set.yaml`:** Defines a set that iterates over the `services` object defined in the workflow's `iterable_objects`. ```yaml name: "analyze-service" description: "Run analysis tasks for each service defined in the context" # Iterates over the 'services' object from workflow.yaml's iterable_objects # Each item will be { key: serviceName, value: serviceObject } for_each: services tasks: # These tasks run in parallel for each service - useTask: identify-service-patterns # Task context automatically includes 'item', 'item.key', 'item.value' # and variables from 'global_variables' like 'overallArchitecture' - useTask: suggest-service-improvements ``` **3. `tasks/analyze-service.md`:** A task template showing how to access the context provided by the iteration and global variables. ````markdown Analyze the service: **{{ item.key }}** **Service Description:** {{ item.value.description }} **Contact:** {{ item.value.contact }} **Overall Architecture Context:** ``` {{ overallArchitecture }} # Accessing a global_variable ``` **Service-Specific Configuration Documentation:** ``` {{ item.value.configDocContent }} # Accessing data from item.value ``` **Service-Specific Code:** ``` {{ item.value.codeContent }} # Accessing data from item.value ``` **Analysis Request:** Based on the overall architecture and the specific documentation and code for the `{{ item.key }}` service, please perform the analysis requested by the calling task (e.g., identify patterns, suggest improvements). ```` This example demonstrates how to: - Define multiple file sources. - Define `global_variables` accessible everywhere. - Structure data for iteration under `iterable_objects`. - Iterate over this structured data using `for_each`. - Access the iteration key (`item.key`), iteration value (`item.value.*`), and global variables within a task template. --- ## Key Concepts Now let's dive deeper into the core components illustrated above. ### Workflows A workflow is the top-level container defined in `workflow.yaml`, as seen in the Core Example. It specifies: - `global_variables` accessible throughout the workflow. These form the base context. - `iterable_objects` defining data structures (arrays/objects) intended for set iteration via `for_each`. - Named file collections (`files:`) to gather context using glob patterns. File *content* is typically embedded into `global_variables` or `iterable_objects`. - An ordered sequence of `sets` to be executed. ### Sets Sets group related tasks. Sets defined in the workflow's `sets:` list are executed **sequentially** in the order they appear. Within a single set, the tasks listed are executed **in parallel**. Sets can optionally iterate over arrays or objects defined in the workflow's `iterable_objects` using `for_each`. ### Tasks Tasks are templated prompts (stored as `.md` files) that perform a specific action using an LLM, like the `analyze-service.md` template in the Core Example. Each task runs with a context including: - `global_variables` (from `workflow.yaml`). - Iteration variables (`item`, `item.key`, `item.value` if the set uses `for_each`). - Outputs from tasks in *previous* sets, accessed via `prior_outputs` defined in the set file. **Important:** Due to parallel execution within a set, a task cannot access the output of *another task running in the same set*. Input/output dependencies must be managed by sequencing tasks across different sets. **Escaping Template Syntax:** If you need to include literal `{{` or `}}` characters in your template without them being interpreted as variables, you can escape them with a backslash: `\{{` will render as `{{`, and `\}}` will render as `}}`. ### Referencing Output from Iterated Sets When dealing with outputs from previous iterated sets, there are two main scenarios: 1. **Accessing Corresponding Iteration Output:** When both the previous set (e.g., `set1`) and the current set (e.g., `set2`) iterate over the **same `for_each` target**, you often need to access the output from the previous set's task corresponding to the *current item* being processed in the current set. * **Syntax:** `setName.taskName[this].output` * **Use Case:** An iterated set needs the *specific* output from the *same iteration* of a previous iterated set. * **Result:** Resolves to the single output value for the current iteration. **Example (`set2` iterated, needs corresponding output from iterated `set1`):** ```yaml # In sets/set2.set.yaml (for_each: services) prior_outputs: # Get the analyze-service output for the current service analysis_result: "{{ set1.analyze-service[this].output }}" ``` 2. **Collecting All Iteration Outputs:** When a subsequent set (often a non-iterated set, e.g., `setB`) needs to gather **all** the individual outputs generated by a task within a previous *iterated* set (e.g., `setA`). * **Syntax:** `setName.taskName[*].output` * **Use Case:** A later set needs to aggregate or process the results from *all* iterations of a previous iterated task. * **Result:** Resolves to an **array** containing all the output values generated across all iterations of the specified task. **Example (`setB` non-iterated, needs all outputs from iterated `setA`):** ```yaml # In sets/setB.set.yaml (NOT iterated) prior_outputs: # Gather all results from setA's analyze-item task into an array all_analysis_results: "{{ setA.analyze-item[*].output }}" ``` *(**Note:** The task template using `{{ all_analysis_results }}` will receive these outputs as a newline-separated string by default. Handle accordingly in your prompt.)* ### Referencing Output from Non-Iterated Sets If the previous set was **not** iterated, you simply reference its task output directly: - `setName.taskName.output`: Output from a **non-iterated** task in a previous set. The `taskName` used here **must** match the `useTask` value from the task definition in the previous set's YAML file. **Example (`set2` non-iterated, needs output from non-iterated `set1`):** ```yaml # In sets/set2.set.yaml (NOT iterated) prior_outputs: taskA_result: "{{ set1.taskA.output }}" ``` **Why other syntaxes fail:** - `"{{ set1.analyze-service.output }}"`: Refers to the *entire array* of outputs from the iterated task, not the specific one needed. - `"{{ set1.analyze-service[item.key].output }}"`: The `prior_outputs` resolver doesn't evaluate `{{item.key}}` within the reference string; it looks for a literal key `item.key`. **Important Convention:** Task outputs are *always* stored and referenced using the exact name specified in the `useTask` field. There is no option to rename outputs. **Example Set Configuration (`*.set.yml`):** If `set1` (non-iterated) contains a task `useTask: taskA`, and `set2` (non-iterated) needs its output: ```yaml name: set2 tasks: - useTask: process-output prior_outputs: # Map the reference to a local variable name for use in the task template taskA_result: "{{ set1.taskA.output }}" # Reference uses the original task name 'taskA' ``` **Example Task Template (`tasks/process-output.md`):** ```markdown Processing output for file {{ item.path }}. Result from Task A in Set 1: {{ taskA_result }} # Access the output via the name defined in prior_outputs ``` **Note:** Referencing outputs from tasks within the *same* parallel set execution is unreliable and should be avoided. Structure your workflow with sequential sets for dependencies. ## Where Data Comes From: Defining Your Context Understanding where different types of data are defined and accessed is important for using Committee. The framework uses the following structure: 1. **Global Variables:** Defined in the top-level `global_variables:` block of your `workflow.yaml`. These are accessible to all sets and tasks throughout the workflow execution. 2. **File Collections & Content:** File sources are defined in the `files:` block of `workflow.yaml`. To make file *content* available for LLM analysis, embed it into variables within the `workflow.yaml` `global_variables:` or `iterable_objects:` blocks using `{{ files.collectionName }}`. Task templates (`.md`) can reference `{{ files.collectionName }}` to get a list of *paths*. 3. **Iteration Data (`item`):** Data structures (arrays or objects) intended for iteration using `for_each` are defined in the `iterable_objects:` block of `workflow.yaml`. The `for_each: objectName` directive within a `*.set.yml` file targets one of these workflow iterable objects. Tasks within that set then access the current iteration's data via the `item` object (or `item.key` / `item.value` for object iteration). 4. **Task Outputs (via Prior Outputs):** Outputs from previous tasks are made available to a subsequent task via the `prior_outputs:` block defined under that task in its `*.set.yml` file. This block maps a local name (used in the task template) to the structured output reference string (e.g., `setName.taskName[iterationKey].output`). Essentially, **`workflow.yaml` is the primary location for defining the initial context (`global_variables`), data sources (`files`), and data for iteration (`iterable_objects`)**, while `*.set.yml` files orchestrate the execution flow and manage dependencies on previously generated task outputs via `prior_outputs`. ## File Collection Handling You define named file collections in `workflow.yaml` using file paths or glob patterns (`include`/`exclude`): ```yaml # workflow.yaml name: "code-review-workflow" files: sourceCode: include: ["src/**/*.js"] exclude: ["src/vendor/**"] testFiles: "test/**/*.test.js" docs: ["README.md", "CONTRIBUTING.md"] # ... global_variables, iterable_objects, and sets follow ... ``` These collections are primarily used to inject context into your workflow. The way you reference a collection using `{{ files.collectionName }}` has **two behaviors depending on where it is used**: 1. **In `workflow.yaml` (`global_variables:` or `iterable_objects:`):** * **Behavior:** Embeds the **full content** of each file within the collection directly into the variable's string value. Each file's content is automatically prefixed with a Markdown header indicating its path (e.g., `# path/to/file.js`). * **Purpose:** This is the primary mechanism for **injecting substantial file content** (like source code, documentation) into the context, making it available to subsequent sets and tasks for direct LLM analysis. * **Example (`workflow.yaml`):** ```yaml global_variables: # Embeds the content of all files matching src/**/*.js, # each block prefixed with '# filepath' sourceContext: "{{ files.sourceCode }}" # Embeds content of README.md and CONTRIBUTING.md docsContext: "{{ files.docs }}" ``` 2. **In Task Templates (`*.md` files):** * **Behavior:** Renders a **newline-separated list of the file paths** belonging to that collection. It does **not** embed the file content here. * **Purpose:** Useful for providing informational context within a task prompt, such as listing related files for the LLM's reference, *without* including their potentially large content directly in that specific prompt. * **Example (`tasks/review-code.md`):** ```markdown Review the following source code file `{{ item.path }}`: ```javascript {{ item.content }} # Assuming iteration over a file collection ``` Consider related test files (paths listed below): {{ files.testFiles }} # Lists paths from the 'testFiles' collection ``` **Key Distinction:** Use `{{ files.collectionName }}` in `workflow.yaml` (`global_variables` or `iterable_objects`) to provide the *content* needed for LLM analysis. Use it in task templates (`.md`) when you only need to reference the *paths* of the files. *(Note: Advanced pattern filtering within the template tag like `{{ files.collectionName:*.js }}` is not currently implemented.)* ## Two-Phase Thinking Tasks can optionally perform a preliminary "thinking" step before generating the final response. This is useful for complex analysis or reasoning tasks. Configure this using YAML frontmatter at the top of your task's `.md` file: ```yaml --- name: "complex-analysis-task" # Optional: Task name for clarity thinking: true # REQUIRED: Enables the thinking phase thinking_prompt: "path/to/thinking-prompt.md" # Optional: Use a separate prompt file for the thinking phase thinking_instruction: "Analyze the input step-by-step..." # Optional: Specific instruction for the thinking phase thinking_params: temperature: 0.2 # Optional: LLM parameters specifically for the thinking phase --- # Main Task Prompt Based on the preceding analysis, provide the final answer. Context: {{ context }} ``` - If `thinking: true`, the framework first runs the thinking phase (using the main prompt or `thinking_prompt` if provided, potentially guided by `thinking_instruction`). - The output of the thinking phase is then automatically prepended to the context provided to the main task prompt for generating the final response. - You can control LLM parameters specifically for the thinking step using `thinking_params`. ## Using the Framework ### Installation ```bash # Navigate to the project root directory # Install globally (recommended for CLI use) npm install -g . # Or install locally npm install . ``` ### Basic Usage 1. Create a workflow directory (e.g., `my-workflow/`) containing: - `workflow.yaml` (workflow definition) - `sets/` directory (with `.set.yaml` or `.set.yml` set definitions) - `tasks/` directory (with `.md` task prompt files) 2. Configure your environment variables (e.g., in a `.env` file in your project or system): ```dotenv # Required for using Anthropic API (if not using --local) ANTHROPIC_API_KEY=your_api_key_here # Optional: Specify default model (defaults exist, e.g., Claude 3 Haiku for --lite, Sonnet otherwise) # DEFAULT_MODEL=claude-3-sonnet-20240229 # Optional: Set maximum tokens for LLM responses (default: 10000) # MAX_TOKENS=100000 # Optional: Set maximum number of concurrent API requests (default: 10) # MAX_PARALLEL_REQUESTS=15 # Optional: Set minimum delay between starting parallel API requests (in seconds, default: 0.1) # Useful for proactively avoiding rate limits based on request frequency. # REQUEST_DELAY_SECONDS=0.5 # Optional: Set maximum number of retries for failed API calls (default: 20) # LLM_MAX_RETRIES=10 # Optional: For using a local LLM (requires --local flag) # Needs a running server compatible with OpenAI API spec (e.g., Ollama, LM Studio) LOCAL_LLM_URL=http://localhost:11434 # Default Ollama URL example # Optional: Specify model served by local URL (required if server hosts multiple) # LOCAL_LLM_MODEL=llama3 ``` 3. Run the workflow from your terminal: ```