meld

Below is a practical strategy for structuring an LLM-driven audit in small, focused steps—without manually poring over giant spreadsheets or line-by-line code. The approach treats your codebase (and any logs/test outputs) as a set of “bite-sized” knowledge units that Meld (or a similar directive-based system) can automate into prompts for an LLM. ──────────────────────────────────────────────────────────────────────── 1) Why Break the Audit Into Chunks & How It Helps ──────────────────────────────────────────────────────────────────────── • LLMs have token/length limits. If you bundle your entire codebase or entire test logs in one massive prompt, you risk getting truncated or “context drunk” outputs. • By chunking your code and your test logs, you can keep prompts well under the token limit—this preserves clarity and ensures the LLM can analyze the snippet carefully. • Small chunks also let you direct the LLM to gather exactly the data you need (e.g., “list out all methods that appear to implement IStateService” or “search for any usage of clone() in this batch of files”). It keeps each prompt’s instructions extremely specific. ──────────────────────────────────────────────────────────────────────── 2) Automating the “Chunk & Compare” Workflow ──────────────────────────────────────────────────────────────────────── Below is a step-by-step outline for a semi-automated approach, using Meld’s directive-based orchestration as you described (where you can do something like @cmd[...] to gather code, and then feed each snippet to an LLM for analysis). 2.1) Identify Your “Batch” or “Chunk” Boundaries • For each major directory in your repo—e.g., services/, tests/, mocks/, or perhaps each service subfolder—create small sets of .ts files (or relevant source/test chunks). • You might also separate tests into smaller groups by file (e.g., each .test.ts file is its own chunk) or by categories (“DirectiveService tests,” “OutputService tests,” etc.). 2.2) Write a Meld (or script-based) “Extraction” Step • Example: @cmd[cpai services/StateService --stdout] This might dump the entire “StateService” subdir to stdout. You then wrap that output in a “begin code / end code” block that you feed to the LLM. • Another example: @cmd[npm test -- --json] This might give you a JSON summary of test results, which you can feed as a chunk of “test output.” 2.3) Keep Each Prompt Targeted & Consistent • Instead of a giant prompt that says “Here’s 5,000 lines—please do an audit,” your Meld script can do: 1) “@import[services/StateService/*.ts]” → Then ask: “LLM, please list all public methods found in this code. For each method, summarize its signature and note if it appears in IStateService.” 2) “@import[tests/services/StateService/*.test.ts]” → Then ask: “LLM, please identify any usage of clone(), createChildState(), or transformation methods in these tests. Summarize how they are tested.” • By giving the LLM smaller tasks, you get more reliable, smaller-granularity results. 2.4) Store Intermediate Results & Cross-Reference • For example, once you ask the LLM for “all methods that appear in StateService.ts” (and get a structured list in JSON or a table), keep that data in a local file or key-value store. • Next, do the same for “all methods that appear in IStateService.ts.” • Then have a final prompt that merges those two data sets: “Compare the lists of methods from (StateService) vs (IStateService). Highlight any methods that are missing from one or the other.” 2.5) Pattern for Repeating This Approach 1. Gather a chunk of code (or logs) with @cmd[...] or whichever tool you use to dump them. 2. Provide a short, precise instruction to the LLM: “Analyze this snippet and produce exactly these data points: …” 3. Save the LLM’s response (in JSON, CSV, or a Meld directive-based snippet). 4. Rinse and repeat for the next chunk. 5. Do a final pass prompt that compiles the partial data from each chunk and checks for mismatches or cross-links. ──────────────────────────────────────────────────────────────────────── 3) Example Micro-Prompt Patterns ──────────────────────────────────────────────────────────────────────── Below are some micro-prompt patterns you can use in your Meld script or a similar approach. Each is designed to keep your instructions to the LLM short and unambiguous: 3.1) “Method Inventory” Prompt -------------------------------------------------------------------------------- @import[services/StateService/*.ts] @import[services/StateService/IStateService.ts] Your Task: 1. Extract all public methods (by name, parameters, return type) from IStateService.ts. 2. Extract all public methods from StateService.ts. 3. Produce a side-by-side table showing which methods exist in the interface vs. the implementation. -------------------------------------------------------------------------------- 3.2) “Test Coverage” Prompt -------------------------------------------------------------------------------- @import[tests/services/StateService/StateService.test.ts] Your Task: 1. Identify calls to state.clone(), state.enableTransformation(), or createChildState(). 2. Summarize in a bullet list: each test name + the relevant calls in that test. 3. Note whether any mocks are used or if real StateService is imported. -------------------------------------------------------------------------------- 3.3) “Find References” Prompt for Templated Searching -------------------------------------------------------------------------------- @import[services/DirectiveService/*.ts] Your Task: 1. Search all imported code for references to the method name “transformNode”. 2. For each reference, show the file, line number, and a short excerpt. 3. If no references appear, say “No references found.” -------------------------------------------------------------------------------- 3.4) “Log Test Failures” Prompt -------------------------------------------------------------------------------- @cmd[npm test -- --json] Your Task: 1. Summarize the failing tests. For each failure, capture: test file, test name, error message. 2. Skip any passing test. 3. Provide the results as an array of objects with fields { file, testName, error }. -------------------------------------------------------------------------------- ──────────────────────────────────────────────────────────────────────── 4) Tying It All Back to Your Multi-Phase Plan ──────────────────────────────────────────────────────────────────────── By chaining these small “gathering” tasks: • Phase 1 (Audit & Alignment) can easily be done with “Method Inventory” prompts vs. “Find References” prompts, then one final “LLM, produce a mismatch table” prompt. • Phase 2 (Evidence Collection) can rely on “Test Coverage” prompts and “Log Test Failures”. Then you cross-reference the test failures with the code methods. • Phase 3 (Instrument Failing Tests) might combine “Log Test Failures” with additional instrumentation commands. You might add console logs, then re-run tests and prompt the LLM with updated logs. • And so on. You keep your Meld script orchestrating it all, step by step. ──────────────────────────────────────────────────────────────────────── 5) Additional Suggestions ──────────────────────────────────────────────────────────────────────── 5.1) Keep Each LLM Task Purposeful & Minimal Try to avoid mixing too many tasks into one prompt. If the question is “list all public methods in X,” keep it at that. Then do a separate step for “compare to interface Y.” This helps you (and the LLM) stay focused. 5.2) Confirm the LLM’s Summaries Against Actual Files Sometimes an LLM can “hallucinate” lines/methods even if you feed it the code correctly. If you see suspicious details, do a quick follow-up prompt or run a script-based grep to confirm the method actually exists. 5.3) Automate Re-Running Whenever Code Changes When you fix a mismatch or change code to address an audit concern, re-run the same chunk-based prompts. This ensures your updated code remains in alignment. Meld can help keep the “@import” references stable, so it’s always analyzing the up-to-date version of that snippet. 5.4) Leverage JSON Output or Tagging If feasible, ask the LLM to produce purely structured output (like JSON). Then you can parse or re-meld that data in subsequent steps more systematically. E.g., -------------------------------------------------------------------------------- Your Task: “List all public methods as an array of objects: [ { name: string, parameters: string[], returns: string } … ]” -------------------------------------------------------------------------------- Then, your next prompt can read that JSON to build a side-by-side comparison table automatically. ──────────────────────────────────────────────────────────────────────── 6) Overall Conclusion: “Yes, You Can LLM-Drive the Audit” ──────────────────────────────────────────────────────────────────────── • Rather than manually doing a big spreadsheet, chunking your code + logs and using minimal, highly-focused LLM prompts is exactly the “small, iterative query” style that ensures you gather specific facts accurately. • Meld’s directive-based approach is a perfect fit: each @import or @cmd can produce new “snippets” or “test logs” for the LLM to parse. • The main watch-out is verifying any final claims the LLM makes about your code. If you see results that don’t match your mental model, just do a quick check with a smaller prompt, or with a direct code search. By weaving these smaller prompts into your existing multi-phase blueprint, you effectively create a fully “LLM developer–friendly” pipeline for auditing. It chunkifies the tasks, keeps each prompt’s context tight, and automatically accumulates data across multiple runs—exactly how an engineering team might handle it, but with LLM-driven help instead of manual scouring.