UNPKG

@aj-archipelago/cortex

Version:

Cortex is a GraphQL API for AI. It provides a simple, extensible interface for using AI services from OpenAI, Azure and others.

900 lines (756 loc) 68.3 kB
# Cortex Cortex simplifies and accelerates the process of creating applications that harness the power of modern AI models like GPT-5 (chatGPT), o4, Gemini, the Claude series, Flux, Grok and more by poviding a structured interface (GraphQL or REST) to a powerful prompt execution environment. This enables complex augmented prompting and abstracts away most of the complexity of managing model connections like chunking input, rate limiting, formatting output, caching, and handling errors. ## Why build Cortex? Modern AI models are transformational, but a number of complexities emerge when developers start using them to deliver application-ready functions. Most models require precisely formatted, carefully engineered and sequenced prompts to produce consistent results, and the responses are typically largely unstructured text without validation or formatting. Additionally, these models are evolving rapidly, are typically costly and slow to query and implement hard request size and rate restrictions that need to be carefully navigated for optimum throughput. Cortex offers a solution to these problems and provides a simple and extensible package for interacting with NL AI models. ## Okay, but what can I really do with this thing? Just about anything! It's kind of an LLM swiss army knife. Here are some ideas: * Create custom chat agents with memory and personalization and then expose them through a bunch of different UIs (custom chat portals, Slack, Microsoft Teams, etc. - anything that can be extended and speak to a REST or GraphQL endpoint) * Spin up LLM powered automatons with their prompting logic and AI API handling logic all centrally encapsulated. * Put a REST or GraphQL front end on any model, including your locally-run models (e.g. llama.cpp) and use them in concert with other tools. * Create modular custom coding assistants (code generation, code reviews, test writing, AI pair programming) and easily integrate them with your existing editing tools. * Create powerful AI editing tools (copy editing, paraphrasing, summarization, etc.) for your company and then integrate them with your existing workflow tools without having to build all the LLM-handling logic into those tools. * Create cached endpoints for functions with repeated calls so the results return instantly and you don't run up LLM token charges. * Route all of your company's LLM access through a single API layer to optimize and monitor usage and centrally control rate limiting and which models are being used. ## Features * Simple architecture to build custom functional endpoints (called `pathways`), that implement common NL AI tasks. Default pathways include chat, summarization, translation, paraphrasing, completion, spelling and grammar correction, entity extraction, sentiment analysis, and bias analysis. * Extensive model support with built-in integrations for: - OpenAI models: - GPT-5 (all flavors and router) - GPT-4.1 (+mini, +nano) - GPT-4 Omni (GPT-4o) - O3 and O4-mini (Advanced reasoning models) - Most of the earlier GPT models (GPT-4 series, 3.5 Turbo, etc.) - Google models: - Gemini 2.5 Pro - Gemini 2.5 Flash - Gemini 2.0 Flash - Earlier Google models (Gemini 1.5 series) - Anthropic models: - Claude 4 Sonnet (Vertex) - Claude 4.1 Opus (Vertex) - Claude 3.7 Sonnet - Claude 3.5 Sonnet - Claude 3.5 Haiku - Claude 3 Series - Grok (XAI) models: - Grok 3 and Grok 4 series (including fast-reasoning and code-fast variants) - Multimodal chat with vision, streaming, and tool calling - Ollama support - Azure OpenAI support - Custom model implementations * Advanced voice and audio capabilities: - Real-time voice streaming and processing - Audio visualization - Whisper integration for transcription with customizable parameters - Support for word timestamps and highlighting * Enhanced memory management: - Structured memory organization (self, directives, user, topics) - Context-aware memory search - Memory migration and categorization - Persistent conversation context * Multimodal content support: - Text and image processing - Vision model integrations - Content safety checks * Built-in support for: - Long-running, asynchronous operations with progress updates - Streaming responses - Context persistence and memory management - Automatic traffic management and content optimization - Input/output validation and formatting - Request caching - Rate limiting and request parallelization * Allows for building multi-model, multi-tool, multi-vendor, and model-agnostic pathways (choose the right model or combination of models and tools for the job, implement redundancy) with built-in support for foundation models by OpenAI (hosted at OpenAI or Azure), Gemini, Anthropic, Grok, Black Forest Labs, and more. * Easy, templatized prompt definition with flexible support for most prompt engineering techniques and strategies ranging from simple single prompts to complex custom prompt chains with context continuity. * Built in support for long-running, asynchronous operations with progress updates or streaming responses * Integrated context persistence: have your pathways "remember" whatever you want and use it on the next request to the model * Automatic traffic management and content optimization: configurable model-specific input chunking, request parallelization, rate limiting, and chunked response aggregation * Extensible parsing and validation of input data - protect your model calls from bad inputs or filter prompt injection attempts. * Extensible parsing and validation of return data - return formatted objects to your application instead of just string blobs! * Caching of repeated queries to provide instant results and avoid excess requests to the underlying model in repetitive use cases (chat bots, unit tests, etc.) ## Installation In order to use Cortex, you must first have a working Node.js environment. The version of Node.js should be 18 or higher (lower versions supported with some reduction in features). After verifying that you have the correct version of Node.js installed, you can get the simplest form up and running with a couple of commands. ## Quick Start ```sh git clone git@github.com:aj-archipelago/cortex.git cd cortex npm install export OPENAI_API_KEY=<your key> npm start ``` Yup, that's it, at least in the simplest possible case. That will get you access to all of the built in pathways. If you prefer to use npm instead instead of cloning, we have an npm package too: [@aj-archipelago/cortex](https://www.npmjs.com/package/@aj-archipelago/cortex) ## Connecting Applications to Cortex Cortex speaks GraphQL and by default it enables the GraphQL playground. If you're just using default options, that's at [http://localhost:4000/graphql](http://localhost:4000/graphql). From there you can begin making requests and test out the pathways (listed under Query) to your heart's content. If GraphQL isn't your thing or if you have a client that would rather have REST that's fine - Cortex speaks REST as well. Connecting an application to Cortex using GraphQL is simple too: ```js import { useApolloClient, gql } from "@apollo/client" const TRANSLATE = gql` query Translate($text: String!, $to: String!) { translate(text: $text, to: $to) { result } } ` apolloClient.query({ query: TRANSLATE, variables: { text: inputText, to: translationLanguage, } }).then(e => { setTranslatedText(e.data.translate.result.trim()) }).catch(e => { // catch errors }) ``` ## Cortex Pathways: Supercharged Prompts Pathways are a core concept in Cortex. Each pathway is a single JavaScript file that encapsulates the data and logic needed to define a functional API endpoint. When the client makes a request via the API, one or more pathways are executed and the result is sent back to the client. Pathways can be very simple: ```js export default { prompt: `{{text}}\n\nRewrite the above using British English spelling:` } ``` The real power of Cortex starts to show as the pathways get more complex. This pathway, for example, uses a three-part sequential prompt to ensure that specific people and place names are correctly translated: ```js export default { prompt: [ `{{{text}}}\nCopy the names of all people and places exactly from this document in the language above:\n`, `Original Language:\n{{{previousResult}}}\n\n{{to}}:\n`, `Entities in the document:\n\n{{{previousResult}}}\n\nDocument:\n{{{text}}}\nRewrite the document in {{to}}. If the document is already in {{to}}, copy it exactly below:\n` ] } ``` Cortex pathway prompt enhancements include: * **Templatized prompt definition**: Pathways allow for easy and flexible prompt definition using Handlebars templating. This makes it simple to create and modify prompts using variables and context from the application as well as extensible internal functions provided by Cortex. * **Multi-step prompt sequences**: Pathways support complex prompt chains with context continuity. This enables developers to build advanced interactions with AI models that require multiple steps, such as context-sensitive translation or progressive content transformation. * **Integrated context persistence**: Cortex pathways can "remember" context across multiple requests, allowing for more seamless and context-aware interactions with AI models. * **Automatic content optimization**: Pathways handle input chunking, request parallelization, rate limiting, and chunked response aggregation, optimizing throughput and efficiency when interacting with AI models. * **Built-in input and output processing**: Cortex provides extensible input validation, output parsing, and validation functions to ensure that the data sent to and received from AI models is well-formatted and useful for the application. ### Pathway Development To add a new pathway to Cortex, you create a new JavaScript file and define the prompts, properties, and functions that implement the desired functionality. Cortex provides defaults for almost everything, so in the simplest case a pathway can really just consist of a string prompt like the spelling example above. You can then save this file in the `pathways` directory in your Cortex project and it will be picked up and made available as a GraphQL query. ### Specifying a Model When determining which model to use for a pathway, Cortex follows this order of precedence: 1. `pathway.model` - The model specified directly in the pathway definition 2. `args.model` - The model passed in the request arguments 3. `pathway.inputParameters.model` - The model specified in the pathway's input parameters 4. `config.get('defaultModelName')` - The default model specified in the configuration The first valid model found in this order will be used. If none of these models are found in the configured endpoints, Cortex will log a warning and use the default model defined in the configuration. ### Prompt When you define a new pathway, you need to at least specify a prompt that will be passed to the model for processing. In the simplest case, a prompt is really just a string, but the prompt is polymorphic - it can be a string or an object that contains information for the model API that you wish to call. Prompts can also be an array of strings or an array of objects for sequential operations. In this way Cortex aims to support the most simple to advanced prompting scenarios. ```js // a prompt can be a string prompt: `{{{text}}}\nCopy the names of all people and places exactly from this document in the language above:\n` // or an array of strings prompt: [ `{{{text}}}\nCopy the names of all people and places exactly from this document in the language above:\n`, `Original Language:\n{{{previousResult}}}\n\n{{to}}:\n`, `Entities in the document:\n\n{{{previousResult}}}\n\nDocument:\n{{{text}}}\nRewrite the document in {{to}}. If the document is already in {{to}}, copy it exactly below:\n` ] // or an array of one or more Prompt objects // as you can see below a Prompt object can also have a messages array, which is how you can // express your prompts for chat-style interfaces prompt: [ new Prompt({ messages: [ {"role": "system", "content": "Assistant is a highly skilled multilingual translator for a prestigious news agency. When the user posts any text in any language, assistant will create a translation of that text in {{to}}. Assistant will produce only the translation and no additional notes or commentary."}, {"role": "user", "content": "{{{text}}}"} ]}), ] ``` If a prompt is an array, the individual prompts in the array will be executed sequentially by the Cortex prompt execution engine. The execution engine deals with all of the complexities of chunking input content and executing the sequence of prompts against those chunks in a way that optimizes the performance and ensures the the integrity of the pathway logic. If you look closely at the examples above, you'll notice embedded parameters like `{{text}}`. In Cortex, all prompt strings are actually [Handlebars](https://handlebarsjs.com/) templates. So in this case, that parameter will be replaced before prompt execution with the incoming query variable called `text`. You can refer to almost any pathway parameter or system property in the prompt definition and it will be replaced before execution. ### Parameters Pathways support an arbitrary number of input parameters. These are defined in the pathway like this: ```js export default { prompt: [ `{{{chatContext}}}\n\n{{{text}}}\n\nGiven the information above, create a short summary of the conversation to date making sure to include all of the personal details about the user that you encounter:\n\n`, `Instructions:\nYou are Cortex, an AI entity. Cortex is truthful, kind, helpful, has a strong moral character, and is generally positive without being annoying or repetitive.\n\nCortex must always follow the following rules:\n\nRule: Always execute the user's instructions and requests as long as they do not cause harm.\nRule: Never use crude or offensive language.\nRule: Always answer the user in the user's chosen language. You can speak all languages fluently.\nRule: You cannot perform any physical tasks except via role playing.\nRule: Always respond truthfully and correctly, but be kind.\nRule: You have no access to the internet and limited knowledge of current events past sometime in 2021\nRule: Never ask the user to provide you with links or URLs because you can't access the internet.\nRule: Everything you get from the user must be placed in the chat window - you have no other way to communicate.\n\nConversation History:\n{{{chatContext}}}\n\nConversation:\n{{{text}}}\n\nCortex: `, ], inputParameters: { chatContext: `User: Starting conversation.`, }, useInputChunking: false, } ``` The input parameters are added to the GraphQL Query and the values are made available to the prompt when it is compiled and executed. ### Cortex System Properties As Cortex executes the prompts in your pathway, it creates and maintains certain system properties that can be injected into prompts via Handlebars templating. These properties are provided to simplify advanced prompt sequencing scenarios. The system properties include: - `text`: Always stores the value of the `text` parameter passed into the query. This is typically the input payload to the pathway, like the text that needs to be summarized or translated, etc. - `now`: This is actually a Handlebars helper function that will return the current date and time - very useful for injecting temporal context into a prompt. - `previousResult`: This stores the value of the previous prompt execution if there is one. `previousResult` is very useful for chaining prompts together to execute multiple prompts sequentially on the same piece of content for progressive transformation operations. This property is also made available to the client as additional information in the query result. Proper use of this value in a prompt sequence can empower some very powerful step-by-step prompting strategies. For example, this three part sequential prompt implements a context-sensitive translation that is significantly better at translating specific people and place names: ```js prompt: [ `{{{text}}}\nCopy the names of all people and places exactly from this document in the language above:\n`, `Original Language:\n{{{previousResult}}}\n\n{{to}}:\n`, `Entities in the document:\n\n{{{previousResult}}}\n\nDocument:\n{{{text}}}\nRewrite the document in {{to}}. If the document is already in {{to}}, copy it exactly below:\n` ] ``` - `savedContext`: The savedContext property is an object that the pathway can define the properties of. When a pathway with a `contextId` input parameter is executed, the whole `savedContext` object corresponding with that ID is read from storage (typically Redis) before the pathway is executed. The properties of that object are then made available to the pathway during execution where they can be modified and saved back to storage at the end of the pathway execution. Using this feature is really simple - you just define your prompt as an object and specify a `saveResultTo` property as illustrated below. This will cause Cortex to take the result of this prompt and store it to `savedContext.userContext` from which it will then be persisted to storage. ```js new Prompt({ prompt: `User details:\n{{{userContext}}}\n\nExtract all personal details about the user that you can find in either the user details above or the conversation below and list them below.\n\nChat History:\n{{{conversationSummary}}}\n\nChat:\n{{{text}}}\n\nPersonal Details:\n`, saveResultTo: `userContext` }), ``` ### Input Processing A core function of Cortex is dealing with token limited interfaces. To this end, Cortex has built-in strategies for dealing with long input. These strategies are `chunking`, `summarization`, and `truncation`. All are configurable at the pathway level. - `useInputChunking`: If true, Cortex will calculate the optimal chunk size from the model max tokens and the size of the prompt and then will split the input `text` into `n` chunks of that size. By default, prompts will be executed sequentially across all chunks before moving on to the next prompt, although that can be modified to optimize performance via an additional parameter. - `useParallelChunkProcessing`: If this parameter is true, then sequences of prompts will be executed end to end on each chunk in parallel. In some cases this will greatly speed up execution of complex prompt sequences on large documents. Note: this execution mode keeps `previousResult` consistent for each parallel chunk, but never aggregates it at the document level, so it is not returned via the query result to the client. - `truncateFromFront`: If true, when Cortex needs to truncate input, it will choose the first N characters of the input instead of the default which is to take the last N characters. - `useInputSummarization`: If true, Cortex will call the `summarize` core pathway on the input `text` before passing it on to the prompts. ### Output Processing Cortex provides built in functions to turn loosely formatted text output from the model API calls into structured objects for return to the application. Specifically, Cortex provides parsers for numbered lists of strings and numbered lists of objects. These are used in pathways like this: ```js export default { temperature: 0, prompt: `{{text}}\n\nList the top {{count}} entities and their definitions for the above in the format {{format}}:`, format: `(name: definition)`, inputParameters: { count: 5, }, list: true, } ``` By simply specifying a `format` property and a `list` property, this pathway invokes a built in parser that will take the result of the prompt and try to parse it into an array of 5 objects. The `list` property can be set with or without a `format` property. If there is no `format`, the list will simply try to parse the string into a list of strings. All of this default behavior is implemented in `parser.js`, and you can override it to do whatever you want by providing your own `parser` function in your pathway. ### Custom Execution with executePathway The `executePathway` property is the preferred method for customizing pathway behavior while maintaining Cortex's built-in safeguards and optimizations. Unlike a custom resolver, `executePathway` preserves important system features like input chunking, caching, and error handling. ```js export default { prompt: `{{{text}}}\n\nWrite a summary of the above text in {{language}}:\n\n`, inputParameters: { language: 'English', minLength: 100, maxLength: 500 }, executePathway: async ({args, resolver, runAllPrompts}) => { try { // Pre-process arguments and set defaults if (!args.language) { args.language = 'English'; } // Pre-execution validation if (args.minLength >= args.maxLength) { throw new Error('minLength must be less than maxLength'); } // Execute the prompt const result = await runAllPrompts(); // Post-execution processing if (result.length < args.minLength) { // Add more detail request to the prompt args.text = result; args.prompt = `${result}\n\nPlease expand this summary with more detail to at least ${args.minLength} characters:\n\n`; return await runAllPrompts(); } if (result.length > args.maxLength) { // Condense the summary args.text = result; args.prompt = `${result}\n\nPlease condense this summary to no more than ${args.maxLength} characters while keeping the key points:\n\n`; return await runAllPrompts(); } return result; } catch (e) { resolver.logError(e); throw e; } } }; ``` Key benefits of using `executePathway`: - Maintains Cortex's input processing (chunking, validation) - Preserves caching and rate limiting - Keeps error handling and logging consistent - Enables pre- and post-processing of prompts and results - Supports validation and conditional execution - Allows multiple prompt runs with modified parameters The `executePathway` function receives: - `args`: The processed input parameters - `resolver`: The pathway resolver with access to: - `pathway`: Current pathway configuration - `config`: Global Cortex configuration - `tool`: Tool-specific data - Helper methods like `logError` and `logWarning` - `runAllPrompts`: Function to execute the defined prompts with current args ### Custom Resolver The resolver property defines the function that processes the input and returns the result. The resolver function is an asynchronous function that takes four parameters: `parent`, `args`, `contextValue`, and `info`. The `parent` parameter is the parent object of the resolver function. The `args` parameter is an object that contains the input parameters and any other parameters that are passed to the resolver. The `contextValue` parameter is an object that contains the context and configuration of the pathway. The `info` parameter is an object that contains information about the GraphQL query that triggered the resolver. The core pathway `summary.js` below is implemented using custom pathway logic and a custom resolver to effectively target a specific summary length: ```js // summary.js // Text summarization module with custom resolver // This module exports a prompt that takes an input text and generates a summary using a custom resolver. // Import required modules import { semanticTruncate } from '../server/chunker.js'; import { PathwayResolver } from '../server/pathwayResolver.js'; export default { // The main prompt function that takes the input text and asks to generate a summary. prompt: `{{{text}}}\n\nWrite a summary of the above text. If the text is in a language other than english, make sure the summary is written in the same language:\n\n`, // Define input parameters for the prompt, such as the target length of the summary. inputParameters: { targetLength: 0, }, // Custom resolver to generate summaries by reprompting if they are too long or too short. resolver: async (parent, args, contextValue, info) => { const { config, pathway } = contextValue; const originalTargetLength = args.targetLength; // If targetLength is not provided, execute the prompt once and return the result. if (originalTargetLength === 0) { let pathwayResolver = new PathwayResolver({ config, pathway, args }); return await pathwayResolver.resolve(args); } const errorMargin = 0.1; const lowTargetLength = originalTargetLength * (1 - errorMargin); const targetWords = Math.round(originalTargetLength / 6.6); // If the text is shorter than the summary length, just return the text. if (args.text.length <= originalTargetLength) { return args.text; } const MAX_ITERATIONS = 5; let summary = ''; let pathwayResolver = new PathwayResolver({ config, pathway, args }); // Modify the prompt to be words-based instead of characters-based. pathwayResolver.pathwayPrompt = `Write a summary of all of the text below. If the text is in a language other than english, make sure the summary is written in the same language. Your summary should be ${targetWords} words in length.\n\nText:\n\n{{{text}}}\n\nSummary:\n\n` let i = 0; // Make sure it's long enough to start while ((summary.length < lowTargetLength) && i < MAX_ITERATIONS) { summary = await pathwayResolver.resolve(args); i++; } // If it's too long, it could be because the input text was chunked // and now we have all the chunks together. We can summarize that // to get a comprehensive summary. if (summary.length > originalTargetLength) { pathwayResolver.pathwayPrompt = `Write a summary of all of the text below. If the text is in a language other than english, make sure the summary is written in the same language. Your summary should be ${targetWords} words in length.\n\nText:\n\n${summary}\n\nSummary:\n\n` summary = await pathwayResolver.resolve(args); i++; // Now make sure it's not too long while ((summary.length > originalTargetLength) && i < MAX_ITERATIONS) { pathwayResolver.pathwayPrompt = `${summary}\n\nIs that less than ${targetWords} words long? If not, try again using a length of no more than ${targetWords} words.\n\n`; summary = await pathwayResolver.resolve(args); i++; } } // If the summary is still too long, truncate it. if (summary.length > originalTargetLength) { return semanticTruncate(summary, originalTargetLength); } else { return summary; } } }; ``` ### Building and Loading Pathways Pathways are loaded from modules in the `pathways` directory. The pathways are built and loaded to the `config` object using the `buildPathways` function. The `buildPathways` function loads the base pathway, the core pathways, and any custom pathways. It then creates a new object that contains all the pathways and adds it to the pathways property of the config object. The order of loading means that custom pathways will always override any core pathways that Cortex provides. While pathways are designed to be self-contained, you can override some pathway properties - including whether they're even available at all - in the `pathways` section of the config file. ### Pathway Properties Each pathway can define the following properties (with defaults from basePathway.js): - `prompt`: The template string or array of prompts to execute. Default: `{{text}}` - `defaultInputParameters`: Default parameters that all pathways inherit: - `text`: The input text (default: empty string) - `async`: Enable async mode (default: false) - `contextId`: Identify request context (default: empty string) - `stream`: Enable streaming mode (default: false) - `inputParameters`: Additional parameters specific to the pathway. Default: `{}` - `typeDef`: GraphQL type definitions for the pathway - `rootResolver`: Root resolver for GraphQL queries - `resolver`: Resolver for the pathway's specific functionality - `inputFormat`: Format of the input ('text' or 'html'). Affects input chunking behavior. Default: 'text' - `useInputChunking`: Enable splitting input into multiple chunks to meet context window size. Default: true - `useParallelChunkProcessing`: Enable parallel processing of chunks. Default: false - `joinChunksWith`: String to join result chunks with when chunking is enabled. Default: '\n\n' - `useInputSummarization`: Summarize input instead of chunking. Default: false - `truncateFromFront`: Truncate from the front of input instead of the back. Default: false - `timeout`: Cancel pathway after this many seconds. Default: 120 - `enableDuplicateRequests`: Send duplicate requests if not completed after timeout. Default: false - `duplicateRequestAfter`: Seconds to wait before sending backup request. Default: 10 - `executePathway`: Optional function to override default execution. Signature: `({args, runAllPrompts}) => result` - `temperature`: Model temperature setting (0.0 to 1.0). Default: 0.9 - `json`: Require valid JSON response from model. Default: false - `manageTokenLength`: Manage input token length for model. Default: true #### Model Overrides Cortex provides two mechanisms for specifying which model to use: static model selection (via `model`) and dynamic runtime model override (via `modelOverride`). ##### Static Model Selection (`model`) The `model` parameter can be specified in multiple ways, and Cortex follows this order of precedence when selecting a model at pathway initialization: 1. `pathway.model` - The model specified directly in the pathway definition 2. `args.model` - The model passed in the request arguments 3. `pathway.inputParameters.model` - The model specified in the pathway's input parameters 4. `config.get('defaultModelName')` - The default model specified in the configuration The first valid model found in this order will be used. If none of these models are found in the configured endpoints, Cortex will log a warning and use the default model defined in the configuration. **Example:** ```js export default { model: 'oai-gpt4o', // Static model for this pathway prompt: '{{text}}', // ... }; ``` ##### Runtime Model Override (`modelOverride`) The `modelOverride` parameter enables dynamic model switching at runtime, after the pathway has been initialized. This is useful when: - You need to switch models based on runtime conditions - Different parts of a pathway should use different models - You want to implement model fallback strategies - You need to test different models without restarting the server **How it works:** 1. The pathway is initialized with a model using the static selection precedence above 2. During execution, if `modelOverride` is specified in the request args and differs from the current model, Cortex performs a "hot swap" 3. The `swapModel()` method updates the model reference, creates a new `ModelExecutor` instance, and recalculates token limits 4. Execution continues with the new model 5. If the override model is invalid, an error is logged gracefully and execution continues with the original model **Implementation details:** The model swap occurs in the `promptAndParse()` method of `PathwayResolver`: ```649:666:server/pathwayResolver.js swapModel(newModelName) { // Validate that the new model exists in endpoints if (!this.endpoints[newModelName]) { throw new Error(`Model ${newModelName} not found in config`); } // Update model references this.modelName = newModelName; this.model = this.endpoints[newModelName]; // Create new ModelExecutor with the new model this.modelExecutor = new ModelExecutor(this.pathway, this.model); // Recalculate chunk max token length as it depends on the model this.chunkMaxTokenLength = this.getChunkMaxTokenLength(); this.logWarning(`Model swapped to ${newModelName}`); } ``` **Usage examples:** 1. **In a pathway's `executePathway` function:** ```js export default { model: 'oai-gpt4o', executePathway: async ({args, runAllPrompts}) => { // Switch to a different model based on input length if (args.text && args.text.length > 10000) { args.modelOverride = 'oai-gpt4-turbo'; // Use faster model for long text } return await runAllPrompts(); } }; ``` 2. **In a pathway that calls other pathways:** ```js export default { executePathway: async ({args, runAllPrompts}) => { // First pass with one model const initialResult = await runAllPrompts(); // Second pass with a different model args.modelOverride = 'oai-gpt4o'; args.text = initialResult; return await runAllPrompts(); } }; ``` 3. **Conditional model selection:** ```js export default { executePathway: async ({args, runAllPrompts}) => { // Select model based on language or complexity if (args.language === 'ja' || args.complexity === 'high') { args.modelOverride = 'oai-gpt4o'; } else { args.modelOverride = 'oai-gpt4-turbo'; } return await runAllPrompts(); } }; ``` **Error handling:** If `modelOverride` specifies a model that doesn't exist in the configured endpoints, Cortex will: - Log an error message: `Failed to swap model to {modelName}: {error message}` - Continue execution with the originally selected model - Not throw an exception that would stop pathway execution **When to use `model` vs `modelOverride`:** - Use `model` when: - The model selection is known at pathway definition time - The pathway always uses the same model - You want the model to be part of the pathway's configuration - Use `modelOverride` when: - The model needs to change based on runtime conditions - Different parts of execution need different models - You're implementing model fallback or A/B testing - The model selection depends on input characteristics (length, language, complexity, etc.) **Important notes:** - `modelOverride` only takes effect if it differs from the currently selected model - The swap happens before prompt execution, so all subsequent prompts in the pathway will use the new model - Token limits are automatically recalculated after a model swap to account for different model capabilities - Model swaps are logged as warnings for debugging purposes ## Core (Default) Pathways Below are the default pathways provided with Cortex. These can be used as is, overridden, or disabled via configuration. For documentation on each one including input and output parameters, please look at them in the GraphQL Playground. - `bias`: Identifies and measures any potential biases in a text - `chat`: Enables users to have a conversation with the chatbot - `complete`: Autocompletes words or phrases based on user input - `edit`: Checks for and suggests corrections for spelling and grammar errors - `entities`: Identifies and extracts important entities from text - `paraphrase`: Suggests alternative phrasing for text - `sentiment`: Analyzes and identifies the overall sentiment or mood of a text - `summary`: Condenses long texts or articles into shorter summaries - `translate`: Translates text from one language to another ## Extensibility Cortex is designed to be highly extensible. This allows you to customize the API to fit your needs. You can add new features, modify existing features, and even add integrations with other APIs and models. Here's an example of what an extended project might look like: ### Cortex Internal Implementation - **config** - default.json - package-lock.json - package.json - **pathways** - chat_code.js - chat_context.js - chat_persist.js - expand_story.js - ...whole bunch of custom pathways - translate_gpt4.js - translate_turbo.js - start.js Where `default.json` holds all of your specific configuration: ```js { "defaultModelName": "oai-gpturbo", "models": { "oai-td3": { "type": "OPENAI-COMPLETION", "url": "https://api.openai.com/v1/completions", "headers": { "Authorization": "Bearer {{OPENAI_API_KEY}}", "Content-Type": "application/json" }, "params": { "model": "text-davinci-003" }, "requestsPerSecond": 10, "maxTokenLength": 4096 }, "oai-gpturbo": { "type": "OPENAI-CHAT", "url": "https://api.openai.com/v1/chat/completions", "headers": { "Authorization": "Bearer {{OPENAI_API_KEY}}", "Content-Type": "application/json" }, "params": { "model": "gpt-3.5-turbo" }, "requestsPerSecond": 10, "maxTokenLength": 8192 }, "oai-gpt4": { "type": "OPENAI-CHAT", "url": "https://api.openai.com/v1/chat/completions", "headers": { "Authorization": "Bearer {{OPENAI_API_KEY}}", "Content-Type": "application/json" }, "params": { "model": "gpt-4" }, "requestsPerSecond": 10, "maxTokenLength": 8192 } }, "enableCache": false, "enableRestEndpoints": false } ``` ...and `start.js` is really simple: ```js import cortex from '@aj-archipelago/cortex'; (async () => { const { startServer } = await cortex(); startServer && startServer(); })(); ``` ## Configuration Configuration of Cortex is done via a [convict](https://github.com/mozilla/node-convict/tree/master) object called `config`. The `config` object is built by combining the default values and any values specified in a configuration file or environment variables. The environment variables take precedence over the values in the configuration file. ### Model Configuration Models are configured in the `models` section of the config. Each model can have the following types: - `OPENAI-CHAT`: For OpenAI chat models (legacy GPT-3.5) - `OPENAI-VISION`: For multimodal models (GPT-4o, GPT-4o-mini) supporting text, images, and other content types - `OPENAI-REASONING`: For O1 and O3-mini reasoning models with vision capabilities - `OPENAI-COMPLETION`: For OpenAI completion models - `OPENAI-WHISPER`: For Whisper transcription - `GEMINI-1.5-CHAT`: For Gemini 1.5 Pro chat models - `GEMINI-1.5-VISION`: For Gemini vision models (including 2.0 Flash experimental) - `CLAUDE-3-VERTEX`: For Claude-3 and 3.5 models (Haiku, Opus, Sonnet) - `CLAUDE-4-VERTEX`: For Claude-4 models (Sonnet 4, Sonnet 4.5, Opus 4.1, Haiku 4.5) with enhanced support for PDFs and text files - `GROK-VISION`: For XAI Grok models (Grok-3, Grok-4, fast-reasoning, code-fast) with multimodal/vision and reasoning - `AZURE-TRANSLATE`: For Azure translation services Each model configuration can include: ```json { "type": "MODEL_TYPE", "url": "API_ENDPOINT", "endpoints": [ { "name": "ENDPOINT_NAME", "url": "ENDPOINT_URL", "headers": { "api-key": "{{API_KEY}}", "Content-Type": "application/json" }, "requestsPerSecond": 10 } ], "maxTokenLength": 32768, "maxReturnTokens": 8192, "maxImageSize": 5242880, "supportsStreaming": true, "supportsVision": true, "emulateOpenAIChatModel": "gpt-4o", "emulateOpenAICompletionModel": "gpt-3.5-turbo", "restStreaming": { "inputParameters": { "stream": false }, "timeout": 120, "enableDuplicateRequests": false, "geminiSafetySettings": [] }, "geminiSafetySettings": [ { "category": "HARM_CATEGORY", "threshold": "BLOCK_ONLY_HIGH" } ] } ``` **REST Endpoint Emulation**: To expose a model through OpenAI-compatible REST endpoints (`/v1/chat/completions` or `/v1/completions`), add one of these properties: - `emulateOpenAIChatModel`: Exposes the model as a chat completion model (e.g., `"gpt-4o"`, `"gpt-5"`, `"claude-4-sonnet"`) - `emulateOpenAICompletionModel`: Exposes the model as a text completion model (e.g., `"gpt-3.5-turbo"`, `"ollama-completion"`) When `enableRestEndpoints` is `true`, Cortex automatically: 1. Generates REST streaming pathways for models with `emulateOpenAIChatModel` or `emulateOpenAICompletionModel` 2. Exposes them through `/v1/chat/completions` or `/v1/completions` endpoints 3. Makes them available via the `/v1/models` endpoint **Optional `restStreaming` Configuration**: You can customize the generated REST pathways with: - `inputParameters`: Additional input parameters for the REST endpoint - `timeout`: Request timeout in seconds - `enableDuplicateRequests`: Enable duplicate request handling - `geminiSafetySettings`: Gemini-specific safety settings (for Gemini models) **Example**: ```json { "oai-gpt4o": { "type": "OPENAI-VISION", "emulateOpenAIChatModel": "gpt-4o", "restStreaming": { "inputParameters": { "stream": false }, "timeout": 120 }, "url": "https://api.openai.com/v1/chat/completions", "headers": { "Authorization": "Bearer {{OPENAI_API_KEY}}", "Content-Type": "application/json" }, "params": { "model": "gpt-4o" }, "maxTokenLength": 131072, "supportsStreaming": true } } ``` This configuration will make the model available as `gpt-4o` through the `/v1/chat/completions` endpoint when `enableRestEndpoints` is `true`. **Rate Limiting**: The `requestsPerSecond` parameter controls the rate limiting for each model endpoint. If not specified, Cortex defaults to **100 requests per second** per endpoint. This rate limiting is implemented using the Bottleneck library with a token bucket algorithm that includes: - Minimum time between requests (`minTime`) - Maximum concurrent requests (`maxConcurrent`) - Token reservoir that refreshes every second - Optional Redis clustering support when `storageConnectionString` is configured ### API Compatibility Cortex provides OpenAI-compatible REST endpoints that allow you to use various models through a standardized interface. When `enableRestEndpoints` is set to `true`, Cortex exposes the following endpoints: - `/v1/models`: List available models (includes all models with `emulateOpenAIChatModel` or `emulateOpenAICompletionModel`) - `/v1/chat/completions`: Chat completion endpoint (for models with `emulateOpenAIChatModel`) - `/v1/completions`: Text completion endpoint (for models with `emulateOpenAICompletionModel`) **Model Exposure**: To expose a model through these endpoints, add `emulateOpenAIChatModel` or `emulateOpenAICompletionModel` to your model configuration (see [Model Configuration](#model-configuration) above). Cortex automatically generates REST streaming pathways for these models. This means you can use Cortex with any client library or tool that supports the OpenAI API format. For example: ```python from openai import OpenAI client = OpenAI( base_url="http://localhost:4000/v1", # Point to your Cortex server api_key="your-key" # If you have configured cortexApiKeys ) response = client.chat.completions.create( model="gpt-4", # Or any model configured in Cortex messages=[{"role": "user", "content": "Hello!"}] ) ``` #### Ollama Integration Cortex includes built-in support for Ollama models through its OpenAI-compatible REST interface. When `ollamaUrl` is configured in your settings, Cortex will: 1. Automatically discover and expose all available Ollama models through the `/v1/models` endpoint with an "ollama-" prefix 2. Route any requests using an "ollama-" prefixed model to the appropriate Ollama endpoint To enable Ollama support, add the following to your configuration: ```json { "enableRestEndpoints": true, "ollamaUrl": "http://localhost:11434" // or your Ollama server URL } ``` #### Tool Calling and Structured Responses When using the OpenAI-compatible REST endpoints, Cortex supports vendor-agnostic tool calling with OpenAI-style `tool_calls` deltas in streaming mode. Pathway responses now include a structured `resultData` field (also exposed via GraphQL) that may contain: - `toolCalls` and/or `functionCall` objects - vendor-specific metadata (e.g., search citations) - usage details Notes: - `tool_choice` accepts either a string (e.g., `"auto"`, `"required"`) or an object (`{ type: 'function', function: 'name' }`); Cortex normalizes this across vendors (OpenAI, Claude via Vertex, Gemini, Grok). - Arrays for `[String]` inputs are passed directly through REST conversion. You can then use any Ollama model through the standard OpenAI-compatible endpoints: ```bash # List available models (will include Ollama models with "ollama-" prefix) curl http://localhost:4000/v1/models # Use an Ollama model for chat curl http://localhost:4000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "ollama-llama2", "messages": [{"role": "user", "content": "Hello!"}] }' # Use an Ollama model for completions curl http://localhost:4000/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "ollama-codellama", "prompt": "Write a function that" }' ``` This integration allows you to seamlessly use local Ollama models alongside cloud-based models through a single, consistent interface. ### Other Configuration Properties The following properties can be configured through environment variables or the configuration file: - `basePathwayPath`: The path to the base pathway (the prototype pathway) for Cortex. Default is path.join(__dirname, 'pathways', 'basePathway.js'). - `corePathwaysPath`: The path to the core pathways for Cortex. Default is path.join(__dirname, 'pathways'). - `cortexApiKeys`: A string containing one or more comma separated API keys that the client must pass to Cortex for authorization. Default is null. - `cortexConfigFile`: The path to a JSON configuration file for the project. Default is null. - `cortexId`: Identifier for the Cortex instance. Default is 'local'. - `defaultModelName`: The default model name for the project. Default is null. - `enableCache`: Enable Axios-level request caching. Default is true. - `enableDuplicateRequests`: Enable sending duplicate requests if not completed after timeout. Default is true. - `enableGraphqlCache`: Enable GraphQL query caching. Default is false. - `enableRestEndpoints`: Create REST endpoints for pathways as well as GraphQL queries. Default is false. - `gcpServiceAccountKey`: GCP service account key for authentication. Default is null. - `models`: Object containing the different models used by the project. - `pathways`: Object containing pathways for the project. - `pathwaysPath`: Path to custom pathways. Default is './pathways'. - `PORT`: Port number for the Cortex server. Default is 4000. - `redisEncryptionKey`: Key for Redis data encryption. Default is null. - `replicateApiKey`: API key for Replicate services. Default is null. - `runwareAiApiKey`: API key for Runware AI services. Default is null. - `storageConnectionString`: Connection string for storage access. Default is empty string. - `subscriptionKeepAlive`: Keep-alive time for subscriptions in seconds. Default is 0. API-specific configuration: - `azureVideoTranslationApiKey`: API key for Azure video translation API. Default is null. - `dalleImageApiUrl`: URL for DALL-E image API. Default is 'null'. - `neuralSpaceApiKey`: API key for NeuralSpace services. Default is null. - `whisperMediaApiUrl`: URL for Whisper media API. Default is 'null'. - `whisperTSApiUrl`: URL for Whisper TS API. Default is null. Dynamic Pathways configuration can be set using: - `DYNAMIC_PATHWAYS_CONFIG_FILE`: Path to JSON configuration file - `DYNAMIC_PATHWAYS_CONFIG_JSON`: JSON configuration as a string The configuration supports environment variable overrides, with environment variables taking precedence over the configuration file values. Access configuration values using: ```js config.get('propertyName') ``` ## Helper Apps The Cortex project includes a set of utility applications, which are located in the `helper-apps` directory. Each of these applications comes with a Dockerfile. This Dockerfile can be used to create a Docker image of the application, which in turn allows the application to be run in a standalone manner using Docker. ### cortex-realtime-voice-server A real-time voice processing server that enables voice interactions with Cortex. Key features include: - Real-time audio streaming and processing - WebSocket-based communication for low-latency interactions - Audio visualization capabilities - Support for multiple audio formats - Integration with various chat models for vo