@hpbyte/h-codex-core
Version:
Core indexing and search functionality for h-codex
125 lines (90 loc) • 3.5 kB
Markdown
Core package for h-codex semantic code indexing and search.
- **AST-Based Chunking**: Parse code using tree-sitter for intelligent chunk boundaries
- **Semantic Embeddings**: Generate embeddings using OpenAI text-embedding models
- **File Discovery**: Explore codebases with configurable ignore patterns
- **Vector Search**: Store and search embeddings in PostgreSQL with pgvector
```bash
pnpm add @hpbyte/h-codex-core
```
Create a `.env` file with:
```
LLM_API_KEY=your_llm_api_key_here
LLM_BASE_URL=your_llm_base_url_here (default is openai baseurl: https://api.openai.com/v1)
EMBEDDING_MODEL=text-embedding-3-small
DB_CONNECTION_STRING=postgresql://postgres:password@localhost:5432/h-codex
```
```typescript
import { indexer, semanticSearch } from '@hpbyte/h-codex-core'
// Index a codebase
const indexResult = await indexer.index('./path/to/codebase')
console.log(`Indexed ${indexResult.indexedFiles} files and ${indexResult.totalChunks} code chunks`)
// Search for code
const searchResults = await semanticSearch.search('database connection implementation')
console.log(searchResults)
```
Indexes code repositories by exploring files, chunking code, and generating embeddings.
```typescript
const stats = await indexer.index(
path: string, // Path to the codebase
options?: {
ignorePatterns?: string[], // Additional glob patterns to ignore
maxChunkSize?: number // Override default chunk size
}
): Promise<{
indexedFiles: number, // Number of indexed files
totalChunks: number // Total code chunks created
}>
```
Search indexed code using natural language queries.
```typescript
const results = await semanticSearch.search(
query: string, // Natural language search query
options?: {
limit?: number, // Max results to return (default: 10)
threshold?: number // Minimum similarity score (default: 0.5)
}
): Promise<Array<{
id: string, // Chunk identifier
content: string, // Code content
relativePath: string, // File path relative to indexed root
absolutePath: string, // Absolute file path
language: string, // Programming language
startLine: number, // Starting line in file
endLine: number, // Ending line in file
score: number // Similarity score (0-1)
}>>
```
- **Explorer** (`ingestion/explorer/`) - Discover files in repositories
- **Chunker** (`ingestion/chunker/`) - Parse and chunk code using AST
- **Embedder** (`ingestion/embedder/`) - Generate semantic embeddings
- **Indexer** (`ingestion/indexer/`) - Orchestrate the full ingestion pipeline
- **Repository** (`storage/repository/`) - Database operations for chunks and embeddings
- **Schema** (`storage/schema/`) - Drizzle ORM schema definitions
- **Migrations** - Managed with Drizzle ORM
- **Semantic Search** (`search/`) - Vector similarity search with filtering
```bash
pnpm install
pnpm run db:migrate
pnpm build
pnpm dev
```
This project is licensed under the MIT License.