UNPKG

@beshkenadze/eyecite

Version:

TypeScript library for extracting legal citations from text strings. A complete port of the Python eyecite library.

237 lines (169 loc) 7.48 kB
# @beshkenadze/eyecite [![npm version](https://badge.fury.io/js/%40beshkenadze%2Feyecite.svg)](https://badge.fury.io/js/%40beshkenadze%2Feyecite) [![License: BSD-2-Clause](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause) A TypeScript library for extracting legal citations from text strings. This is a complete port of the Python [eyecite](https://github.com/freelawproject/eyecite) library, providing fast and accurate legal citation extraction for JavaScript and TypeScript applications. ## Features - **Comprehensive citation extraction**: Identifies case citations, short form citations, supra citations, and id citations - **Rich metadata extraction**: Extracts case names, pin cites, years, courts, and parentheticals - **Robust parsing**: Handles complex citation formats, nested parentheticals, and parallel citations - **High performance**: Built with modern TypeScript and optimized for speed - **TypeScript-first**: Full type definitions and excellent IDE support - **Well-tested**: Comprehensive test suite with 100+ passing tests ## Installation ```bash npm install @beshkenadze/eyecite ``` ```bash yarn add @beshkenadze/eyecite ``` ```bash pnpm add @beshkenadze/eyecite ``` ```bash bun add @beshkenadze/eyecite ``` ## Quick Start ```typescript import { getCitations } from '@beshkenadze/eyecite' const text = 'See Lissner v. Test, 1 U.S. 1, 5 (1982).' const citations = getCitations(text) console.log(citations[0].toString()) // "1 U.S. 1" console.log(citations[0].metadata.plaintiff) // "Lissner" console.log(citations[0].metadata.defendant) // "Test" console.log(citations[0].metadata.pinCite) // "5" console.log(citations[0].year) // 1982 ``` ## Supported Citation Types ### Full Case Citations Extracts complete case citations with case names, reporters, and metadata: ```typescript const citations = getCitations('Lissner v. Test, 1 U.S. 1, 5 (1982)') // Extracts: volume=1, reporter=U.S., page=1, pinCite=5, year=1982 ``` ### Short Form Citations Handles abbreviated citations referring to previously cited cases: ```typescript const citations = getCitations('1 U.S. at 5') // Extracts short form with pin cite ``` ### Supra Citations Identifies supra references: ```typescript const citations = getCitations('Lissner, supra, at 5') // Extracts supra citation with antecedent and pin cite ``` ### Id Citations Recognizes id. citations: ```typescript const citations = getCitations('Id. at 5') // Extracts id citation with pin cite ``` ## Advanced Usage ### Text Cleaning Clean text before citation extraction: ```typescript import { getCitations, cleanText } from 'eyecite-ts' const dirtyText = 'See Lissner v. Test, 1 U.S. 1' const cleanedText = cleanText(dirtyText, ['all_whitespace']) const citations = getCitations(cleanedText) ``` ### Custom Tokenization Use custom tokenizers for specialized needs: ```typescript import { getCitations, DefaultTokenizer, REPORTERS } from 'eyecite-ts' const customTokenizer = new DefaultTokenizer(REPORTERS) const citations = getCitations(text, false, customTokenizer) ``` ### Parallel Citations Handle parallel citations with metadata sharing: ```typescript const text = 'Lissner v. Test, 1 U.S. 1, 1 S. Ct. 2 (1982)' const citations = getCitations(text) // Both citations share metadata (year, case names, etc.) ``` ## API Reference ### `getCitations(text, removeAmbiguous?, tokenizer?, markupText?, cleanSteps?)` Main function to extract citations from text. **Parameters:** - `text` (string): The text to extract citations from - `removeAmbiguous` (boolean, optional): Whether to remove ambiguous citations - `tokenizer` (Tokenizer, optional): Custom tokenizer instance - `markupText` (string, optional): HTML markup version of the text - `cleanSteps` (string[], optional): Text cleaning steps to apply **Returns:** `CitationBase[]` - Array of extracted citations ### Citation Types - `FullCaseCitation`: Complete case citations with full metadata - `ShortCaseCitation`: Abbreviated citations (e.g., "1 U.S. at 5") - `SupraCitation`: Supra references - `IdCitation`: Id. citations - `UnknownCitation`: Unrecognized citation patterns ### Utility Functions - `cleanText(text, steps)`: Clean text using specified cleaning steps - `resolveCitations(citations)`: Resolve supra and id citations - `filterCitations(citations)`: Remove overlapping citations ## Implementation Status eyecite-ts provides comprehensive support for case citations with the following features: ### ✅ Fully Implemented - **Full case citations** with complete metadata extraction - **Short form citations** including "at" format - **Supra and Id citations** with antecedent resolution - **Parallel citations** with metadata sharing - **Nested parentheticals** with balanced parsing - **Citation filtering** and overlap detection - **Text cleaning** utilities - **Custom tokenization** support - **Reference citation extraction** using resolved case names ### ⚠️ Planned Features - **Law citations**: Statutory citations (e.g., "42 U.S.C. § 1983") - requires LAWS data - **Journal citations**: Law review citations - requires JOURNALS data - **HTML annotation**: Citation markup in HTML - annotation system planned ## Performance eyecite-ts is optimized for performance: - **Fast tokenization**: Efficient regex-based tokenization - **Minimal allocations**: Optimized for low memory usage - **Batch processing**: Handles large documents efficiently - **TypeScript optimizations**: Built with modern TypeScript features ## Testing The library includes a comprehensive test suite with 100+ tests covering all citation types and edge cases: ```bash npm test ``` ## Development This project uses [Bun](https://bun.sh) as the primary runtime and package manager. ### Setup ```bash # Install dependencies bun install # Run tests bun test # Type check bun run typecheck # Format and lint with Biome bun run check # Build for production bun run build ``` ### Project Structure ``` eyecite-ts/ ├── src/ │ ├── models/ # Citation and token type definitions │ ├── tokenizers/ # Text tokenization logic │ ├── find.ts # Citation extraction │ ├── resolve.ts # Citation resolution │ ├── clean.ts # Text cleaning utilities │ ├── helpers.ts # Helper functions │ └── data/ # Reporter and court databases ├── tests/ # Comprehensive test suite └── dist/ # Built output (ESM + CJS) ``` ## Contributing Contributions are welcome! Please read our contributing guidelines and submit pull requests to the [GitHub repository](https://github.com/freelawproject/eyecite-ts). ## License This project is licensed under the BSD-2-Clause License - see the [LICENSE](LICENSE) file for details. ## Credits This is a TypeScript port of the Python [eyecite](https://github.com/freelawproject/eyecite) library created by the [Free Law Project](https://free.law/). The original library is used to process millions of legal documents for [CourtListener](https://www.courtlistener.com/) and Harvard's [Caselaw Access Project](https://case.law/). ## Support - 📚 [Documentation](https://github.com/freelawproject/eyecite-ts#readme) - 🐛 [Issue Tracker](https://github.com/freelawproject/eyecite-ts/issues) - 💬 [Discussions](https://github.com/freelawproject/eyecite-ts/discussions)