bods-data-extractor
Version:
Convert BODS UK dataset bus line data from XML to JSON
204 lines (155 loc) ⢠5.75 kB
Markdown
# BODS Data Extractor
[](https://github.com/DRFR0ST/bods-data-extractor-js/actions/workflows/test.yml)
A TypeScript library and CLI tool for converting BODS (Bus Open Data Service) UK dataset bus line data from XML to structured JSON format.
## Features
- š Converts BODS XML files to structured JSON
- š Extracts stop points, vehicle journeys, and location data
- šÆ Type-safe TypeScript implementation
- š„ļø Command-line interface for batch processing
- š¦ Can be used as a library in other projects
- ā
Comprehensive test coverage with snapshot testing
- ā” Built with Bun for fast performance
## Installation
### As a CLI tool
```bash
# Clone the repository
git clone https://github.com/DRFR0ST/bods-data-extractor-js.git
cd bods-data-extractor-js
# Install dependencies
bun install
# Make CLI globally available (optional)
bun link
```
### As a library
```bash
bun add bods-data-extractor
# or
npm install bods-data-extractor
```
## Usage
### Command Line Interface
```bash
# Convert a single XML file
bun run cli input/file.xml
# Convert to specific output directory
bun run cli input/file.xml output/
# Process multiple files
for file in input/*.xml; do
bun run cli "$file" output/
done
```
### As a Library
```typescript
import { convertBodsXmlToJson } from 'bods-data-extractor';
// Convert XML file to structured JSON
const result = convertBodsXmlToJson('./path/to/bods-file.xml');
console.log(result);
// {
// stopPoints: [...],
// location: [...],
// startTime: [...]
// }
```
### Output Structure
The converter produces a structured JSON object with the following format:
```typescript
interface BodsOutput {
stopPoints: StopPoint[]; // Bus stops with IDs and names
location: Location[]; // Geographic coordinates for route segments
startTime: VehicleJourney[]; // Journey schedules and timing
}
interface StopPoint {
id: string;
name: string;
}
interface Location {
from: string;
to: string;
longitude: string;
latitude: string;
}
interface VehicleJourney {
time: string;
VehicleJourneyCode: string;
routeSegments: RouteSegment[];
}
```
## Development
### Prerequisites
- [Bun](https://bun.sh/) runtime
- TypeScript 5+
### Setup
```bash
# Clone and install dependencies
git clone https://github.com/DRFR0ST/bods-data-extractor-js.git
cd bods-data-extractor-js
bun install
```
### Running Tests
```bash
# Run all tests
bun test
# Run tests in watch mode
bun test --watch
# Run with coverage
bun test --coverage
```
### Project Structure
```
src/
āāā types/bods.ts # TypeScript type definitions
āāā utils/xml-parser.ts # XML parsing utilities
āāā extractors/
ā āāā stop-points.ts # Stop point extraction
ā āāā vehicle-journeys.ts # Vehicle journey extraction
ā āāā locations.ts # Location data extraction
ā āāā journey-pattern-timing-links.ts
āāā converter.ts # Main conversion logic
āāā cli.ts # Command line interface
āāā index.ts # Library exports
test/
āāā converter.test.ts # Unit tests for converter
āāā cli.test.ts # CLI integration tests
āāā fixtures/ # Test XML files
āāā snapshots/ # Expected output snapshots
```
### Scripts
```bash
bun run start # Run the main script
bun run cli # Run CLI tool
bun run test # Run tests
bun run test:watch # Run tests in watch mode
bun run test:coverage # Run tests with lcov coverage report
bun run test:coverage-text # Run tests with text coverage output
bun run dev # Run CLI in development mode with watch
bun run build # Build for distribution
bun run typecheck # Type checking only
```
## Testing
The project includes comprehensive tests:
- **Unit Tests**: Test individual components and functions
- **Integration Tests**: Test the CLI and end-to-end conversion
- **Snapshot Tests**: Ensure output format consistency across changes
### Snapshot Testing
The project uses snapshot tests to ensure the output structure remains consistent. When the expected output changes legitimately, you can update snapshots by deleting the snapshot files and running tests again.
## CI/CD
The project includes GitHub Actions workflows that:
- Run tests on multiple operating systems (Ubuntu, Windows, macOS)
- Test with different Bun versions
- Generate test coverage reports
- Validate CLI functionality with real files
- Test package builds
## Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes
4. Add tests for your changes
5. Ensure all tests pass (`bun test`)
6. Commit your changes (`git commit -m 'Add amazing feature'`)
7. Push to the branch (`git push origin feature/amazing-feature`)
8. Open a Pull Request
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## About BODS
The Bus Open Data Service (BODS) is a UK government initiative that provides access to bus data across England. This tool helps convert the XML format used by BODS into a more developer-friendly JSON structure.
For more information about BODS, visit: https://www.bus-data.dft.gov.uk/