UNPKG

@dataset.sh/client

Version:

TypeScript client library for dataset.sh - A powerful dataset management system supporting both local and remote storage with seamless transfer capabilities.

249 lines (195 loc) • 6.09 kB
# @dataset.sh/client TypeScript client library for dataset.sh - A powerful dataset management system supporting both local and remote storage with seamless transfer capabilities. ## Installation ```bash npm install @dataset.sh/client # or pnpm add @dataset.sh/client # or yarn add @dataset.sh/client ``` ## Features - šŸ’¾ **Local Storage**: Manage datasets on your local filesystem - ā˜ļø **Remote Storage**: Connect to dataset.sh servers - šŸ”„ **Seamless Transfer**: Upload/download with resume support - šŸ“¦ **Version Control**: Track dataset versions with checksums - šŸ·ļø **Tagging System**: Tag versions for easy reference - šŸ“Š **Progress Tracking**: Monitor transfer progress in real-time - šŸ” **Authentication**: Secure API key authentication ## Quick Start ### Local Storage ```typescript import { LocalStorage } from '@dataset.sh/client'; // Initialize local storage const storage = new LocalStorage({ location: './my-datasets' }); // Create a dataset const dataset = storage.dataset('my-namespace/my-dataset'); // Import data from collections await dataset.importData({ users: { name: 'users', data: [ { id: 1, name: 'Alice', age: 30 }, { id: 2, name: 'Bob', age: 25 } ], typeAnnotation: 'Array<{id: number, name: string, age: number}>' } }, null, ['v1.0', 'latest'], 'My user dataset'); // Access the latest version const latest = dataset.latest(); if (latest) { const reader = latest.open(); const users = reader.collection('users'); for (const user of users) { console.log(user); } } ``` ### Remote Storage ```typescript import { RemoteClient } from '@dataset.sh/client'; // Initialize remote client const client = new RemoteClient({ host: 'https://api.dataset.sh', accessKey: 'your-api-key' }); // Access a remote dataset const dataset = client.dataset('my-namespace/my-dataset'); // List all versions const versions = await dataset.versions(); console.log('Available versions:', versions.map(v => v.getVersion())); // Get latest version const latest = await dataset.latest(); if (latest) { // Read README const readme = await latest.getReadme(); console.log(readme); } ``` ## Usage Guide ### Local Client API #### LocalStorage The main entry point for local dataset operations. ```typescript import { LocalStorage } from '@dataset.sh/client'; // Initialize with custom location const storage = new LocalStorage({ location: '/path/to/datasets' }); // List all namespaces const namespaces = storage.namespaces(); // Access a specific namespace const namespace = storage.namespace('my-namespace'); // List all datasets const datasets = storage.datasets(); // Access a specific dataset const dataset = storage.dataset('namespace/dataset-name'); ``` #### LocalDataset Manage dataset versions and tags. ```typescript // Import from file const version = dataset.importFile('./data.dataset', { replace: false, // Don't replace if version exists removeSource: false, // Keep source file tags: ['v1.0'], // Apply tags asLatest: true // Mark as latest }); // Import from data const version = await dataset.importData( { collection1: { name: 'collection1', data: [...] }, collection2: { name: 'collection2', data: [...] } }, null, // Type dictionary (optional) ['v2.0', 'latest'], // Tags 'Dataset description' ); // Manage versions const versions = dataset.versions(); // List all versions const v1 = dataset.version('abc123...'); // Get specific version const latest = dataset.latest(); // Get latest tagged version // Manage tags dataset.setTag('stable', 'abc123...'); dataset.removeTag('beta'); const tags = dataset.tags(); // Get all tags const version = dataset.resolveTag('stable'); // Resolve tag to version ID ``` ### Transfer Between Local and Remote The transfer module provides seamless data movement between local and remote storage. #### Download from Remote to Local ```typescript import { LocalStorage, RemoteClient, download, downloadToFile, ConsoleDownloadProgressReporter } from '@dataset.sh/client'; // Setup const remote = new RemoteClient({ host: 'https://api.dataset.sh', accessKey: 'key' }); const local = new LocalStorage({ location: './datasets' }); // Get source and target const remoteDataset = remote.dataset('namespace/dataset'); const localDataset = local.dataset('namespace/dataset'); const remoteVersion = await remoteDataset.latest(); const localVersion = localDataset.version('abc123'); // Target version // Download with progress tracking await download( remoteVersion, localVersion, new ConsoleDownloadProgressReporter() ); // Or download to specific file await downloadToFile( remoteVersion, './downloads/dataset.zip', new ConsoleDownloadProgressReporter() ); ``` #### Upload from Local to Remote ```typescript import { upload, uploadFromFile, ConsoleUploadProgressReporter } from '@dataset.sh/client'; // Get source and target const localVersion = localDataset.latest(); const remoteVersion = remoteDataset.version('new-version-id'); // Upload with progress tracking await upload( localVersion, remoteVersion, new ConsoleUploadProgressReporter() ); // Or upload from file await uploadFromFile( './my-dataset.zip', remoteVersion, new ConsoleUploadProgressReporter() ); ``` ## Storage Structure ### Local Storage Layout ``` storage-base/ ā”œā”€ā”€ namespace1/ │ ā”œā”€ā”€ dataset1/ │ │ ā”œā”€ā”€ version/ │ │ │ ā”œā”€ā”€ abc123.../ │ │ │ │ └── abc123.../ │ │ │ │ ā”œā”€ā”€ file.dataset │ │ │ │ ā”œā”€ā”€ readme │ │ │ │ └── cache/ │ │ │ │ ā”œā”€ā”€ meta.json │ │ │ │ ā”œā”€ā”€ data_sample_*.jsonl │ │ │ │ └── typing_*.tl │ │ │ └── def456.../ │ │ └── tag/ │ │ ā”œā”€ā”€ latest │ │ ā”œā”€ā”€ v1.0 │ │ └── stable │ └── dataset2/ └── namespace2/ ``` ## License MIT