@dataset.sh/client
Version:
TypeScript client library for dataset.sh - A powerful dataset management system supporting both local and remote storage with seamless transfer capabilities.
249 lines (195 loc) ⢠6.09 kB
Markdown
for dataset.sh - A powerful dataset management system supporting both local and remote storage with seamless transfer capabilities.
```bash
npm install @dataset.sh/client
pnpm add @dataset.sh/client
yarn add @dataset.sh/client
```
- š¾ **Local Storage**: Manage datasets on your local filesystem
- āļø **Remote Storage**: Connect to dataset.sh servers
- š **Seamless Transfer**: Upload/download with resume support
- š¦ **Version Control**: Track dataset versions with checksums
- š·ļø **Tagging System**: Tag versions for easy reference
- š **Progress Tracking**: Monitor transfer progress in real-time
- š **Authentication**: Secure API key authentication
```typescript
import { LocalStorage } from '@dataset.sh/client';
// Initialize local storage
const storage = new LocalStorage({ location: './my-datasets' });
// Create a dataset
const dataset = storage.dataset('my-namespace/my-dataset');
// Import data from collections
await dataset.importData({
users: {
name: 'users',
data: [
{ id: 1, name: 'Alice', age: 30 },
{ id: 2, name: 'Bob', age: 25 }
],
typeAnnotation: 'Array<{id: number, name: string, age: number}>'
}
}, null, ['v1.0', 'latest'], 'My user dataset');
// Access the latest version
const latest = dataset.latest();
if (latest) {
const reader = latest.open();
const users = reader.collection('users');
for (const user of users) {
console.log(user);
}
}
```
```typescript
import { RemoteClient } from '@dataset.sh/client';
// Initialize remote client
const client = new RemoteClient({
host: 'https://api.dataset.sh',
accessKey: 'your-api-key'
});
// Access a remote dataset
const dataset = client.dataset('my-namespace/my-dataset');
// List all versions
const versions = await dataset.versions();
console.log('Available versions:', versions.map(v => v.getVersion()));
// Get latest version
const latest = await dataset.latest();
if (latest) {
// Read README
const readme = await latest.getReadme();
console.log(readme);
}
```
The main entry point for local dataset operations.
```typescript
import { LocalStorage } from '@dataset.sh/client';
// Initialize with custom location
const storage = new LocalStorage({ location: '/path/to/datasets' });
// List all namespaces
const namespaces = storage.namespaces();
// Access a specific namespace
const namespace = storage.namespace('my-namespace');
// List all datasets
const datasets = storage.datasets();
// Access a specific dataset
const dataset = storage.dataset('namespace/dataset-name');
```
Manage dataset versions and tags.
```typescript
// Import from file
const version = dataset.importFile('./data.dataset', {
replace: false, // Don't replace if version exists
removeSource: false, // Keep source file
tags: ['v1.0'], // Apply tags
asLatest: true // Mark as latest
});
// Import from data
const version = await dataset.importData(
{
collection1: { name: 'collection1', data: [...] },
collection2: { name: 'collection2', data: [...] }
},
null, // Type dictionary (optional)
['v2.0', 'latest'], // Tags
'Dataset description'
);
// Manage versions
const versions = dataset.versions(); // List all versions
const v1 = dataset.version('abc123...'); // Get specific version
const latest = dataset.latest(); // Get latest tagged version
// Manage tags
dataset.setTag('stable', 'abc123...');
dataset.removeTag('beta');
const tags = dataset.tags(); // Get all tags
const version = dataset.resolveTag('stable'); // Resolve tag to version ID
```
The transfer module provides seamless data movement between local and remote storage.
```typescript
import {
LocalStorage,
RemoteClient,
download,
downloadToFile,
ConsoleDownloadProgressReporter
} from '@dataset.sh/client';
// Setup
const remote = new RemoteClient({ host: 'https://api.dataset.sh', accessKey: 'key' });
const local = new LocalStorage({ location: './datasets' });
// Get source and target
const remoteDataset = remote.dataset('namespace/dataset');
const localDataset = local.dataset('namespace/dataset');
const remoteVersion = await remoteDataset.latest();
const localVersion = localDataset.version('abc123'); // Target version
// Download with progress tracking
await download(
remoteVersion,
localVersion,
new ConsoleDownloadProgressReporter()
);
// Or download to specific file
await downloadToFile(
remoteVersion,
'./downloads/dataset.zip',
new ConsoleDownloadProgressReporter()
);
```
```typescript
import {
upload,
uploadFromFile,
ConsoleUploadProgressReporter
} from '@dataset.sh/client';
// Get source and target
const localVersion = localDataset.latest();
const remoteVersion = remoteDataset.version('new-version-id');
// Upload with progress tracking
await upload(
localVersion,
remoteVersion,
new ConsoleUploadProgressReporter()
);
// Or upload from file
await uploadFromFile(
'./my-dataset.zip',
remoteVersion,
new ConsoleUploadProgressReporter()
);
```
```
storage-base/
āāā namespace1/
ā āāā dataset1/
ā ā āāā version/
ā ā ā āāā abc123.../
ā ā ā ā āāā abc123.../
ā ā ā ā āāā file.dataset
ā ā ā ā āāā readme
ā ā ā ā āāā cache/
ā ā ā ā āāā meta.json
ā ā ā ā āāā data_sample_*.jsonl
ā ā ā ā āāā typing_*.tl
ā ā ā āāā def456.../
ā ā āāā tag/
ā ā āāā latest
ā ā āāā v1.0
ā ā āāā stable
ā āāā dataset2/
āāā namespace2/
```
MIT
TypeScript client library