url-to-json-markdown
Version:
A TypeScript library that fetches URLs and converts them to structured JSON and Markdown format.
120 lines (89 loc) • 3.53 kB
Markdown
[](https://www.npmjs.com/package/url-to-json-markdown)
A TypeScript library that fetches URLs and converts them to structured JSON and Markdown format.
Built by [16x Writer](https://writer.16x.engineer/) and [16x Eval](https://eval.16x.engineer/) team.
```bash
npm install url-to-json-markdown
```
```typescript
import { urlToJsonMarkdown } from 'url-to-json-markdown';
// Reddit post (using fallback without credentials)
const post = await urlToJsonMarkdown(
'https://www.reddit.com/r/example/comments/12345/title/'
);
console.log(post.title); // "Post Title"
console.log(post.content); // "# Post Title\n\nPost content...\n\nby _username_ (↑ 123) 12/25/2024"
console.log(post.type); // "reddit"
// Reddit post with credentials (more reliable)
const postWithCreds = await urlToJsonMarkdown(
'https://www.reddit.com/r/example/comments/12345/title/',
{
clientId: 'your_client_id',
clientSecret: 'your_client_secret',
}
);
// Reddit post with comments included
const postWithComments = await urlToJsonMarkdown(
'https://www.reddit.com/r/example/comments/12345/title/',
{
clientId: 'your_client_id',
clientSecret: 'your_client_secret',
includeComments: true,
}
);
// Will include "## Comments" section with tree-structured comments
// Reddit comment
const comment = await urlToJsonMarkdown(
'https://www.reddit.com/r/example/comments/12345/comment/abc123/'
);
console.log(comment.title); // "First line of comment..."
console.log(comment.content); // "# Comment by username\n\nComment text...\n\nby _username_ (↑ 45)"
console.log(comment.type); // "reddit"
// Reddit comment with child comments/replies
const commentWithReplies = await urlToJsonMarkdown(
'https://www.reddit.com/r/example/comments/12345/comment/abc123/',
{ includeComments: true }
);
// Will include "## Replies" section with tree-structured child comments
// Generic web page
const webpage = await urlToJsonMarkdown('https://example.com/article');
console.log(webpage.title); // "Article Title"
console.log(webpage.content); // "# Article Title\n\nMain content as markdown..."
console.log(webpage.type); // "generic"
```
**Parameters:**
- `url` - The URL to fetch and convert
- `options` - Optional Reddit configuration
**Reddit Options:**
```typescript
interface RedditOptions {
clientId?: string;
clientSecret?: string;
includeComments?: boolean;
}
```
- `clientId` & `clientSecret` - Reddit API credentials for more reliable access
- `includeComments` - Include comments in a tree structure (Reddit posts) or child comments/replies (Reddit comments)
**Return Type:**
```typescript
interface UrlToJsonResult {
title: string;
content: string;
type: 'reddit' | 'generic';
}
```
For Reddit URLs, the library supports two modes:
1. **Fallback mode (no credentials)**: Uses browser user agent to access Reddit's public JSON API. May be subject to rate limiting.
2. **Authenticated mode (with credentials)**: Uses Reddit OAuth API for more reliable access. Requires Reddit app credentials.
To get Reddit credentials:
1. Go to https://www.reddit.com/prefs/apps
2. Create a new app (script type)
3. Use the client ID and secret
## Supported URLs
- **Reddit**: Posts and comments from reddit.com
- **Generic**: Any website with automatic content extraction