UNPKG

mongo-checker

Version:

CLI tool for searching duplicate values in a MongoDB collection by a chosen field.

119 lines (85 loc) 4.77 kB
[![Members](https://img.shields.io/badge/dynamic/json?style=for-the-badge&label=&logo=discord&logoColor=white&labelColor=black&color=%23f3f3f3&query=$.approximate_member_count&url=https%3A%2F%2Fdiscord.com%2Fapi%2Finvites%2FENB7RbxVZE%3Fwith_counts%3Dtrue)](https://discord.gg/ENB7RbxVZE)&nbsp;[![MIT License](https://img.shields.io/badge/license-MIT-blue.svg?style=for-the-badge&logo=5865F2&logoColor=black&labelColor=black&color=%23f3f3f3)](https://github.com/AndrewShedov/mongoChecker/blob/main/LICENSE) # mongoChecker CLI tool for searching duplicate values in a MongoDB collection by a chosen field. ### Features - Finds duplicates by any user-defined field (**"createdAt"**, **"text"**, **"price"**, etc). - Dates are formatted as ISO. Arrays and objects are output via JSON.stringify. - Works with a configuration file containing: <code>uri</code>, <code>db</code>, <code>collection</code>, <code>field</code>, <code>allowDiskUse</code>, <code>maxDuplicatesToShow</code>. - Informative logs: <img src="https://raw.githubusercontent.com/AndrewShedov/mongoChecker/refs/heads/main/assets/screenshot_1.png" width="450" /> In the screenshot, an example of checking the collection **"posts"** (10,000,000 documents) by field **"createdAt"**. Documents created in [turboMaker](https://www.npmjs.com/package/turbo-maker), with value - timeStepMs: 0. ### How it works **Aggregation pipeline:** ```js const duplicates = await coll.aggregate( [ { $group: { _id: `$${field}`, count: { $sum: 1 } } }, { $match: { count: { $gt: 1 } } }, { $sort: { count: -1 } } ], { allowDiskUse } ).toArray(); ``` <code>_id</code> in the group stage → the field value (date/string/number/object/array). **Output formatting:** <br/> Date → <code>ISO string</code><br/> Array/Object → <code>JSON.stringify</code><br/> Other → <code>String(value)</code><br/> ### Installation & Usage 1. Install the package: ```bash npm i mongo-checker ``` 2. Add a script in your **package.json**: ```json "scripts": { "mongoChecker": "mongo-checker" } ``` 3. In the root of the project, create a file — [mongo-checker.config.js](https://github.com/AndrewShedov/mongoChecker/blob/main/config%20example/mongo-checker.config.js). Example of file contents: ```js export default { uri: "mongodb://127.0.0.1:27017", db: "crystalTest", collection: "posts", field: "createdAt", allowDiskUse: true, maxDuplicatesToShow: 5 }; ``` **⚠️ All parameters are required.** 4. Run from the project root: ```bash npm run mongoChecker ``` ## Config parameters explained <code>allowDiskUse: true</code> **MongoDB is allowed to use temporary disk space for intermediate data.** <br/> **When to enable:** <br/> - With a small amount of RAM.<br/> - For large collections (tens of millions of documents or more), to avoid out-of-memory errors.<br/> **Drawbacks:** <br/> - Disk is slower than RAM → query execution can be significantly slower. - If the disk is heavily used, other operations may slow down as well. <code>allowDiskUse: false</code> **MongoDB processes data only in RAM.** <br/> - For small collections (up to ~1M documents), this is usually faster. - For huge collections, the operation may fail with an out-of-memory error. In-memory operations are often much faster than disk-based ones - **allowDiskUse: true**. <code>maxDuplicatesToShow</code> Limits the maximum number of duplicate values displayed in the output. An example of mongoChecker in operation: <br> <p align="center"> <a href="https://youtu.be/5V4otU4KZaA?t=82"> <img src="https://raw.githubusercontent.com/AndrewShedov/mongoChecker/refs/heads/main/assets/screenshot_2.png" style="width: 100%; max-width: 100%;" alt="CRYSTAL v1.0 features"/> </a> </p> [![SHEDOV.TOP](https://img.shields.io/badge/SHEDOV.TOP-black?style=for-the-badge)](https://shedov.top/) [![CRYSTAL](https://img.shields.io/badge/CRYSTAL-black?style=for-the-badge)](https://crysty.ru/AndrewShedov) [![Discord](https://img.shields.io/badge/Discord-black?style=for-the-badge&logo=discord&color=black&logoColor=white)](https://discord.gg/ENB7RbxVZE) [![Telegram](https://img.shields.io/badge/Telegram-black?style=for-the-badge&logo=telegram&color=black&logoColor=white)](https://t.me/ShedovTop) [![X](https://img.shields.io/badge/%20-black?style=for-the-badge&logo=x&logoColor=white)](https://x.com/AndrewShedov) [![VK](https://img.shields.io/badge/VK-black?style=for-the-badge&logo=vk)](https://vk.com/ShedovTop) [![VK Video](https://img.shields.io/badge/VK%20Video-black?style=for-the-badge&logo=vk)](https://vkvideo.ru/@ShedovTop) [![YouTube](https://img.shields.io/badge/YouTube-black?style=for-the-badge&logo=youtube)](https://www.youtube.com/@AndrewShedov)