UNPKG

@fractal-solutions/xgboost-js

Version:

A pure JavaScript implementation of XGBoost for both Node.js and browser environments

361 lines (275 loc) 11.8 kB
# XGBoost.js Documentation ## Introduction XGBoost.js is my JavaScript version of the XGBoost algorithm. It handles classification and regression tasks, letting you train models and make predictions directly in your JavaScript projects, whether on the server or in the browser. ``` ## Table of Contents - [Installation](#installation) - [Getting Started](#getting-started) - [Basic Classification](#basic-classification) - [Model Serialization](#model-serialization) - [Feature Importance](#feature-importance) - [Advanced Usage](#advanced-usage) - [Handling Multiclass Classification](#handling-multiclass-classification) - [Integrating with Web Applications](#integrating-with-web-applications) - [Real-Life Examples](#real-life-examples) - [Predicting Housing Prices](#predicting-housing-prices) - [Customer Churn Prediction](#customer-churn-prediction) - [Testing](#testing) - [Conclusion](#conclusion) ## Installation Ensure you have [Node.js](https://nodejs.org/) installed. Clone or download the repository and include the `xgboost.js` module in your project: ```javascript const { XGBoost } = require('./xgboost.js'); ``` ## Getting Started ### Basic Classification Learn how to train a simple binary classification model using XGBoost.js. #### Step 1: Prepare Your Data Organize your training data into feature matrices and label vectors. ```javascript const { XGBoost } = require('./xgboost.js'); // Realistic sample training data // Each sample consists of two features: // - Age (in years) // - Annual Income (in USD) const X_train = [ [25, 50000], [30, 60000], [45, 80000], [35, 70000], [50, 90000], [23, 48000], [40, 75000], [29, 62000], [33, 68000], [38, 72000], [27, 53000], [42, 77000], [31, 61000], [36, 69000], [48, 85000], [22, 47000], [39, 73000], [34, 66000], [28, 59000], [46, 82000], ]; // Labels corresponding to the training data // 0: Did Not Purchase // 1: Purchased const y_train = [0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1]; ``` #### Step 2: Initialize and Train the Model Configure the model parameters and train using the `fit` method. Understanding the hyperparameters is crucial for optimizing the model's performance: - **learningRate**: Controls the contribution of each tree to the final model. A smaller value makes the learning process more robust but requires more trees. - **maxDepth**: Sets the maximum depth of each tree. Deeper trees can capture more complex patterns but may lead to overfitting. - **minChildWeight**: Specifies the minimum sum of instance weights (hessian) needed in a child. It helps prevent overfitting by controlling the complexity of the trees. - **numRounds**: Determines the number of boosting rounds or the number of trees to be built. More rounds can improve performance but increase computational cost. ```javascript // Initialize the model with parameters and explain each hyperparameter const model = new XGBoost({ learningRate: 0.3, // Determines the step size at each iteration maxDepth: 4, // Maximum depth of a tree minChildWeight: 1, // Minimum sum of instance weight (hessian) needed in a child numRounds: 100 // Number of boosting rounds }); // Train the model with the training data model.fit(X_train, y_train); ``` // Start of Selection #### Step 3: Make Predictions Use the trained model to predict outcomes on new data. Ensure that the test data follows the same feature structure as the training data. ```javascript // Sample test data following the same feature structure as the training data const X_test = [ [20, 45000], [35, 75000], [40, 82000], ]; // Predict probabilities for the test data using predictBatch const predictionsBatch = model.predictBatch(X_test); // Predict probabilities for the test data using predictSingle const predictionsSingle = X_test.map(x => model.predictSingle(x)); console.log('Batch Predictions:', predictionsBatch); // Outputs an array of probabilities indicating the likelihood of purchase console.log('Single Predictions:', predictionsSingle); // Outputs an array of probabilities indicating the likelihood of purchase ``` ### Model Serialization Save your trained model and load it later without retraining. ```javascript // Serialize the model const serialized = model.toJSON(); // Save 'serialized' to a file or database as needed // Later, deserialize the model const deserializedModel = XGBoost.fromJSON(serialized); // Use the deserialized model for predictions const newPredictions = deserializedModel.predictBatch(X_test); console.log(newPredictions); ``` ### Feature Importance Understanding which features contribute most to your model's predictions is essential for interpreting the results and making informed decisions. XGBoost provides a method to retrieve feature importance scores, allowing you to identify and focus on the most influential features in your dataset. ```javascript // Retrieve feature importance scores const importance = model.getFeatureImportance(); // Assuming you have an array of feature names corresponding to your dataset const featureNames = ['feature1', 'feature2', 'feature3', 'feature4']; // Combine feature names with their importance scores const featureImportance = featureNames.map((name, index) => ({ feature: name, importance: importance[index] })); // Sort features by importance in descending order featureImportance.sort((a, b) => b.importance - a.importance); // Display the feature importances console.log('Feature Importances:'); featureImportance.forEach(({ feature, importance }) => { console.log(`${feature}: ${importance}`); }); ``` **Explanation:** 1. **Retrieving Importance Scores:** - The `model.getFeatureImportance()` method returns an array where each element represents the importance score of a corresponding feature. The importance is typically based on how frequently a feature is used to split the data across all trees in the model. 2. **Mapping Feature Names:** - To make the importance scores more interpretable, especially when dealing with multiple features, it's helpful to map these scores to their respective feature names. This assumes you have an array `featureNames` that lists all feature names in the same order as they were used during training. 3. **Sorting Features:** - Sorting the features in descending order of their importance scores allows you to quickly identify which features have the most significant impact on the model's predictions. 4. **Displaying the Results:** - The final console log presents a clear and organized view of feature importances, making it easier to interpret and analyze the model's behavior. **Example Output:** ``` Feature Importances: age: 25 income: 18 education: 10 gender: 5 ``` In this example, the `age` feature is the most influential, followed by `income`, `education`, and `gender`. Such insights can guide feature selection, data collection priorities, and provide explanations for model decisions. ## Advanced Usage ### Handling Multiclass Classification Extend XGBoost.js to handle multiclass classification tasks by adjusting label encoding and prediction interpretation. ```javascript // Example setup for multiclass classification with 3 classes const y_train = [0, 1, 2, 1, 0, 2]; // Initialize the model with multiclass parameters const model = new XGBoost({ learningRate: 0.1, maxDepth: 6, minChildWeight: 1, numRounds: 200 }); // Train the model model.fit(X_train, y_train); // Predict class probabilities const predictions = model.predictBatch(X_test); console.log(predictions); ``` ### Integrating with Web Applications Leverage XGBoost.js in frontend applications for real-time predictions. ```html <!DOCTYPE html> <html> <head> <title>XGBoost.js Integration</title> <script src="xgboost.js"></script> <script> document.addEventListener('DOMContentLoaded', () => { // Initialize and load the model const model = new XGBoost({ learningRate: 0.3, maxDepth: 4, minChildWeight: 1, numRounds: 100 }); // Example prediction on user input const userInput = [2.5, 3.5]; const prediction = model.predictSingle(userInput); console.log('Prediction:', prediction); }); </script> </head> <body> <h1>XGBoost.js in Web App</h1> </body> </html> ``` ## Real-Life Examples ### Predicting Housing Prices Use XGBoost.js to predict housing prices based on features like size, location, and number of bedrooms. ```javascript const { XGBoost } = require('./xgboost.js'); // Sample training data // Each entry in X_train represents a house with two features: // - Size in square feet // - Number of bedrooms const X_train = [ [1500, 3], [1600, 3], [1700, 4], [1875, 3], [1100, 2], [1550, 4], [2350, 4], [2450, 5], ]; // Target prices corresponding to each house in X_train const y_train = [245000, 312000, 279000, 308000, 199000, 219000, 405000, 324000]; // Initialize and train the XGBoost model with specified hyperparameters const model = new XGBoost({ learningRate: 0.05, // Step size shrinkage to prevent overfitting maxDepth: 5, // Maximum depth of a tree minChildWeight: 1, // Minimum sum of instance weight (hessian) needed in a child numRounds: 500 // Number of boosting rounds }); model.fit(X_train, y_train); // New housing data for prediction // Each entry in X_new has the same features as the training data: // - Size in square feet // - Number of bedrooms const X_new = [ [2000, 3], [1600, 2], ]; // Predict housing prices for the new data const predictedPrices = model.predictBatch(X_new); console.log('Predicted Prices:', predictedPrices); ``` ### Customer Churn Prediction Predict whether customers will churn based on their usage patterns and demographics. ```javascript const { XGBoost } = require('./xgboost.js'); // Sample data const X_train = [ [1, 34, 50000], [0, 45, 60000], [1, 23, 40000], [0, 35, 65000], [1, 52, 70000], [0, 46, 55000], ]; const y_train = [1, 0, 1, 0, 0, 0]; // Initialize and train the model const model = new XGBoost({ learningRate: 0.2, maxDepth: 3, minChildWeight: 1, numRounds: 150 }); model.fit(X_train, y_train); // Predict churn probabilities const X_new = [ [1, 30, 48000], [0, 50, 62000], ]; const churnProbabilities = model.predictBatch(X_new); console.log('Churn Probabilities:', churnProbabilities); ``` ## Testing Utilize the provided `xgboosttest.js` to run comprehensive tests ensuring the reliability and accuracy of your models. ```javascript // Run tests const tester = new XGBoostTester(); tester.runTests().catch(console.error); ``` ## Conclusion XGBoost.js offers a robust and flexible solution for integrating gradient boosting algorithms into JavaScript applications. Whether you're building predictive models for web applications, data analysis, or real-time decision-making systems, XGBoost.js provides the tools necessary to implement efficient and accurate machine learning solutions. For more advanced features and customization, refer to the test suite in `xgboosttest.js` and explore additional methods within the `XGBoost` class. ```