@cdp-forge/plugin-pipeline-sdk
Version:
SDK for pipeline plugins for CDP Forge platform
350 lines (268 loc) ⢠10.6 kB
Markdown
# CDP Forge Plugin Pipeline SDK
SDK for easily implementing pipeline plugins for the CDP Forge platform.
This project serves as an SDK for building plugins that can be integrated into the data processing pipeline of the CDP Forge platform. It is designed to simplify the development of custom data transformation and processing logic within the platform ecosystem.
## š¦ Installation as NPM Library
You can install this library as a dependency in other projects:
```bash
npm install @cdp-forge/plugin-pipeline-sdk
```
### Usage as Library
```typescript
import {
PipelinePluginI,
PipelineStage,
ConfigListener,
ConfigReader,
Log,
start
} from '@cdp-forge/plugin-pipeline-sdk';
// Create a custom plugin
class MyCustomPlugin implements PipelinePluginI {
async elaborate(log: Log): Promise<Log | null> {
// Implement your processing logic
console.log('Processing log:', log);
return log;
}
async init(): Promise<void> {
console.log('Plugin initialization');
}
}
// Load configuration
const config = ConfigReader.getInstance('./config/config.yml', './config/plugin.yml').config;
// Create plugin instance and start the server
const customPlugin = new MyCustomPlugin();
start(customPlugin, config).then(({ stage, configListener }) => {
console.log('Server started successfully');
}).catch(error => {
console.error('Error during startup:', error);
});
```
## š Features
- **Pipeline Plugin:** Provides a structure for creating plugins that fit into a sequential or parallel processing pipeline
- **Kafka Integration:** Uses Kafka for asynchronous communication and data streaming between pipeline stages
- **TypeScript:** Written in TypeScript to improve code maintainability, type safety, and developer productivity
- **Docker Support:** Includes Docker configuration for deployment
- **Testing:** Jest configuration for unit tests
- **Configuration Management:** Automatic merging of cluster and plugin configurations
## š Prerequisites
- Node.js 20.11.1 or higher
- npm or yarn
- Docker (optional, for deployment)
- Access to a Kafka cluster
## š ļø Installation
1. **Clone the repository:**
```bash
git clone <repository-url>
cd plugin-pipeline-sdk
```
2. **Install dependencies:**
```bash
npm install
```
3. **Configure the environment:**
- Copy and modify configuration files in `config/`
- Ensure Kafka brokers are accessible
## āļø Configuration
The SDK uses two separate configuration files to manage different aspects of the plugin system:
### Configuration File Structure
#### `config/config.yml` - Cluster Configuration
This file contains the **cluster-level configuration** that is **shared across all plugins** in the CDP Forge platform.
```yaml
kafkaConfig:
brokers:
- 'localhost:36715'
manager:
url: 'https://plugin_template_url'
config_topic: 'config'
mysql:
uri: 'mysql://user:password@my-server-ip:3306'
```
**Important**: If you're using the **Helm installer** provided by the CDP Forge platform, this file is **automatically generated** and you should use thia one on your plugin.
#### `config/plugin.yml` - Plugin-Specific Configuration
This file contains **plugin-specific settings** that define how your individual plugin behaves within the pipeline.
```yaml
plugin:
name: 'myPlugin'
priority: 1 # 1 to 100 (not required if parallel)
type: 'blocking' # or 'parallel'
```
### Field Descriptions
#### Cluster Configuration (`config.yml`)
- **`kafkaConfig.brokers`**
List of Kafka broker addresses to which the plugin will connect. This is configured at the cluster level and shared by all plugins.
- **`manager.url`**
URL used to register or communicate with the plugin manager service.
- **`manager.config_topic`**
Kafka topic used for plugin configuration management across the cluster.
- **`mysql.uri`**
MySQL connection string for database operations.
#### Plugin Configuration (`plugin.yml`)
- **`plugin.name`**
Unique identifier for your plugin instance within the pipeline.
- **`plugin.priority`**
(Required only for `blocking` plugins)
An integer from **1 to 100** that defines the execution order of the plugin within the pipeline. A lower number means higher priority, so the plugin with priority 1 will be executed before plugins with priority 2,3,4...
- **`plugin.type`**
Defines the plugin execution mode:
- `blocking`: The plugin processes data and returns a `Promise<Log>` for the next stage.
- `parallel`: The plugin runs independently and returns a `Promise<void>`.
### Configuration Management
- **Cluster Config (`config.yml`)**: Managed by the platform, automatically generated by Helm installer
- **Plugin Config (`plugin.yml`)**: Managed by you, defines your plugin's behavior
- **Environment Variables**: Can override both configurations if needed
- **Runtime Updates**: Plugin configuration can be updated without restarting the cluster
### Using ConfigReader for Convenience
The SDK provides a `ConfigReader` utility that automatically merges both configuration files into a single `config` object, making it easier to access all settings in your plugin code.
```typescript
import { ConfigReader } from 'plugin-pipeline-sdk';
// The ConfigReader automatically loads and merges:
// - config/config.yml (cluster configuration)
// - config/plugin.yml (plugin configuration)
const config = ConfigReader.getInstance('./config/config.yml', './config/plugin.yml').config;
// Access cluster configuration
console.log(config.kafka.brokers);
console.log(config.manager.url);
// Access plugin configuration
console.log(config.plugin.name);
console.log(config.plugin.priority);
// Access merged configuration
console.log(config.mysql.uri);
```
### Starting the Server with Configuration
The `start()` function requires the merged configuration to initialize the server:
```typescript
import { start, PipelinePluginI, Log, ConfigReader } from 'plugin-pipeline-sdk';
const config = ConfigReader.getInstance('./config/config.yml', './config/plugin.yml').config;
class MyPlugin implements PipelinePluginI {
async elaborate(log: Log): Promise<Log | null> {
// Your plugin logic here
return log;
}
async init(): Promise<void> {
// Plugin initialization
}
}
// Start the server with the merged configuration
start(new MyPlugin(), config).then(({ stage, configListener }) => {
console.log('Server started with merged configuration');
}).catch(error => {
console.error('Error starting server:', error);
});
```
The server will:
1. **Load** both configuration files using the specified paths
2. **Merge** them into a single config object
3. **Validate** the configuration
4. **Start** the plugin with the merged settings
## š§ Plugin Development
To create a new plugin, follow these steps:
1. **Configure the `config.yml` and `plugin.yml` files correctly**
2. **Implement the `elaborate` function in your plugin class**
### Plugin Implementation
The plugin must implement the `PipelinePluginI` interface:
```typescript
import { PipelinePluginI, Log } from 'plugin-pipeline-sdk';
export default class MyPlugin implements PipelinePluginI {
elaborate(log: Log): Promise<Log | null> {
// Implement your processing logic here
// For blocking plugins: return Promise<Log>
// For parallel plugins: return Promise<void>
return Promise.resolve(log);
}
init(): Promise<void> {
// Plugin initialization
return Promise.resolve();
}
}
```
### Plugin Types
Depending on the plugin type:
- **`blocking` plugins**: The `elaborate` function must return a `Promise<Log>`.
- **`parallel` plugins**: The `elaborate` function must return a `Promise<void>`.
## š Project Structure
```
plugin-pipeline-template/
āāā config/ # Configuration files
ā āāā config.yml # Cluster configuration
ā āāā plugin.yml # Plugin-specific configuration
āāā src/ # TypeScript source code
ā āāā plugin/ # Plugin implementation
ā ā āāā Plugin.ts # Main plugin class
ā ā āāā PipelinePluginI.ts # Plugin interface
ā āāā types.ts # Type definitions
ā āāā config.ts # Configuration management
ā āāā index.ts # Library entry point
ā āāā ... # Other utility files
āāā __tests__/ # Unit tests
āāā Dockerfile # Docker configuration
āāā package.json # Dependencies and scripts
āāā tsconfig.json # TypeScript configuration
```
## š Available Scripts
- **`npm run build`**: Compiles TypeScript code
- **`npm test`**: Runs unit tests
- **`npm run clean`**: Cleans the dist folder
- **`npm run prepublishOnly`**: Builds before publishing
## š³ Docker Deployment
1. **Build the image:**
```bash
docker build -t plugin-pipeline-sdk .
```
2. **Run the container:**
```bash
docker run -p 3000:3000 plugin-pipeline-sdk
```
## š Data Structure
The plugin processes `Log` objects that contain:
```typescript
interface Log {
client: number;
date: string;
device: {
browser?: string;
id: string;
ip?: string;
os?: string;
type?: string;
userAgent?: string;
};
event: string;
geo?: {
city?: string;
country?: string;
point?: {
type: string;
coordinates: number[];
};
region?: string;
};
googleTopics?: GoogleTopic[];
instance: number;
page: {
description?: string;
href?: string;
image?: string;
title: string;
type?: string;
};
product?: Product[];
referrer?: string;
session: string;
target?: string;
order?: string;
[key: string]: any; // Allows additional properties
}
```
## š¦ Publishing to NPM
To publish this library to npm, see the [Publishing Guide](PUBLISHING.md).
## š¤ Contributing
Contributions are welcome! To contribute:
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
## š License
This project is distributed under the GPL-3.0 license. See the `LICENSE` file for more details.
## š Support
For support and questions, please open an issue on the GitHub repository.