@aj-archipelago/cortex
Version:
Cortex is a GraphQL API for AI. It provides a simple, extensible interface for using AI services from OpenAI, Azure and others.
94 lines (74 loc) • 3.5 kB
Markdown
# Markitdown Azure Function Converter
This Azure Function App provides an HTTP endpoint to convert various file formats (specified by a URI) to Markdown using the `microsoft/markitdown` Python library.
## Function: MarkitdownConverterFunction
* **Trigger**: HTTP (GET, POST)
* **Route**: `/api/convert` (or as configured by your Azure Function host settings)
* **Authentication**: Function (requires a function key for access)
### Input
The function expects a `uri` parameter, either in the query string (for GET requests) or in the JSON body (for POST requests).
**Example GET Request:**
```
GET /api/convert?uri=https://www.example.com/somefile.pdf
```
**Example POST Request:**
```
POST /api/convert
Content-Type: application/json
{
"uri": "https://www.example.com/somefile.docx"
}
```
### Output
* **Success (200 OK):** Returns a JSON object containing the original URI and the converted Markdown content.
```json
{
"uri": "https://www.example.com/somefile.pdf",
"markdown": "# Converted Markdown Content\n..."
}
```
* **Bad Request (400):** If the `uri` parameter is missing.
* **Internal Server Error (500):** If an error occurs during the conversion process. The response will contain an error message and details.
```json
{
"error": "Failed to convert URI to Markdown.",
"details": "<specific error message from the library>"
}
```
## Project Structure
```
cortex-markitdown/
├── MarkitdownConverterFunction/
│ ├── __init__.py # The Python code for the Azure Function
│ └── function.json # Configuration file for the Azure Function (bindings, triggers)
├── .gitignore # Standard Python .gitignore
├── host.json # Configuration for the Azure Functions host
├── requirements.txt # Python package dependencies
└── README.md # This file
```
## Prerequisites
* Azure Functions Core Tools
* Python 3.8+ (check Azure Functions Python version compatibility)
* An Azure account (for deployment)
## Setup and Local Development
1. **Clone the repository (if applicable).**
2. **Create and activate a virtual environment:**
```bash
python -m venv .venv
source .venv/bin/activate # On Windows use `.venv\Scripts\activate`
```
3. **Install dependencies:**
```bash
pip install -r requirements.txt
```
4. **Run the Azure Function locally:**
```bash
func start
```
The function should be available at `http://localhost:7071/api/convert` (the port might vary).
## Dependencies
* `azure-functions`: For creating Azure Functions.
* `markitdown[all]`: The core library used for file conversion to Markdown. The `[all]` option installs all optional dependencies for handling various file types.
## Notes
* The `MarkItDown` instance in `__init__.py` is initialized with `enable_plugins=True` to allow for extended file format support through plugins.
* For handling images that require OCR or descriptions, the `markitdown` library might need an LLM client (e.g., OpenAI) configured. This is not included in the basic setup provided but can be added by modifying the `MarkItDown()` instantiation in `__init__.py` and ensuring the necessary environment variables (like API keys) are available to the function.
* Ensure that the URIs provided to the function are publicly accessible or accessible from the environment where the Azure Function is running.