@lobehub/chat

Version:

Lobe Chat - an open-source, high-performance chatbot framework that supports speech synthesis, multimodal, and extensible Function Call plugin system. Supports one-click free deployment of your private ChatGPT/LLM web application.

github.com/lobehub/lobe-chat

lobehub/lobe-chat

153 lines (94 loc) • 6.26 kB

text/mdx

--- title: Anthropic Claude 系列 Tools Calling 评测 description: >- 使用 LobeChat 测试 Anthropic Claude 系列模型（Claude 3.5 sonnet / Claude 3 Opus / Claude 3 haiku）的工具调用（Function Calling）能力，并展现评测结果 tags: - Tools Calling - Benchmark - Function Calling 评测 - 工具调用 - 插件 --- # Anthropic Claude Series Tools Calling Overview of Anthropic Claude Series model Tools Calling capabilities: | Model | Support Tools Calling | Stream | Parallel | Simple Instruction Score | Complex Instruction | | ----------------- | --------------------- | ------ | -------- | ------------------------ | ------------------- | | Claude 3.5 Sonnet | ✅ | ✅ | ✅ | 🌟🌟🌟 | 🌟🌟 | | Claude 3 Opus | ✅ | ✅ | ❌ | 🌟 | ⛔️ | | Claude 3 Sonnet | ✅ | ✅ | ❌ | 🌟🌟 | ⛔️ | | Claude 3 Haiku | ✅ | ✅ | ❌ | 🌟🌟 | ⛔️ | ## Claude 3.5 Sonnet ### Simple Instruction Call: Weather Query Test Instruction: Instruction ① <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/42a6980c-ea2a-44fd-b61f-a7989827f5a5" /> <Image alt="Claude 3.5 Sonnet Tools Calling for Simple Instruction" src="https://github.com/lobehub/lobe-chat/assets/28616219/71146b75-2c73-48c3-9688-1d8814d2a791" /> <details> <summary>Tools Calling Raw Output:</summary> ```yml ``` </details> ### Complex Instruction Call: Literary Map Test Instruction: Instruction ② <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/a9a40899-d5f3-4ef2-aa08-922751b05ca6" /> From the above video: 1. Sonnet 3.5 supports Stream Tools Calling and Parallel Tools Calling; 2. In Stream Tools Calling, it is observed that creating long sentences will cause a delay (as seen in the Tools Calling raw output `[chunk 40]` and `[chunk 41]` with a delay of 6s). Therefore, there will be a relatively long waiting time at the beginning stage of Tools Calling. <Image alt="Claude 3.5 Sonnet Tools Calling for Complex Instruction" src="https://github.com/lobehub/lobe-chat/assets/28616219/23e2d7e5-a6f3-4f4c-9c6a-5651f35a5910" /> <details> <summary>Tools Calling Raw Output:</summary> ```yml ``` </details> ## Claude 3 Opus ### Simple Instruction Call: Weather Query Test Instruction: Instruction ① <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/0e120fa2-8410-4552-a947-5ab7a91d994d" /> From the above video: 1. Claude 3 Opus outputs a `<thinking>` tag at the beginning of Tools Calling, which is not very helpful for users and consumes more tokens; 2. Opus triggers Tools Calling twice, indicating that it does not support Parallel Tools Calling; 3. The raw output of Tools Calling shows that Opus also supports Stream Tools Calling. <Image alt="Claude 3 Opus Tools Calling for Simple Instruction" src="https://github.com/lobehub/lobe-chat/assets/28616219/fa2f89bc-b9d5-43e3-a15e-1e79174d002c" /> <details> <summary>Tools Calling Raw Output:</summary> </details> ### Complex Instruction Call: Literary Map Test Instruction: Instruction ② <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/b2dc8cd9-2582-43fe-9121-29c20a1cdc7b" /> From the above video: 1. Combining with simple tasks, Opus will always output a `<thinking>` tag, which significantly impacts the user experience; 2. Opus outputs the prompts field as a string instead of an array, causing an error and preventing the plugin from being called correctly. <Image alt="Claude 3 Opus Tools Calling for Complex Instruction" src="https://github.com/lobehub/lobe-chat/assets/28616219/1eee785d-932f-4320-845e-eed0bee4b1ae" /> <details> <summary>Tools Calling Raw Output:</summary> </details> ## Claude 3 Sonnet ### Simple Instruction Call: Weather Query Test Instruction: Instruction ① <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/600becd5-7f12-4a9a-86c7-e5cca0db6b1b" /> From the above video, it can be seen that Claude 3 Sonnet triggers Tools Calling twice, indicating that it does not support Parallel Tools Calling. <Image alt="Claude 3 Sonnet Tools Calling for Simple Instruction" src="https://github.com/lobehub/lobe-chat/assets/28616219/e82f5c69-7607-488f-8c10-0482fb380c6c" /> <details> <summary>Tools Calling Raw Output:</summary> </details> ### Complex Instruction Call: Literary Map Test Instruction: Instruction ② <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/c150aa5f-36bc-40f2-a779-9c4fdcf2cd4c" /> From the above video, it can be seen that Sonnet 3 fails in the complex instruction call. The error is due to prompts being expected as an array but generated as a string. <Image alt="Claude 3.5 Sonnet Tools Calling for Complex Instruction" src="https://github.com/lobehub/lobe-chat/assets/28616219/b7d84e26-920d-4a82-8798-1b1060ebb341" /> <details> <summary>Tools Calling Raw Output:</summary> </details> ## Claude 3 Haiku <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/02b3e872-735a-4928-8245-a90786acea8b" /> From the above video: 1. Claude 3 Haiku triggers Tools Calling twice, indicating that it also does not support Parallel Tools Calling; 2. Haiku does not provide a good response and directly calls the tool; <Image alt="Claude 3 Haiku Tools Calling for Simple Instruction" src="https://github.com/lobehub/lobe-chat/assets/28616219/9081b586-cf43-440f-8ef8-1de5d8658694" /> ### Complex Instruction Call: Literary Map Test Instruction: Instruction ② <Video src="https://github.com/lobehub/lobe-chat/assets/28616219/d1e3f804-0b89-4b90-9d78-69aee0db1c4d" /> From the above video, it can be seen that Haiku 3 also fails in the complex instruction call. The error is the same as prompts generating a string instead of an array. <Image alt="Claude 3 Haiku Tools Calling for Complex Instruction" src="https://github.com/lobehub/lobe-chat/assets/28616219/cde80220-4615-43bb-934f-35fe0de88754" /> <details> <summary>Tools Calling Raw Output:</summary> </details>