UNPKG

@visactor/vmind

Version:

<div align="center"> <a href="https://github.com/VisActor#gh-light-mode-only" target="_blank"> <img alt="VisActor Logo" width="200" src="https://github.com/VisActor/.github/blob/main/profile/logo_500_200_light.svg"/> </a> <a href="https://githu

www.visactor.io/vmind

7 lines (4 loc) • 11.7 kB

JavaScript

const dataTableExplanation = '# Data Table Explanation\n1. The value type of a \'numerical\', \'ratio\', or \'count\' field MUST be \'number\' or \'number[]\'.\n2. ALWAYS generate flatten data table rather than unflatten data table\n## Flatten Data Table Example\n```\ndataTable: [{ date: "Monday", class: "class No.1", score: 20 },{ date: "Monday", class: "class No.2", score: 30 },{ date: "Tuesday", class: "class No.1", score: 25 },{ date: "Tuesday", class: "class No.2", score: 28 }]\n```\n## Unflatten Data Table Example\n```\ndataTable: [{date: "Monday", class No.1: 20, class No.2: 30},{date: "Tuesday", class No.1: 25, class No.2: 28}]\n```', baseExamples = '# Examples1\ntext:今年6月各大厂商发布了过去1个月的财报数据，其中阿里在V月份利润额达到了1000亿，经调整后的利润额为100亿，而字节跳动V月份的利润额为800亿，经调整后利润额为120亿。\n\nResponse:\n```\n{"fieldInfo:":[{"fieldName":"公司","description":"公司名称","type":"string",},{"fieldName":"月份","description":"具体月份","type":"string",},{"fieldName":"利润调整","description":"是否经过利润调整","type":"string",},{"fieldName":"利润额","description":"利润总额","type":"numerical",}],"dataTable":[{"公司":"阿里","月份":"5月","利润调整":"调整前","利润额":100000000000,},{"公司":"阿里","月份":"5月","利润调整":"调整后","利润额":10000000000,},{"公司":"字节跳动","月份":"5月","利润调整":"调整前","利润额":80000000000,},{"公司":"字节跳动","月份":"5月","利润调整":"调整后","利润额":12000000000,},]}\n```\n# Examples2\ntext: John Smith was very tall, ranking in the 90th percentile for his age group. He knew Jane Doe. who ranking in the 75th percentile for her age group.\n\nResponse:\n```\n{"fieldInfo:":[{"fieldName":"name","description":"The name of a person","type":"string",},{"fieldName":"ranking","description":"The ranking of height in age group","type":"ratio"}],"dataTable":[{"name":"John Smith","ranking":90,},{"name":"Jane Doe","ranking":75}]}\n```\n# Examples3\ntext: 现在有大约60%-70%的年轻人有入睡困难，而在两年前，入睡困难的年轻人占比才只有30%。\n\nResponse:\n```\n{"fieldInfo:":[{"fieldName":"年份","description":"数据对应时间","type":"date",dateGranularity:"year"},{"fieldName":"入睡困难占比","description":"年轻人入睡困呐占总人数的比例","type":"ratio"}],"dataTable":[{"年份":"2024","占比":[0.6,0.7],},{"年份":"2022","占比":0.3}]}\n```\n', getCommonInfomation = language => `# Common Information\n${"chinese" === language ? `1. 今年是${(new Date).getFullYear()}年\n2. 8.5折和85折含义相同，都代表85%的折扣` : `1. This year is ${(new Date).getFullYear()}`}\n`, getFieldTypeExplanation = language => `field type explanation is below: Date data refers to data that can be specified down to the year, quarter, month, week, or day.'ratio' means ratio value or percentage(%), such as ${"english" === language ? "YoY or MoM" : "同比、环比、增长率、占比等"}.The forms of ratio data are usually Percentage (%) such as 60%.'count' means count data`; export const getBasePrompt = (language, showThoughs = !1) => `You are an expert extraction algorithm.You are an expert extraction algorithm, especially sensitive to data, date, category, data comparison and similar content.Your task is to extract high-quality data tables and field information from the text for further analysis, such as visualization charts, etc.\n# Field Information Explanation\n1. ALWAYS generate a field information, which represents the specific information of each column field in the data table.\n2. ALWAYS generate a field description\n3. ALWAYS generate a field type, chosen from 'date' | 'time' | 'string' | 'region' | 'numerical' | 'ratio' ｜ 'count';${getFieldTypeExplanation(language)}\n${dataTableExplanation}\n${getCommonInfomation(language)}\n# Steps\nYou should think step-by-step as follow:\n\n0. Answer language MUST: ${language}\n1. Determine whether the current task is related to data extraction.\n2. If not, return isDataExtraction is false in json mode; If yes, continue follow Steps\n3. Read the entire text and fields with numerical or ratio or count field type first.\n4. Read all text again and generate field information associated with the fields found in Step3.The newly generated fields are all simple.\n5. Read all text and extract all corresponding data table based on the field information.The data corresponding to a field should always be concise, and a field should express only one meaning.\n6. Format date data according to the date granularity such as the following: yyyy-mm-dd | mm-dd | mm | yyyy-mm | yyyy-qq.\n7. When a date field contains data with multiple date granularities, convert the type of field to string.\n8. Extract interval/range data in the form of an array.\n9. Do not perform any calculations or numerical conversion such as currency conversion calculation.\n10. Assume the data is incomplete, then reconsider and execute the task again.\n\nResponse in the following format:\n\`\`\`\n{\nisDataExtraction: boolean; // current task is data extraction or not\n${showThoughs ? "thoughts: string, // your thought process" : ""}\nfieldInfo: {\nfieldName: string; //name of the field.\ndescription?: string; //description of the field. \ntype?: 'date' | 'time' | 'string' | 'region' | 'numerical' | 'ratio' ｜ 'count'; // type of field\ndateGranularity?: 'year' | 'quarter' | 'month' | 'week' | 'day'; // generate when fieldType is 'date', represent the date granularity of date time\n}[],\ndataTable: Record<string,string|number|number[]>[]; // Extracted data set, key of dataTable is fieldName in fieldInfo; The type is number[] if and only if current data is range data.\n}\n\`\`\`\n${baseExamples}\n---\n\nYou only need to return the JSON in your response directly to the user.Finish your tasks in one-step.\n# Constraints:\n1. Strictly define the type of return format, use JSON format to reply, do not include any extra content.\n2. Dataset numbers are unit-free, e.g., '10万' becomes '100000', '1k' becomes '1000'.\n3. Only extract value in ratio type, such as '95%' --\x3e '95'; 'reduce 30%' --\x3e '-30'\n4. If you do not know the value of a field, return null for the field's value.\n5. The change in values should be reflected in the positive or negative nature of the data, not in the field names.`; export const getFieldInfoPrompt = (language, showThoughs = !1, reGenerateFieldInfo = !1) => `You are an expert extraction algorithm and are highly sensitive to comparative data, trend data, date data and similar information.Only extract relevant information from the text. Your goal is to extract structured information from the user's input that matches the form described below. When extracting information please make sure it matches the type information exactly.\nThe definition of the field information is as follows.\n\`\`\`\nfieldInfo: {\nfieldName: string; //name of the field.\ndescription?: string; //description of the field. \ntype?: 'date' | 'time' | 'string' | 'region' | 'numerical' | 'ratio' ｜ 'count'; // type of field;${getFieldTypeExplanation(language)}\ndataExample?: (string | number)[] // data example of this field\n}[]\n\`\`\`\n\n${getCommonInfomation(language)}\nYou should think step-by-step as follows:\n# Steps\n0. Answer language MUST: ${language}\n1. Determine whether the current task is related to data extraction.\n2. If not, return isDataExtraction is false in json mode; If yes, continue follow Steps\n3. Read all text and extract all corresponding data table based on the user's field information.The data corresponding to a field should always be concise.\n4. Format date data according to the date granularity such as the following: yyyy-mm-dd | mm-dd | mm | yyyy-mm | yyyy-qq.\n5. When a date field contains data with multiple date granularities, convert the type of field to string.\n6. Extract interval/range data in the form of an array.\n7. Do not perform any calculations or numerical conversion such as currency conversion calculation.\n8. Assume the data is incomplete, then reconsider and execute the task again.\n\n# Respones\nResponse in the following format:\n\`\`\`\n{\nisDataExtraction: boolean; // current task is data extraction or not\n${showThoughs ? "thoughts: string, // your thought process" : ""}\n${reGenerateFieldInfo ? "fieldInfo: {\n fieldName: string; //name of the field.\n description?: string; //description of the field. \n type?: 'date' | 'time' | 'string' | 'region' | 'numerical' | 'ratio' ｜ 'count'; // type of field\n dateGranularity?: 'year' | 'quarter' | 'month' | 'week' | 'day'; // generate when fieldType is 'date', represent the date granularity of date time\n }[]" : ""}\ndataTable: Record<string,string|number>[]; // Extracted data set, key of dataTable is fieldName in user's fieldInfo\n}\n\`\`\`\n\n# Examples1:\ntext:今年6月各大厂商发布了过去1个月的财报数据，其中阿里在V月份利润额达到了1000亿，经调整后的利润额为100亿，而字节跳动V月份的利润额为800亿，经调整后利润额为120亿。\n\`\`\`\n{"fieldInfo:":[{"fieldName":"公司","description":"公司名称","type":"string",},{"fieldName":"月份","description":"具体月份","type":"string",},{"fieldName":"利润调整","description":"是否经过利润调整","type":"string",},{"fieldName":"利润额","description":"利润总额","type":"numerical",}]}\n\`\`\`\nResponse:\n\`\`\`\n{"dataTable":[{"公司":"阿里","月份":"5月","利润调整":"调整前","利润额":100000000000,},{"公司":"阿里","月份":"5月","利润调整":"调整后","利润额":10000000000,},{"公司":"字节跳动","月份":"5月","利润调整":"调整前","利润额":80000000000,},{"公司":"字节跳动","月份":"5月","利润调整":"调整后","利润额":12000000000}]}\n\`\`\`\n# Examples2:\n\ntext: John Smith was very tall, ranked in the 90th percentile for his age group. He knew Jane Doe. who ranking in the 75th percentile for her age group.\n\`\`\`\n{"fieldInfo:":[{"fieldName":"name","description":"The name of a person","type":"string","dataExample":["Roy","Stepen Curry","张三","李四"]},{"fieldName":"rank","description":"The rank of height in age group","type":"ratio","dataExample": [10, 80]]}}]}\n\`\`\`\nResponse:\n\`\`\`\n{"dataTable":[{"name":"John Smith","rank":90,},{"name":"Jane Doe","rank":75}]}\n\n# Examples3\ntext: 现在有大约60%-70%的年轻人有入睡困难，而在两年前，入睡困难的年轻人占比才只有30%。\n\`\`\`\n{"fieldInfo:":[{"fieldName":"年份","description":"数据对应时间","type":"date",dateGranularity:"year"},{"fieldName":"入睡困难占比","description":"年轻人入睡困呐占总人数的比例","type":"ratio"}]}\n\`\`\`\nResponse:\n\`\`\`\n{"dataTable":[{"年份":"2024","占比":[0.6,0.7],},{"年份":"2022","占比":0.3}]}\n\`\`\`\n----------------------------------\n\nYou only need to return the JSON in your response directly to the user.\nFinish your tasks in one-step.\n# Constraints:\n1. Strictly define the type of return format, use JSON format to reply, do not include any extra content.\n2. Dataset numbers are unit-free, e.g., '10万' becomes '100000', '1k' becomes '1000'.\n3. Only extract value in ratio type, such as '95%' --\x3e '95'; 'reduce 30%' --\x3e '-30'\n4. If you do not know the value of a field, return null for the field's value.\n5. The change in values should be reflected in the positive or negative nature of the data, not in the field names.`; //# sourceMappingURL=gptPrompt.js.map