Responsibilities
1. Responsible for the effect and quality evaluation of general vertical applications built based on large language models through close cooperation with the product R&D team, formulate evaluation standards and systems, accurately analyze the evaluation effects, and provide support for the optimization and improvement of applications to enhance the performance, user experience and market competitiveness of applications 2. Application effect and quality evaluation: 1) Responsible for evaluating the effect and quality of general vertical applications built on large language models, including but not limited to language understanding accuracy, rationality of generated content, logical coherence, knowledge accuracy and other aspects 2) Participate in the design of test cases, covering different scenarios, user needs and input conditions to ensure the stability and reliability of the application in various situations 3) Participate in the collection and analysis of user feedback data, and conduct a comprehensive evaluation of the application's performance and user experience based on actual usage 3. Evaluation standards and system construction: 1) Understand business needs and product goals, communicate and collaborate closely with the product R&D team, and be responsible for formulating detailed and clear evaluation standards and indicator systems 2) Understand the collection, collation and analysis process of evaluation data, ensure the integrity and reliability of evaluation data, and provide data support for the continuous improvement of the evaluation system 4. Evaluation effect analysis and report: Write an evaluation report, clearly explain the evaluation methods, processes, results and conclusions, and provide decision-making basis and improvement direction for the product R&D team 5. Team collaboration and communication.
Qualifications
1. Bachelor degree or above, with good comprehension and text editing skills 2. Experience in large model evaluation 3. Independent judgment, good at communication and teamwork, and good at promoting work 4. Have a certain Python foundation and can process data through scripting (bonus points) 5. Have certain experience in using Prompt Engineering (bonus points). 6. Good communication and expression skills, able to smoothly communicate and collaborate with relevant parties inside and outside the team.