Responsibilities
1. Write efficient and maintainable code and develop LLM evaluation software library 2. Participate in the evaluation of large language models, including but not limited to the evaluation of basic capabilities, Function Call, instruction compliance, security and other capability dimensions 3. Track and study the latest large language model technology and evaluation methods in the industry, introduce valuable evaluation sets (Benchmark) into the Doubao evaluation system, and provide the team with cutting-edge knowledge and insights 4. Design and execute evaluation-related experiments, collect and analyze experimental data, and use data to drive model iteration and product development, for example: 1) Design evaluation methods to measure LLM's capabilities in new areas 2) Analyze the bias and variance of indicators, and reduce the impact of bias and variance through data set construction and statistical methods 3) Design online experiments to verify whether offline evaluation indicators can reflect real user experience.
Qualifications
1. Bachelor degree or above, computer science, natural language processing, artificial intelligence and other related majors are preferred 2. Have a strong interest and enthusiasm for large language models, and have excellent code programming skills and execution capabilities 3. Familiar with the creation, verification, and application processes of large-scale data sets, and have a deep understanding of LLM evaluation methods and indicators 4. Work carefully and patiently, with a strong sense of responsibility and teamwork spirit 5. LLM evaluation is a problem that requires a certain forward-looking vision (hope to have a certain vision driving force), and at the same time, it is extremely necessary to transform vague definitions into quantifiable work. I hope that you can have both.