Responsibilities
1. Based on the application characteristics of heterogeneous AI accelerators (GPU) on cloud platforms, deeply understand and iterate the upper-level application roadmap, organize and output clear GPU hardware product planning roadmaps to ensure the leading position of hardware solutions 2. Responsible for the resource demand profiling of heterogeneous GPU product business scenarios, cross-generation and cross-platform scenario performance benefit testing and verification, design performance testing plans, execute performance testing, analyze system performance bottlenecks, make tuning suggestions and assist in debugging and verification, and output performance testing reports coordinate and solve technical problems encountered in the implementation of new hardware and new technologies 3. Compare and evaluate the benefits of heterogeneous GPU product business scenarios, determine the selection plan, and output the standard computing power/resource conversion plan for solution iteration 4. Combine the latest technical capabilities of the industry chain and the characteristics of hardware product architecture to provide heterogeneous hardware solutions with leading comprehensive competitiveness, and output self-developed hardware server product information 5. Monitor and analyze the quality and performance of heterogeneous cloud hardware in actual applications, and provide systematic technical support capabilities to promote improvement identification and implementation.
Qualifications
1. Bachelor degree or above, major in electronic engineering or computer science, 5 years or more of basic experience in hardware development, testing or performance tuning in heterogeneous fields 2. Good foundation in Unix/Linux operating system and proficiency in common commands, with the ability to independently analyze and locate problems, analyze and solve problems 3. Familiar with various common heterogeneous platforms, such as hardware platforms for large model training, GPU inference scenarios, etc. Familiar with server hardware product components of mainstream manufacturers, such as processors, hard disks, network cards, SAS cards and testing methods, familiar with tools such as Speccpu/Fio/Iperf/Stream/Mlc/lmbench/MLperf and related tuning methods in-depth understanding of operating system kernel, virtualization, GPU architecture principles, DPDK and other technical principles 4. It is better to have direct performance test development experience in Internet business components/scenarios, and be proficient in stress and load testing 5. Have strong team communication and collaboration skills, strong global vision, communication and organization skills, and project promotion skills have strong learning and logical thinking skills, pay attention to cutting-edge technology, good team collaboration and team awareness, strong sense of responsibility and execution, positive work attitude, and positive energy 6. Have a systematic understanding of the end-to-end delivery process and product logic of ToB products.