Responsibilities
1. Work closely with the business side to clarify business needs and provide solutions from a multimodal perspective 2. Follow up on cutting-edge multimodal algorithms, understand common multimodal tasks, data, and evaluation methods, and be able to use internal and external multimodal tools 3. Process and analyze multimodal data, and be able to effectively clean, organize and visualize data, etc. 4. Use multimodal LLM to make changes and finetune business data 5. Focus on exploring video content understanding based on multimodal LLM to support various businesses 6. Work closely with various teams to ensure that algorithm implementation meets business needs.
Qualifications
1. Excellent programming and algorithm skills, familiar with Python/C++ programming language, master the basic knowledge of deep learning, familiar with at least one deep learning framework such as Pytorch, Tensorflow 2. Familiar with deep learning algorithms such as Transformer, have a certain multimodal background, strong algorithm implementation ability, and familiar with common multimodal algorithms 3. Those with experience in deep pre-trained models are preferred those with experience in multimodal, NLP, CV, video/audio algorithms are preferred those with in-depth understanding and practice of LLM and multimodal learning, and experience in pre-training and controllable content generation are preferred 4. Engineering projects such as generative models GAN, VAE, Diffusion are bonus points those with AIGC related experience are bonus points those with experience in NLP/CV/ML top conferences (ACL/EMNLP/CVPR/ICCV/NeurIPS, etc.) are bonus points 5. Have good logical thinking ability, communication and collaboration ability, self-learning ability, maintain curiosity about things, have a positive attitude and a sense of responsibility.