Responsibilities
1. Responsible for building a distributed control platform to uniformly manage a large number of online hosts and various storage or database services on them 2. Responsible for building distributed middleware to provide various public services and a unified container base for storage services to enable business development 3. Responsible for building a monitoring and alarm system to monitor the operating status of each business in real time, provide response alarms within seconds and provide security audit guarantees 4. Responsible for the development of various systems such as operation platform, operation and maintenance platform, fault diagnosis, DevOps, etc.
Qualifications
1. Computer-related major, familiar with one of the Go/Java/Python languages, familiar with common frameworks, more than 1 year of engineering practice experience, and pursuit of code quality 2. Experience in architecture design, development and operation of large-scale high-concurrency and high-availability applications, and a deep understanding of reliability, performance, availability, etc. 3. Master the operation and maintenance management of container-related components such as K8S, Etcd, Nginx, Prometheus, and have rich experience in optimization and troubleshooting. It is better to have a source code level understanding 4. Familiar with at least one distributed system (database, table, cache, message queue, object, block, etc.).