Responsibilities
1. Responsible for the observability and AIOps platform construction of ByteDance Volcano Engine cloud native K8S container products, and enhance the intelligent insight diagnosis and troubleshooting capabilities of the PaaS product system build ToB productized services, and provide enterprise customers with the ultimate self-service operation and maintenance experience and complete observability & stability solutions using cloud native 2. Responsible for the design of the full-stack observable architecture of Metrics/Tracing/Logging/Profiling for ByteDance Volcano Engine's cloud-native K8S container products and the construction of basic capabilities such as instrumentation-collection-processing-analysis, and enhancing the deep collection capabilities of the container and application protocol layers based on eBPF technology 3. Responsible for the construction of ByteDance Volcano Engine's cloud-native AIOps infrastructure, mainly including cloud-native container-related inspection and diagnosis, fault self-healing, AI timing prediction and causal analysis, and intelligent operation and maintenance large model Agent. 4. Pay attention to the development of the industry's observability/AIOps direction and participate in the construction of the open source community ecosystem.
Qualifications
1. In-depth understanding of Linux kernel architecture and underlying principles, familiar with common kernel performance analysis and tuning solutions, proficient in using tools such as Perf, eBPF, SystemTap, etc. in-depth understanding of TCP/IP protocol stack, familiar with network observability related practices 2. Excellent coding ability, master any language such as Golang/C++/Java 3. Familiar with cloud-native related technology stacks, including but not limited to: Kubernetes, OpenTelemetry, Prometheus, Cilium, Pixie, Istio, etc., experience in related community open source contributions is preferred 4. Familiar with cloud-native operation and maintenance system design and stability practice theory, including but not limited to observable system, fault diagnosis and self-healing system, SRE method system, etc. 5. Value goal driven, strong self-drive, experience in real and virtual line management of technical teams or technical project leaders is preferred strong learning ability, good at thinking, able to think about and implement valuable directions for existing scenarios.