Responsibilities
- Build and lead a team of Site Reliability Engineers focused on creating complex back-end infrastructure.
- Help improve the deployment process to make it as reliable and boring as possible.
- Ensure the team has effective monitoring coverage of our infrastructure, and ensure alerting is meaningful and actionable.
- Implement and support automated environment provisioning and container solutions with AWS cloud service
- Design and implement automated infrastructure solutions for our product platform
- Design and implement application monitoring and alerting solutions to get issues to the right people at the right time
Requirements & Qualifications
- A SRE Team Lead will have demonstrated capabilities with Platform as a Service, CI/CD and IaC.
- Have experience working on a fast-growing SaaS platform, either as an SRE or in a lead position
- Have experience as an SRE or DevOps engineer and still enjoy technical tasks
- Have an urge to build automation and tooling so that you never have to do the same work twice.
- Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it.
- Have experience with Nginx, Docker, Kubernetes, Terraform, or similar technologies.
- Have experience with various Cloud providers like AWS, GCP, Azure, DO etc.,
- Ability to troubleshoot complex systems and environments
- Knowledge of full stack monitoring concepts and tooling from code to system resources
- Worked with CDN technologies (Akamai, Imperva, CloudFront, Cloudflare, Fastly, etc.)