The Senior SRE will be responsible for leading initiatives to improve system reliability, automate
operational processes, and ensure the scalability and security of our systems. The ideal candidate will
have a strong background in Linux systems, cloud technologies, containerization, and automation,
along with a proactive approach to problem-solving and a commitment to continuous improvement.
Key Skills Sets
● Linux Administration
● DeVops
● Docker
● Kubernetes
● AWS
● Python
● Ansible
● Jenkins
● Observability tools like New Relic
Roles and Responsibilities:
● Design and implement automation solutions for infrastructure provisioning, configuration,
and management using Ansible, promoting consistency and reliability across environments.
● Lead the development and maintenance of CI/CD pipelines using Jenkins, ensuring efficient
deployment processes and integrating quality checks.
● Manage and optimize containerized applications using Docker and Kubernetes, focusing on
scalability, efficiency, and security.
● Architect and maintain secure, scalable, and resilient cloud infrastructure on AWS, including
performance tuning and cost optimization.
● Conduct comprehensive Linux system administration, including performance tuning, security
hardening, and troubleshooting.
● Develop and maintain Python scripts to automate tasks and integrate systems, enhancing
operational efficiency.
● Collaborate with development and operations teams to implement SRE principles, fostering a
culture of reliability and performance.
● Monitor system performance, identify bottlenecks, and implement solutions to ensure high
availability and optimal user experience.
● Leadi ncident response efforts, minimizing impact and conducting post-mortem analyses to
prevent future occurrences.
● Mentor junior team members and contribute to the development of best practices and
standards within the SRE team.
Must Have Skills:
● Minimumof8+years of overall experience with 5+ years of relevant experience in a senior
SRE role or similar, with a proven track record in improving system reliability and
performance.
● Should be strong in SRE principles and concepts.
● Expertise and proven knowledge in Linux administration, performance optimization,
and security practices.
● Strong experience in observability tools like New relic, Prometheus, grafana, Datadog,
etc.,
● Strong experience with container technologies (Docker) and orchestration systems
(Kubernetes).
● AWS experience.
● Proficiency in Python/Bash for scripting and automation.
● Solid understanding of Ansible and Terraform for infrastructure automation and Jenkins
for continuous integration and delivery.
● Excellent problem-solving skills, with the ability to troubleshoot complex system issues
effectively.
● Strong communication and collaboration abilities, capable of leading projects and working
across teams to achieve objectives.
Qualification:
● Bachelor's degree in Computer Science, Information Technology, or related field preferred.