Site Reliability Engineer

Company:

Spot Your Leaders & Consulting

Location:

Bengaluru, Karnataka, India

Posted:

May 16, 2024

Apply

Description:

The Senior SRE will be responsible for leading initiatives to improve system reliability, automate

operational processes, and ensure the scalability and security of our systems. The ideal candidate will

have a strong background in Linux systems, cloud technologies, containerization, and automation,

along with a proactive approach to problem-solving and a commitment to continuous improvement.

Key Skills Sets

● Linux Administration

● DeVops

● Docker

● Kubernetes

● AWS

● Python

● Ansible

● Jenkins

● Observability tools like New Relic

Roles and Responsibilities:

● Design and implement automation solutions for infrastructure provisioning, configuration,

and management using Ansible, promoting consistency and reliability across environments.

● Lead the development and maintenance of CI/CD pipelines using Jenkins, ensuring efficient

deployment processes and integrating quality checks.

● Manage and optimize containerized applications using Docker and Kubernetes, focusing on

scalability, efficiency, and security.

● Architect and maintain secure, scalable, and resilient cloud infrastructure on AWS, including

performance tuning and cost optimization.

● Conduct comprehensive Linux system administration, including performance tuning, security

hardening, and troubleshooting.

● Develop and maintain Python scripts to automate tasks and integrate systems, enhancing

operational efficiency.

● Collaborate with development and operations teams to implement SRE principles, fostering a

culture of reliability and performance.

● Monitor system performance, identify bottlenecks, and implement solutions to ensure high

availability and optimal user experience.

● Leadi ncident response efforts, minimizing impact and conducting post-mortem analyses to

prevent future occurrences.

● Mentor junior team members and contribute to the development of best practices and

standards within the SRE team.

Must Have Skills:

● Minimumof8+years of overall experience with 5+ years of relevant experience in a senior

SRE role or similar, with a proven track record in improving system reliability and

performance.

● Should be strong in SRE principles and concepts.

● Expertise and proven knowledge in Linux administration, performance optimization,

and security practices.

● Strong experience in observability tools like New relic, Prometheus, grafana, Datadog,

etc.,

● Strong experience with container technologies (Docker) and orchestration systems

(Kubernetes).

● AWS experience.

● Proficiency in Python/Bash for scripting and automation.

● Solid understanding of Ansible and Terraform for infrastructure automation and Jenkins

for continuous integration and delivery.

● Excellent problem-solving skills, with the ability to troubleshoot complex system issues

effectively.

● Strong communication and collaboration abilities, capable of leading projects and working

across teams to achieve objectives.

Qualification:

● Bachelor's degree in Computer Science, Information Technology, or related field preferred.

Apply

Site Reliability Engineer

Description:

Report this job