Senior Site Reliability Engineer - Docker/Kubernetes

Company:

Career Stone Consultant Private Limited

Location:

Mumbai, Maharashtra, India

Posted:

April 23, 2024

Apply

Description:

Key Responsibilities : 1.

AWS Infrastructure Design : - Lead the design and implementation of scalable, reliable, and secure AWS infrastructure.

- Provide expertise in architecting solutions that maximize the benefits of AWS services.

- Lead the upgrade of Apache web servers for improved performance and security.

- Oversee the database (DB) upgrade process, ensuring minimal downtime and data integrity.

- Manage the upgrade of application servers to enhance overall system efficiency.

Automation and AWS Tooling : - Develop and maintain automation tools for deployment, monitoring, and operations on AWS.

- Implement and enhance infrastructure as code (IaC) using AWS Cloud Formation or similar tools.

- Service Availability Monitoring and Incident Response - Set up and maintain monitoring solutions on AWS to proactively identify and address system issues.

- Respond to and resolve incidents, ensuring minimal downtime and impact on users.

- Getting involved during Major incidents.

- Leverage available monitors at hand to debug, identify and get right team to resolve the issue - Prepare proper RCA of incident.

- Get the right team to work on preventive steps - Keep a tab on Minor incidents.

Look for trends to ensure they do not lead to Major incident 4.

AWS Best Practices : - Enforce AWS best practices for security, performance, and cost optimization.

- Stay current with AWS advancements and integrate relevant technologies into our infrastructure.

Collaboration and Communication : - Work closely with development, operations, and QA teams to foster a DevOps culture.

- Effectively communicate AWS-related insights, recommendations, and project status.

- Facilitate the upgrade of Kafka and other essential tools within the solution engineering framework.

- Engage in change planning with the cloud team for seamless upgrades and troubleshoot any arising issues.

Cloud Security : - Implement and maintain Akamai Edge Security, WAF, measures for optimal protection.

- Oversee monitoring activities to proactively identify and address security vulnerabilities.

- Collaborate with the solution team to conduct cloud security checks and upgrade planning.

- Work closely with the solution engineering team & Security team to resolve security issues promptly.

- Manage DDOS, WAF, Edge firewall, and network security tasks, including continuous monitoring.

- Coordinate corrective actions with the cloud team/AWS to ensure a secure cloud environment.

High Traffic Events : - Evaluate infrastructure needs for high-traffic events, ensuring appropriate sizing and scaling.

- Monitor traffic patterns and collaborate with basic cloud architects to optimize performance.

FinOps Cost Management : - Monitor storage utilization and implement strategies to optimize costs.

- Oversee infrastructure utilization, controlling costs through effective monitoring.

- Monitor CPU, memory, RAM, and other parameters, optimizing resource consumption.

- Conduct regular checks on data storage to ensure efficient : - Bachelor's degree in Computer Science, Engineering, or related field.

- 6-10 years of hands-on experience as a Site Reliability Engineer, with a focus on AWS.

- Hands-on experience with AWS, Cloud Infrastructure, AWS cloud security, high-traffic events, and FinOps cost management - Proficiency in scripting languages (e.g., Python, Bash) and experience with AWS SDKs.

- In-depth knowledge of AWS services and a proven track record of implementing solutions on AWS.

- Experience with container orchestration tools (e.g., Kubernetes, Docker Swarm) on AWS.

- Has an understanding of Web, Middleware, DB technologies such as Apache, Wildfly, MySQL, Kafka etc - Familiarity with cloud security measures and high-traffic event management.

- Knowledge of FinOps principles and cost management in cloud environments - Strong problem-solving and troubleshooting skills.

- Excellent communication and collaboration skills.

(ref:hirist.tech)

Apply

Senior Site Reliability Engineer - Docker/Kubernetes

Description:

Report this job