Key Responsibilities : 1.
AWS Infrastructure Design : - Lead the design and implementation of scalable, reliable, and secure AWS infrastructure.
- Provide expertise in architecting solutions that maximize the benefits of AWS services.
- Lead the upgrade of Apache web servers for improved performance and security.
- Oversee the database (DB) upgrade process, ensuring minimal downtime and data integrity.
- Manage the upgrade of application servers to enhance overall system efficiency.
2.
Automation and AWS Tooling : - Develop and maintain automation tools for deployment, monitoring, and operations on AWS.
- Implement and enhance infrastructure as code (IaC) using AWS Cloud Formation or similar tools.
- Service Availability Monitoring and Incident Response - Set up and maintain monitoring solutions on AWS to proactively identify and address system issues.
- Respond to and resolve incidents, ensuring minimal downtime and impact on users.
- Getting involved during Major incidents.
- Leverage available monitors at hand to debug, identify and get right team to resolve the issue - Prepare proper RCA of incident.
- Get the right team to work on preventive steps - Keep a tab on Minor incidents.
Look for trends to ensure they do not lead to Major incident 4.
AWS Best Practices : - Enforce AWS best practices for security, performance, and cost optimization.
- Stay current with AWS advancements and integrate relevant technologies into our infrastructure.
5.
Collaboration and Communication : - Work closely with development, operations, and QA teams to foster a DevOps culture.
- Effectively communicate AWS-related insights, recommendations, and project status.
- Facilitate the upgrade of Kafka and other essential tools within the solution engineering framework.
- Engage in change planning with the cloud team for seamless upgrades and troubleshoot any arising issues.
6.
Cloud Security : - Implement and maintain Akamai Edge Security, WAF, measures for optimal protection.
- Oversee monitoring activities to proactively identify and address security vulnerabilities.
- Collaborate with the solution team to conduct cloud security checks and upgrade planning.
- Work closely with the solution engineering team & Security team to resolve security issues promptly.
- Manage DDOS, WAF, Edge firewall, and network security tasks, including continuous monitoring.
- Coordinate corrective actions with the cloud team/AWS to ensure a secure cloud environment.
7.
High Traffic Events : - Evaluate infrastructure needs for high-traffic events, ensuring appropriate sizing and scaling.
- Monitor traffic patterns and collaborate with basic cloud architects to optimize performance.
8.
FinOps Cost Management : - Monitor storage utilization and implement strategies to optimize costs.
- Oversee infrastructure utilization, controlling costs through effective monitoring.
- Monitor CPU, memory, RAM, and other parameters, optimizing resource consumption.
- Conduct regular checks on data storage to ensure efficient : - Bachelor's degree in Computer Science, Engineering, or related field.
- 6-10 years of hands-on experience as a Site Reliability Engineer, with a focus on AWS.
- Hands-on experience with AWS, Cloud Infrastructure, AWS cloud security, high-traffic events, and FinOps cost management - Proficiency in scripting languages (e.g., Python, Bash) and experience with AWS SDKs.
- In-depth knowledge of AWS services and a proven track record of implementing solutions on AWS.
- Experience with container orchestration tools (e.g., Kubernetes, Docker Swarm) on AWS.
- Has an understanding of Web, Middleware, DB technologies such as Apache, Wildfly, MySQL, Kafka etc - Familiarity with cloud security measures and high-traffic event management.
- Knowledge of FinOps principles and cost management in cloud environments - Strong problem-solving and troubleshooting skills.
- Excellent communication and collaboration skills.
(ref:hirist.tech)