Ishaaq Sameer Mohammed
AWS Cloud Production Support Engineer (L3) AWS Operations & Incident Response
E-mail: ***********@*****.*** Phone: +1-217-***-****
Professional Summary
AWS Cloud Production Support Engineer (L3) with 8+ years of experience supporting mission-critical, cloud-based production systems. Specializes in real-time incident response, outage triage, root cause analysis (RCA), and stabilizing distributed applications in high-availability environments. Strong in diagnosing infrastructure, application performance, and networking/dependency failures, with a focus on improving reliability, reducing repeat incidents, and restoring service within SLAs.
Technical Skills
Cloud Platforms:
AWS EC2, VPC, IAM, S3, RDS/Aurora, DynamoDB, ELB/ALB, Route 53, Lambda, CloudTrail, Multi-AZ architectures, distributed system support
Incident Response & Reliability:
L2/L3 production support, incident management, outage triage, root cause analysis (RCA), post-incident reviews (PIR), SLA/MTTR improvement, escalation handling, runbooks
Monitoring & Observability:
Datadog Incident Management, AWS CloudWatch, Splunk, New Relic, Kibana, log and metric analysis, alerting and diagnostics
Automation & Scripting:
Python, Bash, AWS CLI, JSON, SQL, automated diagnostics, recovery scripts, health checks
Containers & Orchestration:
Kubernetes (EKS), pod and container troubleshooting, resource utilization analysis, application log inspection
Data Platforms:
Databricks, Snowflake, AWS EMR, batch and analytics workload support
CI/CD & ITSM:
GitHub Actions, Jenkins, AWS CodePipeline, ServiceNow, JIRA / JSM, PagerDuty, Opsgenie, Confluence
Work Experience
Senior AWS Cloud Production Support Engineer Feb 2024 – Present
Capital One Bank – Richmond, VA
●Owned L3/L4 production incident response for AWS-hosted applications, restoring service within defined SLAs during high-severity outages.
●Led real-time outage bridges, coordinating cross-functional recovery efforts and stabilizing distributed cloud systems under incident conditions.
●Diagnosed complex failures across compute, networking, identity, load balancing, database, and application dependency layers to identify root causes.
●Performed root cause analysis (RCA) and post-incident reviews (PIR), driving corrective actions that reduced recurrence and improved platform reliability.
●Analyzed logs, metrics, and alerts using enterprise monitoring tools to accelerate detection and resolution of production issues.
●Supported Kubernetes (EKS) workloads by troubleshooting pod failures, resource constraints, and service connectivity issues during incidents.
●Investigated data pipeline and batch processing failures involving Databricks, Snowflake, and EMR, ensuring timely recovery of downstream reporting systems.
●Automated diagnostic checks and recovery tasks using scripting and cloud tooling, reducing manual effort and improving MTTR.
●Monitored deployments and coordinated rollback activities during production releases and incident scenarios.
●Authored and maintained incident runbooks and operational documentation to improve on-call readiness and knowledge sharing.
AWS Cloud Infrastructure Engineer Oct 2020 – Jan 2024
Capital One Bank – Richmond, VA
●Supported high-availability AWS production environments across EC2, S3, EFS, RDS, and VPC, diagnosing infrastructure and application issues as part of cloud escalation and recovery efforts.
●Utilized CloudFormation and Terraform to support infrastructure changes and assisted with deployment validation and recovery during high-impact production releases.
●Built Lambda-based automation for proactive alerting, scheduled recovery jobs, and data validation tasks used during production incident investigation.
●Supported IAM, CloudTrail, and S3 policy configurations during production incidents, investigating access, audit, and encryption-related issues impacting cloud workloads.
●Investigated EKS/Kubernetes failures, analyzing pod logs, resource constraints, and service dependencies to stabilize workloads during infrastructure-related outages.
●Leveraged GitHub Actions, Jenkins, and AWS CodePipeline to monitor deployments and coordinate rapid rollback during production incidents and recovery efforts.
●Analyzed logs, metrics, and alerts using AWS CloudWatch and Splunk to identify performance anomalies and early indicators of production service degradation.
AWS Platform Support Engineer Sept 2019 – Sept 2020
Capital One Bank – Richmond, VA
●Provided L2/L3 production support for AWS environments including EC2, EMR, S3, and IAM, triaging and resolving incidents impacting big-data pipelines and analytics workloads.
●Investigated EMR-based ETL failures using Python and Lambda automation, contributing to root cause analysis and recovery during data platform production incidents.
●Monitored large-scale distributed systems using logs, metrics, and health checks to detect anomalies and support timely incident response.
●Created SOPs and operational runbooks to standardize technical response during incident escalations and reduce recovery time.
Software Engineer – AWS May 2019 – June 2019
Magtech Solutions– Jersey City, NJ
●Executed deployments of AWS services including IAM, EC2, S3, Lambda, RDS, VPC, and SNS using CloudFormation and Terraform, supporting consistent and repeatable infrastructure setup.
●Automated deployment and configuration tasks using Ansible to ensure consistency and reduce manual errors during environment setup.
●Performed deployment validation and integration testing to verify infrastructure readiness and prevent post-deployment issues.
Software Analyst – Salesforce Aug 2018 – May 2019
Veridic Solutions LLC – Jersey City, NJ
●Developed and customized Salesforce Lightning and Visualforce components, implementing application-level enhancements to support business workflows.
●Built Apex triggers and classes to automate workflows and ensure consistent data updates across Salesforce applications.
●Executed bulk data migrations involving 10,000+ records using Data Loader and REST APIs, validating data integrity and successful integration.
●Created reports and dashboards to monitor application data consistency and support operational visibility.
Software Systems Analyst – Salesforce May 2017 – July 2018
ITDEA Technologies LLC – Tampa, FL
●Designed and implemented Salesforce data models using custom objects and fields, supporting application logic and structured data relationships.
●Automated business processes using workflow rules, validation rules, and Apex triggers to ensure data consistency and process reliability.
●Developed Salesforce Lightning components to support application functionality and improve system usability.
Business Analyst Aug 2016 – Apr 2017
ASTA CRS – Greenbelt, MD
●Documented system workflows and requirements using UML and BPMN, translating business needs into technical specifications for development teams.
●Supported UAT and regression testing cycles, validating system behavior and assisting QA teams during release readiness activities.
Project Engineer June 2012 – Jan 2015
Wipro Technologies – Hyderabad, India
●Performed manual and automated testing for enterprise applications (Siebel CRM and JDE ERP), validating functionality, integrations, and data accuracy prior to production releases.
●Prepared technical documentation and supported project delivery activities by tracking defects, test outcomes, and release readiness metrics.
Education
●M.S. in Management Information Systems May 2016
University of Illinois at Springfield
●B.E. in Electrical & Electronic Engineering April 2012
GITAM University, India
Certifications & Professional Development:
●Salesforce Platform Developer I (PD-401)
●Salesforce Administrator (ADM-201)
●Certified IT Project Management
●Certified Business Process Management