MORRISVILLE, US, ***** • ************@*****.*** • 475-***-****
AMNA ASIF
Site Reliability Engineer
PROFESSIONAL SUMMARY
Versatile IT Professional with 9+ years of experience in enterprise infrastructure and cloud technologies. Proven track record of architecting, implementing, and optimizing scalable solutions across AWS, Azure, and GCP platforms. Expert in containerization, automation, and DevOps practices, leveraging tools such as Kubernetes, Docker, Ansible, and Terraform. Skilled in designing high-availability systems, enhancing security postures, and streamlining IT operations. Adept at leading cross-functional teams and aligning technical solutions with business goals. Committed to driving innovation and efficiency in complex IT environments while ensuring robust, reliable, and cost-effective infrastructures.
EMPLOYMENT HISTORY
DIGITAL SITE RELIABILITY ENGINEERJun 2023 - Present
HILTONMemphis, TN
Manage Kubernetes deployments, ensuring optimal pod performance and streamlined processes.
Implement comprehensive monitoring with Datadog, Dynatrace, Splunk, and AWS CloudWatch.
Enhance CI/CD pipelines with automated testing, integrating GitLab and Bitbucket.
Design secure AWS environments with IAM, encryption, and robust disaster recovery plans..
Optimized Kubernetes cluster performance through advanced pod management and automated deployments using Argo CD, enhancing system reliability and deployment efficiency
Architected multi-region AWS infrastructure with comprehensive disaster recovery strategies, implementing geographic redundancy for enhanced service resilience
Strengthened system observability by integrating Datadog and Dynatrace monitoring, creating custom service detections for improved application performance tracking
Enhanced security posture through GraphQL resolver validation, IAM configurations, and network security protocols in AWS environments
Streamlined CI/CD workflows by implementing automated Cypress testing and integrating version control systems, reducing deployment cycles
Implemented advanced monitoring solutions across AWS infrastructure, integrating multiple observability tools for enhanced system performance tracking
.Tuned Akamai cache configurations to enhance response times for www.hilton.com and decrease origin traffic.Analyzed cache hit ratio (CHR) metrics and optimized caching rules to minimize latency and backend load.
Developed and optimized Java-based monitoring integrations for Dynatrace, enhancing application observability and reliability.
Wrote Java scripts and API connectors to extract and analyze telemetry data, improving SLI tracking for client-side interactions.
Implemented deep linking strategies using Akamai query parameters to enhance Hilton’s search and shop functionality.
Optimized cache offloading using Akamai parameters to improve user experience on www.hilton.com/en/search.
Utilize scripting languages (Python, Bash) to automate task and reduce errors.
Managed code repositories using Git, Bitbucket, and GitLab, ensuring effective version control and collaboration.
Integrated Bamboo into CI/CD pipelines to automate builds, tests, and deployments.
SITE RELIABILITY ENGINEERMar 2022 - Apr 2023
PrometheusRaleigh, NC
Led high-availability infrastructure design across AWS, Azure, GCP
Spearheaded CI/CD pipeline development for rapid feature delivery
Pioneered custom Dynatrace dashboards for real-time performance monitoring
Conducted systematic performance analysis for cost-effective resource allocation
Orchestrated seamless major infrastructure upgrades and migrations
Led disaster recovery initiatives and implemented automated backup strategies across cloud platforms, strengthening system resilience and data protection
Optimized cloud resource utilization through strategic capacity planning and performance analysis, driving cost efficiency while maintaining service quality
Orchestrated cross-functional infrastructure upgrades, ensuring seamless transitions between cloud environments while maintaining operational continuity
Developed comprehensive monitoring dashboards using Dynatrace and Pingdom, enabling real-time system performance tracking and proactive issue resolution
Engineered multi-cloud infrastructure solutions with automated failover mechanisms, enhancing system resilience and minimizing downtime across distributed environments
NETWORK/LINUX ENGINEERAug 2019 - Jan 2022
Verizon BusinessCary, NC
Resolved server outages and brought them up quickly using console (ILO) for HP servers and IDRAC for Dell servers.
Used Nagios as an IT infrastructure monitoring tool for monitoring host resources such as processor load, disk usage, system logs, monitoring applications, services, and network protocols.
Managed users using LDAP Active Directory to maintain user data and security, managing network connections and server-based security using SELinux and iptables.
Monitored network traffic using tcpdump and TCP/UDP protocol. Performed vulnerability management of operating systems with scheduled patches and security hardening through Ansible playbooks.
Configured and maintained Gitlab/GitHub repository servers for code releases and application configurations.
Created AWS EC2 Instances, set up VPC, created load balancers (ELB), and used Route53 with failover and latency options for high availability and fault tolerance.
Worked closely with application and product teams in understanding their specific automation requirements and implementing them in an optimal CI/CD pipeline (Jenkins).
LINUX SYSTEM ADMINISTRATOROct 2017 - Aug 2019
DupontDelaware, DE
Hands-on experience building, patching, and maintaining Linux systems in a mission-critical bare metals and virtualized (VMware) environment using Red Hat Satellite.
Installed and configured Operating systems (RHEL 6&7, Centos 6&7, VMware Workstation), Supported in virtualization VMware ESX and ESXI hypervisors, and vSphere, vMotion and vCenter servers.
Configured and managed the volume groups, and logical volume using LVM for disk management and troubleshooting failed LVM.
Performed troubleshooting steps for system sluggishness using different tools such as TOP, VMSTAT, and SAR.
Configured NIC-Bonding with active backup and load balance based on traffic speed to exclude latency issues in the Network.
Worked with Splunk universal forwarder to get reliable, secure data collection from various sources and delivered the data to Splunk Enterprise or Splunk Cloud for indexing and log analysis.
LINUX SUPPORT OPERATORSep 2016 - Sep 2017
DatapriseJersey City, NJ
Led strategic planning for future hardware needs
Delivered tier 2 and 3 technical supports remotely
Ensured optimal hardware health and managed repair needs
Administered ACL and OpenLDAP for file system management
Streamlined complex tasks with bash and shell scripting
EDUCATION
BACHELOR OF ARTS University of the Punjab in 2015
COURSES
CERTIFIED KUBERNETES ADMINISTRATOR (CKA)2023
Kubernetes
CCIE (R&S) WRITTEN CSCO135147132020
Cisco
PROJECT MANAGEMENT PROFESSIONAL (PMP)
PMI
SKILLS
Incident Management, System Scalability, Performance Tuning, Risk Mitigation, Strategic Planning, Problem Solving, Kubernetes, Terraform, Automation, Linux, Team Collaboration.