Post Job Free

Resume

Sign in

Google Cloud Site Reliability

Location:
Oakland, CA
Salary:
150,000/ Negotiable.
Posted:
January 19, 2024

Contact this candidate

Resume:

Steven Pardee

SRE & Cloud Infrastructure Engineer

ad2wmr@r.postjobfree.com • 510-***-****

Oakland, CA, 94612

Technically Astute and dynamic professional with comprehensive background in site reliability and cloud infrastructure engineering. Expert in orchestrating complex server environments, particularly in Red Hat Linux, across AWS, Google Cloud, and Azure platforms. Demonstrates proficiency in automating and optimizing large-scale operations. Adept at enhancing system reliability and performance, with keen focus on incident management and strategic problem-solving. Proven track record of collaborating with cross-functional teams and vendors, ensuring seamless integration and robust infrastructure solutions. Skilled in developing and standardizing operational procedures, combining technical expertise with commitment to operational excellence and innovation. Technical Proficiencies

Cloud Technology: AWS-CLI, Cloud Formation, Terraform, Google Cloud Shell, Oracle Cloud Services, Certified Google Cloud Associate Engineer Certification ID: OQsveF DevOps Tools: Ansible, Puppet, Jenkins, Oracle Orchestration, KickStart Scripting: Shell, Perl, Python, Golang

Operating Systems: Linux, AIX, Solaris, Windows, OS X Databases: MySQL, RDS, Oracle, Exadata, ExaCC, PostgreSQL, Cassandra Monitoring: New Relic, Nagios, Splunk, Graphana, NetCool, AppDynamics Areas of Expertise

Site Reliability Engineering

Cloud Services Management

Server Deployment & Management

Linux Systems Administration

Compliance & Quality Assurance

System Troubleshooting & Remediation

Process Automation

Vendor Collaboration

Team Leadership & Training

Professional Experience

Wells Fargo, San Francisco, CA 2020 to 2023

Senior Systems Operations Engineer

Deployed and maintained large, complex, and critical applications across many data centers and via cloud technologies, including AWS, Google Cloud, and Azure. Performed daily maintenance tasks and proactive reviews of production environment. Developed automation scripts in Shell and Python for server discovery across global cloud platforms. Carried out discussions with development and business teams, gaining deep understanding of applications and business functionality. Led remediation activities, ran technical incident bridges, and provided updates on active issues.

Managed incident, problem, change, and release processes for 20,000 Red Hat Linux Release 9 servers.

Collaborated with vendors to troubleshoot issues regarding the products.

Identified and addressed gaps and single points of failure in current technology processes and designs.

Updated and maintained application-specific troubleshooting documentation and runbooks, transferring knowledge to other members. University of California - Office of the President/Randstad, Oakland, CA 2019 to 2020 UCPath Cloud Services Tech Lead (Contract)

Leveraged Cloud Formation to migrate UCOP's PeopleSoft from Oracle OMCS to AWS cloud-based services. Crafted and presented executive and peer-level presentations and briefings to communicate upcoming changes, solutions, projects, and improvements. Devised and streamlined processes and procedures, ensuring operational efficiency. Standardized processes and deployments across various platforms, including scripting, group policy, and change management, particularly for Google Cloud and Azure systems.

Spearheaded collaboration with AWS and Oracle for robust installation and deployment of thousands of Red Hat Linux systems, including troubleshooting infrastructure services.

Drove strategic planning, implementation, and evaluation of automated server-based technologies on Red Hat Linux using AWS native services and architectures.

Influenced system initiatives and design modifications, advocating for policy updates and procedural enhancements in relation to AWS Systems architecture.

Resolved wide array of complex technical issues by providing expert technical assistance and innovative solutions. Page 1 2

Enhanced technical processes for Oracle and AWS-hosted environments through dedicated support. Oracle/Intelliswift, Redwood Shores, CA 2018 to 2019 Cloud at Customer Senior Systems Engineer (Contract) Deployed and integrated Oracle Cloud on customer premises. Conducted service checks, monitoring and triaging system/application alerts, emails, and phone calls for appropriate prioritization and response. Troubleshot service-impacting events efficiently using various channels, including phone, email, service telemetry, and alerting systems. Developed and maintained standard operating procedures and troubleshooting guides in collaboration with Service Operations and Development teams to enhance mitigation efficiency.

Enhanced reliability, performance, and operability of Red Hat Linux-based Client Cloud Services.

Collaborated with engineering teams to identify and implement automation opportunities, signal noise reduction, and solutions for recurring issues to expedite mitigation of service-impacting events and improve cloud operations.

Managed critical incidents, ensuring rapid resolution and comprehensive communication with customers and key stakeholders.

Engaged in project delivery to expand monitoring, configuration, and deployment capabilities within Client Cloud Platform.

Supported training and development of junior team members, fostering skill growth and knowledge sharing. Oracle/Intelliswift, Redwood Shores, CA 2017 to 2018 Cloud Operations, Compute Classic - Senior Linux Admin (Contract) Processed automated service requests, creating Jira work orders, and allocated Systems Administrators and Field Engineers across various groups and data centers. Managed work execution using communication tools, such as Telephone, Zoom, Slack, and Pidgin. Deployed and remediated technologies, including Oracle Exadata Autonomous Database Cloud, Puppet, Oracle 11g, Xen-based Nimbula Director, and various RAID technologies, alongside firmware and hardware.

Gained proficiency in Oracle's internally developed packages, notably Oracle Enterprise Manager and Oracle Identity Manager.

Authored automation scripts for server discovery across global cloud network.

Contributed to 24 7 team, maintaining availability and response readiness by carrying pager.

Integrated, deployed, and administered vast cloud-based environment with over 500,000 Red Hat Linux servers worldwide, encompassing development, testing, and production stages. Verisk Analytics, San Francisco, CA 2016

Senior Linux Administrator (Contract)

Orchestrated integration, deployment, and administration of 600-server Red Hat Linux environment, covering stages from development to production. Implemented SSL-enabled Kickstart and Puppet for streamlined deployment processes. Provided support to software development teams in California and New Jersey on various projects, including Pentaho Data Analysis.

Utilized VMWare VSphere for VM deployments and Cisco UCM/KVM for on-metal installations across 600 Red Hat servers.

Managed mirroring and administration of 300-server Cassandra site in New Jersey and Utah, serving as hot fail-over for GM OnStar's data collection and analysis project.

Established and maintained Rabbit MQ messaging framework within the WS02 environment.

Administered NGINX load balancing HTTP reverse-proxies, ensuring efficient operation across all stages.

Integrated and customized third-party New Relic monitoring environment, tailored for Cassandra, Rabbit MQ, and NGINX systems. Environments: Red Hat Enterprise Linux, Java, Apache, NGINX, WSO2, MySQL, Solr, Spark, Cassandra, Hadoop, Pentaho, RabbitMQ, Puppet, Chef, Kickstart, VMWare, VSphere, Cisco UCM/KVM, Splunk, New Relic, and Jira. SmartZip Analytics, Pleasanton, CA 2015

Linux Systems Administrator (Contract)

Led migration and administration of legacy CentOS-based data center to Amazon Web Services (AWS), involving 200 MySQL database servers, Hadoop nodes, LAMP, and file servers. Documented and standardized procedures for building and refreshing servers. Created Munin-based monitoring system with custom plugins for proactive component diagnostics, optimizing reliability and performance. Conducted security audit of Data Center Operations, identifying vulnerabilities and implementing effective monitoring and remediation strategies. Established and documented procedures for PKI infrastructure to secure database transfers to and from Data Center and AWS.

Designed and managed backup policies and procedures using Amazon S3, complemented by AWS-CLI scripting.

Developed and executed disaster recovery procedures for RAID arrays (0, 1, 5, 6, 10).

Migrated and maintained up to 30 MySQL database and LAMP servers from RackSpace to AWS.

Collaborated as DevOps Analyst with software development and data science teams, deploying and troubleshooting code and databases from development to production using tools, such as Capistrano and Elastic Beanstalk. Environments: CentOS 6.5, MySQL 5.5, Apache 2.4, Apache Tomcat 7.0, Open JDK 1.7, Ruby 1.9, Rails 4.04, Python 2.7 & 3.4, Hadoop 2.2, MDADM 2.6, AWS-CLI 1.7, VSphere 5.0, and Munin 2.0. Additional Experience: Site Reliability Engineer (Contract), Yahoo, Inc, Sunnyvale, CA Education: Bachelor of Business Administration James Madison University, Harrisonburg, VA Page 2 2



Contact this candidate