Google Cloud Site Reliability

Location:

Oakland, CA

Salary:

150,000/ Negotiable.

Posted:

July 23, 2025

Contact this candidate

Resume:

Steven Pardee

510-***-**** ***********@*******.*** Oakland, CA 94612

CAREER OVERVIEW AS SITE RELIABILITY ENGINEER (SRE) AND CLOUD INFRASTRUCTURE ENGINEER Technically astute and dynamic professional with over 10 years of comprehensive background in site reliability and cloud infrastructure engineering. Expert in orchestrating complex server environments, particularly in Red Hat Linux, across AWS, Google Cloud, and Azure platforms. Demonstrates proficiency in automating and optimizing large-scale operations. Adept at enhancing system reliability and performance, with keen focus on incident management and strategic problem-solving. Proven track record of collaborating with cross-functional teams and vendors, ensuring seamless integration and robust infrastructure solutions. Skilled in developing and standardizing operational procedures, combining technical expertise with commitment to operational excellence and innovation. TECHNICAL PROFICIENCIES

Cloud Technology: AWS-CLI, Cloud Formation, Terraform, Google Cloud Shell, Oracle Cloud Services, Certified Google Cloud Associate Engineer Certification ID: OqsveF https://www.credential.net/07535945-3830-43c5-bc53-6a73391b2e36 DevOps Tools: Ansible, Puppet, Jenkins, Oracle Orchestration, KickStart Scripting: Shell, Perl, Python, Golang

Operating Systems: Linux, AIX, Solaris, Windows, OS X Databases: MySQL, RDS, Oracle, Exadata, ExaCC, PostgreSQL, Cassandra Monitoring: New Relic, Nagios, Splunk, Grafana, NetCool, AppDynamics AREAS OF EXPERTISE

Site Reliability Engineering

Cloud Services Management

Server Deployment &

Management

Linux Systems Administration

Compliance & Quality Assurance

System Troubleshooting &

Remediation

Process Automation

Vendor Collaboration

Team Leadership &

Training

PROFESSIONAL EXPERIENCE

Samsung Semiconductor Labs/Crystal Equations

San Jose, CA

Contract Sr. Linux Administrator July 2024 – June 2025 Engineered systems administration-related solutions for various project and operational needs. Installed new/rebuild existing servers and configured hardware, peripherals, services, settings, directories, storage, etc. in accordance with standards and project/operational requirements including: Virtualization platform (VMware, KVM, and other) storage management (NetApp, Dell/EMC, and others), CI tools, Jenkins, Infra automation tools

(ansible) support, System inventory system (GLPI), Network equipment management (Cisco, Palo Alto Networks Firewall Monitoring, and others).

Installed and configured systems which support infrastructure and/or R&D activities. Developed and maintained installation and configuration procedures. Configured/maintained storage equipment and troubleshot with vendors. Configured/installed EDA tools, FlexLM licenses, and troubleshot software tool issues. Contributed to and maintained system standards. Contributed to and maintained security posture of the systems. Researched and recommended innovative, and where possible, automated approaches for system administration tasks. Identified approaches that leveraged resources. Operations and Support activities included: Installing and maintaining security patches on the operational and development system, which includes but is not limited to, Red Hat Linux (Centos, Rocky, Ubuntu…etc.), Windows 2008/2k8R2/2k12, VMWare, and Apache web services. Reported Security Patch compliance, performing daily system monitoring, verifying the integrity and availability of all hardware, server resources, systems and key processes, reviewing system and application logs, and verifying completion of scheduled jobs such as backups. Performed regular security monitoring to identify any possible intrusions; performed daily backup operations, ensuring all required file systems and system data were successfully backed up to the appropriate media, recovery tapes or disks created, and media recycled and sent off site as necessary. Performed regular file archival and purge as necessary, created, changed and deleted user accounts per request as necessary. Provided Tier III/other support per request from various constituencies. Investigated and troubleshot issues, repaired and recovered from hardware or software failure, coordinated and communicated with impacted constituencies. Maintenance duties included: Application of OS patches and upgrades on a regular basis, and upgraded administrative tools and utilities. Configured/added new services as necessary. Managed existing and/or deployed new instances of Jira, Git, LDAP, Jenkins, VMWare environments. Upgraded and configured system software that supports infrastructure applications or Asset Management applications per project or operational needs. Performed ongoing performance tuning, hardware upgrades, and resource optimization as required. Configured CPU, memory, and disk partitions as required. Wells Fargo San Francisco, CA

Senior Systems Operations Engineer 2020 to 2023

Deployed and maintained large, complex, and critical applications across many data centers and via cloud technologies, including AWS, Google Cloud, and Azure. Performed daily maintenance tasks and proactive reviews of production environment. Developed automation scripts in Shell and Python for moving and removing files across global cloud platforms. Carried out discussions with development and business teams, gaining deep understanding of applications and business functionality. Led remediation activities, ran technical incident bridges, and provided updates on active issues.

Managed incident, problem, change, and release processes for 20,000 Red Hat Linux Release 9 servers. Collaborated with vendors to troubleshoot issues regarding the products. Identified and addressed gaps and single points of failure in current technology processes and designs.

Updated and maintained application-specific troubleshooting documentation and runbooks, transferring knowledge to other members.

University of California - Office of the President/Randstad Oakland, CA UCPath Cloud Services Tech Lead (Contract) 2019 to 2020 Leveraged Cloud Formation to migrate UCOP's PeopleSoft from Oracle OMCS to AWS cloud-based services.

Crafted and presented executive and peer-level presentations and briefings to communicate upcoming changes, solutions, projects, and improvements. Devised and streamlined processes and procedures, ensuring operational efficiency. Standardized processes and deployments across various platforms, including scripting, group policy, and change management, particularly for Google Cloud and Azure systems. Spearheaded collaboration with AWS and Oracle for robust installation and deployment of thousands of Red Hat Linux systems, including troubleshooting infrastructure services. Drove strategic planning, implementation, and evaluation of automated server-based technologies on Red Hat Linux using AWS native services and architectures. Influenced system initiatives and design modifications, advocating for policy updates and procedural enhancements in relation to AWS Systems architecture. Resolved wide array of complex technical issues by providing expert technical assistance and innovative solutions.

Oracle/Intelliswift Redwood Shores, CA

Cloud at Customer Senior Systems Engineer (Contract) 2018 to 2019 Deployed and integrated Oracle Cloud on customer premises. Conducted service checks, monitoring and triaging system/application alerts, emails, and phone calls for appropriate prioritization and response. Troubleshot service-impacting events efficiently using various channels, including phone, email, service telemetry, and alerting systems. Developed and maintained standard operating procedures and troubleshooting guides in collaboration with Service Operations and Development teams to enhance mitigation efficiency. Enhanced reliability, performance, and operability of Red Hat Linux-based Client Cloud Services. Collaborated with engineering teams to identify and implement automation opportunities, signal noise reduction, and solutions for recurring issues to expedite mitigation of service-impacting events and improve cloud operations.

Managed critical incidents, ensuring rapid resolution and comprehensive communication with customers and key stakeholders.

Engaged in project delivery to expand monitoring, configuration, and deployment capabilities within Client Cloud Platform.

Supported training and development of junior team members, fostering skill growth and knowledge sharing.

Cloud Operations, Compute Classic - Senior Linux Admin (Contract) – Redwood Shores, CA 2017 to 2018 Processed automated service requests, creating Jira work orders, and allocated Systems Administrators and Field Engineers across various groups and data centers. Managed work execution using communication tools, such as Telephone, Zoom, Slack, and Pidgin. Deployed and remediated technologies, including Oracle Exadata Autonomous Database Cloud, Puppet, Oracle 11g, Xen-based Nimbula Director, and various RAID technologies, alongside firmware and hardware.

Gained proficiency in Oracle's internally developed packages, notably Oracle Enterprise Manager and Oracle Identity Manager.

Authored automation scripts for server discovery across global cloud network. Contributed to 24 7 team, maintaining availability and response readiness by carrying pager. Integrated, deployed, and administered vast cloud-based environment with over 500,000 Red Hat Linux servers worldwide, encompassing development, testing, and production stages. Verisk Analytics San Francisco, CA

Senior Linux Administrator (Contract) 2016

Orchestrated integration, deployment, and administration of 600-server Red Hat Linux environment, covering stages from development to production. Implemented SSL-enabled Kickstart and Puppet for streamlined deployment processes. Provided support to software development teams in California and New Jersey on various projects, including Pentaho Data Analysis.

Utilized VMWare VSphere for VM deployments and Cisco UCM/KVM for on-metal installations across 600 Red Hat servers.

Managed mirroring and administration of 300-server Cassandra site in New Jersey and Utah, serving as hot fail-over for GM OnStar's data collection and analysis project. Established and maintained Rabbit MQ messaging framework within the WS02 environment. Administered NGINX load balancing HTTP reverse-proxies, ensuring efficient operation across all stages. Integrated and customized third-party New Relic monitoring environment, tailored for Cassandra, Rabbit MQ, and NGINX systems.

Environments: Red Hat Enterprise Linux, Java, Apache, NGINX, WSO2, MySQL, Solr, Spark, Cassandra, Hadoop, Pentaho, RabbitMQ, Puppet, Chef, Kickstart, VMWare, VSphere, Cisco UCM/KVM, Splunk, New Relic, and Jira. SmartZip Analytics Pleasanton, CA

Linux Systems Administrator (Contract) 2015

Led migration and administration of legacy CentOS-based data center to Amazon Web Services

(AWS), involving 200 MySQL database servers, Hadoop nodes, LAMP, and file servers. Documented and standardized procedures for building and refreshing servers. Created Munin-based monitoring system with custom plugins for proactive component diagnostics, optimizing reliability and performance. Conducted security audit of Data Center Operations, identifying vulnerabilities and implementing effective monitoring and remediation strategies. Established and documented procedures for PKI infrastructure to secure database transfers to and from Data Center and AWS.

Designed and managed backup policies and procedures using Amazon S3, complemented by AWS-CLI scripting.

Developed and executed disaster recovery procedures for RAID arrays (0, 1, 5, 6, 10). Migrated and maintained up to 30 MySQL database and LAMP servers from RackSpace to AWS. Collaborated as DevOps Analyst with software development and data science teams, deploying and troubleshooting code and databases from development to production using tools, such as Capistrano and Elastic Beanstalk.

Environments: CentOS 6.5, MySQL 5.5, Apache 2.4, Apache Tomcat 7.0, Open JDK 1.7, Ruby 1.9, Rails 4.04, Python 2.7 & 3.4, Hadoop 2.2, MDADM 2.6, AWS-CLI 1.7, VSphere 5.0, and Munin 2.0. Additional Experience: Site Reliability Engineer (Contract), Yahoo, Inc, Sunnyvale, CA EDUCATION

James Madison University Harrisonburg, VA

Bachelor of Business Administration (BBA)

Contact this candidate