Manager Senior systems engineering, Networking and SRE Team

Location:

Godley, TX

Salary:

125000

Posted:

October 29, 2024

Contact this candidate

Resume:

RANDALL C. BAXTER

Manager Senior systems engineering, Networking and SRE Team

**** ****** *****, ******, ***** 76044

U.S.A.

E-mail *******.*.******@*****.***

Cell 281-***-****

https://www.linkedin.com/in/randybaxter/

Skype: live:randall.c.baxter

Summary of Qualifications:

Results oriented with a successful record of accomplishments in managing a team ensuring quality through training, communication, monitoring, and consistent goal-oriented expectations through metrics. Built a Site Reliability Engineering team from ground up, managing Outages and Incidents through weekly RCA meetings to ensure each would not happen again. Working with Engineering teams, developers and vendors on automation, terraform / IaC. Built Microsoft Copilot utilizing AI pulling all JIRA related incidents to further enhance the Postmortem of each. Worked with auditors for all reporting and validations for ESX and hardware infrastructure showing HSA accounts and access routes. Created confluence sites for each team to visualizing the KPI’s, open queues, along with metrics showing the value of each. Built PowerBi reports into JIRA which allowed to view KPI, open vs closure percentage rates. Work Experience: American Homes 4 Rent - 05/10/2021 – 10/07/2024 Manager Senior Systems Engineering Team

• Leadership & Team Management: Oversaw multiple engineering teams, aligning individual and team priorities to enhance productivity and efficiency.

• Confluence Development: Designed and implemented a 5x5 Excel site in Confluence to showcase each team's priorities, aligning them with organizational goals.

• KPI & Metrics Dashboard: Created a comprehensive Confluence site to display KPIs and metrics related to Reliability and Uptime, Project Delivery, Performance, Incident Management, Customer Satisfaction, Compliance, Innovation, and Cost Efficiency.

• Executive Reporting: Developed PowerPoint presentations for CTO reviews, outlining KPIs and presenting a strategic 3-year plan for transitioning from Systems Engineers to a full SRE team, including a detailed training path for each engineer.

• Azure Engineer: Managed all aspects of Azure from Entra B2B, Azure AD B2C, Frontdoor, Gateway, Kubernetes, Service Bus, Virtual machine maintenance including Backup and Restore processes, etc.

• Cloud Migration Project: Led a critical project to migrate all on-premises servers to Azure, successfully transitioning application servers and 200TB SQL clusters to Azure PaaS environments using Microsoft Migration tools.

• Site Reliability Engineering (SRE) Team Development: Established and managed an SRE team, focusing on hiring and developing KPIs such as Mean Time to Repair (MTTR), Mean Time Between Failures (MTBF), and Customer Satisfaction Score (CSAT) to enhance system usability.

• Physical to Virtual Migration: Managed the migration of 90 servers from physical infrastructure to Azure VMs, including Talend, 200TB SQL databases, gateways, and various applications.

• Oracle Infrastructure Rollout: JDE Oracle Infrastructure, successfully launching the JDE 4AMH environment for go-live on January 1, 2022.

• Standard Operating Procedures (SOPs): Authored all SOPs for maintaining the Oracle environment, including compliance documentation for monthly audits.

• Lead Engineering Projects: Assisted in projects involving Cosmos DBs, Data Lake, and geo-redundancy between Azure East and West for failover solutions.

• Azure Services Knowledge: Proficiency in Azure core services (Compute, Storage, Networking)

• Azure Networking: Understanding of Azure Virtual Networks, VPNs, ExpressRoute, and Network Security Groups.

• Operating systems: Windows, Linux, (Ubuntu, Redhat), SQL, SQL Reporting servers, patch management, and recommended updates. (example: TLS version)

• Automation: Skills in Powershell, Azure CLI, Azure Automation, and use of Policies.

• Monitoring and Performance: Familiarity with Azure Monitor, Application Insights, and Log Analytics.

• Backup and Disaster Recovery: Understanding Azure Backup and Azure Site Recovery.

• Database Management: Knowledge of Azure SQL Database, Cosmos DB, and other Azure data services. Work Experience: JP Morgan Chase Bank - 3/19/2009 – 10/17/2019 Vice President /SRE Delivery Manager/Analyst

• Worked with engineering team in the implementation of ScaleIO/VxFlex/PowerFlex environment for over 100,000 VMs and 6 Petabyte of data. Daily administration and support of ESX infrastructure and training of all ESX personal on updated processes for automation and maintenance of the ScaleIO environment. This is utilizing the ScaleIO AMS/ VxFlex OS to update, maintain, add/remove resources as needed.

• Managed large project for repairing the mismanagement of blades to ESX clusters. Known as the 1:1:1 Tetris project, to set all ESX Clusters to only have one blade per HP Apollo Chassis. This required moving blades from one chassis to another from different ESX layer Clusters. All without impact to the customer. (no down time)

• Manage and supervise Global ESX Team in Argentina, India, and USA, providing comprehensive incident reports broken down by Availability Metrics and Action items. Lowered Ticket count by over 50% dropping team tickets from over 1000 tickets a month to under 500 in one year. Lead team in managing multiple projects to upgrade Environments consisting of multiple hardware layers from HP standalone, UCS and Dell ScaleIO. Application updates of VMWare 6.x, ESXi, and Cisco UCS for both PSI and VSI environments.

• Trained all new teams created in offshore sites on how to be an engineer, what to look for and how to respond to emergencies. Created contact emergency list for sites to reach out during off hours for both Team Leads and Management.

• Responsible for daily meeting on change control for all changes impacting a 10+ thousand ESX host environments, over two hundred Virtual Center environment with over one hundred and fifty thousand virtual machines.

• Created process documentation for ticket escalation, closure, metrics and SCIM data showing SLA level. Met with customers on Metrics for customer satisfaction levels, or areas to pinpoint for enhancement.

• Utilized vROPs for monitoring of VM's reporting to LOB's preventing operating system outages due to OS failure. Completed server consolidation project to remove 80% of active servers globally without jeopardizing speed or connectivity by managing server resources both physical and virtual using VROPs to show Cluster resources availability.

• Meet with vendors to work on issues with Hardware or issues with response of technical teams to regional DCO's globally. Work directly with vendor for ongoing issues with VMware, HP 3000 environment, Cisco UCS environment. As well as maintaining Dell, IBM, Cisco, HP Hardware.

• Work as a lead technician for escalations with customers using years of experience with multiple technologies. Employment:

Dril-Quip Inc. 2/01/2006 – 3/16/2009

Systems Administrator

Tetra Pak, 03/01/2000 – 6/15/2006

Operations Specialist

Education:

Multiple Pluralsight training classes in Azure and Oracle Sept. 2024 Using Python to Access Web Data Apr. 2020

Python Data Structures Mar. 2020

Programming for Everybody (Getting started with Python) Feb. 2020 Google Cloud Platform Fundamentals: Core Infrastructure Feb. 2020 VMWare ESX 6x VCP May 2018

VMWare ESX 5x VCP Nov. 2013

Microsoft VB Scripting for Windows Microsoft SQL Dec. 2001 United States Navy 1986 - 1991

Art Institute of Pittsburgh, Pittsburgh, Pennsylvania 1985 - 1986 Youngstown State University, Youngstown, Ohio 1984 - 1985 United States Navy 1980 - 1985

Skills:

Kubernetes – Oracle – JDEdwards – Azure – AWS - VMware – ESXi – Exchange – Teams - Office365 - JIRA – Strong Excel – Word

– PowerPoint - ScaleIO – VxFlex, Cisco UCS Environments - Change Control - Linux (RedHat)(Ubuntu) – Hyper-V – PowerShell/VisualStudio

Contact this candidate