Gregory Fletcher Jr.
+1-302-***-**** *********@*****.*** linkedin.com/in/gregeng1 github.com/mmxxtdmk
Site Reliability Engineering Leader
Site Reliability Engineering leader with nearly a decade of experience owning reliability for mission-critical systems at JP Morgan Chase & Co. Drove measurable improvements in observability, MTTR reduction, toil automation, and platform availability (up to 99.9%) across RPA and distributed Java environments. Completed JPMC’s accelerated Google SRE training pilot program. Excel at mentoring cross-functional teams, standardizing SRE practices, and delivering zero-downtime outcomes through automation and collaboration.
Areas of Expertise
Site Reliability Engineering (SRE) Observability & Monitoring Incident Response & Blameless Post-Mortems Toil Reduction and Automation High-Availability & Zero-Downtime Systems Error Budgets & SLOs Technical Mentorship & Knowledge Sharing Cloud & Distributed Systems DevOps & CI/CD Agile Practices
Professional Experience
JP Morgan Chase & Co.
Remote/Wilmington, DE
April 2018 – June 2025
Site Reliability Engineer: Robotics, UiPath/C#
May 2023 – June 2025
Reliability owner for 60+ production robotic process automations supporting critical operations. Drove SRE best practices across stability, observability, incident response, and toil reduction after completing JPMC’s accelerated Google SRE training pilot program with Google SRE trainers.
Owned observability and monitoring strategy for the RPA platform; collaborated with SRE teams to optimize tools and workflows, reducing MTTR by 20%, improving service resiliency, and minimizing operational risk for business-critical automations.
Developed and maintained run books plus Confluence knowledge base that accelerated team onboarding by 30%, fostered blameless postmortems, and strengthened knowledge sharing to enhance incident response and reduce repeat issues.
Mentored 5 cross-functional engineering and operations teams on SRE principles, including error budgets, toil automation, and operational excellence, boosting resiliency of mission-critical RPA systems, reducing toil by 35%, and enabling faster, more reliable delivery in dynamic production environments.
Spearheaded integration of enterprise observability tooling, standardized reliability patterns across business units, and promoted reusable services that improved platform availability to 99.9%, reduced downtime by 25%, and supported scalable adoption of RPA capabilities.
Site Reliability Engineer: Distributed Computing, Java
April 2019 – April 2023
Reliability owner for business-critical distributed applications. Focused on high-reliability system design, automation of operations, and enterprise-level observability to achieve zero-downtime goals. Completed JPMC’s accelerated Google SRE training pilot program and partnered with product teams to prioritize foundational reliability capabilities.
Standardized operating procedures for 25% of business-critical products, boosting operational efficiency by 20% through Agile processes and cross-functional collaboration.
Developed technical playbooks and centralized monitoring tools with incident response frameworks that unified practices across teams, accelerated SRE adoption by 40%, and strengthened platform availability, reducing repeat incidents by 25%.
Executed zero-downtime data center migrations, delivering 160+ hours of annual operational savings via automation and enhanced monitoring while validating technology feasibility for product roadmaps.
Site Reliability Engineer / DevOps Engineer: Java
April 2018 – April 2019
Designed and supported CI/CD pipelines, observability solutions, and high-availability environments for Java-based microservices serving global user bases. Collaborated with engineers and product owners to modernize legacy systems.
Pivoted operations from a service model to a product model, scaling and securing global business and technical processes while prioritizing platform backlogs and managing dependencies to enable 30% faster delivery cycles.
Designed and deployed Java microservices into cloud infrastructures, replacing legacy systems and saving $100k+ annually using Spring, REST and cloud services.
Automated DevOps workflows in Linux environments, saving 10+ developer hours weekly with Jenkins, Bitbucket, and CI/CD tools.
Reduced incident detection time by 15% through Splunk and Kibana dashboards, delivering reusable observability capabilities across teams.
Education
Bachelor’s Degree in Engineering
2010 – 2011
Virginia Polytechnic Institute and State University
Java Full Stack Development - Zip Code Wilmington
Built and delivered a mobile full-stack application in a team of five using Java, Spring, AngularJS, SASS and Agile practices.
Technical Skills
Languages & Frameworks:
Java, Python, C#, Spring, REST, Node.js, JavaScript
DevOps & Infrastructure:
Jenkins, Bitbucket, CI/CD, Terraform, Docker, Kubernetes, Ansible, Git, YAML, Infrastructure-as-Code
Observability & Monitoring:
Splunk, Grafana, Kibana, Logstash, FluentD
Cloud & Databases:
AWS, Apache Kafka, Tomcat, MongoDB, Cassandra, MS SQL Server, Oracle DB
Other:
Jira, Confluence, ServiceNow, Agile/Scrum, JUnit
Awards
Leadership of the Internal Big Data Community – JP Morgan Chase & Co., June 2022
Led cross-functional collaboration and best practice sharing to drive adoption of scalable data solutions across engineering teams.
Leadership of the Internal Machine Learning Community, July 2019
Founded the community and led knowledge sharing sessions and technical workshops to foster innovation and platform alignment.