Site Reliability Engineer

Location:

Greenville, TX, 75401

Salary:

$40-$60/HR

Posted:

February 24, 2024

Contact this candidate

Resume:

Tommy Teinert

Monitoring Engineer / SRE

Mobile – (1-717-***-****

Email - ********@*****.***

Summary

•Monitoring Engineer SRE with experience performance testing, performance monitoring, performance troubleshooting, application development, full stack development, automation, database management and observability platforms.

•Experience in performance testing, monitoring, and analysis of Apache2 and Tomcat web server environments running on Windows platforms.

•Created applications in Python and GoLand to connect to Meraki and Extrahop APIs for collection and transfer of data.

•Deployed applications on Docker containers to interact with and cause chaos fault scenarios on infrastructure and applications based on Weblogic VMs.

•Utilized Kubernetes for pod/node creation, deployment, deletion, status checks and flag creation, among other commands to interact with applications on Openshift Container Platform, AKS and Loki.

•Created dashboards in Grafana for observability and alerting purposes in prod and non prod environments.

•Utilized Helm charts and Spinnaker to deploy code to Kubernetes clusters.

•Engineered test scenarios using JMeter and Loadrunner.

•Experience with Kafka to send logs to dashboards for data convergance and analysis

•Work on performance engineering with J2EE based, OLTP applications.

•Monitored system performance using JVisualVM, JMap, and Jstat.

•Leveraged VMware Mangle Chaos Framework to inject chaos faults into application and infrastructure systems to degrade performance.

•Monitored system performance while introducing chaos fault scenarios to purposefully degrade infrastructure and application performance.

•Performed monitoring assessment on system applications and provided recommendations on addressing gaps and managing critical incidents.

•Implemented automation of files across environments using Selenium.

Technical / Domain Expertise

Operating System

Windows •

Back-End Development

Java • PL/SQL • JDBC

Front-End Development

HTML/CSS • JavaScript • AJAX

J2EE Frameworks

Servlets

UI Framework

JQuery • Angular • Bootstrap •Kibana • Grafana

Databases

MySQL Server • Oracle

Web/Application Servers

Apache • Tomcat • PCF • Docker • Kubernetes

IDE/Source Control

Eclipse • Spring Tool Suite • Git • IntelliJ IDEA • VSCode

Monitoring Tools

Dynatrace • Lighthouse • Datadog • Meraki • Extrahop

Performance Testing Tools

JMeter • LoadRunner

Performance Analysis Tools

VisualVM • GC Viewer • Perfmon

Chaos Testing Fameworks

VMware Mangle Chaos Framework

Cloud Platform

AWS • PWS • OCP • Azure

Engagement Details

Client • Location • Duration

Project Description • Roles and Responsibilities

Albertsons

WFH

(10/2021 – 11/2023)

Monitoring Engineer

Problem Statement: Observability over multiple applications needed to be seen in single pane of glass view for easier assessment and analysis of network traffic and hardware status availability

Solution: Route data from application API’s into observability platform using application written in GoLang and Pythin and design dashboards and alerts to notify users of issues. Monitor dashboards and alerts for metrics.

•Reviewed and determined data available from Meraki and Extrahop API’s.

•Assisted team in development of exporters in Python and GoLang to interface with API’s.

•Consumed and posted Kafka topics for data analysis and creation of dashboards.

•Retrieved data from Meraki and Extrahop API’s with created exporters.

•Routed data through Loki and Prometheus for use in Grafana.

•Led team in creation of dashboards and visualizations in Grafana to show network traffic data and device status.

•Led team in creation of dashboards in Grafana to show device status.

•Created alerts and assisted team in creation of alerts in Grafana and prometheus to notify users of offline devices and above threshold network traffic.

•Installed grafana plugins for use in varius dashboards creations.

•Utilized Helm charts to deploy exporter code in Kubernetes deployment.

•Setup and led meetings with client and shareholders for demonstrations and Q/A sessions, weekly progress meetings and end of sprint progress reports.

CVS

WFH

(05/2021 – 08/2021)

Site Reliability Engineer (SRE)

Problem Statement: Identify monitoring and logging gaps in client systems and help reduce client pain-points with respect to availability.

Solution: Gather information on current system architecture, monitoring and alerting thresholds, application and webserver logs, NFR’s and critical incidents affecting client. Determine which servers, service URL endpoints and third-party software was being monitored, and by which means. Focus on how key critical incidents were managed. Recommendations on addressing gaps in monitoring and logging were given, as well as recommendations on incident management analysis.

•Gathered information on system servers, monitoring tools, logging details, NFRs and critical incidents.

•Determined which system server architecture and software is being monitoring.

•Analyzed monitoring tools being used and output of messages and alerts.

•Study application logs and logging standards being used.

•Determined which service and third-party URLs open to system are being monitored.

•Gain understanding of critical incidents and possible root causes.

•Planned meetings for information sharing and gathering.

•Shared recommendations to address monitoring and alerting gaps found.

•Provided recommendations to address issues with logging outputs and standards in application code.

•Supplied recommendations on addressing management of critical incidents.

•Consolidated all information gathered and recommendations made into knowledge repository for clients to view

Fannie Mae

WFH

(10/2020 – 04/2021)

Site Reliability Engineer (SRE)

Problem Statement: Find the best Chaos Testing Framework that meets the needs of the client in as many aspects as possible. This includes finding a framework that can connect to multiple systems such as VMware, Kubernetes, Weblogic and Openshift, as well as having the chaos testing scenarios needed for complete coverage of infrastructure and application faults.

Solution: A paper based study was done on multiple chaos frameworks before a decision was made on the framework of choice, then actual deployment of framework was completed on client side systems. As an extra step, load tests were designed and ran during chaos scenario execution to better judge the impact of chaos implemented on systems, and system performance was monitored through command line commands and Jmeter graphs and data.

•Designed paper based study on multiple chaos frameworks, including analysis and ranking sheet.

•Determined framework of choice using paper based study analysis and client input.

•Deployed chosen framework on client systems and determined hosts and application to run chaos on.

•Designed chaos scenarios that needed to be ran on application and infrastructure systems.

•Customized and refined Jmeter scripts for load tests to be ran on systems during chaos testing.

•Implemented chaos scenarios on various systems and monitored system performance.

•Detailed findings and observations using Jmeter data and graphs.

•Recorded all data pertaining to POC into confluence pages for client use.

•Tracked user stories and story points using Jira.

Discover

Texas

(04/2019 – 04/2020)

Site Reliability Engineer (SRE)

Problem Statement: Collect data from daily logs and use that data to help alleviate the difficulties in identifying bottlenecks or trouble spots in the pipeline used to ingest and export the data being obtained.

Solution: Multiple dashboards were created that showed various data collected throughout the day, as well as doing aggregations, comparisons, and arithmetic operations

•Designed dashboards to fit customer needs and expectations.

•Utilized Kibana to output data using canvas and visualizations for graphs and dashboards.

•Customized logs with client to allow for broader sources of data to be shown in dashboards.

•Worked with client to improve and expand functionality of dashboards.

•Implemented dashboard to show HTTP error codes with counts and services with errors.

•Used Elastic Search Watcher to scan logs for keywords and send emails when errors were found.

•Leveraged Logstash grok filters to parse logs for select data and send to Kibana.

•Utilized Selenium Webdrivers and Java applications to create automated transferring of files across environments.

Cognizant

Virginia

(03/2019 – 03/2019)

Performance Engineer (SRE)

Problem Statement: Identify bottlenecks and resolve issues with slow load times and low load capacity

Solution: Perform performance testing, monitoring, and engineering in order to pinpoint and resolve bottlenecks in addition to optimizing code and allowing for user load growth.

•Executed Load, Endurance, Scalability, and Stress tests using JMeter and Blazemeter

•Employed Dynatrace to monitor CPU, Memory, Disk, and Network usage

•Implemented Dynatrace to track non-performant code needed for optimization and to track long SQL Query executions

•Located memory and thread issues using JVisualVM

Revature

Virginia

(02/2019 – 02/2019)

Performance Engineer (SRE)

Trivia Hero is a responsive full stack application Greek Mythology trivia game. The goal is to answer as many questions as you can before you reach zero health. Trivia Hero tracks your high score, compares your high score to other registered players and gives you a Rank Badge based on your high score bracket. The rate at which you gain XP is proportional to how often you play the game.

•Created a full stack Single Page Application using the Angular framework.

•Customized the layout and style using Bootstrap.

•Utilized Http Requests to access data from an API.

•Automated builds with Jenkins through the use of pipelines.

•Managed client to server calls through JDBC.

•Implemented Front Controller and DAO design patterns.

•Manipulated data in a database with SQL.

•Leveraged an Amazon Web Services RDS Database for use of permanent storage of data.

•Developed project with teammates using a DevOps mindset.

•Integrated Maven to manage Back-End application dependencies and enable build automation with the Jenkins automated server

Revature

Virginia

(02/2019 – 03/2019)

Performance Engineer (SRE)

TuningForce is a per batch project designed to collect data, diagnose bottlenecks, and recommend optimizations to increase overall application performance. Benchmark metrics are gathered for each key scenario in Project 2 at the maximum load, target load, and a 'peak hours' spike. With this data, an analysis of every key scenario is employed to discover why the applications perform in this manner as well as how the applications should be written / configured to improve performance. Objectively, there is a project that needs the most improvement from a performance perspective. Through deliberation the worst performing application is selected and recreated with the recommended optimizations implemented. This newly implemented project then endures the same set of initial performance tests and a deep investigation is conducted to determine the effectiveness of the approach and nature of the optimizations.

•Developed simulations of use cases to be tested to collect performance data using a workload model.

•Constructed test scripts using recorders in Jmeter and Loadrunner.

•Identified tests to be performed such as load tests, spike tests, and endurance tests.

•Ran performance tests using various performance testing tools including Jmeter, Dynatrace, and Loadrunner to monitor memory and CPU usage.

•Created scripts to perform tests and retrieve performance data on database interactions.

•Introduced Indexes to a Database to allow for faster, more efficient retrieval of data.

•Collected data on throughput, response time, latency, and transactions to evaluate performance.

•Employed Lighthouse to pinpoint client-side bottlenecks

• Constructed a Continuous Integration Pipeline on an EC2 using Jenkins, Maven, and Git

Revature

Virginia

(02/2019 – 02/2019)

Performance Engineer (SRE)

The Expense Reimbursement System (ERS) will manage the process of reimbursing employees for expenses incurred while on company time. All employees in the company can login and submit requests for reimbursement and view their past tickets and pending requests. Finance managers can log in and view all reimbursement requests and past history for all employees in the company. Finance managers are authorized to approve and deny requests for expense reimbursement.

•Designed an Employee Reimbursement System with HTML to allow employees to submit and review reimbursements.

•Utilized JavaScript and to allow for better interactions on the webpage.

•Managed the styling process with CSS and Bootstrap elements.

•Directed flow of information with the use of Java servlets.

•Implemented JDBC for connections to the database.

•Acquired and stored data using callable and prepared statements, functions, and procedures in SQL.

•Separated data from users using the Data Access Object design pattern.

•Communicated asynchronously between the front end and back end using AJAX calls.

•Stored data securely using Oracle database.

•Managed dependency management with Maven framework.

Contact this candidate