Muhammad Saad
Observability Engineer - Dynatrace Professional
****************@*****.***
https://www.linkedin.com/in/muhammad-saad-68a70135a About Me
Dedicated Observability Engineer with 3 years of experience specializing in Dynatrace, proficient in designing monitoring solutions to optimize system performance and reliability.Expert in utilizing Dynatrace features such as Multi-Dimensional Analysis, Management Zones, custom metrics, alerting, and SLOs to optimize application performance and ensure SLA compliance. Experienced in Dynatrace cluster management, administration and performance optimization. Passionate about delivering data-driven, scalable solutions to support business goals. Education
BS Software Engineering
University of the Punjab - FCIT
2019 – 2023
Professional Experience
Dynatrace Consultant
i2c Inc.
2023 – present
Key Responsibilities
Cluster Administration and Management
•Configured and managed Dynatrace Managed clusters, ensuring high availability, data integrity, and secure access across multiple environments
•Manage Dynatrace OneAgent deployment, configuration, and upgrades, troubleshooting connectivity and performance with backend diagnostics and the OneAgent Health Overview
•Managed and allocated Host Units, DDUs, AppSec Units (ASUs), and DEM units using the CMC and Dynatrace Account Management Portal, ensuring resource scaling aligned with the dynamic needs and performance requirements of applications
•Performed monthly Dynatrace Managed cluster upgrades, including version validation, impact analysis, and executing updates across all cluster nodes using the CMC
•Managed user access and authentication via LDAP and role-based access control (RBAC) to ensure secure and compliant cluster usage
•Monitored cluster health and resource utilization (CPU, memory, disk) using the CMC and Dynatrace Self-Monitoring features to maintain optimal performance of Dynatrace cluster components
•Configured Environment and Cluster ActiveGates for secure data routing between OneAgents, the cluster, and external integrations (e.g., Mission Control, DEM, AppSec modules) Reporting & Analytics
•Configured Dynatrace monitoring for Apache web servers to track key metrics like request throughput, response times, error rates, and traffic patterns
•Set up monitoring and alerting on JVM metrics such as GC count, throughput, and heap usage to optimize memory usage and performance for Java-based applications
•Instrumented and monitored custom services in Dynatrace by configuring entry points and request attributes for code-level tracing and performance analysis
•Implemented VMware infrastructure monitoring by integrating vCenter with Dynatrace to track VM performance, ESXi host health, and resource bottlenecks
•Built custom dashboards tailored for both executive-level overviews and deep-dive operational monitoring to support data-driven decision-making
•Defined and maintained Service-Level Objectives (SLOs) with integrated alerting for proactive incident detection
•Configured alerting profiles tailored to scenarios such as service degradation, log-based errors, synthetic test failures, and security events
•Developed and applied auto-tagging rules and management zones to segment environments, enforce team-based access, and enable scoped reporting
•Utilized Dynatrace AppSec for identifying, prioritizing, and reporting code-level and third-party vulnerabilities to relevant stakeholders
•Implemented Dynatrace Log Management, collecting logs across services, enabling log processing rules, and correlating logs with distributed traces for root cause analysis Internal Trainings & Dynatrace Platform Support
•Conducted in-depth APM module training for Service Delivery and Incident Management teams, empowering them to quickly identify root causes and resolve issues impacting mission- critical services
•Delivered hands-on Log Management module training to Product Operations teams, enabling effective issue tracking and faster correlation of client-reported problems at runtime
•Facilitated targeted AppSec module sessions for Engineering Security and SOC teams, focusing on the detection, mitigation, and remediation of third-party, code-level, and runtime vulnerabilities
•Collaborated with Dynatrace support via Dynatrace ONE portal, Health checks calls and Troubleshooting sessions to resolve technical issues at hand, ensuring timely troubleshooting and effective solutions for system performance challenges Projects
Dynatrace Cluster OS Migration
•Led the end-to-end migration of Dynatrace Managed cluster nodes from RHEL 7 to RHEL 9, ensuring zero data loss and zero downtime
•Planned and executed a rolling upgrade and node replacement strategy to maintain high availability and ensure cluster stability throughout the OS migration
•Collaborated closely with internal stakeholders and Dynatrace support to mitigate migration risks and align with vendor-recommended best practices
•Documented key migration steps, troubleshooting scenarios, and lessons learned to enhance the internal knowledge base and support future upgrade efforts Phase-wise Implementation of Dynatrace AppSec Modules
•In the first phase, enabled Runtime Vulnerability Analytics (RVA) for third-party vulnerability detection on selected hosts and process groups, prioritizing based on criticality and exposure. Upon successful validation, enabled RVA third-party detection globally to ensure broader coverage
•In the second phase, enabled RVA code-level vulnerability detection, starting with low-critical applications and gradually advancing to business-critical services to ensure controlled adoption
•Enabled Runtime Application Protection (RAP) in detection as well as blocking mode across application infrastructure to actively block critical code-level and third-party vulnerabilities in real time
•Collaboration with Cloud Applications and Service Delivery teams ensuring zero downtime, no response time and throughput degradation and zero transaction declines throughout the rollout Alert Noise Reduction
•Analyzed historical problem patterns and alert frequency across more than 100 monitored services to identify sources of alert fatigue
•Tuned Dynatrace anomaly detection settings and thresholds to minimize false positives without missing critical issues
•Configured problem suppression rules and maintenance windows for non-critical environments to reduce irrelevant alerts during deployments
•Achieved more than 50% reduction in alert volume while maintaining SLA compliance and improving mean-time-to-detect (MTTD) by 20%