Application Production Support Engineer Hadoop Expert

Location:

Plano, TX

Posted:

April 04, 2026

Contact this candidate

Resume:

Roshini Chalarapu

Plano, TX +1-928-***-**** **********@*****.***

Professional Summary

Results-driven Application Production Support Engineer with 3+ years of enterprise experience monitoring and maintaining large-scale distributed systems leveraging Hadoop, Linux, Autosys, Spectrum Grid, ITRS, and Dynatrace. Proven expertise in incident management, root cause analysis, and 24x7 production support operations – maintaining 95% SLA compliance across 2,000+ monthly tickets supporting 8,000+ global users. Strong track record of automating recurring operational tasks, improving monitoring and alerting frameworks, collaborating with development teams on application deployments, and delivering thorough documentation aligned with enterprise risk controls and audit requirements. Technical Skills

Big Data & Distributed Systems: Hadoop (HDFS, YARN, MapReduce), Hive, Spark, Distributed Cluster Monitoring, Data Pipeline Support

Job Scheduling & Automation: Autosys, Spectrum Grid, Automated Job Monitoring, Recurring Task Automation, Workflow Optimization, PowerShell, Python

Monitoring & Observability: Dynatrace, ITRS (Geneos), Application Performance Monitoring (APM), Alerting Framework Design, Log Analysis, Real-Time Incident Detection Operating Systems: Linux (RHEL/CentOS – shell scripting, cron jobs, process management, log tailing), Windows 10/11

Incident Management: Root Cause Analysis (RCA), P1/P2 Incident Response, ITIL Change/Incident/Problem Management, 24x7 On-Call Support, SLA Compliance

Application Support: Production Deployment Support, Configuration Management, Application Troubleshooting, Post-Deployment Validation

Compliance & Documentation: Enterprise Risk Controls, Audit Requirements, SOP Authoring, KB Article Development, Runbook Creation

ITSM & Collaboration: ServiceNow, BMC Remedy, Jira, Azure AD / Entra ID, Active Directory, Microsoft 365 Work Experience

Elevance Health Application Production Support Engineer May 2024 – Present

– Monitored and maintained enterprise production systems leveraging Hadoop, Linux, Autosys, Dynatrace, and ITRS

– performed real-time health checks, job scheduling oversight, and distributed system performance monitoring to ensure 99.9% application availability across a 500+ user enterprise environment.

– Executed end-to-end incident management for production application failures – performed structured root cause analysis on Hadoop cluster issues, Autosys job failures, and Linux process anomalies; resolved 2,000+ monthly tickets maintaining 95% SLA compliance and 90% first-call resolution rate.

– Automated recurring operational tasks using PowerShell and Linux shell scripts – eliminated manual monitoring overhead, improved Autosys job alerting frameworks, and reduced per-incident response time by 40% through proactive automation of routine production support workflows.

– Collaborated with development and infrastructure teams to support application deployments and configuration changes – performed pre- and post-deployment validation, coordinated Spectrum Grid job scheduling updates, and ensured zero-impact production releases aligned with enterprise change management standards.

– Ensured strict compliance with enterprise risk controls, audit requirements, and regulatory standards – maintained detailed incident logs, authored runbooks and SOPs for Hadoop and Linux production environments, and delivered audit-ready documentation aligned with ITIL governance frameworks.

– Led critical production outage response impacting 500+ concurrent users – applied structured RCA across Hadoop, Dynatrace, and ITRS monitoring data; coordinated cross-functional vendor and stakeholder communication and restored full application operations within 45 minutes; delivered post-incident remediation documentation to prevent recurrence.

Tata Consultancy Services Application Production Support Analyst June 2021 – July 2022

– Provided 24x7 production support for distributed enterprise applications across Linux and Windows environments – monitored Hadoop cluster health, Autosys scheduled job execution, and application performance via Dynatrace and ITRS dashboards for 8,000+ global users; maintained strict SLA compliance across high-volume support queues.

– Performed root cause analysis on recurring production incidents including Hadoop job failures, Linux process crashes, Autosys scheduling conflicts, and application connectivity issues – utilized ServiceNow and Jira ticket analytics to identify systemic failure patterns and implement preventive monitoring controls.

– Developed Linux shell scripts and PowerShell automation to streamline recurring production support tasks – automated log parsing, job status alerting, and system health reporting; reduced manual operational effort and improved monitoring framework coverage across distributed application environments.

– Collaborated with application development teams to support deployment pipelines and configuration changes – validated Spectrum Grid job configurations, executed post-deployment smoke testing, and ensured application stability across production environments following release activities.

– Maintained comprehensive production support documentation including troubleshooting runbooks, incident post-mortems, and SOPs – ensured knowledge sharing across the support organization and drove 20% improvement in first-contact resolution for top-volume production incident categories. Projects

Production Monitoring & Alerting Framework Enhancement Dynatrace, ITRS, Autosys, Linux, ServiceNow

– Redesigned production monitoring and alerting framework using Dynatrace and ITRS – configured custom application performance dashboards, automated threshold-based alerting for Hadoop cluster metrics and Autosys job failures, and reduced mean time to detect (MTTD) production incidents by 35%.

– Developed Linux shell scripts and PowerShell automation to eliminate recurring manual monitoring tasks – integrated automated log tailing, job health checks, and incident ticket creation in ServiceNow; improved operational efficiency and reduced alert fatigue across the 24x7 production support team. Hadoop Production Support & Operational Runbook Program Hadoop, Linux, Autosys, Spectrum Grid, ITIL

– Established comprehensive Hadoop and Linux production support runbook library – documented HDFS, YARN, and MapReduce troubleshooting procedures, Autosys job failure resolution steps, and Spectrum Grid scheduling best practices; reduced onboarding time for new support engineers and improved incident resolution consistency.

– Led continuous improvement initiative to strengthen enterprise risk controls and audit compliance – implemented structured post-incident review process, authored audit-ready documentation aligned with enterprise governance standards, and drove 25% reduction in repeat production incidents across distributed application environments. Certifications

SC-Completed: 900: Microsoft CompTIA Security, A+ Compliance CompTIA & Security+ Identity Fundamentals Microsoft 365 Fundamentals (MS-900) ITIL 4 Foundation In Progress: AZ-104: Microsoft Azure Administrator CompTIA Network+ Cloudera Hadoop Administrator (CCA) Education

Northern Arizona University MS, Computer Science August 2022 – May 2024 JNTU Hyderabad BTech, Information Technology August 2018 – May 2022

Contact this candidate