Post Job Free
Sign in

Hadoop Production Support Engineer (4+ yrs)

Location:
Arlington, TX
Posted:
March 01, 2026

Contact this candidate

Resume:

LAXMI NANDITHA BHASHINI POTHULA

Arlington, TX +1-312-***-**** ***********@*****.*** https://www.linkedin.com/in/nandithaa/ PROFESSIONAL SUMMARY

Hadoop Application Production Support Engineer with 4 years of experience supporting large-scale Big Data platforms in production environments. Strong hands-on expertise in Linux administration, Cloudera Hadoop ecosystem operations, AutoSys scheduling, and incident management including root cause analysis and remediation. Experienced improving monitoring and alerting, automating recurring operational tasks using Bash and Python, and partnering with application and infrastructure teams to support deployments, configuration changes, and 24x7 on-call coverage. Proven track record supporting enterprise risk controls, audit readiness, and knowledge documentation.

TECHNICAL SKILLS

• Hadoop and Big Data: Cloudera CDH, HDFS, YARN, Hive, Impala, Spark, MapReduce, HBase, Zookeeper, Kafka, Kerberos, HA DR

• Linux and Systems: Linux/Unix(RHEL,CentOS), Windows Server, patching, upgrades, performance tuning, disk mounting, OS rebuild support

• Scheduling and Automation: AutoSys, PUMA, Bash, Python, PowerShell, job monitoring, reruns, replication workflows, batch scheduling

• Monitoring and Observability: Dynatrace, ITRS, Splunk, Grafana, AppDynamics, alert triage, dashboard monitoring

• Incident and Change: Incident management, RCA, MIM calls, SLA adherence, change validation, runbooks, SOPs

• Networking: TCP IP, DNS, VLAN, VPN, connectivity troubleshooting

• Data and SQL: SQL, MySQL, Hue, Anaconda, validation queries

• Collaboration: Cross-functional support, knowledge transfer, documentation, stakeholder updates PROFESSIONAL EXPERIENCE

Kersia USA

IT Support Engineer Nov 2025 - Present

• Administer Azure, Microsoft 365, and enterprise systems supporting identity and endpoint compliance requirements

• Troubleshoot Linux and Windows issues including authentication, connectivity, and performance incidents

• Engineered secure SFTP integrations in Oracle Cloud Infrastructure using SSH key authentication and encryption standards for third-party file exchange

• Coordinated infrastructure changes and documented operational procedures for support readiness

• Managed Veeam backup and replication processes and supported recovery validation

• Installed switches and configured VLAN setup for new acquisition network onboarding

• Migrated on-prem file shares to SharePoint and OneDrive with post-migration validation, access testing, and data integrity checks Basey Insurance

Systems and Analytics Engineer Jun 2025 - Nov 2025

• Supported Hortonworks (HDP) Hadoop clusters in production by monitoring service health using Dynatrace dashboards and alerts, triaging incidents, and coordinating escalations

• Executed patching and maintenance activities across Linux nodes and Hadoop services with structured validation testing and post-change verification

• Developed SQL and Hive-based validation queries to support downstream operational workflows and ensure data accuracy

• Automated recurring cluster monitoring and health-check tasks using PowerShell, Bash, and Python to reduce manual intervention

• Leveraged Dynatrace for JVM monitoring, application performance analysis, and root cause investigation of service slowdowns and job failures within Hadoop environments

• Supported release readiness activities including configuration updates, smoke testing, and production documentation updates Infosys Ltd

Systems Engineer (Client: Citi Bank) Nov 2021 - Aug 2023

• Supported enterprise Hadoop production platforms across 50+ clusters and 16K+ Linux servers, ensuring high availability (HA/DR) and continuous operations in a regulated banking environment.

• Provided 24x7 production support with on-call rotation, triaging alerts, restoring services, and driving incident resolution for P1/P2 outages impacting critical applications and downstream data processing.

• Monitored platform health using Dynatrace and enterprise observability tools, tracking service metrics, node performance, JVM behavior, and resource bottlenecks; initiated remediation for anomalies before business impact.

• Managed and monitored Hadoop ecosystem services including HDFS, YARN, Hive, Impala, Spark, MapReduce, HBase, ZooKeeper, Kafka; performed service restarts, config tuning, and post-change validation.

• Performed Linux administration on production nodes including log analysis, disk and mount troubleshooting, CPU and memory diagnostics, network connectivity checks, patching support, and coordination of OS rebuilds with data center teams.

• Led structured incident management and root cause analysis (RCA) by correlating Dynatrace traces, logs, cluster metrics, and application patterns; documented findings and preventive actions to reduce repeat incidents.

• Diagnosed and resolved common production issues including YARN resource contention, HDFS replication, NameNode/JournalNode health issues, Spark job failures, MapReduce errors, and Impala admission control and pool contention.

• Configured and optimized YARN queues, scheduler policies, and Impala pools, aligning resource allocation with workload priority to stabilize peak-hour processing and improve cluster utilization.

• Implemented and maintained HA configurations for NameNodes, ResourceManagers, and HBase Masters; managed metadata backups, snapshots, and recovery runbooks to support audit and resiliency requirements.

• Executed weekend maintenance and upgrade activities including CDH upgrades (6.x.x to 7.1.8), JDK upgrades, Cloudera Manager upgrades, node patching, and cluster tuning with rollback planning and validation testing.

• Administered and supported AutoSys scheduling, troubleshooting and repairing failed jobs, managing reruns, dependencies, and replication workflows for batch pipelines and downstream feeds in production.

• Automated recurring operational tasks using Bash and Cron scripts, including log parsing, service health checks, disk utilization monitoring, job status reporting, and standardized validation checks.

• Supported application deployments and configuration changes by partnering with development teams, reviewing runbooks, validating configs, coordinating release windows, and performing smoke tests after deployments.

• Ensured compliance with enterprise standards by following controlled change processes, maintaining accurate documentation, and supporting risk controls, audit evidence, and operational readiness.

• Created and maintained detailed SOPs, troubleshooting guides, and knowledge base articles for recurring incidents, improving team consistency and reducing escalation dependency.

• Delivered knowledge transfer (KT) sessions for new team members on Hadoop operations, incident troubleshooting, AutoSys scheduling, and Dynatrace-based monitoring practices. Junior Systems Engineer Aug 2021 - Oct 2021

• Assisted with Linux system monitoring, log review, disk checks, and initial incident triage

• Supported patching and performed pre and post change validation checks

• Documented troubleshooting steps and supported ticket-based incident workflows

• Completed training in SQL, Python, DB2, Unix Linux, networking, SDLC, Agile, and ITIL processes EDUCATION

University of Texas at Arlington Aug 2023 - May 2025 Master of Science, Data Science

• GPA: 3.7

• Coursework: Cloud Computing, Database Management, Software Engineering, Data Mining



Contact this candidate