Suman Madhavaram
Mobile: 762-***-**** Email: ************@*****.***
LinkedIn: https://www.linkedin.com/in/suman-madhavaram-b77674148/ Summary
Big Data Engineer with strong expertise in Hadoop ecosystems, distributed data processing platforms, and Linux systems. Adept at supporting mission critical, highly available data environments and collaborating with cross functional teams to deliver complex Big Data solutions. Skilled in troubleshooting, programming, automation, and communicating technical concepts clearly to stakeholders at all levels. Technical Skills
Platforms & Big Data: Hadoop (HDFS,MinIO Hive, HBase, Impala, Kafka, Spark, Sqoop, Oozie, Flume, Kudu, Hue, Ranger, Zookeeper,Nifi, KMS, Cloudera, Hortonworks,AWS,Azure,Ambari, Amazon EMR, Open Source Hadoop)
Cloud & Virtualization: AWS (EMR, Glue, Athena, Lake Formation, MSK,Datasync, Lambda, EKS), Docker, Kubernetes, VMware
Automation & Monitoring: Terraform, Ansible, Chef, Grafana, Azure Pipelines, ServiceNow, JIRA, Java,Shell/Bash, Python, ELK Stack
Security: Kerberos, Ranger, Sentry, SSL/TLS, LDAP, Ranger KMS CI/CD & Orchestration: Airflow, Azure DevOps Pipelines Databases: MySQL, IBM DB2
OS & Infra: RedHat Linux (7/8/9), SAN/NAS, RAID, LVM, DNS, NTP, DHCP, FTP, Kerberos KDC
Version Control: Git,Gitlab
Professional Experience
Barclays(Contract) - Platform Engineer
Location: Whippany, NJ Duration: Jan 2026 – Present
- Reduced Mean Time to Market (MTTM) for new features by 75% by building a self-service Internal Developer Platform (IDP) on top of Kubernetes.
- Managed 10+ production-grade clusters across AWS (EKS), Azure (AKS), and On-prem
(Kubeadm), ensuring 99.99% availability for 500+ microservices.
- Implemented multi-master (Control Plane) redundancy and cross-zone node groups to eliminate single points of failure.
- Strengthened cluster security by implementing strict Role-Based Access Control (RBAC) and integrating AWS IAM roles for service accounts (IRSA).
- Building clusters and installing the Hadoop services in Opensource distribution.
- Expertise in Hadoop Cluster capacity planning & build new cluster for kafka and HDP.
- Experience in automating open source Hadoop platform using ansible playbooks.
- Working on setting up CICD pipelines for airflow dag code updates
- Initiating Databricks in Barclays
- Secured the cluster using HA configuration, Kerberos and Ranger, Ranger KMS & SSL.
- Managed performance tuning, cluster resources management, Job schedulers and resource pool allocations for clusters; troubleshot and analyzed on job failures.
- Experience in working with Linux team during OS Patching, Package installation and upgrades.
- Experience in deploying Amazon EMR long running and transient clusters using Terraform scripts.
Amazon Web Services – Technical Account Manager
Location: Cupertino, CA Duration: April 2025 – Jan 2026
- Built and managed multiple AWS EMR & EKS clusters using Terraform.
- Built pipeline using datasync and airflow copy data nd metadata from onprem to cloud.
- Created data pipelines to using glue and lambda .
- Orchestrated the migration of legacy microservices to Kubernetes.
- Reduced Mean Time to Market (MTTM) for new features by 75% by building a self-service Internal Developer Platform (IDP) on top of Kubernetes.
- Deployed and managed 10+ Kubernetes EKS clusters across hybrid-cloud environments. Morgan Stanley (Contract) – Platform Engineer
Location: Alpharetta, GA Duration: Oct 2024 – Mar 2025
- Maintained, monitored, and troubleshot Zookeeper, HDFS, YARN, Spark, Impala, Kafka clusters, ensuring 99.9% uptime.
- Built & managed AWS EMR/EKS clusters with Terraform, supporting both transient and long-running workloads.
- Optimized application performance by fine-tuning Resource Requests & Limits, significantly reducing OOMKilled errors and CPU throttling.
- Designed Airflow-based pipelines for orchestration, replacing legacy schedulers.
- Automated incident detection and recovery workflows, reducing MTTR by 55%.
- Centralized cluster-wide logging using the ELK Stack (Elasticsearch, Logstash, Kibana) or Fluentd for real-time debugging.
- Partnered with developers to enhance observability and job reliability (Spark, Hive, Kafka).
- Managed data replication across DCs, ensuring disaster recovery readiness.
- Strong hands-on with Kerberos, LDAP, TLS integration for secure cluster operations.
- Built automated infrastructure management scripts using Python/Ansible
- Implemented distributed data processing workflows on Hadoop/Spark. Infosys – Lead Consultant (Hadoop SRE)
Location: Pune, India Duration: Sept 2023 – Jul 2024
- Managed 50+ node clusters, ensuring performance and stability.
- Designed capacity planning strategies for Kafka & HDP clusters.
- Automated Hadoop provisioning with Ansible playbooks & Terraform.
- Implemented Airflow-based orchestration & CI/CD pipelines for DAGs.
- Tuned cluster performance and implemented secure HA environments with Kerberos & Ranger.
- Collaborated with Linux team for OS patching and infra upgrades. Danske Bank – Chief Platform Engineer
Location: Bangalore, India Duration: Aug 2021 – Aug 2023
- Led cluster maintenance, hardware configuration, and node lifecycle management.
- Migrated & upgraded Cloudera clusters using Cloudera Manager tools.
- Designed real-time data pipelines with Spark, PySpark, and Kafka.
- Strengthened resilience and reliability with monitoring, scaling, and incident playbooks.
- Enhanced Airflow orchestration for ETL pipelines, improving reliability and reducing failures.
Barclays Bank – Hadoop Dataware Engineer
Location: Pune, India Duration: Apr 2018 – Jul 2021
- Installed, secured, and tuned Cloudera clusters, ensuring HA.
- Automated upgrades with Chef, reducing downtime and manual errors.
- Implemented Kerberized Kafka for data security.
- Developed internal SRE tooling with Django/Python for cluster management.
- Led BDR (Backup & Disaster Recovery) automation for Hadoop.
- Mentored a team of 4, enforcing SRE best practices (change management, incident response, SLAs).
Deutsche Bank – Software Associate
Location: Pune, India Duration: Oct 2016 – Mar 2018
- Executed cluster upgrades & patching with minimal downtime.
- Configured Oozie workflows, Spark Streaming, Kafka pipelines.
- Ensured data resilience with BDR, snapshots, and replication strategies.
- Applied SRE automation to job recovery, reducing failure impact. Barclays Bank – IT Analyst
Location: Pune, India Duration: May 2008 – Sep 2016
- Delivered 40+ projects, installing, upgrading, and securing clusters.
- Automated infra setup with Chef & Shell scripting.
- Developed Environment Portal (Perl, HTML, JavaScript, CSS, CGI) for automation & reporting.
- Ensured BCM disaster recovery readiness as BCM representative. Artech Info Systems – Unix Administrator
Location: Pune, India Duration: Aug 2007 – Apr 2008
- Managed Linux/Unix infra reliability, automation, backups, and monitoring.
- Delivered end-to-end infra support, ensuring system availability & security. Education
Master of Computer Applications – Anna University, Chennai (2006) Bachelor of Computer Applications – SV University, Tirupati (2003)