Post Job Free
Sign in

Senior Data Customer Experience

Location:
United States Naval Academy, MD, 21402
Posted:
August 25, 2025

Contact this candidate

Resume:

Manaswini Kakumanu

Senior Data & AI/ML Engineer Data Engineering & Cloud Solutions Pentaho • PySpark • AWS

+1-903-***-**** *************@*****.***

PROFESSIONAL SUMMARY

• Senior Data & AI/ML Engineer with 9+ years of progressive experience in data engineering, ETL/ELT pipelines, and AI-driven analytics across consulting, healthcare, government, and financial domains.

• Proven expertise in ETL/ELT pipeline design using Pentaho Data Integration (PDI), PySpark, and Databricks, processing structured/unstructured datasets at enterprise scale.

• Strong command of AWS data services (S3, Redshift, EMR, EC2, Lambda, Glue) for cloud-native data storage, computation, and orchestration.

• Hands-on development of churn prediction, personalization, and network performance analysis use cases, leveraging data engineering pipelines for customer experience insights.

• Designed and optimized Pentaho workflows for ingestion, cleansing, and transformation, achieving 40% faster processing efficiency for large-scale telecom and healthcare datasets.

• Expert-level programming in Python and SQL, building reusable libraries for data wrangling, data quality checks, and workflow automation.

• Delivered real-time streaming data solutions with PySpark, Kafka, and AWS Kinesis for customer usage and billing events, enabling proactive insights.

• Collaborated with business stakeholders, marketing, and network engineering teams to deliver customer experience analytics, reducing churn rates through predictive modeling pipelines.

• Skilled in Tableau and Power BI, creating executive dashboards on churn, billing clarity, and customer experience KPIs, widely adopted by leadership teams.

• Implemented data quality frameworks in Pentaho and Databricks, ensuring completeness, accuracy, and lineage tracking across data pipelines.

• Established data governance practices aligned with industry standards (GDPR, HIPAA), ensuring secure handling of sensitive customer and billing data.

• Strong exposure to telecom customer experience use cases, including billing inquiry analytics, root cause analysis for network issues, and personalization of support interactions.

• Built scalable data lakes and marts integrating AWS S3 + Redshift + Pentaho ETL, reducing query latency for customer analytics by 35%.

• Mentored junior engineers on Pentaho, PySpark, and AWS best practices, fostering a culture of data- driven problem solving and technical excellence.

• Advocated Agile development methodologies, improving velocity of ETL enhancements and enabling continuous delivery of customer insights.

• Recognized for bridging technical and business needs by translating raw data into actionable insights that directly improved customer engagement and reduced churn. TECHNICAL SKILLS

Programming & Scripting:

Python, SQL, PL/SQL, PySpark, Java, Bash

ETL / Data Integration Tools:

Pentaho Data Integration (PDI/Kettle), Apache Spark, Databricks, Apache Kafka, AWS Glue, Azure Data Factory Cloud Platforms & Services:

AWS (S3, Redshift, EMR, EC2, Lambda, API Gateway, DynamoDB, SNS/SQS, CloudWatch), Azure Data Lake, GCP Vertex AI (exposure)

Databases & Storage:

PostgreSQL, MySQL, Oracle, SQL Server, MongoDB, Cassandra, Snowflake, Redis Data Engineering & Analytics:

ETL/ELT Pipeline Design, Data Warehousing, Data Lakes, Data Quality & Governance, Workflow Automation, Churn Analysis, Customer Experience Analytics

Big Data & Streaming:

PySpark, Databricks, Apache Spark, AWS Kinesis, Kafka, Hadoop Ecosystem Visualization & Reporting:

Tableau, Power BI, AWS QuickSight, Looker, Kibana

DevOps & Infrastructure:

Terraform, Docker, Kubernetes, Jenkins, GitHub Actions, GitLab CI/CD Security & Compliance:

IAM Policies, RBAC, Data Privacy & Compliance (GDPR, HIPAA, PCI-DSS), Risk Assessments Software Development Methodologies:

Agile, Scrum, Continuous Integration & Deployment (CI/CD), Test-Driven Development (TDD) WORK EXPERIENCE

Senior Data & AI/ML Engineer Enterprise Data Engineering Leadership Morgan Stanley – New York, NY Jan 2025 – Present

• Leading enterprise-scale data engineering initiatives across financial products, consolidating multiple legacy ETL platforms (including Pentaho) into cloud-native Databricks + PySpark pipelines.

• Architected hybrid ETL workflows combining Pentaho Data Integration with AWS-native services (S3, Redshift, Glue), ensuring seamless modernization without disrupting legacy processes.

• Built high-volume PySpark pipelines on Databricks to process trading, billing, and customer data, reducing analytics turnaround times by 40%.

• Partnering with business leaders to deliver customer experience analytics use cases including churn modeling, personalization, and service inquiry insights for wealth management clients.

• Established data quality and governance frameworks, embedding validation, lineage, and audit checkpoints into Pentaho and Databricks workflows.

• Created executive-level dashboards in Tableau and Power BI, providing insights on customer satisfaction, service usage, and billing performance for senior leadership.

• Oversaw multi-region AWS Redshift clusters and S3-based data lakes, ensuring high availability and compliance with PCI-DSS, GDPR, and SOX standards.

• Collaborated with data science teams to operationalize predictive analytics models for churn and fraud detection, integrating outputs into customer-facing workflows.

• Directed infrastructure automation efforts with Terraform and Kubernetes for scalable data engineering environments across development and production.

• Spearheaded Pentaho performance tuning initiatives, improving job execution reliability and reducing SLA violations by 20%.

• Provided technical mentorship to 10+ engineers on Databricks, Pentaho orchestration, and AWS cloud- native data engineering practices.

• Advocated for Agile and CI/CD best practices, implementing continuous testing, deployment automation, and faster release cycles for ETL enhancements.

• Partnered with cross-functional stakeholders (compliance, risk, product teams) to align data pipelines with enterprise governance and audit requirements.

• Regularly presented data engineering strategies to senior executives, aligning modernization roadmaps with Morgan Stanley’s digital transformation goals.

• Driving adoption of data-driven decision-making by delivering timely, high-quality datasets that empower analytics, reporting, and AI/ML initiatives. AI/ML Engineer Advanced Data Pipelines & Cloud Analytics ManTech, Herndon, Virginia, US Mar 2023 – Dec 2024

• Architected enterprise-scale ETL/ELT pipelines using PySpark and Databricks to process petabyte-scale datasets for government and healthcare customer experience platforms.

• Modernized legacy Pentaho pipelines into Databricks notebooks with reusable Python modules, reducing operational maintenance by 30%.

• Designed cloud-native data lakes on AWS (S3 + Redshift + EMR), supporting advanced analytics use cases such as churn prediction, billing clarity, and service personalization.

• Partnered with analysts and data scientists to deliver feature-ready datasets for churn models, personalization strategies, and network anomaly detection.

• Automated ETL job orchestration with AWS Step Functions and Airflow, ensuring reliable scheduling and dependency management across 500+ daily workflows.

• Developed real-time streaming pipelines with Kinesis and Spark Streaming, enabling proactive monitoring of billing inquiries and call center interactions.

• Implemented data governance and lineage frameworks, embedding validation, anomaly detection, and audit logging into pipelines for regulatory compliance.

• Built executive dashboards in Tableau and Power BI, visualizing customer churn trends, billing pain points, and customer support KPIs for decision-makers.

• Partnered with network engineers to analyze network telemetry data, correlating outages and poor QoS with churn likelihood, driving root cause fixes.

• Enhanced data quality processes with Python-based validation libraries, improving trust in analytics and reducing rework by 25%.

• Implemented security and compliance standards (HIPAA, GDPR, PCI-DSS) across AWS-hosted pipelines, ensuring safe handling of sensitive customer and billing data.

• Mentored junior engineers on Databricks performance tuning, PySpark optimization, and Pentaho orchestration best practices.

• Contributed to MLOps enablement, creating data prep pipelines for predictive churn models and deploying inference services via AWS Lambda.

• Collaborated with product teams in an Agile/Scrum environment, delivering increments of customer analytics pipelines aligned to business roadmaps.

• Improved operational resilience by deploying multi-region data replication and automated recovery strategies for high-availability data systems.

Data Engineer Applied AI & Customer Analytics

The Data Sherpas, San Francisco, US Sep 2021 – Feb 2023

• Built and optimized Databricks + PySpark pipelines processing terabytes of customer interaction and billing data, supporting churn prediction and network performance analytics.

• Designed ETL/ELT workflows in Pentaho and PySpark, integrating customer usage data from CRM, billing, and network monitoring systems into AWS S3 and Redshift.

• Partnered with product and marketing teams to develop data pipelines for personalization, enabling tailored offers and proactive customer support recommendations.

• Implemented real-time streaming ingestion using AWS Kinesis and PySpark for customer inquiry events, reducing response latency by 40%.

• Integrated speech-to-text and sentiment datasets into customer analytics pipelines, enhancing call center insights and churn detection accuracy.

• Built feature engineering frameworks in Databricks, enabling data scientists to train churn models and predictive billing classifiers at scale.

• Developed Tableau and Power BI dashboards to visualize churn trends, billing inquiry hotspots, and network outage impacts on customer experience.

• Enhanced data quality controls by embedding validation checks (completeness, anomaly detection, lineage tracking) directly into Pentaho pipelines.

• Partnered with network engineers to ingest network telemetry data, correlating outages with customer satisfaction declines, driving faster root-cause resolution.

• Migrated historical ETL workloads to cloud-native orchestration on AWS EMR + Step Functions, improving scalability and reducing operational cost by 25%.

• Built SQL-based semantic layers in Redshift/Snowflake to serve business-ready datasets for reporting and analytics.

• Automated CI/CD pipelines for Pentaho jobs and PySpark notebooks, ensuring faster deployments and higher reliability across dev/test/prod.

• Delivered customer churn insights that directly influenced retention strategies, reducing churn in pilot regions by 8% through targeted offers.

• Mentored junior data engineers on Databricks, Pentaho PDI, and AWS cloud data engineering practices.

• Advocated for Agile and data governance best practices, ensuring transparency, auditability, and alignment with business needs.

Senior Software Engineer Big Data & Cloud ETL

Persistent Systems, Pune (Client: Silver Spring, USA) Jan 2019 – Aug 2021

• Designed and optimized Pentaho Data Integration (PDI) pipelines to process millions of daily utility and customer billing records, improving data throughput by 35%.

• Migrated ETL workloads to AWS (S3, Redshift, EMR), leveraging PySpark for distributed transformations and reducing processing time for billing reconciliation jobs.

• Built Databricks-based data engineering frameworks, enabling advanced customer usage analytics and churn detection workflows.

• Engineered data quality validation layers in Pentaho and PySpark, embedding reconciliation, null checks, and audit logging into all pipelines.

• Partnered with business stakeholders to deliver customer experience use cases such as churn prediction, billing inquiry analytics, and personalized customer outreach.

• Implemented real-time streaming ingestion with Kafka + Spark Streaming for smart meter data, reducing latency in reporting from 2 hours to under 15 minutes.

• Developed SQL/Redshift models to enable regulatory reporting and usage trend analysis, improving compliance accuracy by 20%.

• Collaborated with data scientists to provision feature-ready datasets for churn models and customer segmentation initiatives.

• Automated ETL deployments using Jenkins CI/CD and AWS CloudFormation, ensuring consistent environments across dev, QA, and production.

• Integrated customer and billing pipelines with Tableau dashboards, providing leadership with real-time insights into billing disputes and customer churn risks.

• Defined IAM roles and S3 bucket policies to secure customer usage data and align with US energy data compliance standards.

• Acted as a bridge between network engineering and data teams, analyzing network performance data and highlighting areas impacting customer satisfaction.

• Conducted root cause analysis on failed ETL jobs, reducing recurring failures by 40% through improved error handling.

• Mentored junior developers on Pentaho transformations, PySpark performance tuning, and AWS best practices.

• Actively contributed to Agile sprint ceremonies, delivering increments of ETL enhancements in close collaboration with client stakeholders.

Software Developer ETL & Databases

Cognizant, Bengaluru, India Aug 2016 – Dec 2018

• Designed and implemented ETL workflows for healthcare and financial clients, integrating multiple source systems into centralized Oracle and SQL Server warehouses.

• Developed stored procedures, triggers, and SQL optimization scripts, reducing query execution times by up to 30% for reporting workloads.

• Built Python- and Java-based automation scripts to validate and reconcile ETL outputs, ensuring accuracy across large-volume data transfers.

• Collaborated with data architects to model OLTP and OLAP schemas, enabling downstream analytics for business intelligence teams.

• Supported early adoption of Pentaho/Kettle-based ETL jobs, gaining experience in data extraction, transformation, and scheduling.

• Enhanced data quality validation frameworks, implementing checks for completeness, consistency, and referential integrity across ETL pipelines.

• Partnered with QA teams to build unit and regression test harnesses for ETL processes, reducing production defects.

• Developed REST API integrations for feeding processed data into client reporting applications.

• Contributed to data governance initiatives, documenting lineage, workflows, and transformation logic for regulatory compliance.

• Participated in Agile/Scrum ceremonies, assisting with sprint planning and backlog refinement for ETL enhancement projects.

• Delivered ad hoc reporting solutions in SQL and Power BI for internal stakeholders.

• Provided production support (L2/L3) for ETL jobs, ensuring SLA adherence and minimizing downtime.

• Collaborated with senior engineers to plan incremental migration of workloads to cloud storage (AWS S3) for archival and reporting use cases.

• Actively engaged in knowledge-sharing sessions, mentoring peers on SQL optimization and ETL best practices.

EDUCATION

• Shri Vishnu Engineering college for Women

BTech, Computer Science and Engineering

Andhra Pradesh, India



Contact this candidate