Manaswini Kakumanu
Senior Data & AI/ML Engineer Data Engineering & Cloud Solutions Pentaho • PySpark • AWS
+1-903-***-**** *************@*****.***
PROFESSIONAL SUMMARY
• Senior Data & AI/ML Engineer with 9+ years of progressive experience in data engineering, ETL/ELT pipelines, and AI-driven analytics across consulting, healthcare, government, and financial domains.
• Proven expertise in ETL/ELT pipeline design using Pentaho Data Integration (PDI), PySpark, and Databricks, processing structured/unstructured datasets at enterprise scale.
• Strong command of AWS data services (S3, Redshift, EMR, EC2, Lambda, Glue) for cloud-native data storage, computation, and orchestration.
• Hands-on development of churn prediction, personalization, and network performance analysis use cases, leveraging data engineering pipelines for customer experience insights.
• Designed and optimized Pentaho workflows for ingestion, cleansing, and transformation, achieving 40% faster processing efficiency for large-scale telecom and healthcare datasets.
• Expert-level programming in Python and SQL, building reusable libraries for data wrangling, data quality checks, and workflow automation.
• Delivered real-time streaming data solutions with PySpark, Kafka, and AWS Kinesis for customer usage and billing events, enabling proactive insights.
• Collaborated with business stakeholders, marketing, and network engineering teams to deliver customer experience analytics, reducing churn rates through predictive modeling pipelines.
• Skilled in Tableau and Power BI, creating executive dashboards on churn, billing clarity, and customer experience KPIs, widely adopted by leadership teams.
• Implemented data quality frameworks in Pentaho and Databricks, ensuring completeness, accuracy, and lineage tracking across data pipelines.
• Established data governance practices aligned with industry standards (GDPR, HIPAA), ensuring secure handling of sensitive customer and billing data.
• Strong exposure to telecom customer experience use cases, including billing inquiry analytics, root cause analysis for network issues, and personalization of support interactions.
• Built scalable data lakes and marts integrating AWS S3 + Redshift + Pentaho ETL, reducing query latency for customer analytics by 35%.
• Mentored junior engineers on Pentaho, PySpark, and AWS best practices, fostering a culture of data- driven problem solving and technical excellence.
• Advocated Agile development methodologies, improving velocity of ETL enhancements and enabling continuous delivery of customer insights.
• Recognized for bridging technical and business needs by translating raw data into actionable insights that directly improved customer engagement and reduced churn. TECHNICAL SKILLS
Programming & Scripting:
Python, SQL, PL/SQL, PySpark, Java, Bash
ETL / Data Integration Tools:
Pentaho Data Integration (PDI/Kettle), Apache Spark, Databricks, Apache Kafka, AWS Glue, Azure Data Factory Cloud Platforms & Services:
AWS (S3, Redshift, EMR, EC2, Lambda, API Gateway, DynamoDB, SNS/SQS, CloudWatch), Azure Data Lake, GCP Vertex AI (exposure)
Databases & Storage:
PostgreSQL, MySQL, Oracle, SQL Server, MongoDB, Cassandra, Snowflake, Redis Data Engineering & Analytics:
ETL/ELT Pipeline Design, Data Warehousing, Data Lakes, Data Quality & Governance, Workflow Automation, Churn Analysis, Customer Experience Analytics
Big Data & Streaming:
PySpark, Databricks, Apache Spark, AWS Kinesis, Kafka, Hadoop Ecosystem Visualization & Reporting:
Tableau, Power BI, AWS QuickSight, Looker, Kibana
DevOps & Infrastructure:
Terraform, Docker, Kubernetes, Jenkins, GitHub Actions, GitLab CI/CD Security & Compliance:
IAM Policies, RBAC, Data Privacy & Compliance (GDPR, HIPAA, PCI-DSS), Risk Assessments Software Development Methodologies:
Agile, Scrum, Continuous Integration & Deployment (CI/CD), Test-Driven Development (TDD) WORK EXPERIENCE
Senior Data & AI/ML Engineer Enterprise Data Engineering Leadership Morgan Stanley – New York, NY Jan 2025 – Present
• Leading enterprise-scale data engineering initiatives across financial products, consolidating multiple legacy ETL platforms (including Pentaho) into cloud-native Databricks + PySpark pipelines.
• Architected hybrid ETL workflows combining Pentaho Data Integration with AWS-native services (S3, Redshift, Glue), ensuring seamless modernization without disrupting legacy processes.
• Built high-volume PySpark pipelines on Databricks to process trading, billing, and customer data, reducing analytics turnaround times by 40%.
• Partnering with business leaders to deliver customer experience analytics use cases including churn modeling, personalization, and service inquiry insights for wealth management clients.
• Established data quality and governance frameworks, embedding validation, lineage, and audit checkpoints into Pentaho and Databricks workflows.
• Created executive-level dashboards in Tableau and Power BI, providing insights on customer satisfaction, service usage, and billing performance for senior leadership.
• Oversaw multi-region AWS Redshift clusters and S3-based data lakes, ensuring high availability and compliance with PCI-DSS, GDPR, and SOX standards.
• Collaborated with data science teams to operationalize predictive analytics models for churn and fraud detection, integrating outputs into customer-facing workflows.
• Directed infrastructure automation efforts with Terraform and Kubernetes for scalable data engineering environments across development and production.
• Spearheaded Pentaho performance tuning initiatives, improving job execution reliability and reducing SLA violations by 20%.
• Provided technical mentorship to 10+ engineers on Databricks, Pentaho orchestration, and AWS cloud- native data engineering practices.
• Advocated for Agile and CI/CD best practices, implementing continuous testing, deployment automation, and faster release cycles for ETL enhancements.
• Partnered with cross-functional stakeholders (compliance, risk, product teams) to align data pipelines with enterprise governance and audit requirements.
• Regularly presented data engineering strategies to senior executives, aligning modernization roadmaps with Morgan Stanley’s digital transformation goals.
• Driving adoption of data-driven decision-making by delivering timely, high-quality datasets that empower analytics, reporting, and AI/ML initiatives. AI/ML Engineer Advanced Data Pipelines & Cloud Analytics ManTech, Herndon, Virginia, US Mar 2023 – Dec 2024
• Architected enterprise-scale ETL/ELT pipelines using PySpark and Databricks to process petabyte-scale datasets for government and healthcare customer experience platforms.
• Modernized legacy Pentaho pipelines into Databricks notebooks with reusable Python modules, reducing operational maintenance by 30%.
• Designed cloud-native data lakes on AWS (S3 + Redshift + EMR), supporting advanced analytics use cases such as churn prediction, billing clarity, and service personalization.
• Partnered with analysts and data scientists to deliver feature-ready datasets for churn models, personalization strategies, and network anomaly detection.
• Automated ETL job orchestration with AWS Step Functions and Airflow, ensuring reliable scheduling and dependency management across 500+ daily workflows.
• Developed real-time streaming pipelines with Kinesis and Spark Streaming, enabling proactive monitoring of billing inquiries and call center interactions.
• Implemented data governance and lineage frameworks, embedding validation, anomaly detection, and audit logging into pipelines for regulatory compliance.
• Built executive dashboards in Tableau and Power BI, visualizing customer churn trends, billing pain points, and customer support KPIs for decision-makers.
• Partnered with network engineers to analyze network telemetry data, correlating outages and poor QoS with churn likelihood, driving root cause fixes.
• Enhanced data quality processes with Python-based validation libraries, improving trust in analytics and reducing rework by 25%.
• Implemented security and compliance standards (HIPAA, GDPR, PCI-DSS) across AWS-hosted pipelines, ensuring safe handling of sensitive customer and billing data.
• Mentored junior engineers on Databricks performance tuning, PySpark optimization, and Pentaho orchestration best practices.
• Contributed to MLOps enablement, creating data prep pipelines for predictive churn models and deploying inference services via AWS Lambda.
• Collaborated with product teams in an Agile/Scrum environment, delivering increments of customer analytics pipelines aligned to business roadmaps.
• Improved operational resilience by deploying multi-region data replication and automated recovery strategies for high-availability data systems.
Data Engineer Applied AI & Customer Analytics
The Data Sherpas, San Francisco, US Sep 2021 – Feb 2023
• Built and optimized Databricks + PySpark pipelines processing terabytes of customer interaction and billing data, supporting churn prediction and network performance analytics.
• Designed ETL/ELT workflows in Pentaho and PySpark, integrating customer usage data from CRM, billing, and network monitoring systems into AWS S3 and Redshift.
• Partnered with product and marketing teams to develop data pipelines for personalization, enabling tailored offers and proactive customer support recommendations.
• Implemented real-time streaming ingestion using AWS Kinesis and PySpark for customer inquiry events, reducing response latency by 40%.
• Integrated speech-to-text and sentiment datasets into customer analytics pipelines, enhancing call center insights and churn detection accuracy.
• Built feature engineering frameworks in Databricks, enabling data scientists to train churn models and predictive billing classifiers at scale.
• Developed Tableau and Power BI dashboards to visualize churn trends, billing inquiry hotspots, and network outage impacts on customer experience.
• Enhanced data quality controls by embedding validation checks (completeness, anomaly detection, lineage tracking) directly into Pentaho pipelines.
• Partnered with network engineers to ingest network telemetry data, correlating outages with customer satisfaction declines, driving faster root-cause resolution.
• Migrated historical ETL workloads to cloud-native orchestration on AWS EMR + Step Functions, improving scalability and reducing operational cost by 25%.
• Built SQL-based semantic layers in Redshift/Snowflake to serve business-ready datasets for reporting and analytics.
• Automated CI/CD pipelines for Pentaho jobs and PySpark notebooks, ensuring faster deployments and higher reliability across dev/test/prod.
• Delivered customer churn insights that directly influenced retention strategies, reducing churn in pilot regions by 8% through targeted offers.
• Mentored junior data engineers on Databricks, Pentaho PDI, and AWS cloud data engineering practices.
• Advocated for Agile and data governance best practices, ensuring transparency, auditability, and alignment with business needs.
Senior Software Engineer Big Data & Cloud ETL
Persistent Systems, Pune (Client: Silver Spring, USA) Jan 2019 – Aug 2021
• Designed and optimized Pentaho Data Integration (PDI) pipelines to process millions of daily utility and customer billing records, improving data throughput by 35%.
• Migrated ETL workloads to AWS (S3, Redshift, EMR), leveraging PySpark for distributed transformations and reducing processing time for billing reconciliation jobs.
• Built Databricks-based data engineering frameworks, enabling advanced customer usage analytics and churn detection workflows.
• Engineered data quality validation layers in Pentaho and PySpark, embedding reconciliation, null checks, and audit logging into all pipelines.
• Partnered with business stakeholders to deliver customer experience use cases such as churn prediction, billing inquiry analytics, and personalized customer outreach.
• Implemented real-time streaming ingestion with Kafka + Spark Streaming for smart meter data, reducing latency in reporting from 2 hours to under 15 minutes.
• Developed SQL/Redshift models to enable regulatory reporting and usage trend analysis, improving compliance accuracy by 20%.
• Collaborated with data scientists to provision feature-ready datasets for churn models and customer segmentation initiatives.
• Automated ETL deployments using Jenkins CI/CD and AWS CloudFormation, ensuring consistent environments across dev, QA, and production.
• Integrated customer and billing pipelines with Tableau dashboards, providing leadership with real-time insights into billing disputes and customer churn risks.
• Defined IAM roles and S3 bucket policies to secure customer usage data and align with US energy data compliance standards.
• Acted as a bridge between network engineering and data teams, analyzing network performance data and highlighting areas impacting customer satisfaction.
• Conducted root cause analysis on failed ETL jobs, reducing recurring failures by 40% through improved error handling.
• Mentored junior developers on Pentaho transformations, PySpark performance tuning, and AWS best practices.
• Actively contributed to Agile sprint ceremonies, delivering increments of ETL enhancements in close collaboration with client stakeholders.
Software Developer ETL & Databases
Cognizant, Bengaluru, India Aug 2016 – Dec 2018
• Designed and implemented ETL workflows for healthcare and financial clients, integrating multiple source systems into centralized Oracle and SQL Server warehouses.
• Developed stored procedures, triggers, and SQL optimization scripts, reducing query execution times by up to 30% for reporting workloads.
• Built Python- and Java-based automation scripts to validate and reconcile ETL outputs, ensuring accuracy across large-volume data transfers.
• Collaborated with data architects to model OLTP and OLAP schemas, enabling downstream analytics for business intelligence teams.
• Supported early adoption of Pentaho/Kettle-based ETL jobs, gaining experience in data extraction, transformation, and scheduling.
• Enhanced data quality validation frameworks, implementing checks for completeness, consistency, and referential integrity across ETL pipelines.
• Partnered with QA teams to build unit and regression test harnesses for ETL processes, reducing production defects.
• Developed REST API integrations for feeding processed data into client reporting applications.
• Contributed to data governance initiatives, documenting lineage, workflows, and transformation logic for regulatory compliance.
• Participated in Agile/Scrum ceremonies, assisting with sprint planning and backlog refinement for ETL enhancement projects.
• Delivered ad hoc reporting solutions in SQL and Power BI for internal stakeholders.
• Provided production support (L2/L3) for ETL jobs, ensuring SLA adherence and minimizing downtime.
• Collaborated with senior engineers to plan incremental migration of workloads to cloud storage (AWS S3) for archival and reporting use cases.
• Actively engaged in knowledge-sharing sessions, mentoring peers on SQL optimization and ETL best practices.
EDUCATION
• Shri Vishnu Engineering college for Women
BTech, Computer Science and Engineering
Andhra Pradesh, India