Professional Summary
VAKUL MUVVALA
Dorchester, MA 02125 +1-857-***-**** ****************@*****.*** LinkedIn Results-driven Data Engineer with 5+ years of experience delivering enterprise-grade data platforms across finance, insurance, and healthcare domains. Proven expertise in building scalable ETL/ELT pipelines, real-time streaming solutions, and modern data lakehouse architectures using Python, PySpark, SQL, and Apache Spark. Skilled at modernizing legacy systems into cloud-native platforms (AWS, Azure, Databricks, Snowflake, Redshift, Synapse) to enable high-performance analytics and AI adoption. ExperiencedA at data governance, compliance (HIPAA, SOX, PCI-DSS, GDPR), and optimizing pipelines for reliability, scalability, and cost efficiency. Recognized for reducing ETL runtimes by 40%+, enabling fraud detection in under 5 seconds, and cutting onboarding times by 50% through metadata-driven frameworks. Strong communicator with cross-functional teams, enabling real-time insights, regulatory readiness, and business transformation. Skills
• Programming Languages: Python, SQL, Scala (basic), Shell Scripting
• Big Data & Processing: Apache Spark, PySpark, Delta Lake, Spark Structured Streaming, Hadoop
• Data Integration & Orchestration: Apache Airflow, AWS Glue, Azure Data Factory, Oozie, Informatica (basic)
• Cloud Platforms: AWS (S3, Redshift, Glue, EMR, KMS, Athena), Azure (ADLS, Synapse, Key Vault, ADF)
• Data Warehouses: Snowflake, Redshift, Azure Synapse Analytics, SQL Server, PostgreSQL, Oracle
• Streaming & Messaging: Apache Kafka, Kafka Streams, Flume,Kinesis
• Data Catalog/Lineage: AWS Glue Catalog, Apache Atlas, Azure Purview
• Workflow Orchestration: Apache Airflow, Oozie
• DevOps & CI/CD: Git, GitHub Actions, Bitbucket, Jenkins, Azure DevOps
• Monitoring & Logging: CloudWatch, Datadog, Prometheus, Azure Monitor, Log Analytics
• Testing & Quality: Great Expectations, PyTest, UnitTest, Data Validation Scripts
• Visualization Tools: Power BI, Tableau (for data outputs only),QuickSight
• Security & Compliance: IAM, RBAC, KMS Encryption, HIPAA, PCI-DSS, SOX, GDPR
• File Formats: Parquet, ORC, Avro, JSON, CSV, XML
• Documentation & Tools: Confluence, JIRA, Postman, Swagger, MS Excel Professional Experience
BNY Mellon– AWS Data Lakehouse Jul 2023 - Present
Data Engineer Remote, USA
Partnered with Accenture to modernize enterprise data management and analytics platforms, migrating legacy systems into a cloud-native, AI-enabled architecture.
Built ETL/ELT pipelines in Airflow, Glue, and ADF to ingest billions of records from trading, market feeds, and compliance sources into Snowflake and Redshift.
Designed curated data models (Star/Snowflake schemas) to support wealth management and private markets analytics, reducing reporting latency by 60%.
Implemented data governance with Apache Atlas and Purview, enabling full lineage, role-based access, and SOX/PCI-DSS compliance across 20+ domains.
Optimized Delta Lake pipelines with schema evolution and rollback features, improving resilience and reducing manual fixes by 40%.
Automated deployment with CI/CD (Jenkins, GitHub Actions), cutting release cycles from weeks to days.
Developed real-time Kafka + Spark Structured Streaming pipelines to detect anomalies in financial transactions, enabling fraud detection in under 5 seconds and delivering ML-ready datasets for downstream analytics.
Leveraged AWS Glue Crawlers and Athena to automate schema discovery and query access, reducing new data source onboarding time by 50% and improving overall engineering productivity. Cognizant– Insurance Data Lake Modernization May 2021 - Jul 2022 Data Engineer Bangalore, India
Led development of a cloud-based insurance data lake on AWS and Databricks, consolidating policy, claims, and customer behavior data from 15+ source systems.
Designed and orchestrated PySpark ETL pipelines in Airflow, ingesting millions of daily records into curated Delta Lake tables with schema enforcement and time-travel for regulatory audits.
Built real-time Kafka streaming pipelines with Spark Structured Streaming, enabling sub-second fraud detection and faster claims settlement.
Partnered with actuarial teams to implement Star-schema models in Snowflake, powering analytics on premium forecasts, loss ratios, and claim rate trends.
Developed metadata-driven ingestion frameworks with YAML-based configs, reducing new data source onboarding time by 40%.
Implemented data quality validation (Great Expectations) and unit-tested PySpark logic, improving downstream analytics reliability by 35%.
Integrated external REST APIs (e.g., weather, credit scores, accident reports) into customer profiles, enriching risk scoring and underwriting decisions.
InterMountain HealthCare– Azure Healthcare Data Lakehouse Modernization Jun 2019 - May 2021 Data Engineer India
• Designed and implemented a Delta Lakehouse architecture on Azure Data Lake Storage (ADLS), consolidating EHR, lab, billing, and pharmacy data from Cerner and Epic systems.
• Built PySpark pipelines to cleanse, transform, and anonymize PHI/PII using hashing, masking, and tokenization, ensuring compliance with HIPAA and enterprise security standards.
• Orchestrated Airflow DAGs to automate nightly batch jobs aggregating vitals, medications, and clinical notes, powering executive dashboards for 10+ clinical departments.
• Developed complex SQL transformations in Azure Synapse to derive KPIs like readmission rates, treatment costs, and length of stay
(LOS), improving operational insights by 30%.
• Integrated HL7 and FHIR APIs to ingest lab and wearable device data into structured patient records, enabling <10s latency reporting in Power BI.
• Implemented Great Expectations validation across hundreds of data fields, reducing clinical data errors by 40%.
• Deployed pipelines through Azure DevOps CI/CD, linking bug fixes and enhancements directly to work items for traceability. Education
University of Massachusetts, Boston,
Master of Science, Information Technology
Coursework: Advanced Statistics with Data Science, Data Visualization, Big Data Analytics, Predictive Analytics, Prescriptive Analytics, Data Warehousing, Modelling for Business analytics Certifications
• Microsoft Certified – Azure Data Engineer Associate
• Certified AWS Data Engineer – Associate (DEA – C01)