Data Engineer Real-Time

Location:

Hyderabad, Telangana, India

Salary:

90000

Posted:

September 10, 2025

Contact this candidate

Resume:

Professional Summary

VAKUL MUVVALA

Dorchester, MA 02125 +1-857-***-**** ****************@*****.*** LinkedIn Results-driven Data Engineer with 5+ years of experience delivering enterprise-grade data platforms across finance, insurance, and healthcare domains. Proven expertise in building scalable ETL/ELT pipelines, real-time streaming solutions, and modern data lakehouse architectures using Python, PySpark, SQL, and Apache Spark. Skilled at modernizing legacy systems into cloud-native platforms (AWS, Azure, Databricks, Snowflake, Redshift, Synapse) to enable high-performance analytics and AI adoption. ExperiencedA at data governance, compliance (HIPAA, SOX, PCI-DSS, GDPR), and optimizing pipelines for reliability, scalability, and cost efficiency. Recognized for reducing ETL runtimes by 40%+, enabling fraud detection in under 5 seconds, and cutting onboarding times by 50% through metadata-driven frameworks. Strong communicator with cross-functional teams, enabling real-time insights, regulatory readiness, and business transformation. Skills

• Programming Languages: Python, SQL, Scala (basic), Shell Scripting

• Big Data & Processing: Apache Spark, PySpark, Delta Lake, Spark Structured Streaming, Hadoop

• Data Integration & Orchestration: Apache Airflow, AWS Glue, Azure Data Factory, Oozie, Informatica (basic)

• Cloud Platforms: AWS (S3, Redshift, Glue, EMR, KMS, Athena), Azure (ADLS, Synapse, Key Vault, ADF)

• Data Warehouses: Snowflake, Redshift, Azure Synapse Analytics, SQL Server, PostgreSQL, Oracle

• Streaming & Messaging: Apache Kafka, Kafka Streams, Flume,Kinesis

• Data Catalog/Lineage: AWS Glue Catalog, Apache Atlas, Azure Purview

• Workflow Orchestration: Apache Airflow, Oozie

• DevOps & CI/CD: Git, GitHub Actions, Bitbucket, Jenkins, Azure DevOps

• Monitoring & Logging: CloudWatch, Datadog, Prometheus, Azure Monitor, Log Analytics

• Testing & Quality: Great Expectations, PyTest, UnitTest, Data Validation Scripts

• Visualization Tools: Power BI, Tableau (for data outputs only),QuickSight

• Security & Compliance: IAM, RBAC, KMS Encryption, HIPAA, PCI-DSS, SOX, GDPR

• File Formats: Parquet, ORC, Avro, JSON, CSV, XML

• Documentation & Tools: Confluence, JIRA, Postman, Swagger, MS Excel Professional Experience

BNY Mellon– AWS Data Lakehouse Jul 2023 - Present

Data Engineer Remote, USA

Partnered with Accenture to modernize enterprise data management and analytics platforms, migrating legacy systems into a cloud-native, AI-enabled architecture.

Built ETL/ELT pipelines in Airflow, Glue, and ADF to ingest billions of records from trading, market feeds, and compliance sources into Snowflake and Redshift.

Designed curated data models (Star/Snowflake schemas) to support wealth management and private markets analytics, reducing reporting latency by 60%.

Implemented data governance with Apache Atlas and Purview, enabling full lineage, role-based access, and SOX/PCI-DSS compliance across 20+ domains.

Optimized Delta Lake pipelines with schema evolution and rollback features, improving resilience and reducing manual fixes by 40%.

Automated deployment with CI/CD (Jenkins, GitHub Actions), cutting release cycles from weeks to days.

Developed real-time Kafka + Spark Structured Streaming pipelines to detect anomalies in financial transactions, enabling fraud detection in under 5 seconds and delivering ML-ready datasets for downstream analytics.

Leveraged AWS Glue Crawlers and Athena to automate schema discovery and query access, reducing new data source onboarding time by 50% and improving overall engineering productivity. Cognizant– Insurance Data Lake Modernization May 2021 - Jul 2022 Data Engineer Bangalore, India

Led development of a cloud-based insurance data lake on AWS and Databricks, consolidating policy, claims, and customer behavior data from 15+ source systems.

Designed and orchestrated PySpark ETL pipelines in Airflow, ingesting millions of daily records into curated Delta Lake tables with schema enforcement and time-travel for regulatory audits.

Built real-time Kafka streaming pipelines with Spark Structured Streaming, enabling sub-second fraud detection and faster claims settlement.

Partnered with actuarial teams to implement Star-schema models in Snowflake, powering analytics on premium forecasts, loss ratios, and claim rate trends.

Developed metadata-driven ingestion frameworks with YAML-based configs, reducing new data source onboarding time by 40%.

Implemented data quality validation (Great Expectations) and unit-tested PySpark logic, improving downstream analytics reliability by 35%.

Integrated external REST APIs (e.g., weather, credit scores, accident reports) into customer profiles, enriching risk scoring and underwriting decisions.

InterMountain HealthCare– Azure Healthcare Data Lakehouse Modernization Jun 2019 - May 2021 Data Engineer India

• Designed and implemented a Delta Lakehouse architecture on Azure Data Lake Storage (ADLS), consolidating EHR, lab, billing, and pharmacy data from Cerner and Epic systems.

• Built PySpark pipelines to cleanse, transform, and anonymize PHI/PII using hashing, masking, and tokenization, ensuring compliance with HIPAA and enterprise security standards.

• Orchestrated Airflow DAGs to automate nightly batch jobs aggregating vitals, medications, and clinical notes, powering executive dashboards for 10+ clinical departments.

• Developed complex SQL transformations in Azure Synapse to derive KPIs like readmission rates, treatment costs, and length of stay

(LOS), improving operational insights by 30%.

• Integrated HL7 and FHIR APIs to ingest lab and wearable device data into structured patient records, enabling <10s latency reporting in Power BI.

• Implemented Great Expectations validation across hundreds of data fields, reducing clinical data errors by 40%.

• Deployed pipelines through Azure DevOps CI/CD, linking bug fixes and enhancements directly to work items for traceability. Education

University of Massachusetts, Boston,

Master of Science, Information Technology

Coursework: Advanced Statistics with Data Science, Data Visualization, Big Data Analytics, Predictive Analytics, Prescriptive Analytics, Data Warehousing, Modelling for Business analytics Certifications

• Microsoft Certified – Azure Data Engineer Associate

• Certified AWS Data Engineer – Associate (DEA – C01)

Contact this candidate