PRAVEENA MUDIYALA
Data Engineer With Power BI Expertise In Cloud And Big Data Solutions.
Personal Information
********.****@*****.***
www.linkedin.com/in/praveenam7
669
Plano, TX
Education
Univeristy of Central Missouri
Masters in Big Data Analytics
Warrensburg, USA, Aug 2022 - May 2024
Jawaharlal Nehru Technological
University, Anantapur
Master of Business Administration
Mar 2017 - Jan 2019
Vikrama Simhapuri University
Bachelor of Commerce in Computer
Applications
Apr 2014 - May 2017
Skills
Programming & Scripting
Languages: • Python • Scala • SQL •
Unix/Shell Scripting • SAS
Big Data & Distributed Computing:
Apache Hadoop • Apache Spark •
HDFS • MapReduce • Apache Hive •
Apache Sqoop • Apache Oozie •
Apache Zookeeper • Apache Kafka •
Apache NiFi • Flume • Snowpark
Cloud Platforms & Services:
Amazon Web Services (AWS Services
– Glue, EMR, Redshift, Lambda, S3,
RDS, CloudWatch) • Microsoft Azure
(ADF, Synapse, Databricks, Logic
Apps) • Talend • Microsoft OneLake
Machine Learning & Data Science:
• Large Language Models (LLMs) •
NumPy • Pandas • Scikit-learn •
Seaborn • Matplotlib
Data Engineering & Data
Warehousing: Data Engineering •
Data Modeling • ETL Development •
Data Pipeline Design • Data Migration
• Snowflake • Teradata • Oracle •
MySQL • PostgreSQL • MongoDB •
Cassandra
Summary
Data Engineer with 8+ years of experience designing and building data pipelines and streaming frameworks across Azure and AWS ecosystems. Proficient in setting up Change Data Capture (CDC) for multiple databases to hydrate data lakes on S3 and Delta Lake, and developing both batch and streaming ETL jobs in Apache Spark
(PySpark, Java). Hands-on with Airflow (MWAA) for workflow orchestration, AWS EMR / EMR Serverless, Glue Data Catalog, and Lambda for serverless transformation tasks. Experienced in optimizing Spark jobs using partitioning, broadcast joins, and caching strategies for high-volume CDC data. Strong collaborator working with cross-functional teams to build governed, scalable data solutions for analytics and machine learning use cases.
Work Experience
Data Engineer II Databricks
Plano, TX, USA, May 2025 - Present
Implemented Change Data Capture (CDC) pipelines to incrementally hydrate the Delta Lake (Bronze Silver Gold) layers using Spark Structured Streaming for multiple source databases.
Orchestrated CDC workflows in Apache Airflow (MWAA) integrating AWS S3, Glue Data Catalog, and EMR Serverless for scalable ETL execution. Built Python and Java Spark UDFs for schema validation, data type standardization, and anomaly detection within streaming pipelines. Utilized S3 CRUD operations for raw CDC landing zones, snapshot archives, and Silver-layer curation, ensuring cost-efficient storage management. Designed Spark DataFrame transformations and Spark SQL joins to normalize semi-structured JSON CDC events into analytical tables. Tuned job performance using partition pruning, caching, and broadcast joins, reducing CDC load time by 45 percent.
Integrated Step Functions and Lambda for asynchronous validation and alerting, with Slack notifications for error handling and retry logic. Collaborated with cross-functional teams to convert YAML-based detection rules into parameterized CDC transformations for real-time security insights Data Engineer Microsoft
Redmond, WA, USA, Jul 2024 - Apr 2025
Designed and deployed streaming and batch ETL pipelines in Azure Databricks
(PySpark / Scala) to support real-time data refresh in Fabric Lakehouse (Bronze Silver Gold).
Integrated Azure Event Hub streams for incremental CDC-like data capture from operational sources and transformed them for analytics. Automated workflow orchestration using Apache Airflow on Azure Container Apps, enhancing visibility and retries across multiple pipelines. Improved Spark job performance through broadcast joins and partition tuning strategies, reducing processing time by 35 percent. Embedded schema evolution logic and data quality validation within ETL flows to ensure consistency across incremental loads.
Developed parameterized ETL scripts in Python for metadata-driven job execution and dynamic table hydration.
Supported governance via Microsoft Purview, maintaining lineage and data cataloging for all ETL assets.
Data Engineer MetLife Insurance
New York, USA, Dec 2023 - Jun 2024
Business Analytics & Visualization:
Power BI • Tableau • Oracle OBIEE 12c
(12.2.1.4.0) • SSIS • SSRS • Microsoft
Excel (Pivot Tables, Charts,
Dashboards) • PowerPoint (Executive
Storytelling) • Business Analytics •
Data Storytelling
Version Control & DevOps: Git •
GitHub • Jenkins • Docker •
Confluence • Terraform, kubernetes
Workflow Orchestration &
Methodologies: • Apache Airflow •
Agile/Scrum
Core Professional Skills: Analytical
Skills • Communication • Stakeholder
Collaboration • Cross-Functional
Coordination • Executive Reporting •
Leadership Presentations • Problem
Solving
Certifications
Azure Data Engineer, Microsoft
Fabric Analytics Engineer, Microsoft
Certified Solutions Architect -
Associate, AWS
Certified Data Engineer – Associate,
AWS
SnowPro Advanced: Data Engineer,
Snowflake
Databricks Certified Data Engineer,
Databricks
Databricks Certified Machine
Learning Professional, Databricks
Developed SQL and PySpark transformations in Azure Databricks and Fabric Lakehouse (Bronze–Silver–Gold) for ingestion from internal telemetry APIs and Cosmos DB.
Automated data ingestion and transformation workflows using Azure Data Factory, enabling reliable data refresh cycles across multiple datasets. Optimized SQL queries and view definitions in Synapse Analytics, improving dashboard response times and reducing compute cost. Collaborated with analytics teams to create DBT models for reproducible and version-controlled transformations within the Fabric environment. Used Git CLI and Azure DevOps pipelines to maintain deployment consistency and implement branch-based testing workflows.
Supported Agile and Kanban project delivery, participating in sprint planning, backlog grooming, and production release reviews.
Enhanced data governance and lineage visibility using Microsoft Purview and applied validation rules to maintain data quality. Data Engineer Wells Fargo Bank
Banglore, India, Jan 2019 - Dec 2022
Migrated legacy Hadoop ETLs to AWS EMR and Spark (Scala/PySpark) and redesigned workflows for efficiency and maintainability. Built data ingestion connectors for financial transaction feeds and log streams into Snowflake and Hive, improving data freshness by 40 %. Added unit testing and validation layers to ETL jobs to detect schema drift and null data issues before load.
Documented pipeline architecture, dependencies, and error-handling flows for internal reviews and audit support.
Participated in code reviews and scrum planning through Jira, addressing project risks and delivery timelines with team leads.
Worked with data governance teams to build data quality metrics dashboards in Power BI, helping business stakeholders track data accuracy and pipeline health. Supported a culture of continuous improvement by applying lessons learned from previous ETL failures into subsequent projects.