Data Engineer Big

Location:

Plano, TX

Posted:

October 15, 2025

Contact this candidate

Resume:

PRAVEENA MUDIYALA

Data Engineer With Power BI Expertise In Cloud And Big Data Solutions.

Personal Information

518-***-****

********.****@*****.***

www.linkedin.com/in/praveenam7

669

Plano, TX

Education

Univeristy of Central Missouri

Masters in Big Data Analytics

Warrensburg, USA, Aug 2022 - May 2024

Jawaharlal Nehru Technological

University, Anantapur

Master of Business Administration

Mar 2017 - Jan 2019

Vikrama Simhapuri University

Bachelor of Commerce in Computer

Applications

Apr 2014 - May 2017

Skills

Programming & Scripting

Languages: • Python • Scala • SQL •

Unix/Shell Scripting • SAS

Big Data & Distributed Computing:

Apache Hadoop • Apache Spark •

HDFS • MapReduce • Apache Hive •

Apache Sqoop • Apache Oozie •

Apache Zookeeper • Apache Kafka •

Apache NiFi • Flume • Snowpark

Cloud Platforms & Services:

Amazon Web Services (AWS Services

– Glue, EMR, Redshift, Lambda, S3,

RDS, CloudWatch) • Microsoft Azure

(ADF, Synapse, Databricks, Logic

Apps) • Talend • Microsoft OneLake

Machine Learning & Data Science:

• Large Language Models (LLMs) •

NumPy • Pandas • Scikit-learn •

Seaborn • Matplotlib

Data Engineering & Data

Warehousing: Data Engineering •

Data Modeling • ETL Development •

Data Pipeline Design • Data Migration

• Snowflake • Teradata • Oracle •

MySQL • PostgreSQL • MongoDB •

Cassandra

Summary

Data Engineer with 8+ years of experience designing and building data pipelines and streaming frameworks across Azure and AWS ecosystems. Proficient in setting up Change Data Capture (CDC) for multiple databases to hydrate data lakes on S3 and Delta Lake, and developing both batch and streaming ETL jobs in Apache Spark

(PySpark, Java). Hands-on with Airflow (MWAA) for workflow orchestration, AWS EMR / EMR Serverless, Glue Data Catalog, and Lambda for serverless transformation tasks. Experienced in optimizing Spark jobs using partitioning, broadcast joins, and caching strategies for high-volume CDC data. Strong collaborator working with cross-functional teams to build governed, scalable data solutions for analytics and machine learning use cases.

Work Experience

Data Engineer II Databricks

Plano, TX, USA, May 2025 - Present

Implemented Change Data Capture (CDC) pipelines to incrementally hydrate the Delta Lake (Bronze Silver Gold) layers using Spark Structured Streaming for multiple source databases.

Orchestrated CDC workflows in Apache Airflow (MWAA) integrating AWS S3, Glue Data Catalog, and EMR Serverless for scalable ETL execution. Built Python and Java Spark UDFs for schema validation, data type standardization, and anomaly detection within streaming pipelines. Utilized S3 CRUD operations for raw CDC landing zones, snapshot archives, and Silver-layer curation, ensuring cost-efficient storage management. Designed Spark DataFrame transformations and Spark SQL joins to normalize semi-structured JSON CDC events into analytical tables. Tuned job performance using partition pruning, caching, and broadcast joins, reducing CDC load time by 45 percent.

Integrated Step Functions and Lambda for asynchronous validation and alerting, with Slack notifications for error handling and retry logic. Collaborated with cross-functional teams to convert YAML-based detection rules into parameterized CDC transformations for real-time security insights Data Engineer Microsoft

Redmond, WA, USA, Jul 2024 - Apr 2025

Designed and deployed streaming and batch ETL pipelines in Azure Databricks

(PySpark / Scala) to support real-time data refresh in Fabric Lakehouse (Bronze Silver Gold).

Integrated Azure Event Hub streams for incremental CDC-like data capture from operational sources and transformed them for analytics. Automated workflow orchestration using Apache Airflow on Azure Container Apps, enhancing visibility and retries across multiple pipelines. Improved Spark job performance through broadcast joins and partition tuning strategies, reducing processing time by 35 percent. Embedded schema evolution logic and data quality validation within ETL flows to ensure consistency across incremental loads.

Developed parameterized ETL scripts in Python for metadata-driven job execution and dynamic table hydration.

Supported governance via Microsoft Purview, maintaining lineage and data cataloging for all ETL assets.

Data Engineer MetLife Insurance

New York, USA, Dec 2023 - Jun 2024

Business Analytics & Visualization:

Power BI • Tableau • Oracle OBIEE 12c

(12.2.1.4.0) • SSIS • SSRS • Microsoft

Excel (Pivot Tables, Charts,

Dashboards) • PowerPoint (Executive

Storytelling) • Business Analytics •

Data Storytelling

Version Control & DevOps: Git •

GitHub • Jenkins • Docker •

Confluence • Terraform, kubernetes

Workflow Orchestration &

Methodologies: • Apache Airflow •

Agile/Scrum

Core Professional Skills: Analytical

Skills • Communication • Stakeholder

Collaboration • Cross-Functional

Coordination • Executive Reporting •

Leadership Presentations • Problem

Solving

Certifications

Azure Data Engineer, Microsoft

Fabric Analytics Engineer, Microsoft

Certified Solutions Architect -

Associate, AWS

Certified Data Engineer – Associate,

AWS

SnowPro Advanced: Data Engineer,

Snowflake

Databricks Certified Data Engineer,

Databricks

Databricks Certified Machine

Learning Professional, Databricks

Developed SQL and PySpark transformations in Azure Databricks and Fabric Lakehouse (Bronze–Silver–Gold) for ingestion from internal telemetry APIs and Cosmos DB.

Automated data ingestion and transformation workflows using Azure Data Factory, enabling reliable data refresh cycles across multiple datasets. Optimized SQL queries and view definitions in Synapse Analytics, improving dashboard response times and reducing compute cost. Collaborated with analytics teams to create DBT models for reproducible and version-controlled transformations within the Fabric environment. Used Git CLI and Azure DevOps pipelines to maintain deployment consistency and implement branch-based testing workflows.

Supported Agile and Kanban project delivery, participating in sprint planning, backlog grooming, and production release reviews.

Enhanced data governance and lineage visibility using Microsoft Purview and applied validation rules to maintain data quality. Data Engineer Wells Fargo Bank

Banglore, India, Jan 2019 - Dec 2022

Migrated legacy Hadoop ETLs to AWS EMR and Spark (Scala/PySpark) and redesigned workflows for efficiency and maintainability. Built data ingestion connectors for financial transaction feeds and log streams into Snowflake and Hive, improving data freshness by 40 %. Added unit testing and validation layers to ETL jobs to detect schema drift and null data issues before load.

Documented pipeline architecture, dependencies, and error-handling flows for internal reviews and audit support.

Participated in code reviews and scrum planning through Jira, addressing project risks and delivery timelines with team leads.

Worked with data governance teams to build data quality metrics dashboards in Power BI, helping business stakeholders track data accuracy and pipeline health. Supported a culture of continuous improvement by applying lessons learned from previous ETL failures into subsequent projects.

Contact this candidate