Data Engineer Power Bi

Location:

Austin, TX, 78701

Posted:

April 17, 2025

Contact this candidate

Resume:

Kavya Madaiah

Data Engineer

979-***-**** *****.*@********.*** Austin, TX LinkedIn

Professional Summary

Data Engineer with 5+ years of expertise in data extraction (ETL), data pipeline development (Airflow, Mage, Spark), Cloud (AWS, GCP), and data visualization (Power BI, Tableau, QuickSight).

Optimizing Data Build Tool (DBT) projects in a Snowflake environment by implementing incremental models, leveraging partitioning, tuning query performance, and reducing query runtime and cloud data warehousing costs.

Expert in Spark applications using Spark-SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for transforming the data to uncover insights into customer usage patterns.

Proficient in designing and building robust data pipelines using tools like Apache Airflow and AWS Step Functions to automate data ingestion, transformation, and delivery.

Experience in ETL (Extract, Transform, Load) processes, including data cleaning, normalization, and enrichment to ensure data quality and consistency.

Technical Skills

Programming Language: Python, R, SQL, SAS

Big Data Ecosystem: Apache Spark, Apache Kafka, Apache Nessie, Hadoop, Hive, HDFS Cloud: AWS (Redshift, S3, AWS Glue, Athena, DynamoDB, Lambda, EMR, Firehose), Big Query, Data Lake Visualizations: Tableau, Power BI, Looker, Alteryx Packages: NumPy, Pandas, Matplotlib, Seaborn, PySpark ETL and Data Processing: SSIS, Data build tool (DBT), Apache Airflow, Mage, Informatica, Airbyte, Pentaho, Talend Version Control & Database: GitHub, Git, Git Actions, SQL Server, PostgreSQL, Cassandra, MySQL, Snowflake, HBase Work Experience

MetLife, TX Data Engineer Jan 2024 – Present

Implemented Apache Flink streaming applications to handle real-time data processing with low latency, achieving 20% faster results compared to Apache Spark Streaming.

Developed and maintained 10+ ETL pipelines using BigQuery's native tools and SQL to extract, transform, and load data from various sources into the data warehouse.

Designed and implemented a scalable data lake architecture to store and manage 100 petabytes of structured and unstructured data, achieving a 50% reduction in storage costs through efficient data tiering.

Utilized Dataflow with other GCP services (Pub/Sub, Cloud Storage) to create a robust data ingestion and processing pipeline.

Accomplished interactive Power BI reports leveraging advanced DAX and Power Query modeling techniques to unlock deeper data insights and facilitate informed decision-making.

Dixon Technology, India Data Engineer II Aug 2019 – Jul 2022

Developed and optimized SQL queries for data extraction, transformation, and loading (ETL) processes, achieving an average 15% reduction in query execution time.

Used Spark libraries like Spark SQL and MLlib to perform complex data analysis tasks, resulting in actionable insights for business stakeholders.

Enhanced 12 Databricks pipelines to ingest, process, and transform 500GB of data from 3 disparate sources into a data lake, improving data processing efficiency by 35%.

Orchestrated data pipelines with Apache Airflow, automating workflows and reducing manual intervention by 70%.

Established data integration workflows using Informatica PowerCenter to extract, transform, and load data between various systems.

Optimized SQL queries in AWS Athena to support complex analytical reporting, enabling data-driven decision-making.

Designed and implemented serverless data pipelines using AWS Lambda to process real-time data streams, achieving a 10% reduction in processing latency compared to traditional batch processing methods.

Reduced data validation time by 50% by leveraging Nessie's lineage tracking functionality to identify and correct data inconsistencies efficiently.

Trion IT Solutions, India Data Engineer Jan 2018 – Jul 2019

Implemented DynamoDB streams to capture data changes and trigger downstream data processing pipelines for real-time analytics.

Created HiveQL queries that identified a 30% increase in customer churn rate, allowing for targeted customer retention campaigns.

Accomplished a data warehouse on Redshift for complex data analysis, supporting a daily user workload of 1K concurrent queries.

Leveraged Snowflake's cloud-based architecture to facilitate collaboration between data analysts and data scientists, leading to a 30% increase in data-driven project completion rate.

Developed and deployed various interactive QuickSight dashboards to deliver actionable insights, boosting data-driven decision-making. Education

Master of Science in Business Analytics University of Texas at Dallas, TX (GPA – 3.8/4) May 2024 Certifications

AWS Certified Data Analytics Specialty AWS Certified Cloud Practioner Certified Associate Data Engineer in SQL - Datacamp SAS Certified Specialist: Base Programming Using SAS 9.4 Alteryx Micro-Credentials AI & NLP for Business from Kozminski University

Contact this candidate