Data Engineer Power Bi

Location:

Manhattan, NY, 10176

Posted:

February 27, 2025

Contact this candidate

Resume:

Shree Varun Data Engineer

******@***********.*** 469-***-**** USA LinkedIn

Summary

Experienced Data Engineer with 4+ years of strong background in designing and implementing scalable ETL pipelines, data modeling, and real-time analytics. Proficient in using cloud platforms like Google Cloud and Azure, as well as data warehousing solutions like Snowflake. Skilled in Python, SQL, Apache Kafka, AWS, and Power BI. Adept at collaborating with cross-functional teams to deliver actionable insights, optimize data workflows, and ensure high-quality, efficient data processing for business intelligence. Technical Skills

• Cloud Platforms: Google Cloud Platform (GCP) (BigQuery, Dataflow, Cloud Storage, Cloud Pub/Sub), Azure (Synapse Analytics, Data Lake Storage, Data Factory), AWS (S3, Glue, MSK, Redshift)

• Data Warehousing & Databases: Snowflake, PostgreSQL, DynamoDB

• ETL Tools & Big Data Processing : Apache NiFi, Talend, Apache Spark, PySpark, Apache Kafka, Apache Flink, Google Cloud Dataproc

• Data Visualization & Reporting: Power BI, Tableau, Google Data Studio, AWS QuickSight

• Data Modeling & Architecture: Star Schema, Snowflake Schema, Dimensional Modeling, Dynamic Partitioning

• Programming Languages: SQL, Python (Pandas, NumPy, Scikit-learn, Dask, PyArrow), Java, Scala

• Data Transformation & Analysis: PySpark, DAX (Power BI), SQL-based transformations, Data wrangling, Data cleaning

• Version Control & Collaboration: Git, GitHub, Bitbucket, Jira, Confluence, Agile methodology

• Security & Compliance: Server-side encryption, HIPAA, GDPR compliance, Data governance, IAM (Identity and Access Management)

• Monitoring & Orchestration: Apache Airflow, AWS CloudWatch, Azure Monitor, Google Cloud Operations

• Containerization & Deployment: Docker, Kubernetes, AWS Lambda, GCP Cloud Functions Professional Experience

Data Engineer, Wayfair 06/2022 – Present Remote, USA

• Designed scalable ETL pipelines in Google Cloud Dataflow, extracting and transforming semi-structured customer transaction data from GCS, applying schema inference and PySpark transformations to create structured datasets for advanced analysis.

• Built dynamic data models in Snowflake using SQL techniques such as CTEs, window functions, and dynamic partitioning, enabling fast and efficient querying for customer segmentation and trend analysis.

• Processed historical transaction data to segment customers by purchasing behavior, demographics, and interaction patterns, enabling targeted marketing campaigns and personalized product recommendations.

• Created interactive Power BI dashboards, visualizing key metrics such as product sales trends, customer demographics, and revenue forecasts, providing actionable insights for business stakeholders.

• Optimized data pipelines and storage workflows by leveraging Google Cloud Storage tiered storage (Standard and Coldline) and applying server-side encryption, reducing operational costs by 25% while ensuring data security.

• Utilized Python libraries such as Pandas, NumPy, and Scikit-learn to perform feature engineering, anomaly detection, and predictive modeling on customer purchase behavior, enhancing the accuracy of recommendation systems.

• United with cross-functional teams, with marketing and business analysts, to identify KPIs and align data results with structural goals. Data Engineer, Tata Consultancy Services Limited 01/2020 – 12/2020 India

• Designed a scalable data warehouse using Azure Synapse Analytics, integrating transactional, customer, and inventory data to enable advanced business intelligence and deliver actionable insights through structured data storage and processing workflows.

• Built and maintained complex ETL pipelines with Azure Data Factory, transforming raw data from diverse sources into high-quality, consistent datasets for downstream analytics and reporting, ensuring data reliability and accuracy.

• Created optimized star and snowflake schemas for data modeling, improving query performance and streamlining reporting processes across sales, customer behavior, and inventory metrics to support informed decision-making.

• Implemented scalable data storage solutions using Azure Data Lake Storage and Azure Synapse Analytics, enabling seamless handling of large-scale structured and semi-structured datasets for analytics and reporting teams.

• Developed interactive Power BI dashboards, incorporating dynamic filters and drill-through features to allow users to explore sales trends, customer insights, and inventory performance in real-time.

• Integrated Power BI with Azure Synapse and Azure Data Lake, providing seamless data access for stakeholders and using Power BI’s data modeling and DAX capabilities to create advanced metrics and KPIs for monitoring critical business functions.

• Utilized Python’s Dask library to handle large-scale data transformations and parallel computing, significantly reducing processing time for complex data aggregation tasks within ETL workflows. Data Engineer, Genpact 10/2018 – 12/2019 India

• Built and optimized hybrid real-time data pipelines using Apache Kafka, AWS MSK, and Apache Spark Streaming to process website clickstream data, enabling real-time monitoring of user behavior and sales trends.

• Designed scalable storage solutions with Amazon S3 and DynamoDB, integrating real-time and batch processing capabilities to support periodic reporting and ensure high availability of business-critical metrics.

• Developed advanced data transformation workflows in AWS Glue and PySpark, ensuring data accuracy, efficient aggregation, and seamless integration of historical and real-time datasets for comprehensive analytics.

• Delivered actionable insights by creating interactive Tableau dashboards to visualize key metrics, such as customer behavior patterns, cart abandonment rates, and top-selling products, driving data-informed e-commerce strategies.

• Developed comprehensive visualizations using Tableau to monitor website performance, user engagement, and sales trends, leveraging Tableau’s advanced capabilities for dynamic, interactive dashboards.

• Leveraged Python’s PyArrow library to enable high-performance in-memory data processing and efficient serialization of structured data between AWS S3, DynamoDB, and Spark, improving pipeline efficiency and reducing latency. Education

Master of Science, TEXAS A & M UNIVERSITY- KINGSVILLE Texas, USA Computer & Information Sciences 01/2021 - 05/2022

Bachelor of Technology, JNTUH Hyderabad, India

Electronics and Communication Engineering 2015 – 2019

Contact this candidate