Data Engineer Power Bi

Location:

Dallas, TX

Posted:

October 15, 2025

Contact this candidate

Resume:

Suprathika V

Data Engineer

**********.***@*****.*** 940-***-**** http://www.linkedin.com/in/suprathikav Denton, TX (Open to Relocate) TECHNICAL SKILLS

Cloud Platforms: Azure (ADF, Event Hubs, ADLS, Databricks); AWS (S3, EMR, Glue, Lambda, Athena) Programming & Scripting: Python, SQL, PySpark.

Streaming & Processing Frameworks: Apache Kafka, AWS Kinesis, Spark Streaming, Apache Spark, Hadoop, Databricks, Hive.

Warehousing & Analytics: Snowflake, Redshift, Azure Synapse, BigQuery. ETL & Integration: AWS Glue, Informatica, Talend, Apache Airflow. Visualization & Monitoring: Power BI, Tableau, Matplotlib/Seaborn (Python), Grafana. Security & Governance: Informatica Data Quality, AWS IAM, Azure AD, Unity Catalog. CI/CD & DevOps: Jenkins, Docker,Azure DevOps, Git, GitHub Actions, Jira. Data Science & ML: Azure ML, IBM SPSS, SAS, RapidMiner, OpenRefine. WORK EXPERIENCE

Alliant group, Houston, TX

Data Engineer Aug 2024 - Present

• Integrated multiple financial data sources (JSON, Parquet, CSV) into an Azure Data Lake using PySpark, processing 2TB/day with 40% faster ingestion and improved data accuracy.

• Built and optimized data pipelines using Azure Data Factory, Apache Airflow, and Spark, automating retries and alerts to reduce failure rates from 15% to 0.15%.

• Developed real-time Kafka streaming pipelines on Azure Event Hubs with Spark Structured Streaming, enabling low-latency data flow for trade and risk analytics.

• Migrated on-prem SQL Server workloads to Snowflake on Azure using Python, SnowSQL, and parallel ETL, reducing costs by 35% and boosting query performance.

• Automated CI/CD workflows in Jenkins for deploying PySpark and SQL scripts, increasing release reliability by 70% across dev, test, and prod environments.

• Designed predictive risk models using Azure ML and Python, improving fraud detection accuracy to 92% precision and 90% recall on 1M+ records.

• Enhanced performance by 45% through query optimization, indexing, and schema improvements in Azure SQL and Synapse Analytics; automated workflows via DevOps and ADF triggers. Fannie Mae, Plano, TX

Data Engineer Feb 2024 - Aug 2024

• Developed and managed ETL pipelines using AWS Glue, Lambda, and Redshift for large-scale data processing and reporting.

• Built and deployed Spark and PySpark jobs on AWS EMR, integrating with Kafka and Kinesis for real-time data ingestion and analytics.

• Created data lake solutions using AWS S3, Athena, and Glue Catalog, ensuring secure and efficient data storage and access.

• Implemented CI/CD pipelines in Jenkins for automating data pipeline deployments and SQL job executions across environments.

• Worked with Hive, Parquet, and Avro file formats for optimized storage and faster data retrieval in AWS environments.

• Performed data validation, monitoring, and issue resolution to maintain data accuracy and support production workloads.

Value Labs, India

Data Engineer June 2021 - Dec 2022

• Developed a unified AWS S3 Data Lake leveraging Apache Spark to integrate over 10 heterogeneous data sources including flat files, OLTP systems, REST APIs, and Kafka streams, resulting in a 20% reduction in false positives and enhancing fraud detection accuracy.

• Optimized PySpark-based ETL pipelines by implementing partition pruning, broadcast joins, caching, and memory tuning using Spark UI and Datadog monitoring tools, achieving a 40% improvement in processing performance.

• Migrated on-premises banking datasets to AWS Cloud using Talend, AWS Glue, and AWS DMS, ensuring secure data transfer and improving scalability and processing speed by 15%.

• Automated CI/CD workflows through Jenkins pipelines, integrating unit and integration testing for data validation and transformation logic. Reduced deployment errors by 75%, ensuring seamless and reliable data releases.

• Designed and implemented Redshift data marts to support fraud analytics, customer segmentation, and transaction monitoring, executing data validation and quality checks across 15+ complex SQL queries to maintain high data integrity.

• Implemented AWS IAM roles, KMS encryption, and S3 lifecycle policies to ensure data governance, security compliance, and cost optimization.

• Conducted performance benchmarking between AWS Glue and Spark clusters to recommend optimal data processing strategies, reducing overall ETL costs by 25%.

• Created metadata-driven ETL frameworks to dynamically handle schema evolution and automate ingestion from new data sources.

Value Labs, India

Data Engineer Intern Jan 2021 - June 2021

• Configured Azure Data Factory (ADF) pipelines to automate ETL processes for data extraction, transformation, and loading into ADLS Gen2 with 80% improved accuracy and consistency.

• Developed optimized data models and DAX measures in Power BI for billing, revenue forecasting, and KPI tracking, ensuring reliable and actionable business insights.

• Designed and scheduled scalable ETL workflows using ADF, SQL, and Databricks, including SCD Type 1 & 2 transformations for customer and account dimensions.

• Created and managed ingestion pipelines from on-prem financial systems to Azure Blob Storage and Data Lake using Integration Runtime with full audit and data validation checks.

• Developed interactive Power BI dashboards integrated with Synapse datasets, enabling real-time reporting and improving stakeholder satisfaction by 80%. EDUCATION

University of North Texas, Denton, TX

Master of Science, Data Science GPA: 3.8/4.0

Jyothishmathi Institute of Technology and Science, India Bachelor of Technology, Computer Science & Engineering GPA: 7.3/10 CERTIFICATIONS

• AWS Certified Data Engineer - Associate (DEA-C01)

• IBM Data Engineering Professional Certificate (Coursera)

• Associate Data Engineer in SQL, Python & PySpark (DataCamp)

• Oracle Certified Cloud AI Infrastructure Fundamentals

Contact this candidate