Harsha Vardhan
Email: *****************@*****.***
Mobile: 253-***-****
Senior Data Engineer
PROFESSIONAL SUMMARY
●Data Engineer with 6+ years of experience designing, building, and optimizing cloud-native data platforms and large-scale ETL/ELT pipelines across Azure, AWS, and GCP ecosystems.
●Specialized in Azure Data Factory, Microsoft Fabric, Synapse, and Databricks, enabling metadata-driven frameworks, delta lakehouse implementations, and bronze–silver–gold architectures for enterprise data management.
●Expertise in AWS (Glue, Redshift, S3, Lambda, Kinesis) and GCP (BigQuery, Dataflow, Pub/Sub, Composer) to deliver real-time streaming pipelines, analytics platforms, and machine learning–ready datasets.
●Strong proficiency in SQL, Python, and PySpark for advanced transformations, query optimization, and performance tuning on structured, semi-structured, and unstructured data.
●Experienced in data governance, security, and compliance frameworks (IAM, Key Vault, KMS, Purview, Cloud Monitoring) ensuring data quality, lineage, and adherence to GDPR, HIPAA, and industry standards.
●Adept at data analysis and BI solutions using Power BI, Tableau, and Looker Studio, collaborating with business teams to deliver actionable insights, KPIs, and predictive models that drive decision-making.
TECHNICAL SKILLS
●Cloud - Microsoft Fabric, Azure (ADF, ADLS, Databricks, Synapse, Event Hub, Logic Apps), AWS (S3, Lambda), GCP (BigQuery, Data Studio)
●Languages - Java,Python, PySpark, SQL, Scala, Shell Script
●Databases - SQL Server, Azure SQL DB, Teradata, Oracle, Cosmos DB
●Big Data Tools - Apache Spark, Hadoop, Event Hub, Kafka, Stream Analytics
●Visualization - Power BI, Tableau
●ETL Tools - SSIS, Informatica
●CI/CD - Azure DevOps, Git, Jenkins, Terraform
●Modeling - Star Schema, Snowflake Schema
PROFESSIONAL EXPERIENCE
March 2023 – Present
Senior Azure Data Engineer
●Engineered scalable ETL/ELT pipelines using Azure Data Factory (ADF) and Microsoft Fabric, integrating metadata-driven frameworks that improved data processing efficiency by 45%.
●Built lakehouse solutions on ADLS Gen2 with Delta Lake and Databricks, implementing bronze–silver–gold architectures for governance, lineage, and optimized analytical performance.
●Developed and optimized PySpark transformations in Azure Databricks, handling multi-terabyte datasets with advanced partitioning, caching, and performance tuning for faster processing.
●Integrated real-time streaming pipelines with Azure Event Hub, Databricks Structured Streaming, and Fabric Dataflows, enabling fraud detection, operational monitoring, and real-time analytics.
●Implemented data governance and security controls with Azure Purview, RBAC, and Key Vault, ensuring compliance with GDPR, HIPAA, and enterprise data privacy policies.
●Collaborated with business teams to deliver Power BI dashboards connected to Fabric and Synapse, providing predictive insights, KPIs, and self-service analytics that reduced reporting time by 35%.
AWS
January 2021 – March 2023
AWS Data Engineer
●Designed and developed scalable ETL/ELT pipelines using AWS Glue, EMR (Spark), and Step Functions, processing multi-terabyte structured and semi-structured datasets with high reliability.
●Built and optimized data lakes and warehouses on Amazon S3, Redshift, and Athena, leveraging partitioning, compression, and Spectrum to reduce query costs and improve performance by 30%.
●Implemented real-time streaming pipelines with Kinesis Data Streams, Firehose, and Lambda, enabling low-latency ingestion and analytics for fraud detection and transaction monitoring.
●Automated infrastructure provisioning and deployment using Terraform and CloudFormation, improving environment setup efficiency and ensuring consistency across accounts.
●Integrated machine learning workflows by connecting SageMaker, Redshift, and S3, streamlining model training, validation, and deployment into production-ready pipelines.
●Applied data governance, monitoring, and security frameworks using AWS IAM, CloudWatch, CloudTrail, and Lake Formation, ensuring compliance with HIPAA, GDPR, and enterprise security standards.
Oracle
April 2019 – January 2021
Data Engineer
●Designed and implemented scalable ETL/ELT pipelines using Cloud Dataflow and Dataproc (Spark) to process structured and semi-structured datasets for analytics and reporting.
●Built and optimized BigQuery data warehouses with clustering, partitioning, and materialized views, reducing query execution time by 40% and cutting storage costs by 25%.
●Developed real-time streaming pipelines leveraging Pub/Sub and Dataflow, enabling near real-time insights for customer behavior, fraud detection, and operational monitoring.
●Automated workflow orchestration with Cloud Composer (Airflow), streamlining compliance reporting and ensuring data pipelines met SLAs across multiple business domains.
●Partnered with business analysts to design interactive dashboards in Looker and Looker Studio, providing KPIs, forecasting, and predictive insights that improved decision-making speed by 30%.
●Applied data governance, monitoring, and security controls using IAM, Cloud Monitoring, Data Catalog, and KMS, ensuring adherence to GDPR and enterprise data privacy standards.
Education
●Master's in Computer Science – University of Cincinnati
●MVSR College Of Engineering - Bachelor's in Electrical Engineering