Data Engineer Azure

Location:

Charlotte, NC

Posted:

September 10, 2025

Contact this candidate

Resume:

Pooja Haridasyam

Data Engineer

Charlotte, NC +1-980-***-**** *****************@*****.*** LinkedIn SUMMARY

Cloud Data Engineer with over 5 years of experience designing scalable, secure, and production-ready data pipelines using AWS and Azure. Skilled in building batch and real-time ETL processes, data modeling, and automating workflows using tools like AWS Glue, Lambda, Redshift, Azure Data Factory, Databricks, and Spark. Strong background in distributed computing, CI/CD, and cross-functional collaboration. Proven ability to reduce pipeline latency, enable real-time analytics, and deliver cost-effective data solutions across healthcare, finance, and insurance industries. pipelines.

SKILLS

Methodologies : SDLC, Agile, Waterfall.

Programing Languages : Python, SQL, Java, JavaScript, Scala, Linux, C++ Big Data Technologies : Hadoop, Apache spark, Hive, Pig, Sqoop, Flume, Spark core, Spark SQL, MapReduce Spark streaming, Apache Kafka, Apache Flink, Yarn, Zookeeper. Frameworks/Libraries : Django, FastAPI, NumPy, Pandas, PySpark, SQLAlchemy, Confluent Kafka. ETL : Azure Data Factory, AWS Glue, Apache Airflow, SSIS, Dataproc, Matillion. Azure Stack : Azure Databricks, Blob storage, Azure functions, HDInsight, Stream analytics, Event hubs, Logic apps, Virtual machines, Azure Service bus, Power BI. AWS Stack : Aws Athena, S3, Lambda, SNS, EC2, Sagemaker, Kinesis, RDS, EMR, IAM, Code pipeline, Route53, Quicksight, Cloudwatch, ECS, Stepfunctions, Elastic search. Databases/Datawarehouses : SQL server, MySQL, PostgreSQL, MongoDB, DB2, Oracle, DynamoDB, Azure SQL, CosmosDB, Cassandra, Azure synapse analytics, Azure data lake gen2, Snowflake, Amazon redshift.

CI/CD Tools : Docker, Azure DevOps, AWS Code Pipeline, Azure Kubernetes, Jenkins, Terraform Version Control Tools : Git, GitHub, Gitlab, Jira

Operating Systems : Windows, Linux

EXPERIENCE

Cigna Healthcare, US Oct 2023 – Current Data Engineer

• Designed serverless ETL pipelines using AWS Glue, S3, and Redshift, processing over 100 million records per day with service-level agreements under 5 minutes.

• Built real-time streaming workflows using Kinesis Firehose, Lambda, and S3 to ingest IoT telemetry data.

• Developed ETL pipelines in Azure Databricks and PySpark to transform raw datasets into Parquet format, optimizing storage and query performance.

• Integrated Apache Kafka with Azure Databricks pipelines to stream real-time healthcare data from multiple sources.

• Developed and optimized Oracle SQL queries, stored procedures, and views to support large-scale healthcare data processing and reporting.

• Automated alerting and post-processing using Lambda functions orchestrated with AWS Step Functions.

• Integrated Azure Logic Apps, Power Automate, and Cosmos DB for hybrid cloud claims processing workflows.

• Optimized Databricks job performance by 30% using PySpark caching and partitioning strategies.

• Developed data ingestion pipelines to parse and transform JSON and XML payloads from APIs and batch files into normalized formats for analytics.

• Designed data models and optimized Power BI reports using DAX and query folding to improve load times and reporting efficiency.

• Enabled federated query capabilities using Glue Data Catalog and Athena.

• Integrated GitHub with CI/CD pipelines to automate builds, testing, and deployments for microservices and ETL workflows.

• Set up production-ready CI/CD pipelines using Jenkins, Terraform, and AWS CodePipeline.

• Developed Python scripts for automating ETL workflows, including data extraction from APIs, transformation using Pandas, and loading into Snowflake.

• Integrated Dremio with Snowflake and Azure Data Lake to create virtual datasets, improving query performance and reducing data duplication.

• Created LookML models and explored data with custom dimensions, measures, and filters to deliver tailored insights for business stakeholders.

• Designed and implemented S3 lifecycle policies and intelligent tiering for archival datasets, reducing storage costs by over 40 percent annually.

TCS, India Aug 2021 – Dec 2022 Data Engineer

• Built and deployed 30+ Azure Data Factory workflows to ingest large files from on-prem SQL Server, DB2, and FTP into Azure Blob Storage, Synapse, and Data Lake for centralized cloud access.

• Automated schema mapping and data transformations using Data Flows, improving accuracy by 25 percent.

• Deployed Dockerized Spark jobs to Azure Kubernetes Service for scalable machine learning preprocessing.

• Developed batch data processing workflows in Apache Spark to transform and load structured and semi- structured data into Azure Synapse.

• Modeled fact and dimension tables using Type 2 slowly changing dimensions for historical data tracking.

• Built Azure Monitor dashboards to support reconciliation and performance diagnostics.

• Built Python-based data pipelines incorporating Spark to handle structured and semi-structured data from Azure Data Lake and Synapse.

• Automated reporting workflows using Python scripts integrated with Power BI datasets to generate scheduled business reports.

• Integrated Azure Synapse datasets into Looker for building KPI-driven dashboards, supporting churn analysis and customer retention initiatives.

• Migrated 15 legacy SSIS workflows to ADF pipelines, lowering maintenance effort by 35 percent.

• Designed and executed complex Oracle PL/SQL scripts for ETL workflows, including data cleansing, transformation, and aggregation.

• Built a metadata-driven pipeline framework in ADF to dynamically ingest multiple sources, reducing onboarding time for new data sources by 60 percent.

• Integrated Azure Key Vault for secure credential management across pipelines, improving security posture and audit readiness.

Aktrix, India Jan 2020 - Jul 2021 Associate Data Engineer

• Developed streaming applications using Kafka and Spark Streaming on AWS EMR to handle over 200,000 events per minute from e-commerce platforms.

• Created Glue workflows with S3 data crawlers to apply business rules for financial reconciliation.

• Built RESTful APIs using API Gateway, Lambda, and DynamoDB, achieving sub-100 millisecond latency.

• Implemented pipeline health monitoring using CloudWatch alerts, reducing incident response time by 60 percent.

• Modeled semi-structured data in Snowflake using JSON and XML, with Snowpipe for continuous ingestion.

• Used Redshift Spectrum to query S3-based logs, saving an average of 5,000 dollars per month in storage costs.

• Automated infrastructure deployment using reusable Terraform modules for Glue, Redshift, and S3.

• Built and maintained Jenkins pipelines with GitHub Actions for testing and production deployment.

• Developed Python-based validation scripts for post-load checks, reducing data issues by 40 percent.

• Built reusable JavaScript components to streamline visualization rendering in Tableau and custom reporting tools.

• Built Looker visualizations to track e-commerce KPIs, including sales performance, inventory turnover, and customer behavior analytics.

• Configured Dremio data sources to integrate on-premise and cloud data, ensuring a unified access layer for analysts and data scientists.

• Designed Tableau dashboards with calculated fields, KPIs, and filters for e-commerce performance monitoring.

• Triggered Lambda functions via S3 events to enable real-time invoice reconciliation.

• Developed automated data quality checks using PySpark and Great Expectations, improving anomaly detection and reducing bad data propagation by 35 percent.

• Designed a modular ETL framework using Glue and Lambda functions, enabling rapid onboarding of 10+ new data sources with minimal code changes.

EDUCATION

Master’s in Computer Science

University of North Carolina at Charlotte

Bachelor’s in Computer Science

Malla Reddy Engineering College for Women

CERTIFICATIONS

• AWS Cloud Practitioner. [AWS]

• CLP Advanced programming in C CISCO Networking Academy.

• Microsoft Azure Fundamentals by Microsoft.

• AWS Web Application Development Builder. [AWS]

• Jenkins Essential Training by LinkedIn.

• Introduction to Monitoring Kubernetes by Datadog.

• Agile Project management by Google.

• Data Analysis using R Programming by Google.

• Configuration management and Cloud by Google.

• Advanced Google Analytics by Google.

Contact this candidate