Pooja Haridasyam
Data Engineer
Charlotte, NC +1-980-***-**** *****************@*****.*** LinkedIn SUMMARY
Cloud Data Engineer with over 5 years of experience designing scalable, secure, and production-ready data pipelines using AWS and Azure. Skilled in building batch and real-time ETL processes, data modeling, and automating workflows using tools like AWS Glue, Lambda, Redshift, Azure Data Factory, Databricks, and Spark. Strong background in distributed computing, CI/CD, and cross-functional collaboration. Proven ability to reduce pipeline latency, enable real-time analytics, and deliver cost-effective data solutions across healthcare, finance, and insurance industries. pipelines.
SKILLS
Methodologies : SDLC, Agile, Waterfall.
Programing Languages : Python, SQL, Java, JavaScript, Scala, Linux, C++ Big Data Technologies : Hadoop, Apache spark, Hive, Pig, Sqoop, Flume, Spark core, Spark SQL, MapReduce Spark streaming, Apache Kafka, Apache Flink, Yarn, Zookeeper. Frameworks/Libraries : Django, FastAPI, NumPy, Pandas, PySpark, SQLAlchemy, Confluent Kafka. ETL : Azure Data Factory, AWS Glue, Apache Airflow, SSIS, Dataproc, Matillion. Azure Stack : Azure Databricks, Blob storage, Azure functions, HDInsight, Stream analytics, Event hubs, Logic apps, Virtual machines, Azure Service bus, Power BI. AWS Stack : Aws Athena, S3, Lambda, SNS, EC2, Sagemaker, Kinesis, RDS, EMR, IAM, Code pipeline, Route53, Quicksight, Cloudwatch, ECS, Stepfunctions, Elastic search. Databases/Datawarehouses : SQL server, MySQL, PostgreSQL, MongoDB, DB2, Oracle, DynamoDB, Azure SQL, CosmosDB, Cassandra, Azure synapse analytics, Azure data lake gen2, Snowflake, Amazon redshift.
CI/CD Tools : Docker, Azure DevOps, AWS Code Pipeline, Azure Kubernetes, Jenkins, Terraform Version Control Tools : Git, GitHub, Gitlab, Jira
Operating Systems : Windows, Linux
EXPERIENCE
Cigna Healthcare, US Oct 2023 – Current Data Engineer
• Designed serverless ETL pipelines using AWS Glue, S3, and Redshift, processing over 100 million records per day with service-level agreements under 5 minutes.
• Built real-time streaming workflows using Kinesis Firehose, Lambda, and S3 to ingest IoT telemetry data.
• Developed ETL pipelines in Azure Databricks and PySpark to transform raw datasets into Parquet format, optimizing storage and query performance.
• Integrated Apache Kafka with Azure Databricks pipelines to stream real-time healthcare data from multiple sources.
• Developed and optimized Oracle SQL queries, stored procedures, and views to support large-scale healthcare data processing and reporting.
• Automated alerting and post-processing using Lambda functions orchestrated with AWS Step Functions.
• Integrated Azure Logic Apps, Power Automate, and Cosmos DB for hybrid cloud claims processing workflows.
• Optimized Databricks job performance by 30% using PySpark caching and partitioning strategies.
• Developed data ingestion pipelines to parse and transform JSON and XML payloads from APIs and batch files into normalized formats for analytics.
• Designed data models and optimized Power BI reports using DAX and query folding to improve load times and reporting efficiency.
• Enabled federated query capabilities using Glue Data Catalog and Athena.
• Integrated GitHub with CI/CD pipelines to automate builds, testing, and deployments for microservices and ETL workflows.
• Set up production-ready CI/CD pipelines using Jenkins, Terraform, and AWS CodePipeline.
• Developed Python scripts for automating ETL workflows, including data extraction from APIs, transformation using Pandas, and loading into Snowflake.
• Integrated Dremio with Snowflake and Azure Data Lake to create virtual datasets, improving query performance and reducing data duplication.
• Created LookML models and explored data with custom dimensions, measures, and filters to deliver tailored insights for business stakeholders.
• Designed and implemented S3 lifecycle policies and intelligent tiering for archival datasets, reducing storage costs by over 40 percent annually.
TCS, India Aug 2021 – Dec 2022 Data Engineer
• Built and deployed 30+ Azure Data Factory workflows to ingest large files from on-prem SQL Server, DB2, and FTP into Azure Blob Storage, Synapse, and Data Lake for centralized cloud access.
• Automated schema mapping and data transformations using Data Flows, improving accuracy by 25 percent.
• Deployed Dockerized Spark jobs to Azure Kubernetes Service for scalable machine learning preprocessing.
• Developed batch data processing workflows in Apache Spark to transform and load structured and semi- structured data into Azure Synapse.
• Modeled fact and dimension tables using Type 2 slowly changing dimensions for historical data tracking.
• Built Azure Monitor dashboards to support reconciliation and performance diagnostics.
• Built Python-based data pipelines incorporating Spark to handle structured and semi-structured data from Azure Data Lake and Synapse.
• Automated reporting workflows using Python scripts integrated with Power BI datasets to generate scheduled business reports.
• Integrated Azure Synapse datasets into Looker for building KPI-driven dashboards, supporting churn analysis and customer retention initiatives.
• Migrated 15 legacy SSIS workflows to ADF pipelines, lowering maintenance effort by 35 percent.
• Designed and executed complex Oracle PL/SQL scripts for ETL workflows, including data cleansing, transformation, and aggregation.
• Built a metadata-driven pipeline framework in ADF to dynamically ingest multiple sources, reducing onboarding time for new data sources by 60 percent.
• Integrated Azure Key Vault for secure credential management across pipelines, improving security posture and audit readiness.
Aktrix, India Jan 2020 - Jul 2021 Associate Data Engineer
• Developed streaming applications using Kafka and Spark Streaming on AWS EMR to handle over 200,000 events per minute from e-commerce platforms.
• Created Glue workflows with S3 data crawlers to apply business rules for financial reconciliation.
• Built RESTful APIs using API Gateway, Lambda, and DynamoDB, achieving sub-100 millisecond latency.
• Implemented pipeline health monitoring using CloudWatch alerts, reducing incident response time by 60 percent.
• Modeled semi-structured data in Snowflake using JSON and XML, with Snowpipe for continuous ingestion.
• Used Redshift Spectrum to query S3-based logs, saving an average of 5,000 dollars per month in storage costs.
• Automated infrastructure deployment using reusable Terraform modules for Glue, Redshift, and S3.
• Built and maintained Jenkins pipelines with GitHub Actions for testing and production deployment.
• Developed Python-based validation scripts for post-load checks, reducing data issues by 40 percent.
• Built reusable JavaScript components to streamline visualization rendering in Tableau and custom reporting tools.
• Built Looker visualizations to track e-commerce KPIs, including sales performance, inventory turnover, and customer behavior analytics.
• Configured Dremio data sources to integrate on-premise and cloud data, ensuring a unified access layer for analysts and data scientists.
• Designed Tableau dashboards with calculated fields, KPIs, and filters for e-commerce performance monitoring.
• Triggered Lambda functions via S3 events to enable real-time invoice reconciliation.
• Developed automated data quality checks using PySpark and Great Expectations, improving anomaly detection and reducing bad data propagation by 35 percent.
• Designed a modular ETL framework using Glue and Lambda functions, enabling rapid onboarding of 10+ new data sources with minimal code changes.
EDUCATION
Master’s in Computer Science
University of North Carolina at Charlotte
Bachelor’s in Computer Science
Malla Reddy Engineering College for Women
CERTIFICATIONS
• AWS Cloud Practitioner. [AWS]
• CLP Advanced programming in C CISCO Networking Academy.
• Microsoft Azure Fundamentals by Microsoft.
• AWS Web Application Development Builder. [AWS]
• Jenkins Essential Training by LinkedIn.
• Introduction to Monitoring Kubernetes by Datadog.
• Agile Project management by Google.
• Data Analysis using R Programming by Google.
• Configuration management and Cloud by Google.
• Advanced Google Analytics by Google.