Data Engineer

Location:

New Paltz, NY

Salary:

Posted:

May 20, 2025

Contact this candidate

Resume:

Mounika Sree

Data Engineer

Email: **************@*****.***

Phone: +1-413-***-**** Linkedin

Professional Summary

• 8 years of strong experience as a Data Engineer, delivering end-to-end data solutions across Azure, AWS, and GCP cloud platforms.

• Expertise in designing and building scalable ETL pipelines, real-time data streaming applications, and cloud migration projects using Azure Data Factory, AWS Glue, and GCP Dataflow.

• Proficient in handling big data technologies like Apache Spark, Spark Streaming, Hive, Kafka, and Hadoop for large- scale data processing and analytics.

• Skilled in working with cloud data warehouses including Azure Synapse Analytics, Snowflake, AWS Redshift, and GCP Big Query for high-performance querying and reporting.

• Hands-on experience with automation and orchestration tools such as PowerShell scripting, Cloud Shell, AWS Lambda, and Azure Monitor.

• Strong development skills in Python, Scala, PySpark, and SQL for building data integration, transformation, and analysis processes.

• Migrated on-premises systems to cloud platforms, including Azure Data Lake, Azure SQL Database, and Snowflake, ensuring seamless transitions and optimized performance.

• Built and managed secure data environments, implementing GCP IAM, VPC controls, and HIPAA-compliant protocols for handling sensitive healthcare and financial data.

• Developed dynamic data visualization solutions using Power BI and Tableau, delivering actionable business insights through dashboards and reports.

• Experience working in Agile/Scrum development environments, using JIRA for project tracking, sprint planning, and team coordination.

• Adept at ETL development, data ingestion, data transformation, data migration, and real-time streaming to support enterprise-wide analytics needs.

• Strong background in SQL Server, MySQL, and database management, including designing tables, views, stored procedures, and indexes for performance optimization. Technical Skills

Azure Cloud Technologies: Azure Data Factory, Azure SQL Database, Azure Blob Storage, Azure SQL Data Warehouse, Azure Data Lake, Azure Data Lake Analytics, Azure Databricks, Azure Synapse Analytics, Azure Monitor AWS Cloud Technologies: Amazon S3, AWS Lambda, Amazon EC2, Amazon Redshift, Amazon RDS, AWS Glue, AWS CloudWatch, AWS EMR, AWS SQS

GCP Cloud Technologies: GCP Big Query, GCS Buckets, GCP Cloud Dataflow, GCP Dataproc, GCP Data Prep, GCP Cloud Functions, GCP Cloud Shell, GCP IAM, GCP VPC Controls Big Data Technologies: Apache Spark, Spark-Streaming, Spark SQL, Hadoop, Hive, Pig, Oozie, Kafka, Apache Beam, Sqoop Databases/Warehouses: Azure SQL Database, Azure SQL Data Warehouse, Snowflake, MySQL, SQL Server 2012/2016 Orchestration/Automation: Azure Data Factory (ADF), AWS Glue, Apache Oozie, PowerShell Scripts, Jenkins Programming Languages: Python, Scala, PySpark, SQL Data Visualization: Power BI, Tableau, SFDC, MS Project, MS Excel, MS SharePoint Data Storage: Azure Blob Storage, Azure Data Lake Storage, AWS S3, GCP GCS Buckets, HDFS DevOps Tools: Jenkins, Cloud Shell, Gsutil, BQ Command Line Utilities Agile Methodologies: Agile (Scrum), JIRA, SDLC

Other Tools: SSMS (SQL Server Management Studio), BIDS (Business Intelligence Development Studio), ETL Pipelines, Data Ingestion, Data Transformation, Data Migration, Real-time Data Streaming, GCP IAM, VPC Controls, Spark UI, Azure Monitor, AWS CloudWatch

Certifications

• Microsoft Certified: Azure Database Administrator Associate – DP300

• Microsoft Certified: Power BI Data Analyst Associate - PL300 Professional Experience

Optum, AZ, USA

Data Engineer Jan 2023 – Present

• Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse.

• Used Azure Data Factory extensively for ingesting data from disparate source systems.

• Used Azure Data Factory as an orchestration tool for integrating data from upstream to downstream systems.

• Migrating Services from On-premises to Azure Cloud Environments. Collaborate with development and QA teams to maintain high-quality deployment

• Developed multiple POCs using Scala, compared the performance of Spark, with Hive and SQL/Teradata.

• Developed analytical component using Scala, Spark and Spark Stream. Involved in importing the real-time data to Hadoop using Kafka and implemented Oozie jobs for daily imports.

• Hands-on experience on developing PowerShell Scripts for automation purposes. Used PowerShell to build scripts to migrate pipelines, linked services, datasets from Dev to QA and Prod.

• Designed SSIS Packages using Business Intelligence Development Studio (BIDS) to extract data from various data sources and load into SQL Server 2012 & 2016 database for further Data Analysis and Reporting by using multiple transformations.

• Created and maintained SQL database objects such as tables, views, indexes and store procedures with SQL Server Management Studio (SSMS)

• Use Kafka publish-subscribe messaging system by creating topics using consumers and producers to ingest data into the application for Spark to process the data and create Kafka topics for application and system logs.

• Performed migration of SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse, controlled and granted database access and migrated on premise databases to Azure Data Lake store using Azure Data factory.

• Created various pipelines to load the data from Azure data lake into staging SQL DB and followed by to Azure SQL DB

• Performed ETL using Azure Data Bricks. Migrated on-premises Oracle ETL process to Azure Synapse Analytics.

• Experienced in building end-to-end ETL pipelines to extract, transform, and load data from various sources into Synapse Analytics using Azure Data Factory

Environment: Azure Data Factory, Azure SQL Database, Azure Blob Storage, Azure SQL Data Warehouse, Azure Data Lake, Azure Data Lake Analytics, Azure Databricks, Azure Synapse Analytics, SQL Server 2012/2016, Hadoop, Kafka, Hive, Teradata, Oracle, PowerShell, BIDS, SSMS.

Cognizant, Bangalore, India

Data Engineer Aug 2019 – Sep 2022

• Designed and setup Enterprise data lake to provide support for Data analytics, rapidly changing data and processing.

• Created a Lambda Deployment function and configured it to receive events from your S3 bucket.

• Responsible for ingestion of data from multiple source systems to AWS cloud and make it available for the business users to access it through AWS redshift.

• Developed different AWS Lambda functions and API Gateways, to submit data via API Gateway that is accessible via Lambda function.

• Developed data pipelines and stacked the data from DynamoDB to AWS S3 bucket and into HDFS location using Hive.

• Developed ETL framework using Spark and Hive (including daily runs, error handling, and logging) to useful data.

• Responsible in handling streaming data from web server console log and using the spark with SQS to stream the data in real time.

• Worked on data migration from AWS S3 and staging table to snowflake.

• Intensive experience in migration of data from RDBMS to snowflake data warehouse

• Developed ETL pipeline using the combination of python and snowflake snows writing SQL queries against snowflake

• Worked in advanced Snowflake concepts including the setting up of resource monitors, data sharing, role-based access control, Snow pipes, Zero-copy cloning.

• Experience in deploying the project into Jenkins by working with GIT Version Control System.

• Worked on Agile (Scrum) Methodology, and participated in daily scrum meetings, and being actively involved in sprint planning and product backlog creation.

• Utilized SDLC and Agile methodologies such as SCRUM.

• Developed and Configured AWS services like glue, EC2, S3 using Python Boto 3.

• Implemented Python and PySpark in AWS Glue and Lambda functions to create on-demand files on S3 files.

• Processed the S3 data from starting stage to persistence stage by creating Glue jobs.

• Deployed AWS CloudWatch to create alarms, notifications and logs for Lambda functions, Glue jobs, and EC2.

• Used Python code to write AWS Lambda function for nested Json files and sorted, compared, and converted it. Environment: Spark, Spark-Streaming, Spark SQL, AWS EMR, S3, Lambda, EC2, Redshift, RDS, Glue, Python, PySpark, MySQL, GIT and Agile Methodologies.

Virtusa, Hyderabad, India,

Data Engineer Jan 2017 – Aug 2019

• Designed and deployed scalable data warehousing solutions using GCP BigQuery, optimizing storage and query execution to support healthcare analytics and reporting.

• Implemented secure data ingestion pipelines in GCP Cloud Dataflow and Dataproc, significantly improving data availability and processing time for real-time decision making.

• Configured and managed GCS Buckets for efficient data storage and access, ensuring data integrity and compliance with healthcare data protection regulations.

• Developed complex ETL processes using Apache Beam and GCP Data Prep, facilitating seamless data transformations and integrations across various sources including Salesforce.

• Automated data operations using Cloud Functions and Cloud Shell scripts, enhancing system efficiency and reducing manual intervention by 50%.

• Optimized SQL queries and database schemas in MySQL, improving performance and scalability for high-volume healthcare data.

• Integrated external data sources with Snowflake, creating a centralized data platform that supports advanced analytics and machine learning models.

• Utilized Power BI for dynamic data visualization, delivering actionable insights through custom dashboards and reports for healthcare stakeholders.

• Developed and maintained real-time analytics solutions using Spark and Spark-SQL, providing timely insights into patient care and operational metrics.

• Programmed scalable data processing jobs in Python and Scala, leveraging Apache Hive and Sqoop for complex data aggregation and summarization.

• Designed data security protocols using GCP IAM and VPC controls, safeguarding sensitive patient data and ensuring compliance with HIPAA regulations.

• Led cross-functional teams in agile development environments, using JIRA to track progress and coordinate between project stakeholders.

• Conducted data quality audits and performance optimizations, utilizing SQL Server analysis techniques to maintain high standards of data accuracy and reliability.

Environment: GCP, GCP BigQuery, GCS Bucket, G-Cloud Function, Apache Beam, GCP Cloud Dataflow, GCP Dataproc, GCP Data Prep, Snowflake, Power BI, Cloud Sql, Mysql, Cloud Shell, Gsutil, Bq Command Line Utilities, Sql Server, Python, Scala, Spark, Hive, Spark-Sql.

Education

• Master of Science in Computer Science - Northern Arizona University and Flagstaff, Arizona

• Bachelor of Technology in Computer Science - MLR Institute of Technology and Hyderabad, Telangana

Contact this candidate