Data Engineer Azure

Location:

Peoria, IL

Posted:

October 08, 2025

Contact this candidate

Resume:

Abdul Aziz

Cloud Data Engineer

*********.*******@*****.***

+1-234-***-****

PROFESSIONAL SUMMARY

• Having 8 Years of Cloud Data Engineer experience in all stages of software application development life cycle of data ware housing & Data lake applications using various software tools.

• Worked in all phases of the Software Development Lifecycle (SDLC) - Analysis, Design and Modeling, Development, System Testing, System Implementation and Maintenance.

• Actively involved in ETL design, coding using Ab Initio ETL tool to meet requirements for extract, transformation, cleansing, and loading of data from source to target data structures.

• Practical understanding of the Data modeling concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.

• Worked on Cloud provisioning tools such as Terraform and CloudFormation.

• Implemented Slowly Changing Dimensions-Type I&II in Dimension tables as per the requirements.

• Involved in various projects related to Data Modeling, System/Data Analysis, Design and Development for both OLTP and Data warehousing environments.

• Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, Cloud Watch and other services of the AWS family.

• Familiar with latest technologies Spark and SCALA & Kafka.

• Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.

• Implemented Azure Landing zone architectures as per the application cloud requirements.

• Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing.

• Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning, advanced data processing. And experience optimizing ETL workflows.

• Designed and implemented scalable ETL pipelines using AWS Glue, Lambda, and Step Functions to process terabytes of data daily, improving data ingestion efficiency by 35%.

• Optimized data storage and retrieval by setting up Amazon Redshift clusters and S3 data lakes, reducing query execution time by 40%.

• Developed real-time data streaming solutions leveraging Kinesis Data Streams and Firehose, enabling near real-time analytics for business dashboards.

• Developed and optimized big data pipelines on Databricks, leveraging Apache Spark to process and analyze large-scale datasets, reducing processing time by 50%.

• Integrated Databricks with cloud platforms like AWS and Azure for seamless data ingestion, transformation, and machine learning workflows, ensuring high scalability and performance.

• Ensured data security and compliance by implementing IAM roles, VPC configurations, and encryption standards in AWS, adhering to GDPR and HIPAA guidelines.

• Good knowledge in Python Language and in UNIX shell scripting and Autosys JIL scripts.

• Having experience in Creating in Pyspark Notebooks for Databricks(AWS&Azure).

• Involved in data manipulation using python and spark scripts, which will be useful for faster data processing.

• Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns. TECHNICAL SUMMARY:

Cloud Technologies AWS & Azure

Databases Teradata, Oracle 10g/11g, Exadata, My SQL, SSMS, DB2, Cosmos DB, Hadoop Hive, Snowflake.

Other tools Autosys, Control-M, HP ALM, Jenkins, IBM UDeploy, GitHub, WinSCP, MS Visio, Service Now, AUTOMIC

(AROW),Kafka,Azure,Big Data Technologies (Apache

Hadoop & Spark,EMR)

Programming Languages SQL, PL/SQL,C#, Python(NumPy, Pandas),Pyspark, Scala.

Operating Systems Unix&Linux,Windows.

ACADEMIC DETAILS:

Bachelor’s in Mechanical Engineering from JNTUA, INDIA (2015). Master’s in Computer Science from Youngstown State University, OH (2023). PROFESSIONAL EXPERIENCE:

TIAA, Charlotte, NC

Azure Cloud Data Engineer

Jan 2024 – Till date

Environment: Azure Data Factory, Azure Data Bricks, Azure Data Lake, Azure Landing Zone, MS Visual Studio, GitHub, Terraform, PySpark, Scala,Oracle,Qlik Replicate.SQL,MS Power BI,Apache Kafka,Snowflake.

Roles& Responsibilities:

• Designing ETL pipelines to move data from Oracle to Snowflake using Qlik Replicate.

• Creating STREAMS in Snowflake to captures the CDC Data.

• Creating TASKS in Snowflake to load the automatically to the KAFKA TOPICS.

• Developed the Data Quality check code by using the Azure services and PySpark language.

• Engaged in Data migration from Oracle to Snowflake Datawarehouse by using ETL Pipelines & Qlik Replicate.

• Preforming data analysis on Snowflake & Oracle tables using Python. Involved in creating STREAMS&TASKS in snowflake to capture the CDC Data.

• Created Multiple TASKS in snowflake to load the Kafka Topics from snowflake tables, views.

• Worked on creating the JDBC connector for source & targets to communicate with KAFKA.

• Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

• Implementing and supporting deployments to PROD and NON-PROD .

• Monitoring the jobs during pre-production phase and also jobs running on PROD.

• Worked on PySpark and Spark SQL logics to eliminate the bugs.

• Experience in Different kind of Data platforms support issues (CUBE, DATAMARTS) to give better experience to the end users while consuming the data from the above Data Platforms. Benefits.

• Worked on Creating scalable and modular Azure landing zone architecture to meet various deployment needs.

• Implemented repeatable Azure landing zone infrastructure which allows you to apply configurations and controls to every subscription consistently.

• Developed the performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.

• Developed Terraform scripts for the infrastructure orchestration.

• Developed Notebooks and ETL Pipeline in Azure Data Factory (ADF) that process the data.

• Involved in Configuring virtual machines, storage accounts, resource groups.

• Worked on Feasibility around using SFTP around Azure.

• Designed the Sink connector for ADLS to copy the data from Apache Kafka Topics.

• Worked on creating the Operational Data Store on Snowflake.

• Involved in creating migrating the data from IBM DB2 ODS to Snowflake ODS.

• Worked on Gathering the data modeling requirements and design and implementation of the Conceptual,Logical,Physical models.

HCL /Wells Fargo, Chennai, IND

Azure Cloud Data Engineer

May 2020 – June 2022

Environment: Azure cloud Services(Azure Data Factory, Azure Data Bricks, Azure Data Lake), Azure Landing Zone, MS visual studio,Github,Pyspark,Scala,Terraform,SQL Server,SQL,MS Power BI.

Roles& Responsibilities:

• Analysis,Design and Build Modern data solutions using Azure Cloud services to support visualization of data.

• Understand current Production state of application and determine the impact of new implementation on existing business processes.

• Extract Transform and Load data from the different Sources Systems to Azure Data Storage services using a combination of Azure Data Factory,Spark SQL and Python and Azure Data Lake Analytics .

• Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.

• Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.

• Responsible for Azure Data Factory job monitoring and troubleshooting the failures and providing the resolution for the ADF jobs failures.

• Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

• Worked on Terraform scripts to fulfill the cloud infrastructure design requirements.

• Worked on Parquet file format and other kind of different files types.

• Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark data bricks cluster.

• Designed a new Azure landing zone architecture putting into consideration the network, compute and other Azure Cloud adoption framework principles and deployed azure resources in the new Azure landing zone.

• Deployed network resources like VNets,Subnets,Vnet Peering for connectivity with in Azure landing Zone.

• Extensively used Pandas to create Shared functions to perform DATE,COUNT,MERGE in Numerous Azure Databricks Notebooks.

• Managed large datasets using Panda data frames and MySQL.

• Used Pandas Modules to Analyze data stored in Databases and files and to create the reports.

• Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.

• Worked on Production bugs especially involved in Azure Databricks Notebooks bugs and provided the new Pyspark and Spark SQL logics to eliminate the bugs.

• Developed Notebooks and also ETL Pipeline in Azure Data Factory (ADF) that process the data according the job trigger.

• Involved in creating Data freshness Dashboard by using Power BI to generate the application health reports.

• Hands-on experience on developing SQL Scripts for automation purpose.

• Worked on Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.

• Involved in Database Design and development with Business Intelligence using SQL Server 2014/2016, Integration Services (SSIS), DTS Packages, SQL Server Analysis Services (SSAS), DAX, OLAP Cubes, Star Schema and Snowflake Schema. HCL/ Cigna Healthcare, Chennai, IND

AWS Data Engineer

June 2018 – April 2020

Environment: Azure,Autosys,SQL,Teradata,Oracle,Unix,IBM Udepoly,Jenkins,Python, Hadoop Hive, MapReduce, Airflow,HDFS,Ab initio Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW), AWS Services like Amazon S3, EC2, EBS,AWS Glue,Redshift,AWS Athena, EMR.

Roles& Responsibilities:

• Studied in-house requirements for the Data warehouse to be developed, conducted one-on-one sessions with business users to gather data warehouse requirements.

• Analyzed database requirements in detail with the project stakeholders by conducting Joint Requirements.

• Development sessions Developed a Conceptual model using Erwin based on requirements analysis.

• Developed normalized Logical and Physical database models to design OLTP system for insurance applications.

• Created dimensional model for the reporting system by identifying required dimensions and facts using Erwin r7.1.

• Worked on critical sandbox-integrations tasks as a part of EME migration.

• Involved in designing and deploying multi-tier applications using all the AWS services like

(EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation.

• Supporting Continuous storage in AWS using Elastic Block Storage, S3, Glacier. Created Volumes and configured Snapshots for EC2 instances.

• Used Data Frame API in Scala for converting the distributed collection of data organized into named columns, developing predictive analytic using Apache Spark Scala APIs.

• Worked in AWS Databricks Notebooks to perform Transformation on batch data.

• Involved in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.

• Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.

• Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data.

• Understand current Production state of application and determine the impact of new implementation on existing business processes.

• Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics . Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.

• Installed and configured Apache airflow for workflow management and created workflows in python.

• Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark data bricks cluster.

• Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.

• To meet specific business requirements wrote UDF’s in Scala and Pyspark.

• Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity.

• Hands-on experience on developing SQL Scripts for automation purpose.

• Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS).

• Well versed in SQL Server and T- SQL (DDL and DML) in constructing Tables, Normalization/ De normalization Techniques on database Tables.

• Involved in Creating and Updating Clustered and Non-Clustered Indexes to keep up the SQL Server Performance.

• Worked in creating and managing fragmentation of Indexes to achieve better query performance. Experience in Performance Tuning and Query Optimization. HCL/Mastercard, Chennai, IND

AWS Data Engineer

February 2017 - May 2018

Environment:

AWS (EC2, S3, RDS, DynamoDB, Glue, CloudFormation, EMR), Snowflake, Apache Spark (Scala, PySpark, Spark SQL), Hadoop, Hive, HBase, Sqoop, Python, Scala, Java, C#, SQL, Terraform, CloudFormation, Datadog, SSIS, Snow SQL, AWS Lambda, Oracle, MySQL, SQL Server, Teradata, Redshift, SSRS. Roles& Responsibilities:

•

• Involved in Project promotion from DEVELOPMENT to UAT, PRODUCTION environment after creating TAGS.

• Worked on critical sandbox-integrations tasks as a part of EME migration.

• Involved in designing and deploying multi-tier applications using all the AWS services like

(EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation.

• Supporting Continuous storage in AWS using Elastic Block Storage, S3, Glacier. Created Volumes and configured Snapshots for EC2 instances.

• Used Data Frame API in Scala for converting the distributed collection of data organized into named columns, developing predictive analytic using Apache Spark Scala APIs.

• Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.

• Developed Hive queries to pre-process the data required for running the business process.

• Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.

• Implementations of generalized solution model using AWS Sage Maker.

• Extensive expertise using the core Spark APIs and processing data on an EMR cluster .

• Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.

• Worked on creating infrastructures through Terraform and CloudFormation.

• Developed Terraform scripts to form a Cloud infrastructure.

• Automated Datadog Dashboards with Stack through Terraform scripts.

• Created Terraform Scripts for cloud watch alerts.

• Programmed in Hive, Spark SQL, Java, C# and Python to streamline the incoming data and build the data pipelines to get the useful insights, and orchestrated pipelines.

• Extensive expertise using the core Spark APIs and processing data on a EMR cluster Worked on ETL pipeline to source these tables and to deliver this calculated ratio data from AWS to Datamart (SQL Server) & Credit Edge server.

• Experience in using and tuning relational databases (e.g. Microsoft SQL Server, Oracle, MySQL) and columnar databases (e.g. Amazon Redshift, Microsoft SQL Data Warehouse)

• Engaged in Capital one Modernization concepts to achieve the ETL logics by using Python, Spark, Scala.

• Performed Data analytics on Data Lake of Capital one using Pyspark on Databricks platform.

• Performed the data validations for ECE cloud files against legacy DDE data files using Data Comparison Tool.

• Use Cloud EC2 instances to execute the ETL jobs and publish data to EFG S3 buckets for external vendors.

• Good working knowledge on Snowflake and Teradata databases.

• Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle/Snowflake.

• Expertise in snowflake to create and Maintain Tables and views.

• Worked on importing and exporting data from snowflake, Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.

• Architected and implemented very large scale data intelligence solutions around Snowflake Data Warehouse.

• Developed ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.

• Good working knowledge on Snowflake and Teradata databases.

• Played key role in Migrating Teradata objects into Snowflake environment.

• Heavily involved in testing Snowflake to understand best possible way to use the cloud resources.

• Leveraged Python’s graphics APIs for creating graphics and serialization libraries for encoding data in XML/JSON formats.

• Developed python scripts to parse XML documents and load the data in database.

• Used Snowflake Datamart’s to find the lineage of the data and how it is transformed in staging and distribution projects.

• Designed and developed various SSIS packages (ETL) to extract and transform data and involved in Scheduling SSIS Packages.

• Created ETL metadata reports using SSRS, reports include like execution times for the SSIS packages, Failure reports with error description.

• Created OLAP applications with OLAP services in SQL Server and build cubes with many dimensions using both star and snowflake schemas

HCL/Airtel, Chennai, IND

Data Analyst

June 2015 – January 2017

Environment: Azure,Ab Initio GDE&CO>OP,Oracle Exadata, Hadoop Hive,Unix,Hp ALM, Jenkins, SQL,Spark,Python IDLE, MS Visio, Ab initio, Airflow, Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW).

Roles& Responsibilities:

• Review the Data-Modelers requirement specifications with the client and provide comments to the manager about the ETL logics.

• Involved in analyzing business needs and document functional and technical specifications based upon user requirements with extensive interactions with business users.

• Worked with project manager to determine needs and applying customizing existing technology to meet those needs.

• Involved in Designing the ETL process to Extract, transform and load data.

• Development of Generic Graphs plans by using Ab initio and Usage of Active Transformations like Transform, Partition, De-partition, Sort components and Different Lookups Functions.

• Developed PSET generation Unix Shell script by providing values through list file.

• Used checkpoint and phasing to avoid deadlocks and re-run the graph in case of failure.

• Performing transformations of source data with transform components like join, match sorted, reformat, dedup sorted, Filter by expression.

• Wide usage of lookup files while getting data from multiple sources and size of data is limited.

• Keenly involved in creating data warehouse by using data warehouse concepts CDC and other SCD types.

• Hands on experience in Creating Tables in Hive Database By using DDL’s.

• Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.

• Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine the impact of new implementation on existing business processes.

• Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark data bricks cluster.

• Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.

• To meet specific business requirements wrote UDF’s in Scala and Pyspark.

• Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.

• Hands-on experience on developing SQL Scripts for automation purpose.

• Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS).

Contact this candidate