Azure Data Engineer

Location:

Williamsville, NY, 14221

Salary:

$50/hr

Posted:

February 21, 2025

Contact this candidate

Resume:

Rishitha Reddy

Senior Data Engineer

Email: ****************@*****.*** Phone: +1-716-***-****

www.linkedin.com/in/rishitha-reddy-733901349

Summary

Over 9+ years of IT experience specializing in Data Engineering and the development of scalable data pipelines across sectors like Hospitality, Entertainment, Healthcare, Retail, and Finance.

Expert in Google Cloud Platform (GCP) technologies, including BigQuery, Cloud Storage (GCS Buckets), Cloud Functions, Cloud Dataflow, Pub/Sub, Dataproc, Cloud Shell, GSUTIL, and BQ Command Line Utilities, enabling efficient real-time data processing and engineering workflows.

Extensive experience with AWS services, such as Elastic MapReduce (EMR), Redshift, S3, EC2, Lambda, and Glue, including configuring auto-scaling servers and Elastic Load Balancing (ELB) for high availability and performance optimization.

Proficient in Azure services, including Azure Data Lake Storage (ADLS), Azure Data Factory (ADF), Stream Analytics, Azure SQL Data Warehouse, Data Lake Analytics, and Databricks, with a focus on building and managing robust data platforms.

Designed and implemented end-to-end GCP pipelines for data ingestion, transformation, and processing using Cloud Dataflow, BigQuery, and Pub/Sub, facilitating integration of structured and semi-structured data sources.

Migrated traditional SQL databases to AWS Redshift, leveraging effective indexing strategies to enhance query performance and scalability.

Built ETL pipelines with Azure Data Factory (ADF), Azure SQL Data Warehouse, and Databricks, enabling advanced data transformation for large-scale analytics.

Developed high-performance Big Data solutions using GCP Dataproc, integrating seamlessly with BigQuery and Cloud Storage for warehousing and analytics.

Automated ETL processes with AWS Glue, streamlining integration with AWS services to optimize data workflows.

Designed robust data pipelines on Azure using ADLS, Data Factory, and Databricks, delivering insights-driven analytics with integrated reporting features.

Delivered scalable data processing systems utilizing AWS EMR, Redshift, and Lambda, supported by S3 for cost-effective storage solutions.

Analyzed and processed real-time data streams using GCP BigQuery and Cloud Dataflow, enabling business-driven insights.

Built advanced data processing frameworks using Azure Stream Analytics, Data Lake Analytics, and SQL Data Warehouse to manage large-scale workflows.

Automated workflows and data ingestion pipelines with AWS Lambda and GCP Cloud Functions, ensuring operational efficiency and error-free data transformations.

Engineered efficient ETL solutions using Azure Data Factory (ADF), GCP Dataflow, and AWS Glue, enabling hybrid cloud integration for data movement and transformation.

Developed serverless data workflows using GCP Cloud Functions, Pub/Sub, and BigQuery, enabling real-time analytics with minimal operational complexity.

Implemented comprehensive data security and compliance strategies across AWS, Azure, and GCP, utilizing encryption, IAM roles, and data masking to meet regulatory standards.

Built hybrid data pipelines integrating AWS S3, Azure Data Lake, and GCP Cloud Storage, ensuring smooth data transfer and seamless analytics across multi-cloud environments.

Technical Skills

Big Data Ecosystem

Apache Spark, HDFS, Yarn, MapReduce, Kafka, HIVE, Sqoop, Oozie

Google Cloud Platform

BigQuery, Cloud Storage, Cloud Functions, Cloud Run, Dataflow, Cloud SQL, Dataplex, dataprep, Dataform, Cloud Pub/Sub.

Operating Systems

Linux, Windows, Unix, Mac-OS-X

Databases

Oracle 11g/10g, MySQL, MS-SQL Server, DB2, Teradata

Version Control Tools

GIT

Tools Used

Eclipse, Putty, Winscp, NetBeans, QlikView, IntelliJ

Methodologies

Agile/Scrum, Rational Unified Process and Waterfall

Monitoring Tools

Google Cloud logging, Google Cloud trace

Scripting Languages

Python, bash/Shell scripting

Programming Languages

Python, Scala, SQL, PL/SQL, Linux Shell Scripts

Data Process/ETL

Data Pipelines, Data Flow Ingestion, Qlik Replicate, Terraform, Tekton, Airflow

Professional Experience

Client: Old national bank, Evansville, Indiana May 2023 to Present

Role: Sr. Data Engineer

Responsibilities:

Acquired extensive experience with Google Cloud Platform (GCP) tools, including BigQuery, Cloud Dataproc, Cloud Functions, Composer (Airflow), and GKE, to design and build scalable ETL pipelines for data processing and analytics.

Developed and implemented data engineering workflows using Azure Data Bricks, Azure Data Lake, and Blob Storage, efficiently handling large datasets for analytics and machine learning applications.

Constructed and maintained ETL pipelines on Azure Data Factory (ADF) and Azure Data Bricks, supporting both real-time and batch data transformations and migrations.

Designed and orchestrated ETL workflows on GCP Composer (Airflow) by developing custom operators, enabling seamless management of data flows across storage and compute layers.

Automated data workflows for large-scale datasets using GCP Dataproc and BigQuery, achieving optimized query performance for structured and semi-structured data.

Processed and transformed data with Python, utilizing libraries such as Pandas, NumPy, and PySpark to enable efficient data manipulation and advanced analytics.

Created real-time analytics and predictive modeling pipelines using Azure Data Bricks, integrating sophisticated machine learning models to derive actionable insights.

Deployed and managed containerized data applications on Google Kubernetes Engine (GKE), ensuring high availability and effective resource utilization.

Built custom operators in GCP Airflow to integrate cloud storage, compute resources, and database services, enabling seamless orchestration and data integration.

Automated data ingestion, transformation, and validation pipelines using Python, Airflow, and Spark, ensuring reliability and accuracy in data workflows.

Engineered high-performance data pipelines leveraging GCP BigQuery and Cloud Functions, handling large-scale data ingestion and analytics tasks efficiently.

Orchestrated and processed data from varied sources through Azure Data Factory and Data Bricks, ensuring clean and high-quality outputs for downstream applications.

Managed and processed real-time streaming data using GCP Pub/Sub and Azure Stream Analytics, delivering actionable insights for business use cases.

Scaled data proPython, SQL, GCP BigQuery, GCP Cloud Dataproc, GCP Cloud Functions, GCP Composer (Airflow), GKE (Google Kubernetes Engine), Azure Data Bricks, Azure Data Lake, Blob Storage, Azure Data Factory (ADF), AWS EC2, AWS S3 Buckets, AWS Lambda, AWS Glue, Amazon Redshift, Hadoop, PySpark, Spark Streaming, Apache Kafka, Hive, HDFS, Sqoop, Power BI, SaaS, Pandas, NumPy, Terraform, GIT, Apache Airflow, Talend, MySQL, Teradata, Spark, Scala, PowerShell.cessing workloads effectively on Google Kubernetes Engine (GKE) to optimize resource utilization and maintain high system availability.

Integrated Azure Data Lake and Blob Storage with Data Bricks to create advanced analytics and machine learning pipelines, enabling data-driven decision-making.

Environment: Python, SQL, GCP BigQuery, GCP Cloud Dataproc, GCP Cloud Functions, GCP Composer (Airflow), GKE (Google Kubernetes Engine), Azure Data Bricks, Azure Data Lake, Blob Storage, Azure Data Factory (ADF), AWS EC2, AWS S3 Buckets, AWS Lambda, AWS Glue, Amazon Redshift, Hadoop, PySpark, Spark Streaming, Apache Kafka, Hive, HDFS, Sqoop, Power BI, SaaS, Pandas, NumPy, Terraform, GIT, Apache Airflow, Talend, MySQL, Teradata, Spark, Scala, PowerShell.

Client: UPMC Health System, Pittsburgh Feb 2021 to May 2023

Role: Sr. Data Engineer

Responsibilities:

Collaborated with business and user groups to gather requirements and design data pipelines for optimized workflows.

Gained expertise in AWS databases, including RDS (Aurora), Redshift, DynamoDB, and Elastic Cache (Memcached & Redis), for scalable data storage and retrieval.

Designed and developed ETL pipelines using AWS Glue and PySpark, integrating data from S3 and loading it into Redshift data marts.

Managed AWS security groups implemented fault-tolerant and auto-scaling architectures with Terraform templates, and automated CI/CD pipelines using AWS Lambda and CodePipeline.

Configured AWS S3 buckets and Glacier for secure and highly available data storage, backup, and retrieval.

Worked with GCP tools like Dataproc, BigQuery, Cloud Functions, and Cloud Storage (GCS) to build scalable and efficient data engineering solutions.

Configured GCP Cloud Shell SDK to set up services such as Dataproc, Storage, and BigQuery, and migrated data from on-premises systems to Google Cloud.

Processed and analyzed streaming data with Spark Streaming and Apache Kafka, leveraging Delta Lakes on GCP Dataproc for unified data management and ACID transactions.

Automated ETL workflows using Apache Airflow on GCP, integrating structured and unstructured data into BigQuery for real-time analytics.

Built interactive dashboards and data visualizations using Tableau and Power BI, sourcing data from BigQuery, GCS, and AWS S3.

Migrated and optimized cross-platform data engineering workflows, integrating AWS Glue, S3, Redshift, and GCP BigQuery.

Designed and implemented data migration pipelines from on-premises systems to GCP Dataproc, BigQuery, and Cloud Storage, ensuring seamless integration and high data availability.

Developed and maintained real-time data processing workflows on GCP Dataproc and AWS EMR using Spark Streaming and Kafka.

Created and automated ETL pipelines with AWS Glue and GCP Composer (Airflow) to streamline data extraction, transformation, and loading across multi-cloud environments.

Configured and monitored data lakes on AWS S3 and GCP Cloud Storage, enabling scalable solutions for structured and unstructured data storage.

Deployed and managed containerized data processing applications on Google Kubernetes Engine (GKE), integrating with AWS Lambda for event-driven workflows.

Environment: AWS RDS (Aurora), AWS Redshift, AWS DynamoDB, AWS Elastic Cache (Memcached & Redis), AWS Glue, AWS S3, AWS Glacier, AWS Lambda, AWS CodePipeline, AWS EMR, GCP BigQuery, GCP Dataproc, GCP Cloud Functions, GCP Cloud Storage, GCP Composer (Airflow), GCP Cloud Shell, GKE (Google Kubernetes Engine), Spark, Scala, PySpark, Spark Streaming, Apache Kafka, Delta Lakes, Terraform, Tableau, Power BI.

Client: Direct energy, Huston TX September 2018 to February 2021

Role: Data Engineer

Roles & Responsibilities:

Designed and implemented ETL pipelines using AWS Glue and Lambda with PySpark, automating data extraction, transformation, and loading processes from diverse sources into Snowflake and AWS S3.

Built scalable data pipelines on AWS EMR, processing data with Spark Streaming and Scala, and integrated Kafka messages to curate and deliver data to Redshift, Athena, and S3 buckets.

Architected and maintained data solutions using AWS S3, DynamoDB, and Snowflake, ensuring optimal data storage and retrieval with advanced security frameworks leveraging AWS Lambda and IAM roles.

Created and deployed AWS CloudFormation templates for provisioning VPCs, subnets, and NAT gateways, ensuring the secure and fault-tolerant deployment of web applications and databases.

Designed and deployed data integration pipelines on Azure Data Factory (ADF) and Azure Data Bricks, enabling seamless ETL workflows for structured and unstructured data.

Implemented advanced analytics models on Azure Data Bricks using PySpark and Python libraries like NumPy, Pandas, and Scikit-learn, driving data-driven insights for business decisions.

Migrated and optimized data pipelines from on-premises systems to Azure, configuring Azure Data Lake and Blob Storage for scalable data solutions.

Built automated workflows for real-time and batch processing using Azure Data Bricks and ADF, focusing on enhancing data transformation and operational efficiency.

Collaborated with Data Science, Marketing, and Sales teams to design and implement data pipelines on AWS and Azure, meeting diverse business requirements.

Monitored and ensured the quality and integrity of data using SQL Databases and automated ETL operationalization in AWS and Azure environments.

Developed and optimized data pipelines in Azure Synapse Analytics and Azure Data Bricks, integrating large datasets from Azure Data Lake and performing advanced transformations using PySpark and SQL.

Configured AWS QuickSight for data visualization, enabling dynamic dashboards and insights by integrating data from AWS S3, Redshift, and Snowflake.

Implemented secure, scalable data workflows by leveraging Azure Blob Storage, Azure Data Factory, and Python, ensuring high availability and performance for ETL processes.

Automated deployment of ETL jobs using Terraform and managed continuous integration and delivery pipelines with Jenkins and AWS Lambda, improving operational efficiency.

Environment: Python, SQL, Hadoop, Pig Scripts, HDFS, AWS S3, Lambda, SaaS, Azure SQL, Azure Data Lake, Dynamo DB, Snowflake, Redshift, Athena, Kafka, Quick sight, EMR, RDS, Elastic Cache, Jenkins

Client: AutoZone, Memphis, TN June 2015 to March 2017

Role: Data Analyst

Roles & Responsibilities:

Installed, configured, and maintained Apache Hadoop clusters for the development of applications in accordance with the specifications.

Created ETL data pipelines by combining technologies such as Hive, Spark SQL, and PySpark.

Developed Spark programs using Scala and batch processing using functional programming techniques.

Added data to Power BI from a range of sources, including SQL, Excel, and Oracle.

Utilized GCP machine learning and data analysis services, such as BigQuery ML and Data Studio, to perform advanced analytics and modeling on data stored in Alation, enabling data-driven decision-making.

Wrote Spark Core programs to process and clean data before loading it into Hive or HBase for further processing.

Utilized data transformation tools such as Data Stage, SSIS, Informatica, and DTS.

Created UML diagrams, including Use Cases, Activity Diagrams, Sequence Diagrams, Data Flow Diagrams, Collaboration Diagrams, and Class Diagrams.

Built ETL pipelines with Pig and Hive to extract data from various sources and import it into the Hadoop Data Lake.

Processed various data types, including JSON and XML, and implemented machine learning algorithms using Python.

Created reusable components such as PL/SQL program units, libraries, database functions, procedures, and database triggers to meet business rules.

Gained experience with GCP Dataproc, GCS, Cloud Functions, Data Prep, Data Studio, and BigQuery.

Used SQL Server Integration Services (SSIS) to extract, manipulate, and load data from various sources into the target system.

Designed data mapping, transformation, and cleaning rules for OLTP and OLAP data management.

Utilized Tableau for data visualization during the rapid model construction process in Python, integrating with SAS and MSSQL databases for updates.

Created numerous DataFrames and datasets using the Spark-SQL context for model preprocessing.

Designed HBase row keys to store Text and JSON as key values in the database, ensuring sorted retrieval with Get/Scan operations.

Led ETL design, including source system identification, source-to-target relationships, data cleansing, data quality checks, and creating ETL design documents.

Acquired strong knowledge and experience in creating Jenkins CI pipelines.

Implemented querying using Airflow and Presto and reporting with PySpark, Zeppelin, and Jupyter.

Installed and set up Airflow for managing workflows and built workflows in Python.

Environment: Hive, Spark SQL, PySpark, Oracle, HBase, Data Stage, Power BI, SSIS, Informatica, Pig, Jenkins, Airflow, Presto, Zeppelin, Jupyter.

Contact this candidate