Azure Data Engineer

Location:

Dayton, OH

Posted:

November 21, 2023

Contact this candidate

Resume:

Data Engineer

Srinath Reddy

***************@*****.***

937-***-****

PROFESSIONAL SUMMARY:

•Proficient with 5+ years of IT experience, expertise in Big Data Ecosystem - Data Acquisition, Ingestion, Modeling, Storage Analysis, Integration, and Data Processing.

•Experience working with Azure Cloud, Azure Data Factory, Azure Lake Storage, Azure Synapse Analytics, Azure Analytical Administrations to Ingest, Transform and consolidate Structured and Unstructured Data for downstream Use cases.

•Experience in Building Data Pipelines using Azure Data Factory, Azure Data bricks and loading data to Azure Data Lake, Azure SQL Database/Data warehouse to control and grant user level access.

•Experienced in building data ingestion pipelines on Azure HDInsight spark cluster by using Azure Data Factory and Spark SQL services.

•Experienced in building applications with AWS services like EC2, S3, EMR, Amazon Redshift, Amazon Elastic Cloud Balancing, IAM, Auto Scaling, Cloud watch, Cloud Front, SNS, SQS, SES, and Lambda.

•Experienced in using PowerBI, Tableau, and AWS Quicksight for visualization of data and in building dashboards/reports.

•Created Azure SQL database and managed instances. Performed migration of Microsoft SQL Server to Azure SQL Database and Azure SQL Managed Instance. Also monitored the Azure databases.

•Experience in creating, managing, analyzing, and reporting the internal business client data using AWS services like Athena, Redshift, EMR and QuickSight.

•Responsible for storing data on S3 using Lambda functions and AWS Glue using PySpark.

•Experience working with Batch Ingestion into the platform for Snowflake consumption.

•Worked on distributed frameworks such as Apache Spark and Presto in Amazon EMR, Redshift and interact with data in other AWS data storage services such as Amazon S3 and Amazon DynamoDB.

•Involved in the automation of daily and weekly ETL jobs using Apache Airflow.

•Experience in working with Microsoft SQL Server database programming and as ETL Developer using SSIS, SSRS, SSAS

•Experience in working on python libraries like Numpy, Pandas, and MatlabLib.

•Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features.

•Good knowledge in converting Hive/SQL queries into PySpark transformations using Data Frames.

•Experience working with different file formats like Json, Avro, Parquet, CSV etc.

•Experience in developing applications in Spark using Python/Scala to compare the performance of Spark with Hive.

•Good working experience with Hive and HBase/MapRDB Integration.

•Experienced in developing shell scripts and Python scripts to automate Spark jobs and Hive scripts.

•Experience in Incident Tracking and Ticketing systems such as Jira, Service Now, and Remedy, used Git and SVN for version control.

•Working in a Sprint Methodology with Biweekly Sprints and showcasing work/goals achieved through both internal and external demos.

•Ability to manage multiple projects and eager to learn and adopt new technologies

TOOLS AND TECHNOLOGIES:

Hadoop/Big Data Technologies

Hadoop, MapReduce, HDFS, YARN, Oozie, Hive, Sqoop, Spark, Nifi, Zookeeper and Cloudera Manager, Horton Works.

Azure Cloud Services

Azure Data Factory, Azure Synapse Analytics, Data Lake, Blob Storage, HDInsight, Azure Databricks, Azure Data Analytics, Azure functions

AWS Cloud Services

EC2, EMR, Redshift, S3, Databricks, Athena, Glue, AWS Kinesis, Cloud Watch, SNS, SQS, SES

NO SQL Database

HBase, Dynamo DB, Mongo DB

ETL/BI

PowerBI, Tableau, Snowflake, Informatica, SSRS, SSAS, QlikView, Qlik Sense.

Hadoop Distribution

Horton Works, Cloudera.

Programming & Scripting

Python, Scala, SQL, Shell Scripting, Kafka

Operating systems

Linux (Ubuntu, Centos, RedHat), Windows (XP/7/8/10).

Databases

Oracle, MY SQL, Teradata, PostgreSQL, SQL Server.

Version Control

Bitbucket, Git Lab, Git Hub, Azure DevOps

Education:

Master’s in information technology - University of the Cumber lands

PROFESSIONAL EXPERIENCE:

PwC, Kansas City, MO Dec 2021 - Present

Data Engineer

Roles & Responsibilities:

•Ingesting data from various sources both OnPrem and External sources into Azure Data Lake and increasing the availability from Azure SQL Data Warehouse.

•Developed Data bricks notebooks for extracting data from various source systems such as DB2, Teradata, and performing data cleansing, wrangling, ETL processing, and loading it into Azure SQL DB.

•Built data pipelines using Azure Data Factory by connecting to different database sources using JDBC connectors.

•Processed Ingested data using Azure Databricks and automated Databricks workflow using Python to run multiple data loads or increasing parallel processing.

•Experience in data migration fromMicrosoft SQL server to Azure SQL database.

•Built ETL data pipelines to input data from Blob storage to Azure Data Lake Gen2 using Azure Data Factory (ADF).

•Performed RDD transformations on ingested data to perform streaming analytics in Databricks by Spark streaming.

•Experienced in Spark-SQL for data extraction, data transformation, and data aggregation from multiple file formats for analyzing & transforming.

•Configured Ingestion pipelines and orchestration pipelines using Azure Data Factory, and setting up Email notification alerts for trigger failures which help in monitoring in both Dev and Prod ( CI/CD)

•Automated jobs using Azure Data Factory and used ingested data for Analytics using PowerBI.

•Implemented scalable data services using server less Azure resources such as Data Factory, Synapse, Databricks, Azure Functions and traditional SQL.

•Implemented data warehousing solutions using Azure Data Analytics and Azure Synapse Analytics by applying Apache Spark pool and Synapse to access and move data at scale.

•Developed Spark applications usingpython librarieslike PySpark, Numpy, Pandas for data transformations within Databricks and Azure Functions

•Integrated Azure Key Vault with other Azure services such as Azure Data Factory, Azure Databricks, and Azure Functions to securely access and manage data in data pipelines and workflows.

•Developed Pyspark script to flatten deeply nested JSON files and ingest in raw tables.

•Worked on dimensional modeling in star and snowflake schema and slowly changing Dimensions.

•Developed stored procedures in SnowFlake for loading dimension and fact tables and creating views based on business logic.

•Worked in Agile development team, following Agile principles and methodologies in Bi-weekly sprints.

Environment: Azure Data Factory (ADF v2), Azure Databricks, Azure Data Lake, MS-Azure, Azure SQL Database, Azure functions Apps, Azure Data Lake, BLOB Storage, SQL server, UNIX Shell Scripting, ADLS Gen 2, Azure Cosmos DB, Azure Event Hub, Kafka, Spark Streaming, SQL, Agile Methodology, Snowflake.

American Equity, West Des Moines, IA Aug 2019 – Nov 2021

Data Engineer

Roles & Responsibilities:

•Designed and established an Enterprise Data Lake using AWS, incorporating a multitude of services such as EC2, S3, Redshift, Athena, Glue, EMR, DMS, Kinesis, SNS, and SQS.

•Performed data extraction from several sources including S3, Redshift, and RDS, and employed Glue Crawlers to form databases and tables in the Glue Catalog.

•Created Glue ETL tasks in Glue Studio for data processing and transformation, subsequently loaded it into Redshift, S3, and RDS.

•Used Glue Data Brew for designing reusable transformation recipes within Glue ETL tasks.

•Implemented ETL operations in AWS Glue to transfer data from external sources like S3 and Parquet/Text Files into Redshift in AWS.

•Participates in the development improvement and maintenance of snowflake database applications

•Used the AWS Glue catalog and Athena for carrying out SQL operations, querying, and analyzing data housed in S3.

•Developed data warehouse model in snowflake for over 100 datasets using whereScape.

•Created data sharing between two snowflake accounts.

•Applied PySpark tasks in AWS Glue for data integration from various tables and updating the Glue Data Catalog with metadata table definitions using Crawlers.

•Develop stored procedures/views in Snowflake and use in Talend for loading Dimensions and Facts.

•Integrated AWS Lambda with AWS Glue for process automation and used AWS EMR for effective data transformation and movement.

•Used CloudWatch for setting up logs, notifications, alarms, and monitors for Lambda functions and Glue Jobs.

•Converted Talend Joblets to support the snowflake functionality.

•Performed complete architecture and implementation evaluations of Amazon EMR, Redshift, and S3 AWS services.

•Applied AWS EMR to transfer and transform large data quantities between AWS data storage, like S3 and DynamoDB.

•Used Athena for carrying out data analysis by running queries on data processed from Glue ETL tasks and QuickSight for creating business intelligence reports.

•Employed DMS for transferring tables from diverse databases, both homogeneous and heterogeneous, from on-site to the AWS Cloud.

•Created Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics to capture, process, and store streaming data in Redshift, S3, and DynamoDB.

•Designed Lambda functions to activate AWS Glue jobs based on events occurring in AWS S3.

•Environment: AWS Glue, S3, IAM, EC2, RDS, Redshift, EC2, Lambda, Boto3, DynamoDB, Apache Spark, Kinesis, Athena, Hive, Sqoop, Python.

Environment: AWS Glue, AWS lambda, AWS EMR, AWS S3, AWS Redshift, Spark, AWS Kinesis, AWS Athena, AWS Cloud watch, IAM, SNS, SQS, UNIX Shell Scripting, AWS QuickSight, GitHub,Jenkins, Python and SQL

Varo Bank, San Francisco, CA Jun 2018 – Aug 2019

Data Engineer

Roles & Responsibilities:

•Performed Data transformations in Hive partitions and buckets were used to improve performance.

•Experienced in handling HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN, Spark and Map Reduce programming.

•Configured and monitor resources across the cluster using Cloudera Manager, Search, and Navigator.

•Create external Hive tables for consumption and store data in HDFS using the ORC, Parquet, Avro and different file formats.

•ETL Pipelines were created using the Apache PySpark - Spark SQL and Data Frame APIs.

•Analyzed Hadoop clusters and Big Data analytic tools such as Pig, Hive, HBase, Spark,and Sqoop.

•Used Sqoop to load data into the cluster from dynamically generated files and relational database management systems.

•Partitioning, dynamic partitions, and buckets had been implemented in HIVE.

•Developed HQL queries, Mappings, tables, and external tables in Hive for analysis across multiple banners, as well as worked on partitioning, optimization, compilation, and execution.

•Cloudera Manager is used to continuously monitor and manage the Hadoop cluster.

•Migrated data successfully from on premises to AWS EMR and S3 buckets by writing shell scripts.

•Invoked Python scripts for data transformations on large data sets in AWS Kinesis .

•Mappings done with reusable components such as worklets and mapplets, as well as other transformations.

•Automated data movement between different components by using Apache NiFi.

•Loading data from multiple data sources (SQL, DB2, and Oracle) into HDFS using Sqoop and storing it in Hive tables.

•Migrated data from Teradata into HDFS using Sqoop.

Environment: Hive 2.3, Pig 0.17, Python, HDFS, Hadoop 3.0, HDFS, AWS, NoSQL, Sqoop 1.4, Oozie, Power BI, Agile, OLAP,Sqoop, Cloudera Manger, ORC, Parquet, Avro etc.

Contact this candidate