Resume

Azure Data Engineer

Location:

New Haven, CT

Posted:

April 10, 2024

Contact this candidate

Resume:

Sai Lokesh Reddy Vanga New Haven, Connecticut ad4wub@r.postjobfree.com +1-475-***-****

PROFESSIONAL SUMMARY

Overall, 10.9 years of experience as a Data Engineer, with the last 4.4 years focused on Azure cloud services, followed by 6.5 years specializing in Big Data, SQL Database, and Data Warehousing, showcasing expertise in designing and implementing scalable data ingestion pipelines.

I am proficient in various Azure technologies, including Azure Data Factory (ADF), Azure Data Lake Storage (ADLS), Databricks, Snowflake, Synapse, Key Vault, Polybase, Event Hub, Azure Synapse, Logic Apps, Data Flow, Power Query, and Cosmos DB.

Demonstrated expertise in creating Batch and Structured streaming using Kafka, ADF pipelines Databricks, Spark, Delta tables, ADLS Gen2, Azure Key-Vault, and Azure Event Hub, contributing to comprehensive data integration solutions.

Proficient in Azure Function Apps for designing serverless data processing workflows, integrating seamlessly with various Azure services, and optimizing data pipelines for enhanced efficiency.

Hands-on experience with Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats.

Developed ETL pipelines using Python, Snowflake Snow SQL, and PySpark, displaying skills in building and optimizing scalable data pipelines and performing complex data transformations.

Extensive experience in Data Warehousing, particularly with Snowflake and Synapse, covering aspects like Snowpipe, Snow task, and Snow SQL.

Skilled in multiple programming languages such as Python, Scala, PySpark, SQL, and PLSQL for comprehensive data processing.

Demonstrated expertise in Kafka, leveraging its real-time data streaming and processing capabilities, contributing to efficient data pipelines.

Worked on Snowflake Schema, Data Modeling, and Elements, demonstrating expertise in implementing large-scale data intelligence solutions around Snowflake Data Warehouse.

Proficient in various Big Data technologies, including Hive, Cassandra, HDFS, MapReduce, Yarn, Sqoop, Apache Airflow, and components of the Hadoop ecosystem, with a focus on MapReduce.

Applied MapReduce tasks for distributed processing and parallel execution of data-intensive tasks, optimizing data workflows, and delivering scalable solutions for analyzing large datasets promptly.

Proficient in Apache Sqoop for importing and exporting data from HDFS and Hive, extracting data from various databases, and loading it into Hadoop Distributed File System (HDFS).

Extensive background in optimizing query performance in Hive using bucketing and partitioning techniques, along with hands-on experience in tuning Spark Jobs.

Solid expertise in Database Architecture for OLAP and OLTP Applications, data modeling, migration, and warehousing concepts with expertise in performance tuning, and optimizing OLAP and OLTP systems for efficient data retrieval and processing.

Strong experience in Apache Hadoop distributions (Hortonworks and Cloudera), focusing on building data pipelines and performing large-scale data transformations.

Hands-on experience with diverse file formats (Parquet, ORC, Avro, Binary, and CSV) for effective data storage and processing.

Experience in partitioning and repartitioning strategies for optimizing data storage and retrieval.

Proficient in scheduling workflows using Control-M and Apache Airflow, ensuring timely execution of data pipelines.

Production scheduling jobs using Control-M and Airflow, displaying expertise in orchestrating complex data pipelines.

Experience with version control using Git, GitHub, CI/CD practices, and Azure DevOps for efficient collaboration.

In-depth hands-on with the application of Agile methodologies, and in user story definition and project execution using JIRA.

TECHNICAL SKILLS

Azure Services

Azure Data Factory, Azure Data Bricks, Logic Apps, Functional App, Snowflake, Azure DevOps, Key Vaults, Azure Synapse, Event Hubs

Big Data Technologies

MapReduce, Hive, Python, PySpark, Scala, Kafka, Spark streaming, Oozie, Sqoop, Zookeeper

Languages

SQL, PL/SQL, Python, HiveQL, Scala.

Operating Systems

Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS

Build Automation tools

Ant, Maven

Version Control

GIT, GitHub

IDE &Build Tools, Design

Eclipse, Visual Studio.

Databases

MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse. MS Excel, MS Access, Oracle 11g/12c, Cosmos DB

WORK EXPERIENCE

Role: Azure data engineer July 2022 – Till Date

Client: L.L.Bean, Freeport, ME

Responsibilities:

Designed and implemented scalable data ingestion pipelines using Azure Data Factory, ensuring extraction from diverse sources.

Developed end-to-end data integration solutions with Azure Data Factory, orchestrating workflows and loading data into target systems.

Leveraged Azure Databricks for optimizing Spark jobs and integrating diverse data sources for efficient ingestion and transformation.

Engineered ETL pipelines for historical and incremental data transfer to the Snowflake database and utilized Azure Blob Storage and Azure Data Lake Storage Gen 2.

Orchestrated seamless data migration from on-prem SFTP server to Azure Cloud using Azure Data Factory.

Designed Stream Sets for loading incremental data from Postgres SQL Database to Kafka, ensuring smooth data flow into Azure ADLS.

Configured Databricks Notebook settings to process data from Kafka into Delta tables, implementing SQL, Python, and Scala.

Deployed Azure PAAS stack which includes containerized App service hosted in App Service Environment, Key vault with private endpoints, and Service Bus to the development team for Node-based Application

Managed Azure SQL Database for high availability and performance, implementing advanced T-SQL skills for querying and optimization.

Proficient in Azure DevOps for CI/CD workflows, automated pipeline deployments, and version control practices.

Designed and developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation.

Configured monitoring solutions for Azure Data Factory pipelines, including email alerts and notifications for pipeline status changes.

Implemented Kafka topics creation to organize data streams efficiently.

Utilized Debezium connector for capturing Change Data Capture (CDC) logs from relevant sources.

Leveraged Confluent Hub connectors to seamlessly connect Kafka to various storage accounts such as ADLS and SQL.

Expanded capabilities with connectors like Confluent JDBC Sink Connector for integrating Kafka with relational databases, Confluent Elasticsearch Sink Connector for indexing Kafka data into Elasticsearch, Confluent Amazon S3 Sink Connector for archiving data to Amazon S3, and Confluent MQTT Source Connector for IoT data ingestion via MQTT protocol.

Ensured centralized management and monitoring using Confluent Control Center for insights into cluster health and performance metrics.

Enforced robust security measures to safeguard data assets and ensure compliance with industry standards.

Proficiently handled large volumes of data across various file formats such as Parquet, ORC, and Avro.

Optimized performance in managing Azure SQL Database for high availability, implementing advanced T-SQL skills for efficient querying.

Implemented different pipeline types, including incremental, full, and historical data loads.

Applied watermark columns for efficient data tracking and management in ETL pipelines.

Collaborated effectively with cross-functional teams to ensure the implementation and maintenance of best practices in Azure data engineering solutions.

Environment: Azure Databricks, Data Factory, Snowflake, Azure Stack, Azure Synapse, Logic Apps, Functional App, MS SQL, Postgres SQL, SQL, Python, ETL, Scala, PySpark, Shell Scripting, Kafka, confluent.

Role: Azure data engineer Dec 2019 – June 2022

Client: 84.51, Cincinnati, OH

Responsibilities:

Implemented comprehensive data migration strategies, transitioning from traditional systems, SSIS, SSRS, and ETL solutions to Azure, including Lift and Shift and Azure Migrate methodologies.

Demonstrated expertise in Azure Cloud Services, Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis Services, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake for seamless integration and data processing.

Collaborated with Analytics and BI teams to design globally adopted metrics and reporting-on-demand solutions, significantly reducing manual data analysis efforts by over 55%.

Architected an automated environment using PowerShell and Azure Cloud Shell, leading to a 25% reduction in costs for deploying Azure data solutions.

Configured and deployed Azure Automation Scripts for a multitude of applications utilizing the Azure Stack Including Compute, App Services, Blobs, Resource Groups, Azure Data Lake, HDInsight Clusters, Azure Data Factory, Azure SQL, and ARM Services and utilities focusing on Automation.

Developed optimized data models, enhancing productivity by 30% in Azure processing pipelines.

Successfully implemented Proof of Concepts for SOAP and REST APIs, ensuring efficient data integration.

Responsible for estimating cluster sizes, monitoring, and troubleshooting Spark Databricks clusters for streamlined operations.

Developed jobs in Talend Enterprise edition from source to stage, intermediate, conversion and Target.

Worked on Talend ETL to load data from various sources to Oracle DB.

Created Job infrastructure using Talend open studio.

Scheduling and Automation of ETL process with scheduling tool in Autosys and TAC.

Worked on the Design, development and testing of Talend mappings.

Scheduled workflows using shell scripts.

Wrote complex SQL queries to inject data from various sources and integrated it with Talend.

Created WSDL data services using Talend ESB.

Designed and implemented migration strategies for traditional systems on Azure, involving the migration of databases between data centers.

Developed Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from diverse file formats, providing insights into customer usage patterns.

Extensively worked on migrating databases residing on servers across different data centers, addressing challenges and ensuring successful transitions.

Engaged with business users to gather requirements, design visualizations, and provide training on self-service BI tools.

Led the deployment of Confluent Schema Registry for efficient schema management, ensuring data compatibility.

Conducted data transformations within Kafka to format and prepare data for downstream processing.

Utilized various sources, including SQL Server, Excel, Oracle, and SQL Azure, to pull data into Power BI, optimizing data pipelines for improved performance.

Optimized data pipelines and Spark jobs in Azure Databricks, employing tuning techniques such as Spark configurations, caching, and data partitioning.

Environment: Azure Databricks, Data Factory, Azure Synapse, Azure Stack, Logic Apps, Snowflake, Functional App, MS SQL, Talend ETL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, confluent, SQL, Python, Scala, PySpark, shell scripting, Kafka, Power BI.

Role: Big Data Developer Jul 2017 – Nov 2019

Client: Charter Communications, Stamford, CT

Responsibilities:

Spearheaded the building of a multi-node Hadoop Cluster, managing Cloudera Manager, and analyzing Hadoop Log Files for optimal performance.

Configured Hive Metastore to establish multiple user connections to Hive tables, using an Oracle database.

Imported and ingested data into HDFS using Sqoop, with a focus on retrieving data from MYSQL and Oracle databases into HBase.

Developed Hive queries for in-depth data analysis in HDFS, identifying issues and behavioral patterns.

Conducted server-to-server data querying into MapReduce-FS, importing/exporting data using Sqoop, and configuring Hadoop MapReduce and MapReduce -FS.

Participated in developing and implementing the MapReduce environment, running Hadoop jobs to process terabytes of data.

Implemented Scala for Spark streaming and ongoing customer transactions, developing Scala applications in the Spring tool suite.

Utilized Shell scripting to analyze ERP source data and processed it for storage in HDFS, subsequently storing it in Hive tables for trend identification.

Extensively used Sqoop for importing data from RDMS sources into HDFS, performing transformations and cleaning using Hive and Map Reduce.

Installed, configured, and administered Hadoop clusters for major distributions like CDH4 and CDH5.

Designed and implemented APIs to fetch large datasets, ensuring continuous availability and reliability.

Improved database performance through indexing strategies, resulting in a 25% reduction in query response time.

Expertise in importing/exporting data between HDFS and RDBMS using Sqoop, installing and configuring Hive, and writing Hive UDFs.

Implemented partitioning, dynamic partitions, and buckets in HIVE for efficient data access.

Worked in an Agile environment, using the Rally tool for maintaining user stories and tasks.

Utilized Git as a version control tool for efficient code repository management and change tracking.

Environment: MYSQL, ETL, HDFS, Apache Spark, Scala, PySpark, Spark SQL, Hive Hadoop, Pig, Sqoop, Cloudera, HBASE, Kafka, MapReduce, Zookeeper, Oozie, Data Pipelines, RDBMS, Python, YAML PySpark, Ambari, JIRA.

Role: SQL Developer Sep 2015 – Jun 2017

Client: Chevin Fleet Solutions, Fitchburg, MA

Responsibilities:

Creation, manipulation, and supporting the SQL Server databases.

Performed Data modeling and physical and Logical Design of the Database.

Contributed to the front end's integration with the SQL Server backend.

Created Stored Procedures, Triggers, Indexes, User-defined Functions, Constraints etc. on various database objects to obtain the required results.

Import & Export of data from one server to other servers using tools like Data Transformation Services (DTS).

Wrote T-SQL statements for retrieval of data and was involved in performance tuning of TSQL queries.

Transferred data from various data sources/business systems including Oracle, Flat Files, etc. to SQL Server using SSIS/DTS using various features like data conversion, etc. In addition, Created derived columns from the present columns for the given requirements.

Supported team in resolving T-SQL and SQL Reporting Services-related issues, and Proficiency in designing and formatting a variety of reports, including Crosstab, Conditional, Drill-down, Top N, Summary, and Sub reports.

Performed routine maintenance procedures, such as backups, index rebuilds, and statistics updates, to maintain the data warehouse's health and performance.

Provided via the phone, application support. Developed and tested Windows command files and SQL Server queries for Production database monitoring in 24/7 support.

Worked in an Agile environment, using the Rally tool for maintaining SQL user stories and tasks.

Utilized Git as a version control tool for efficient SQL code repository management and change tracking.

Environment: Windows server, MS SQL Server 2016, SSIS, SSAS, SSRS, SharePoint, MS Access, MS Office, PowerBI, Visual Studio 2015/2017, Git.

Role: Datawarehouse Developer Mar 2013 – Aug 2015

Client: Tire Hub - Dunwoody, GA

Responsibilities:

Acquired extensive experience in extracting, transforming, and loading data from heterogeneous source systems such as flat files, Excel, Oracle, and Microsoft SQL Server.

Collaborated with Data Modeler and DBAs to construct data models and table structures.

Actively took part in discussion sessions to contribute to the design of ETL job flows.

Compiled source-to-target mapping documents to facilitate the design of ETL jobs.

Leveraged various stages, including Transformer, Lookup, Merge, Join, Aggregator, Sort, Filter, and Remove duplicates, for comprehensive data cleansing and transformation into staging.

Scheduled Sessions and Batches on the Informatica Server using Informatica Server Manager/Workflow Manager.

Modified and tested PL/SQL stored procedures.

Demonstrated expertise in ETL tools, including Microsoft SQL Server Integration Services (SSIS), Data Transformation Services (DTS), DataStage, and ETL package design, focusing on RDBM systems like Microsoft SQL Servers, Oracle, and DB2.

Applied technical and analytical skills with a clear understanding of ER modeling for OLTP and dimension modeling for OLAP.

Designed and developed SSIS packages, stored procedures, configuration files, tables, views, and functions, implementing best practices for optimal performance.

Migrated SSIS packages from SQL Server 2005 to SSIS 2008.

I worked extensively on SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS), and Microsoft SQL Server Reporting Services (SSRS).

Proficient in Dimensional Data Modeling for Data Mart design, identifying Facts and Dimensions, and developing fact tables and dimension tables using Slowly Changing Dimensions (SCD).

Demonstrated expertise in constructing cubes and dimensions using various architectures and data sources for business intelligence purposes, adept in writing MDX scripting for advanced analytical capabilities.

Experienced in Error and Event Handling: Precedence Constraints, Break Points, Check Points, Logging.

Created intricate reports, including Parameter-based Reports, Graphical Reports, Well-formatted Reports, Drill-Down Reports, Matrix Reports, Charts, and Tabular reports using SSRS.

Environment: Microsoft SQL Server 2012, SSIS, SSAS, SSRS, Share point, DB2, MS Access, MS Office, Team Foundation server, Git.

EDUCATION

Master's in Computer Science, Bridgeport University.

B.Tech. Computer Science and Systems Engineering, Andhra University.

Contact this candidate