Resume

Data Engineer Azure

Location:

Dallas, TX

Posted:

February 24, 2024

Contact this candidate

Resume:

SHIRISHA BADDAM

Data Engineer

+1-816-***-**** ad3v3r@r.postjobfree.com Kansas City, MO LinkedIn SUMMARY

Experienced Data Engineer with over 5 years of expertise in Big Data Technologies, Data Analytics, and Various Cloud Services.

Experience in building data pipelines using Azure Data Factory, Azure Databricks, and loading data to Azure Data Lake, Azure SQL Database, Azure SQL Data Warehouse to control and grant database access.

Experience in working with Azure Cloud and its components like Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Stream Analytics, Logic Apps, Function Apps, Key Vault and Azure DevOps services.

Experience in working with Various Big Data Components like HDFS, YARN, MapReduce, Spark, Sqoop, Oozie, Pig, ZooKeeper, Hive, HBase, Kafka and Airflow.

Expertise in developing multiple confluent Kafka Producers and Consumers to meet business requirements. Store the stream data to HDFS and process it using Spark.

Extensive knowledge in all phases of Data Acquisition, Data Warehousing (gathering requirements, design, development, implementation, testing, and documentation), Data Modeling (analysis using Star Schema and Snowflake for FACT and Dimensions Tables), Data Processing and Data Transformations (Mapping, Cleansing, Monitoring, Debugging, Performance Tuning and Troubleshooting Hadoop clusters).

Experienced in scripting with Python (PySpark), and Spark-SQL for development, aggregation from various file formats such as XML, JSON, CSV, Parquet.

Monitored SQL Server performance and implemented tuning techniques to optimize query execution.

Sound knowledge in developing highly scalable and resilient Restful APIs, ETL solutions, and third-party integrations as part of Enterprise Site platform using Informatica.

Experience in using bug tracking and ticketing systems such as Jira, and Rally, used Git for version control.

Involved in migration of the legacy applications to cloud platform using DevOps tools like GitHub, Jenkins, JIRA, and Docker.

Highly involved in all facets of SDLC using Waterfall and Agile Scrum methodologies.

Collaborate with business, production support, engineering team regularly for diving deep on data, effective decision making and to support analytics platforms.

TECHNICAL SKILLS

Programming Languages: Python, R, Java, C, C++, SQL, NoSQL, HiveQL, Data Structures

Databases/ Warehouses: MS-SQL Server, Oracle, PostgreSQL, DB2, MongoDB, Snowflake

Big Data Technologies: HDFS, Yarn, MapReduce, Pig, Hive, HBase, Hadoop, Oozie, Apache Spark, Kafka.

Azure Cloud Services: Azure Data Factory, Azure Databricks, Logic Apps, Functional App, Azure Synapse Analytics, Azure Stream Analytics, Azure DevOps, Azure Event Hub, Azure Cosmos DB, Azure Active Directory, Azure Key Vault

ETL Tools/Reporting Tools: Informatica, DBT, SSIS, SSRS, SSAS, Tableau, Power BI.

Development Tools: Eclipse, Visual Studio, NetBeans, IntelliJ, Microsoft Office Suite (Word, Excel, PowerPoint, Access)

Containerization and Orchestration: Docker, Kubernetes

Version Control: Git

Project Management Tools: Jira, Rally

Continuous Integration/Continuous Deployment: Jenkins, GitLab CI

Methodologies: Agile/Scrum, Waterfall.

PROFESSIONAL EXPERIENCE

Client: United Health Group, Eden Prairie, Minnesota Role: Data Engineer November 2022 – Present

Designed and implemented Extract, Transform, Load (ETL) pipelines leveraging Azure Databricks and Azure Data Factory

Built robust and scalable data integration pipelines to extract, transform, and load data from various sources, including electronic health records (EHRs), claims data, and provider systems.

Performed data cleansing and applied transformations using Databricks and Spark data analysis.

Involved in the development of automated workflows for daily incremental loads, moving data from traditional RDBMS to Data lakes.

Created database objects such as tables, views, stored procedures, triggers, packages, and functions using T-SQL to provide efficient data management and structure.

Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.

Executed end-to-end data engineering tasks, incorporating dimensional modeling techniques for data modeling, ensuring data quality and integrity.

Collaborated with cross-functional teams to seamlessly integrate AWS and Azure cloud platforms, optimizing data storage.

Created Pipelines in Azure Data Factory to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.

Facilitated data for interactive Power BI dashboards and reporting purposes.

Championed the adoption of CI/CD tools, streamlining the deployment process and ensuring a smooth deployment.

Utilized Python, Java, and Scala to engineer robust ETL processes, automating data workflows and reducing manual intervention.

Collaborated with cross-functional teams to integrate Snowflake as a central data warehouse solution, leading to improved query optimization and data analysis capabilities. Client: Progressive, Mayfield, Ohio

Role: Data Engineer October 2019 – December 2021

Developed and maintained end-to-end operations of ETL data pipeline and worked with large data sets in azure data factory.

Enhanced Azure Functions code to efficiently extract, transform, and load data from various sources such as databases, Restful APIs, and file systems, resulting in optimized data processing and integration.

Integrated on-premises (MySQL, SQL Server) and cloud-based (Blob storage, Azure SQL DB) data using Azure Data Factory, applying transformations, and loading data into Snowflake.

Extracted data from various sources such as HDFS and HBase into Spark RDD, leveraging PySpark for efficient data processing.

Developed data ingestion pipelines to bring data from various sources into Azure Synapse.

Developed event-driven data pipelines using Azure Event Hubs to ingest and process large volumes of events and messages in real-time, facilitating real-time analytics, monitoring, and alerting.

Designed and implemented data storage solutions using Azure services such as Azure SQL Database, Azure Cosmos DB, and Azure Data Lake Storage.

Built ETL solutions using Databricks by executing code in notebooks on data in data lake using delta lake and loading data into Azure SQL Pools.

Leveraged tools like Azure Data Factory, Azure Databricks, or Synapse Pipelines for data transformations.

Involved in developing batch processing applications that require functional pipelining using Spark APIs.

Led the development of a robust data pipeline system using Python and Airflow, resulting in a significant reduction in data processing time and improved system efficiency.

Introduced proactive monitoring and alerting mechanisms, ensuring the continuous availability of critical data processing workflows.

Client: Walgreens, Deerfield, Illinois

Role: Data Analyst June 2018 – September 2019

Analysed and extracted data from various confidential databases and Data Mart using SQL/Oracle (Toad) queries/codes, achieving an average of 90% successful extractions per month.

Developed intricate SQL queries for data analysis and extraction, resulting in the identificationof critical hiring trends and improved retention rates by 10%.

Conducted quality assurance on tables in databases using Python, achieving a thorough check on 99% of tables and identifying 8 discrepancies, ensuring data integrity during migration.

Collaborated in writing, testing, and implementing triggers, stored procedures, and functions at the database level using PL/SQL, enhancing the functionality and performance of the system.

Maintained accurate data flow documentation and performed object mapping using Power BI tools and Confluence, ensuring a 100% accuracy in the representation of data flows

Actively participated in daily Agile and Scrum Master meetings, consistently attending 100% of the meetings and providing valuable feedback and project updates.

Created process flow charts, presentations, and defect documentation using JIRA and PowerPoint, contributing to 20% documented processes and 10% identified defects.

EDUCATION

Master of Science in Computer Science

University of Missouri - Kansas City, MO, US. January 2022 – May 2023 Bachelor of Technology in Civil Engineering

CVR College of Engineering, Hyderabad, India. July 2016 – May 2020 ACADEMIC PROJECTS

Covid Data Analytics, Azure Data Factory August 2022 - December 2022

Utilized Azure Data Factory to build pipelines to extract data from local storage and read data through HTTP request and apply transformations using Data Flows, Azure Data bricks and move the data to ADLS Gen 2. CERTIFICATION

Python Certification

TechBuzz IT Solutions Issued in December 2020

Cyber Security Certification

Cisco Networking Academy Issued in January 2021

Data Science Certification

Internshala Trainings Issued in June 2021

Contact this candidate