Varun Kodumuru
Senior Data Engineer
Mobile: +1-603-***-****
Email: *************@*****.***
LinkedIn: linkedin.com/in/varun-kodumuru-90668232b
PROFESSIONAL SUMMARY:
Senior Data Engineer with over 6 years of experience delivering high-impact data solutions across cloud platforms and big data tools. Proficient in optimizing ETL processes to reduce processing time by up to 40%, and in designing data workflows that enhance business decision-making. Skilled in integrating Azure, AWS, and GCP technologies to streamline large-scale data operations, increasing efficiency and reducing operational costs.
CORE COMPETENCIES: Data Warehousing, Data Lakes, Real-Time Data Integration, Big Data Analytics
Cloud Technologies: Azure Data Factory, Azure Synapse Analytics, Azure Databricks, Event Hub, Azure Stream Analytics, Snowflake, AWS Glue, Kinesis, Redshift,AWS Glue, Azure Event Grid, Kinesis, Snowflake
Big Data & Real-Time Processing: Apache Hadoop, Kafka, Spark, Azure HDInsight, PySpark, Delta Lake, Delta Live Tables
ETL & Data Warehousing: Informatica PowerCenter, SSIS, SQL, Azure Data Lake Storage, Amazon S3, Redshift, SQL Server
Data Governance & Security: Azure Key Vault, Microsoft Purview, RBAC, IAM, AWS Lake Formation
Data Visualization: Power BI, Tableau, SSRS, Amazon QuickSight
Programming Languages: Python, R, SQL, Scala, T-SQL, Java
DevOps & CI/CD: Azure DevOps, Jenkins, GitHub, Airflow, BitBucket, Maven
Databases: SQL Server, Oracle, PostgreSQL, MongoDB, Teradata, MySQL
PROFESSIONAL EXPERIENCE:
Sr. Azure Data Engineer
Johnson & Johnson, Arlington, TX June 2023 – Present
Designed and implemented scalable ETL pipelines with Azure Data Factory, enhancing data processing speed by 25% and reducing workflow latency. Leveraged SQL DB, Blob Storage, and Synapse Analytics for streamlined integrations, cutting down data preparation time for real-time analytics.
Built event-driven workflows using Azure Event Grid, improving real-time data processing capabilities.
Designed and developed ETL pipelines using SSIS for integrating diverse data sources into SQL Server and Azure Synapse Analytics.
Designed and implemented real-time ETL/ELT pipelines using Azure Data Factory, integrating diverse data sources and automating data workflows
Developed PySpark applications in Azure Databricks for large-scale data transformations, improving processing efficiency by 25%
Led the creation of SSIS packages to automate data extraction, transformation, and loading (ETL) processes from multiple sources, improving data processing efficiency by 25%.
Developed SQL Server stored procedures, views, and complex queries to optimize the loading of healthcare-related data into the data warehouse, ensuring high data accuracy and integrity.
Utilized Azure Event Grid to enhance real-time processing and reduce pipeline latencies.
Integrated Teradata with SQL Server through custom-built ETL processes to support scalable data storage solutions for business reporting and analytics.
Built data pipelines using SSIS and Azure Data Factory, ensuring seamless and timely data ingestion from Teradata and other enterprise systems.
Optimized data workflows and reduced processing times by 20% through performance tuning of SSIS packages and SQL Server queries.
Developed PySpark applications in Azure Databricks for large-scale data transformations and optimized Spark jobs for enhanced performance.
Integrated IoT data using Azure IoT Hub and Stream Analytics for device-to-cloud and cloud-to-edge communication.
Created interactive Power BI dashboards, delivering actionable insights for business leaders.
Technologies: Azure Data Factory, Databricks, Synapse Analytics, IoT Hub, PySpark, Power BI
Azure Data Engineer
HSBC Bank, New York, NY September 2020 – November 2021
Migrated SQL databases to Azure Data Lake and Azure SQL Database, optimizing data storage and reducing operational costs.
Developed complex SSIS packages for processing large datasets from SQL Server, Oracle, and other legacy systems, automating data ingestion and transformation.
Created and optimized SQL Server stored procedures, views, and functions for efficient data loading and transformation in the ETL process.
Led the migration of SQL databases to Azure Data Lake and integrated real-time data using Kafka and Databricks
Led the migration of on-premise SQL databases to Azure SQL Database, ensuring seamless integration of existing data flows using SSIS and Azure Data Factory.
Implemented Informatica PowerCenter for managing high-volume data transformations, ensuring consistency and accuracy across data pipelines.
Designed and built ETL pipelines using SSIS to extract, transform, and load (ETL) data from various sources into SQL Server, improving data accessibility and reporting efficiency by 30%.
Designed end-to-end ETL pipelines with Azure Data Factory and implemented predictive insights using ML models in Databricks
Integrated Teradata systems into the organization's data architecture using SSIS, enabling advanced reporting and analytics through centralized data processing.
Collaborated with cross-functional teams to troubleshoot ETL issues, streamline processes, and optimize SQL Server and SSIS performance, reducing job run times by 15%.
Ensured data consistency, security, and compliance with industry standards by incorporating data governance practices throughout the ETL processes.
Designed data pipelines using Azure Data Factory, ensuring seamless integration with SQL Server, Oracle, Kafka, and REST APIs.
Led the integration of machine learning models into Databricks pipelines using MLlib for predictive insights.
Implemented Delta Lake for data consistency and fault tolerance in complex data pipelines.
Technologies: Azure Data Factory, Databricks, Azure SQL, Delta Lake, Kafka, Power BI, HDInsight
AWS Data Engineer
Alaska Airlines, Seattle, WA August 2018 – September 2020
Managed ingestion and processing of large-scale datasets using AWS Glue, Amazon Redshift, and S3.
Deployed machine learning models in SageMaker to optimize workflows and improve prediction accuracy by 30%.
Managed large-scale data ingestion and real-time streaming using AWS Glue and Kinesis
Developed ETL pipelines with AWS Glue for real-time and batch data integration, improving data processing efficiency by 30%.
Deployed machine learning models in SageMaker and optimized workflows using AWS Step Functions.
Leveraged Kinesis for real-time data streaming, enabling actionable insights.
Technologies: AWS Glue, Redshift, SageMaker, Kinesis, Step Functions, Athena
ETL Developer
Cipla Pharmaceuticals, India June 2016 – August 2018
Developed and optimized ETL packages using SSIS and Informatica PowerCenter, reducing data loading time by 20%.
Created complex SQL queries for healthcare data processing, ensuring high data integrity and performance.
Designed and maintained a data warehouse, providing healthcare teams with ad-hoc reporting capabilities.
Technologies: Informatica PowerCenter, SSIS, SQL Server, Tableau, ERWIN, SSAS, SSRS
EDUCATION:
Master’s in computer science, Eastern Illinois University
Bachelor’s in computer science and technology, Siddhartha Institute of Engineering and Technology
PROJECT HIGHLIGHTS:
IoT Data Integration: Developed an Azure-based IoT data pipeline integrating real-time data from connected devices using Azure IoT Hub, Stream Analytics, and Databricks.
Data Lake Migration: Led the migration of on-premise SQL and Oracle databases to Azure Data Lake, optimizing data storage for scalability and reducing query times by 40%.
Engineered scalable AWS Glue ETL pipelines, improving data flow, ensuring high availability, and reducing business-critical operation downtimes by 20% for Alaska Airlines.
Led the development of a real-time data pipeline : using Azure IoT Hub and Azure Stream Analytics, integrating large-scale IoT data for cloud-based analysis in Databricks. Optimized ETL pipelines using Azure Data Factory, reducing processing times by 40%