Sudha Gadde
***********@*****.***
Data engineer with 5 years of experience in building scalable data pipelines, optimizing ETL workflows, and implementing data warehousing solutions using technologies like Apache Spark, Databricks, Snowflake, and cloud platforms.
Professional Summary:
5+ years of hands-on experience in Big Data Engineering, specializing in building scalable ETL pipelines, data warehousing, and real-time data processing across on-premises and cloud environments.
Proficient in technologies such as Apache Spark, Kafka, Snowflake, Airflow, and Databricks, delivering optimized and high-performance data solutions.
Expertise in designing and implementing distributed data processing solutions using PySpark, Azure Databricks, python, SQL and HDFS to handle petabyte-scale datasets.
Design & implemented solutions on Azure cloud by creating pipelines using ADF – Azure data factory, linked services, data sets, Azure Blob Storage, Azure Databricks.
Expertise in using major components of Hadoop ecosystem components like HDFS, YARN, MapReduce, Hive, Sqoop, Spark, Spark SQL, Oozie, Hue.
Experienced in migrating legacy systems to cloud platforms (AWS, Azure), streamlining workflows and improving data scalability and cost-efficiency.
Adept at CI/CD automation using Jenkins, Git, and GitHub, ensuring seamless deployment of data solutions.
Collaborative team player with a proven track record of delivering robust and efficient big data solutions while ensuring data quality and integrity.
Strong expertise in data validation, cleansing, and transformation processes to ensure data quality and integrity across complex datasets.
Extensive experience in working with relational databases (MySQL, PostgreSQL) and NoSQL systems (HBase) to meet a variety of data storage and retrieval needs.
Hands-on experience in building end-to-end ETL workflows using Azure Data Factory (ADF), Synapse Analytics, and ADLS Gen2 to support real-time and batch data pipelines.
Technical skills:
Big Data Ecosystem: Apache Spark, Kafka, Hadoop (HDFS, Hive), Sqoop, Airflow, Oozie
Programming Languages: Python, SQL
DevOps/Automation: GitHub, Jenkins
Data Warehousing: MySQL, PostgreSQL, Snowflake
Cloud Platform: Azure, AWS.
Certifications:
Databricks Certified Associate Developer for Apache Spark.
Microsoft Azure Data Engineer Associate.
Professional Experience:
Client: Cloudwick Technologies Aug 2024 – Present
Role: Data Engineer
Responsibilities:
Engineered batch and real-time data pipelines using Apache Spark, Kafka, and Airflow for high-volume data ingestion.
Automated data transformation logic in Azure Databricks, integrating processed datasets into Snowflake for business reporting.
Managing SQL-based data solutions, including query optimization, stored procedures, and data modeling to support analytics.
Built CI/CD pipelines using Jenkins and Git, improving deployment efficiency and reducing errors across data workflows.
Environment: Apache Spark, Kafka, Airflow, Databricks, Snowflake, Azure Data Factory, Python, SQL.
Client: Kensium Software Solutions Jan 2017 - Oct 2021
Role: Data Engineer
Responsibilities:
Designed and managed HDFS clusters, ensuring fault tolerance and high availability for large-scale data processing.
Refactored and optimized SQL queries in PostgreSQL, enhancing data extraction and transformation performance.
Implemented partitioning and bucketing strategies in HDFS and PostgreSQL, reducing storage costs and improving query efficiency.
Utilized GitHub for version control, streamlining team collaboration and minimizing code conflicts during deployments.
Environment: Apache Spark, Python, PostgreSQL, Airflow, GitHub.
Academic Projects
Breast Tumor Trend Analysis (ML Capstone Project)
Developed classification models (Logistic Regression, Decision Tree, Random Forest) for benign vs. malignant tumor prediction.
Achieved 88.61% accuracy using Random Forest, validated model performance via precision, recall, and F1-score.
Implemented in Python with visualization libraries for data-driven healthcare insights.
Big Data Projects (University):
Processed large datasets using Apache Pig, Hive, and MapReduce for exploratory analysis and reporting.
Migrated legacy data systems with Sqoop, improving data accessibility and reporting accuracy.
Implemented partitioning and bucketing in Hive for efficient query performance.
Built real-time dashboards using Spark and SQL for flight delay prediction and revenue analysis.
Education Aug 2022 – Dec 2024
St. Cloud State University, School of Business, St. Cloud, MN
Master of Engineering Management; (3.83 GPA)