Data Engineer

Location:

Snoqualmie, WA

Posted:

March 17, 2024

Contact this candidate

Resume:

BHUVANENDRA JONNALAGADDA

Seattle, WA 716-***-**** ************@*****.*** LinkedIn GitHub

SUMMARY

Master’s graduate who aims at finding a Data Engineer/Data Analyst full-time opportunity, with hands-on experience in AWS and Azure cloud, Apache Spark/Kafka/Flink, Snowflake, Tableau/PowerBI, ETL, Airflow, Docker, Linux, and Git. Solid programming skills in Python, Java, and SQL.

SKILLS

Programming Languages: Python, Java, C, R, SQL, NoSQL, TypeScript, and Flask. Distributed Computing & Version Control: Apache Spark/Kafka/Flink, Git, Linux, Airflow and Docker. Data Analytics & Visualization: DBT, Tableau/ PowerBI, PySpark, MS Excel, Pandas, NumPy, Matplotlib and Seaborn. AWS Cloud: AWS Glue, Athena, Redshift, EMR, QuickSight, Lambda, EC2, RDS, S3, CloudFront, ECR, Kinesis, DynamoDB, SQS, SNS. Azure Cloud: Azure ADF, Data Lake, Databricks, HDInsight, Azure SQL, and Azure Synapse. Databases & Data Warehousing: Snowflake,Postgres, SQL Server, Cassandra, MongoDB. Machine Learning: Statistical Data Mining/Analysis (Hypothesis Testing, Time Series, Experimental Design), Regression, Classification, Clustering, Neural Networks and NLP (Natural Language Processing). Software Development: Object-Oriented Programming, Agile, Scrum, Test-Driven Development. Certifications: AWS Certified Solutions Architect Associate – SAA C03, Snowflake Masterclass, Data visualization using Tableau. WORK EXPERIENCE

Software Developer Vision Computers Inc, India May. 2021 – Jul. 2022

• Implemented data transformation logic within the Apache Airflow data ingestion pipeline to process millions of customer transactional data using Apache Spark contributing to an 18% reduction in processing time.

• Configured Airflow for automatic data quality checks and monitoring for incoming raw invoice data using Spark streaming.

• Integrated the pipeline with Snowflake and leveraged dbt in defining transformations and modeling the data within the Snowflake data warehouse for downstream data analytics.

Undergraduate Research Assistant ESU University, Penn, USA (Remote) Jan. 2021 – Jul. 2021

• Contributed to a research thesis optimizing time series methodologies, resulting in a significant 30% enhancement in data accuracy.

• Applied Python programming skills to analyze time series data, utilizing statistical techniques such as Simple Linear Regression, Decision Trees, and Random Forests, resulting in a 20% increase in data-driven insights. EDUCATION

University at Buffalo – SUNY, Buffalo, NY Aug. 2022 – Dec. 2023 Master's, Data Science, and Its Applications GPA: 3.7 Vel Tech University, India Aug. 2018 – May 2022

Bachelor of Technology, Computer Science and Engineering GPA: 9.18, Dean’s list PROJECTS

Automated Snowflake Data Pipeline for Data-Driven Insights (deeply unstructured to structured)

• Developed an end-to-end data cleaning and transformation pipeline in Snowflake.

• Designed and implemented fact and dimension tables to facilitate data exploration.

• Automated the entire pipeline using Snowflake tasks and streams to run every 5 minutes, minimizing manual intervention in data processing from external stage (AWS S3) into snowflake and storing the cleaned data in final staging area for visualization.

• Designed dynamic and interactive Tableau dashboards and automated the connection for immediate update in visualizations. ETL pipeline for ECDC data using Azure Data Factory

• Implemented an automated ETL pipeline on COVID-19 data using Azure Data Lake Gen 2 with daily triggers and lookups, reducing operational workload by 4 hours per week. This enabled scheduled data updates, failure monitoring and sending alerts.

• This cloud-based solution and data transformation facilitated real-time insights by streamlining data flow for Power BI reports, leading to 10% faster report generation, supporting better decision-making. Performance Analysis: A Comparison Study of Machine Learning and Deep Learning Algorithms

• Increased sales forecasting accuracy by 13% (from 77% to ~90%) by developing an XGBoost forecasting model and cut training time by 15% by using early stopping (validation loss), hyperparameter tuning, and subsampling.

• Leveraged a linear activation function in the hidden layers of a deep neural network model for regression, resulting in a 40% reduction in training epochs, leading to faster model development cycles. Retail Sales Forecasting with XGBoost: Optimizing Inventory and Maximizing Revenue

• Developed a highly accurate (87%) retail sales forecasting model using XGBoost with advanced hyperparameter tuning (grid search) and ensemble methods (stacking, bagging)

• Feature engineered the model by including seasonality indicators, promotional effects and year-on-year variability which further improved the model by 9%.

Contact this candidate