Data Engineer Science

Location:

Halethorpe, MD

Salary:

60k-70k

Posted:

April 24, 2025

Contact this candidate

Resume:

Bhagavath Sai Darapureddy

571-***-**** Arlington, VA 22202 ***********************@*****.*** LinkedIn GitHub EDUCATION

George Washington University MS in Data Science - GPA: 3.70 08/2023 – present Vignan’s Foundation for Science, Technology

Research

Bachelor’s in Computer Science & Eng. - GPA: 8.26 05/2017 – 04/2021 EXPERIENCE

Tata Consultancy Services — Data Engineer, India 07/2021 – 06/2023

• Contributed to the modernization of Boeing’s data infrastructure and applications, migrating legacy systems to modern cloud-based platforms using AWS services.

• Developed Infrastructure as Code (IaC) scripts with AWS CloudFormation to automate resource provisioning and support large-scale data migration initiatives.

• Migrated and validated MS-SQL Server databases from on-premises to Boeing’s internal AWS Cloud environment using AWS Database Migration Service (DMS), ensuring secure and efficient data transfer.

• Optimized data workflows and re-architected data models to align with cloud-native standards, improving system scala- bility, performance, and maintainability.

• Collaborated with cross-functional engineering and DevOps teams to troubleshoot migration issues, resolve schema incon- sistencies, and ensure seamless integration of modernized data systems. PROJECTS

Exploratory Data Analysis of NYC Yellow Taxi Trip Data with PySpark Streaming — Data engineering Project 12/2024

• Conducted large-scale data analysis on 48M+ NYC Yellow Taxi trip records using PySpark Streaming, enabling real-time insights into fare trends, passenger behavior, and trip patterns.

• Designed and optimized streaming queries for data aggregation, geospatial mapping, and statistical trend analysis, while handling computational constraints efficiently.

• Developed interactive visualizations using Folium and Matplotlib, showcasing trip distributions, fare fluctuations, and passenger density, contributing to scalable real-time analytics. MLOps Pipeline for Screentime Data Analysis using Apache Airflow — Data engineering Project 11/2024

• Designed and automated an MLOps pipeline using Apache Airflow DAGs to preprocess app usage data, extract temporal features, and perform feature scaling and encoding for model training.

• Developed and trained a Random Forest Regressor to predict daily app usage, optimizing performance and achieving a Mean Absolute Error (MAE) of 15.4 minutes.

• Implemented a scalable and reproducible workflow with automated data ingestion, preprocessing, model training, and evaluation, ensuring seamless execution and scheduling. AWS-Integrated ETL Pipeline for Fisheries Data Processing — Data engineering Project 01/2025

• Built a serverless ETL pipeline using AWS Glue, S3, and Athena, transforming 561,675+ records from fisheries datasets into optimized Parquet format for efficient querying.

• Developed an AWS Glue Crawler to automate schema inference and metadata extraction, enabling seamless data cataloging and integration with Athena.

• Queried and analyzed fisheries data using Amazon Athena and SQL, creating Athena Views to extract insights on global fish catch trends from 1950 to 2018.

• Configured an AWS Cloud9 IDE for data preprocessing using Python (Pandas, PyArrow) and optimized query performance with Athena Query Federation and dataset partitioning. SKILLS

Programming Python, SQL, PySpark, Pandas, NumPy, Data Modeling ETL & Workflow Apache Airflow, DBT, Apache Flink, Data Pipelines, Delta Lake, Terraform, CI/CD, Docker, Kubernetes

Big Data Technologies Apache Spark, Hadoop, Hive, Kafka, Flink, Parquet, ORC Databases PostgreSQL, MySQL, MongoDB, Snowflake, Google BigQuery Cloud Platforms AWS (S3, Redshift, Glue, EMR, Lambda, Athena, Kinesis, RDS, DynamoDB, Step Func- tions), Azure Data Factory, Databricks

CERTIFICATIONS:

• Completed Google Advanced Data Analytics Certification

Contact this candidate