Post Job Free
Sign in

Data Science Quality

Location:
Richmond, VA
Posted:
June 13, 2025

Contact this candidate

Resume:

Sai Prerana Mandalika

*************@*****.*** +1-240-***-**** Washington, DC https://www.linkedin.com/in/mandalikap/ WORK EXPERIENCE

Medequip, Inc Jun 2024 – Aug 2024

Data Science Intern. Washington, USA

Designed and implemented modular data models & ETL pipelines using DBT and Snowflake, transforming raw Medisoft EMR billing data into clean, analytics-ready tables for Billing teams, reducing manual data prep by 50%.

Built automated data quality checks and anomaly alerts using DBT tests, orchestrated through Dagster, which improved pipeline reliability and reduced claim processing errors by 40%.

Developed dynamic Tableau dashboards and KPI reports to monitor patient collections, denial codes, and reimbursement trends, enabling real-time insights and reducing revenue cycle turnaround time by 35%. Georgetown University Nov 2023 – Jan 2024

Graduate Research Assistant Washington,USA

Built a deep learning pipeline using structured EHR data to predict Type-2 Diabetes onset, attaining and AUC-ROC score of 80% and improving early risk detection

Developed a stratification framework using stepwise logistic regression (AUC-ROC: 0.83) and KMeans clustering

(Silhouette Score: 0.62), segmenting EHR data into clinically distinct subgroups. Micron Technologies, LLC June 2021 – Feb 2023

Software Developer Hyderabad,IN

Conducted Ad-hoc SQL querying to clean, transform, render transactions between MySQL and external systems.

Developed scalable ETL pipelines using DBT and Snowflake, orchestrated via Apache Airflow to automate 1M+ daily sensor data, for data quality checks, for manufacturing analytics.

Designed Tableau dashboards for real-time monitoring of sensor data, enabling proactive issue detection. RELEVANT PROJECTS

GenAI Powered Data Engineering Agent Jan 2025 – May 2025

Built a data engineering agent using OpenAI LLMs to automate ETL workflows, generating python code for serialization

/deserialization and enabled seamless interaction via Streamlit and FastAPI, boosting interoperability.

Built End to End monitoring using AWS CloudWatch, capturing metrics across Glue, S3, OpenAI APIs-including job success rates, model latency (1.2s), and schema inference accuracy (90%) and reducing ETL failure diagnostics by 40%. Smart City End to End data pipeline April 2025 – Present

Designed and implemented a real-time Smart City data streaming pipeline, integrating Apache Kafka, Zookeeper, and spark within Dockerized environments to enable scalable data ingestion and processing.

Leveraged AWS Glue for ETL, Athena for querying, and Redshift for centralized storage, configured IAM roles for secure access control.

State Analysis Big data project. Jul 2024 – Dec 2024

Developed scalable sentiment analysis system across US states by processing 12M subreddit records with AWS S3, Glue, Redshift, applying PySpark pipeline and fine-tune NLP models to boost sentiment classification accuracy by 25%.

Analyzed large-scale regional data in Redshift to uncover socio-political sentiment trends. EDUCATION

Georgetown University Aug 2023 – May2025

Master of Science in Data Science and Analytics Washington,USA Current GPA: 3.95

Achievements: Returning Student Scholarship

Chaitanya Bharathi Institute of Technology Aug 2017 - Jun 2021 Bachelor of Science in Computer Science and Engineering Hyderabad,IN National Level Smart India Hackathon Winner

TECHNICAL SKILLS

Data Analysis&Visualization: Python, R, SQL, C, Postgresql, Tableau,Looker, Javascript, PowerBI, Airflow, Git, DBT Cloud Technologies: AWS SageMaker, EC2, Bedrock, Cloudwatch, Kafka, PySpark, Glue, Athena, Redshift, FastAPI AI/Machine Learning&Statistical Modeling: Pandas, Numpy, Regression,Tensorflow,Pytorch,NLP, Hypothesis testing



Contact this candidate