NAVYA SONTI
Boulder Colorado 720-***-**** **********.*@*****.***
Summary
I am a results-driven Data Analyst with 3+ years of experience translating complex data into actionable insights that drive business outcomes. Proficient in Python, R, SQL, and Excel (PivotTables, VBA), with hands-on experience in data cleaning, analysis, and visualization using Power BI. Skilled at delivering clear, strategic reports to both technical and non-technical audiences. Adept at working under pressure, meeting tight deadlines, and quickly adapting to new tools and technologies to support performance optimization and business growth.
Skills
• Programming Languages & Tools:
Python, R, SQL, PostgreSQL, SQL Server Management
Studio, Power BI, Excel (PivotTables, VBA), SAP, Jupyter Notebook, VS Code
• Machine Learning & Analytics:
Scikit-learn, TensorFlow, Pandas, NumPy, XGBoost,
Hypothesis Testing, Regression Analysis, Statistical Modeling, Revenue Forecasting
• Data Visualization:
Power BI, Excel, Spotfire, Tableau
• Model Deployment & MLOps:
Docker, MLflow, Streamlit, FastAPI
• Cloud & Automation:
AWS Glue, AWS Lambda, AWS S3, Azure Data Factory,
Databricks
• Big Data & Processing:
PySpark, Apache Spark, Delta Lake, Data
Transformations, Data Preprocessing, Optimization
• API & Data Formats:
REST APIs, GraphQL, JSON, XML
• Collaboration & Project Management:
Git, GitHub, Jira, Cross-functional Projects,
Communication, Leadership, Managing Teams
Experience
Data Analyst 12/2024 to 03/2025
Occidental Petroleum Platteville Colorado
Queried SQL Server Management Studio (SSMS) to identify and extract incorrect or mismatched oil and gas equipment data for Quality Assurance/Quality Control (QA/QC) purposes, utilizing Spotfire to perform data validation and generate actionable insights.
• Queried SQL Server to identify and extract QA/QC discrepancies in oil and gas asset data and implemented automated
• issue flagging using Spotfire for proactive resolution.
• Performed data cleaning using Excel functions (VLOOKUP, XLOOKUP), conditional formatting, and Power BI's Power Query to align SQL and SAP datasets for accurate equipment tracking.
• Mined historical QA/QC logs using SQL window functions and pattern matching to identify repeated error types and failure points in equipment data.
• Designed targeted visualizations to highlight mismatched records, improving transparency and operational data integrity.
• Utilized Power BI, Spotfire, and Excel macros, VBA to develop streamlined reporting pipelines, enhancing efficiency and reducing manual effort.
• Created dynamic Power BI dashboards to visualize discrepancies between SQL and SAP records, supporting rapid error detection, root cause analysis, and data correction. Data Scientist 04/2024 to 12/2024
Virufy.org San Francisco California
Developed and deployed a CNN-based model in TensorFlow to detect COVID-19 from cough audio samples, resulting in a 10% improvement in input data quality and model performance.
• Automated real-time data ingestion and model serving pipelines using AWS Lambda and AWS S3, enabling scalable and
• efficient processing.
• Integrated RESTful APIs to facilitate dynamic data retrieval and seamless model updates across systems.
• Utilized SQL and Python (Pandas, NumPy) within Jupyter Notebook for data preprocessing, feature engineering,
• and transformation.
• Cleaned raw audio data using amplitude thresholding, silence removal, and spectrogram normalization for consistent input quality.
• Mined features from cough audio using MFCC (Mel-frequency cepstral coefficients) and extracted patterns using PCA and t-SNE for exploratory analysis.
• Ensured data integrity and reproducibility using Git for version control and Jira for model lifecycle tracking.
• Resolved class imbalance by applying SMOTE (Synthetic Minority Over-sampling Technique) and implemented GANs to generate synthetic audio samples.
Data Scientist Intern 01/2023 to 05/2023
Virufy.org San Francisco California
Led a cross-functional team of five in the design and development of the foundational architecture for a Convolutional Neural Network (CNN)-based COVID-19 detection system, utilizing audio cough samples for early diagnosis.
• Developed and maintained interactive dashboards in Power BI to monitor key ML metrics like accuracy, precision, recall, and F1-score, enabling rapid iteration cycles.
• Cleaned training data using Z-score based outlier removal, normalization, and manual audio tagging to reduce noise and bias.
• Mined patterns in labeled datasets using Chi-square feature selection and audio clustering with DBSCAN to refine input features.
• Collaborated with stakeholders to interpret analytical findings and provide data-driven recommendations, leading to a 10% improvement in model performance.
• Documented technical workflows and model development processes, ensuring reproducibility and knowledge transfer across teams and contributed to the creation of a comprehensive machine learning development guide for future projects. Data Analyst 01/2022 to 08/2022
University of Colorado Boulder Boulder Colorado
Led a data-driven initiative to optimize event logistics and financial planning by building predictive models and automating reporting processes. The project supported decision-making for resource allocation, reduced operational inefficiencies, and improved overall planning accuracy across events department.
• Built predictive forecasting models using Python and SQL to enhance logistics planning accuracy by 10%.
• Performed comprehensive data cleaning and joined logistics and attendance data using Pandas
(dropna, fillna, merge, groupby) and SQL joins for modeling readiness on historical event and financial data to ensure consistency and accuracy in forecasting.
• Mined historical event trends using time-series decomposition, moving averages, and seasonality patterns to forecast attendance which led to reduction in food wastage.
• Automated reporting workflows in Power BI and Excel (macros, PivotTables, VBA), reducing processing time by 25%.
• Conducted operational audits with R (using dplyr, ggplot2) and Excel to uncover budget discrepancies.
• Utilized Jupyter Notebook and Git for documentation, collaboration, and reproducibility.
• Collaborated with the department manager and student leads to align planning outputs with real-time capacity and budget constraints.
Data Analyst 06/2020 to 07/2021
PhyCare Solutions, Inc. Mangalagiri Andhra Pradesh Focused on financial optimization through forecasting, executive reporting, and advanced analytics. Leveraged large-scale healthcare datasets to identify cost-saving opportunities and support data-driven business strategies.
• Conducted financial analysis on healthcare supply chain and billing data using SQL Server and Excel, VBA to identify and reduce revenue leakage and improve profitability.
• Cleaned billing records using SQL CASE statements, COALESCE, NULL checks, and automated Excel macros for structured inputs.
• Mined operational patterns with decision trees and Apriori algorithm (in Python) to detect cost drivers and billing anomalies.
• Created forecasting models in Jupyter using Python (Pandas, NumPy) and SQL, resulting in a 5% cost reduction.
• Built executive-level Power BI dashboards to visualize financial and operational KPIs, improving strategic decision-making efficiency by 20%.
• Ensured data accuracy by implementing SQL data validation processes and integrating REST APIs for external data sources.
• Used GitHub for version control and Excel for preprocessing, automation, and reporting. Data Analyst Intern 12/2019 to 05/2020
PhyCare Solutions, Inc. Mangalagiri Andhra Pradesh Supported the analytics team in building dashboards, optimizing SQL queries, and conducting statistical analysis to enable financial and operational reporting. Focused on automating KPI tracking and uncovering data-driven insights through R and Power BI.
• Automated KPI dashboards using Excel (VBA, pivot tables, and dynamic formulas) and Power BI, saving weekly reporting time by 30%.
• Cleaned patient and claim records using R functions (na.omit, mutate, select) and Excel VLOOKUP/XLOOKUP.
• Mined clinical and billing trends using K-means clustering and association rule mining (arules package in R).
• Assisted in optimizing SQL query execution plans and built parameterized reports for finance.
• Used Jupyter Notebook and REST APIs for custom JSON data pulls and reporting integration. Education
Master of Science: Data Science 12/2023
UNIVERSITY OF COLORADO BOULDER COLORADO, UNITED STATES
• Coursework: Data Structures and Algorithms, Ethical Issues in Data Science, Cybersecurity for Data Science, Statistical Methods and Applications, Data Mining, Machine learning, Information Visualization, Natural Language Processing
(NLP).
• Certificates: NVIDIA - Building Transformer-Based Natural Language Processing Applications, Preparing Data for Analysis with Microsoft Excel, Harnessing the Power of Data with Power BI
• GPA: 3.65/4
Bachelor of Technology: Electronics and Computer Engineering 04/2020 K L UNIVERSITY ANDHRA PRADESH, INDIA
GPA: 3.45/4
LinkedIn Profiles
• https://www.linkedin.com/in/navya-sonti