NIHARIKA YENNUM
Email: **********@*****.***
Contact No: 903-***-****
CAREER SUMMARY
Data Scientist with 9+ years of expertise in data analysis, predictive modeling, and statistical analytics using Python, SAS, and SQL. Strong background in exploratory data analysis, machine learning, and predictive analytics. Adept at designing and implementing statistical models and developing data-driven insights to support decision-making.
Proficient in advanced statistical and Machine learning techniques, including Analysis of Variance (ANOVA), Item Response Theory, Hypothesis Testing, Regression Analysis, Design of Experiments, Logistic Regression, Decision Trees, Monte Carlo Techniques, Multivariate Analysis, Cluster Analysis, Time Series Analysis, stochastic Modeling, and A/B Testing.
Hands-on experience with big data tools such as Hive, Sqoop, Spark RDD, and Spark SQL. Skilled in managing large-scale databases, optimizing SQL queries, and writing complex stored procedures across SQL Server and Oracle.
Expertise in data visualization using Tableau and Pivot Tables, transforming raw data into actionable insights through interactive and visually compelling dashboards. Strong ability to quickly adapt to new technologies and business domains.
EDUCATION
Masters in Statistical Analytics, Computing, and Modeling, Texas A&M University (Jan 2016–May 2017)
Integrated Master’s in Mathematical Sciences, University of Hyderabad, India (Jul 2008–Jul 2013)
THESIS WORK/PUBLICATIONS
Published research work on Structured Survey Interviewing, developing a qualitative proportional randomized response model to reduce bias in survey responses. (Authors: Niharika Yennum, Dr Stefen A Sedory, Dr Sarjinder Singh)
https://www.academia.edu/94339301/Improved_strategy_to_collect_sensitive_data_by_using_geometric_distribution_as_a_randomization_device
CERTIFICATIONS
Python Programming Certification (Udemy)
SAS Certified Statistical Business Analyst using SAS9 (License No: SBARM003586v9)
SAS Certified Base Programmer for SAS9 (License No: BP072794V9)
SQL Server Certification (MCSA)
WORK EXPERIENCE
Sr Data Scientist NMDP/ CIBMTR research group Sep 2019–March 2025
Designed and deployed a machine learning model to predict cancer outcomes, enabling physicians to make data-driven clinical decisions.
Implemented regression, ensemble learning techniques including bagging (Random forest) and boosting (XGboost) algorithms to enhance model accuracy by 13%, improving clinical decision-making efficiency.
Developed and automated ETL pipelines using AWS Glue, integrating data from multiple sources into S3, analyzing in Glue and loading to Oracle DB.
Applied custom distance functions for data imputation, improving the completeness and reliability of datasets.
Conducted model validation and hyperparameter tuning to optimize predictive performance.
Restructured interactive visualizations and automated process for updating content used in standardized dashboards using Tableau.
Created mapping documents and SQL procedures to ensure accurate data processing and adherence to business rules.
Technologies: Python (Scikit-learn, Pandas, NumPy, Seaborn, Matplotlib), Pyspark, SAS/Base, SAS/STAT, AWS (S3, Lambda, Glue, SageMaker), Hadoop, SQL (PL/SQL, Oracle DB), Tableau, Git, Jira, Confluence.
Director of Psychometrics/Statistician Arizona Department of Education Apr 2018–Sep 2019
Engineered a classification model utilizing Python to predict student performance based on 50+ features; model achieved 90% accuracy in identifying at-risk students.
Led data acquisition, preprocessing, and feature engineering efforts, collaborating with data scientists to optimize model performance.
Provided statistical guidance in test construction, item selection, and validation using advanced techniques such as Regression and Rasch models.
Quantified the impact of student accommodations via ANOVA and A/B testing on 8000+ student records; identified 3 statistically significant biases impacting traditionally underserved student populations.
Reviewed stored procedures and wrote test queries to validate data accuracy across SQL Server and Oracle-based data marts.
Technologies: Python (Scikit-learn, NumPy, Pandas, Matplotlib), Pyspark, SAS Base, SAS/STAT, Hive, Tableau, Microsoft SQL Server, Azure (Blob Storage, Data Factory), Git, Jira, Confluence.
Sr Statistical Analyst AMEX, AZ Aug 2017–Apr 2018
Developed, Scored & Monitored Credit Scoring, Marketing Campaign Analysis and Fair Lending custom ML models, to help banking clients, increasing conversion rates.
Prepared Modelling documents (MDDT), On-going Performance Monitoring Documents (OPM) and ad-hoc analysis for model explainability
Analyzed large datasets using SQL and SAS, identifying key trends and insights for the credit risk team.
Collaborated with senior data scientist to improve the model performance, achieving 15% reduction in the processing time.
Delivered presentations on data findings and model results to stakeholders, facilitating informed decision making.
Technologies: Python (Scikit-learn, NumPy, Pandas, Matplotlib, Seaborn), Base SAS, SAS Macros, Hive, Alteryx, Microsoft SQL Server, Hadoop
Data Scientist/Risk Analyst Igreen Systems, India Jun 2013–Dec 2015
Developed Predictive models/Scorecards/Segmentation that predict customer behavior such as delinquency, payment rate, profitability for regional banks.
Designed and implemented Python and SAS programs to clean, transform, and harmonize datasets for predictive modeling.
Created KPIs to assess yearly growth and inform financial risk policies.
Optimized SQL queries to aggregate and transform data for reporting to analyze the drivers of loss estimates.
Generated ad hoc reports on monthly and quarterly risk portfolio.
Technologies: Python (Scikit-learn, Pandas, NumPy, Matplotlib), Hadoop, Tableau, SAS (Base SAS, SAS/Access), SQL (Microsoft SQL Server), Excel Pivot Tables