Post Job Free

Resume

Sign in

Statistician

Location:
New York, NY
Posted:
June 16, 2023

Contact this candidate

Resume:

Pranjal Srivastava

*** ****** ******, *, ****** City, NJ - 07306

908-***-**** adxqw2@r.postjobfree.com LinkedIn GitHub EDUCATION

New York University, New York, USA Sept 2021 - May 2023 Master of Science in Biostatistics

Rama University, Kanpur, India Sept 2015 - Sept 2020 Bachelor of Dental Surgery

WORK EXPERIENCE

New York University, New York, USA May 2022 - Current Research Statistical Analyst, mHealth Lab, Dr. Thomas Kirchner

• Performed ETL on the raw clinical dataset utilizing the relational database to investigate smoking and vaping trends to identify social determinants of smoking and related diseases as a part of the NIH All of Us Study.

• Utilized SQL and R for data wrangling and cleaning large clinical and EHR datasets with 331,303 rows on a secured Google Cloud Platform leveraging a relational database and statistical packages in R.

• Employed R to create impactful data visualizations that effectively communicated complex relationships between features in relation to neighborhood factors, tobacco use patterns, and tobacco-related disease outcomes. New York University - School of Global Public Health, New York, USA Jan 2023 - May 2023 Graduate Teaching Assistant: Regression Analysis

• Developed weekly review sessions for 80+ graduate students, providing targeted guidance on Linear Regression topics and facilitating practice problems, and leveraged Google Sheets and Excel for records of students.

• Collaborated with the instructor and co-assistants to refine course materials, including 10 assignments, exams, and lecture notes.

• Provided feedback on student performance and achieved high student satisfaction and improved performance, resulting in increased retention rates and positive feedback from both students and faculty. NYU Langone Health, New York, US Jun 2022 - Dec 2022 Graduate Researcher - Biostatistical Consulting, Dr. Phuong Thao Le

• Collaborated with the research team and implemented data cleaning and wrangling of the longitudinal survey data of 300 cancer patients from cancer hospitals in Vietnam within the framework of the Stronger Together Program (STEPPS)

• Led the team, conducted a data analysis to compare the intervention with usual care, used Redcap for data management, and calculated the internal consistency of the SF12 scale utilizing R [Cronbach’s alpha = .86].

• Created tables and figures for scales stratified by 3 stakeholders and communicated with the supervisor to interpret results. PROJECTS

Oral Cancer Prediction Model (R, SQL, Jupiter NB) Feb 2022 - Apr 2022

• Conducted Feature Selection, constructed and compared multiple supervised machine learning algorithms to predict oral cancer risk using a large dataset from the National Institute of Health (0.2 million samples).

• Improved the accuracy and robustness of the model by cross-validating and tuning the random forest model (mtry = 6), resulting in a more accurate and reliable prediction algorithm with a 3% increase in accuracy. Hybrid Music Recommender System (Collaborative Filtering & Matrix Factorization) Feb 2023 - May 2023

• Engineered a collaborative filtering music recommender system using Python, Hadoop, and PySpark, leveraging the Alternating Least Squares (ALS) algorithm for matrix factorization and LightFM for hybrid recommendations.

• Optimized model performance through hyper-parameter tuning, utilized the Annoy library for efficient recommendation retrieval, and conducted a robust evaluation using RMSE, MAP (9.23), NDCG(0.0012), and precision_at_k metrics. Covid 19 US States’ Exploratory Data Analysis Project (R, SQL) Jan 2022 - Feb 2022

• Leveraged R to perform data cleaning on the US Counties Covid 19 data from the CDC with 2,105,876 rows suitable for analysis.

• Executed exploratory data analysis and data visualization techniques, including US map plots and histograms, to gain insights into the mortality rates and identified the top 10 states with the highest mortality rates. Correlations in Health and Daily Activities (SAS Studio) Feb 2023 - Apr 2023

• Conducted statistical analysis and visualization of the class survey dataset with 64 samples, to identify potential variables (daily habits, health status, and location) correlated with age (p-value < 0.001). SKILLS

• Programming: R, PYTHON, SQL, STATA, SAS, HADOOP (HDFS and MapReduce), Latex

• Tools: Tableau, Redcap, Qualtrics, GitHub, Jupiter Notebook, SAS Studio, Microsoft Excel, Microsoft Office, Linux, Dataproc

• Certificates: Biomedical Research (CITI), Good Clinical Practice (CITI GCP), Social/Behavioral Research (CITI)

• Others: CDISC (SDTM) standards, Project Management, Data Mining, Data Science, Data Wrangling, Data Quality RELEVANT COURSEWORK

Biostatistics, Machine Learning, Big Data, Regression and Modeling, Research Methods, Epidemiology, Applied Statistical Modelling, Missing Data, Bayesian Inference, Survival Analysis, Statistical Inference, Predictive Modeling, Clinical Data Management, Dashboards (R SHINY)



Contact this candidate