Data Scientist

Location:

Richmond, VA

Posted:

May 04, 2025

Contact this candidate

Resume:

Krishna Priya Nemalikanti

513-***-**** (M)

*************@*****.***

Professional Summary:

Highly motivated and Visa independent Data Science professional with 7.5+ years of experience in Health care & Retail domains with expertise in data analysis, data integration, machine learning and software programming. Also, possesses a strong background in quantitative research.

Experienced in performing Data Cleaning & Preparation, Data Wrangling – combining and merging datasets, reshaping and pivoting of data, Data Aggregating, Grouping & Multi Indexing and Data Optimization using Python Pandas, NumPy libraries.

Expertise in advanced Database Management such as writing medium to complex SQL queries, performance tuning of PL/SQL object etc., data integration with multiple database systems.

Experienced in Data transformation and Visualization with ggplot, running and interpreting simple linear regression, multiple regression and logistic regression, assessing multiple regression model fit in Data Analytics using R.

Experience in using Data Visualization Tableau for connecting data from various data sources, develop interactive dashboards to allow users to explore data and to convey insights.

Experience in building academic Machine Learning Projects using scikit library.

Certified AWS Cloud Practitioner with a strong passion for data science and a drive to leverage data insights to informed business decisions.

Educational Qualifications:

Master’s (Graduate) degree, 2011, Ambedkar University Delhi, INDIA

Bachelor’s (Undergraduate) degree, 2009, Delhi University, INDIA

Certifications:

AWS Certified Cloud Practitioner (2024)

https://www.credly.com/badges/553ab166-0767-49df-9c75-5978d0f57901/public_url

Technical Skills:

Programming/Procedural Languages

Python, R, SQL, PL/SQL,

Machine Learning/Big Data Technologies

Scikit-learn library

RDBMS

Postgres, Oracle, SQL Server

Operation Systems

Windows, UNIX, Linux

Version Control Tools

GIT

Data Integration & Visualization Tools

IBM DataStage 11.x, Tableau

Project Mgmt. Software

Jira, Rally

Other tools/technologies

Jupyter Notebook, Databricks, Snowflake

Cloud Environment

AWS

Previous Work Experience:

(1) Data Scientist, Compass Youth & Family Services November 2023 – Till date

Tools: Python (Programming Language) · Pandas · NumPy · Scikit-learn · Matplotlib · Jupyter Notebook· Data Visualization · Databases · Data Analysis · SQL · PostgreSQL · Data Science

Description: Model for predicting student final grade performance for improving academic intervention strategies. This Project is implemented in NumPy version: 1.24.4, Panda’s version: 2.0.3, sklearn version: 1.3.0. The goal of this project is to develop and evaluate predictive regression models (with and without previous term grades) to assess student’s final term performance using Linear, Lasso and Support Vector Machine Regression Models. Primary steps included Exploratory data analysis where we study attributes, Visualizations (using Matplotlib) and analyze correlations between numeric attributes. Preprocessed data via pipelines and feature engineering to build Custom Transformer, build Numeric, Categorical and Ordinal pipelines as per the requirements and passing pipelines through Column Transformers. Once the training data is transformed, we worked on Initializing regression models and performing 3-fold cross-validation & calculated RMSE score. Fine tuning the best performing model with Grid Search. Finally measure how the models are performing on Test sets. Metrics used to measure the model’s performance were RMSE (Root Mean Squared Error) and R2. Models with previous term grades included showed better performance on measurement metrics.

Roles and Responsibilities:

Developed, tested, adapted Python code in adherence with business requirements.

Used the features of Python Pandas & NumPy libraries efficiently to optimize the code.

Cleaned and preprocessed large datasets to ensure data quality and integrity, handling missing values, duplicates and outliers.

Developed and evaluated predictive models using Linear, Lasso, and Support Vector Machine Regression to forecast student performance.

Designed and implemented data preprocessing pipelines, leveraging feature engineering and correlation analysis to improve model performance.

Optimized models for performance and scalability, enabling early identification of at-risk students and informing academic intervention strategies.

Worked on AWS Cloud platform using Databricks

Tested the code to ensure that it’s meeting the expectations.

Developed automated data audit and validation processes.

Proactively communicated innovative ideas, solutions, and capabilities over and above the specific task request

Collaboratively worked with a team and independently.

(2) Data Integration Developer at Ameritas January 2022 to October 2023

Tools: DataStage 11.7 · Python · Pandas· NumPy · Oracle · Salesforce · 3rd party databases/file systems.

Description: One of the subsidiaries of Ameritas is DentalSelect. This project is to process Ameritas plans data with up to 6-character length and to consolidate/merge DentalSelect Customer data into Ameritas Salesforce environment. IBM DataStage is used to fetch Ameritas Group Network data into Ameritas Salesforce environment.

Roles & Responsibilities:

•Designs and develops ETL solutions using data warehouse design best practices.

•Analyze data requirements, complex source data, and the data model, and determine the best methods in extracting, transforming, and loading the data into the data staging, warehouse and other system integration projects.

•Worked on source data fetch from heterogeneous sources such as Salesforce, third party databases/file systems etc.

•Worked on analyzing and writing complex SQL’s ensuring optimal performance and data accuracy.

•Analyze business requirements and outline solutions.

•Develop and deploy ETL job workflow with reliable error/exception handling and rollback.

•Perform job monitoring, troubleshooting job errors, identifying issues with job windows and performance tuning.

•Design, Develop, Test, Adapt ETL & Python code to accommodate changes in source data and new business requirements.

•Manage automation of file processing as well as all ETL processes within a job workflow.

•Develop, Contribute and adhere to the development of standards and sound procedural practices.

•Effectively communicate status, workloads, offers to assist other areas.

•Collaboratively work with a team and independently. Continuously strive for high performing business solutions

•Participate in design review sessions and ensure all solutions are aligned to pre-defined architectural specifications.

(3) Data Analyst Associate at Era Behavioral Solutions September 2017 - October 2021

Tools: SQL· Data analysis using R (dplyr, ggplot2, lsr, readr) · SPSS · Data wrangling · Python · Pandas · NumPy · Data visualization using Tableau

Roles & Responsibilities:

Data Analysis using R and Statistical Package for Social Sciences (SPSS) for intervention analysis and Statistical modelling to examine data on the effectiveness of ABA therapy in improving behaviors and skills in individuals with autism spectrum disorder (ASD).

Evaluate the effectiveness of token economy systems in increasing desired behaviors and reducing problem behaviors.

Analyze data on the effectiveness of social skills training programs in improving social skills and behaviors.

Performing error analysis to increase the efficiency of the behavioral interventions.

Data visualization using Tableau to connect data from various data sources such as Excel, Postgres SQL and Cloud-based storage, analyze data and create dashboards as per the client’s requirements.

Implemented behavior interventions using ABA principles

Coordinated with service providers and participated in clinical meetings

Git Personal Code Samples:

https://github.com/Krishna569102/Project-Genetic-Marker-HeartAttacks

https://gitfront.io/r/krishna569102/Jj8EH9HsK1G6/Python-Code-Work-GitFront/

https://gitfront.io/r/krishna569102/XPX8z8VpLTYT/MachineLearningProject_PrivateShare_gitfront/

https://gitfront.io/r/krishna569102/SqNcShYHtK4A/Data-Analytics-using-R-PrivateShare-gitfront/

Contact this candidate