Machine Learning Data Science

Location:

United States

Salary:

60-70

Posted:

December 16, 2023

Contact this candidate

Resume:

Nikhil Sinha

Dallas, TX ******@**************.*** +1-727-***-****

I have 10 years of industrial experience in Machine Learning, Data Science and Big Data Technologies and worked in Insurance, Finance, Retail and Healthcare domain.

Data Management Data Science Machine Learning Creating Dashboards Data Analysis Deep Learning

VBA ETL Advance Excel MongoDB Python R/ SAS Writing Algorithms Statistical Analysis Python R/ SAS Writing Algorithms Statistical Analysis Pyspark Big Data MLOPS NLP TensorFlow PyTorch A/B Test Keras OpenCV Image Processing Tableau Power BI Kafka Databricks Azure AWS Computer Vision Speech Recognition Neural Network MLFLOW

Data Scientist – Gainwell Technologies Aug 2022 – Present

Developed and evaluated the multiple Classifier models by implementing various Machine Learning algorithms such as Logistic Regression, Random Forests, GLM, tree-based models and Artificial Neural Networks. Applied these algorithms to provide automation and a machine learning solution, including both deep and non-deep ones, for predicting the readmission risk of COPD patients which helped in reducing healthcare utilization post hospitalization by almost 20%

Built GLM and tree-based models (XGBOOST, Random Forest, KNN, Decision Tree, Logistic Regression) and deployed classification (CATBOOST) that predicts whether a claim falls under soft denials to prevent unnecessary medical billing expenses. This helped the business identify claims that are recoverable by 5% thereby increasing the revenue by 18%

Built GLM and tree-based models (XGBOOST, Random Forest, KNN, Decision Tree, Logistic Regression) and deployed classification Model (CATBOOST) to predict whether a claim belongs to a “Related Claim” which is used for subrogation purposes by health insurance thereby helping the business reduce the time spend in hiring more case workers by 50% and increasing the revenue by 25%.

Built GLM and tree-based models (XGBOOST, Random Forest, KNN, Decision Tree, Logistic Regression) and deployed classification Model (CATBOOST) to predict whether a claim is likely to be paid by the commercial healthcare plans to the Medicare and to predict the likelihood of hospital readmission of patients with Heart Problems.

Used FlashText technique to perform retrieval and replace of keywords from a text document derived from a speech recording using ASR like google offline speech SDK

Performed speech recognition using speech data and performed audio segmenting using Pydub to deal with multiple components of speech

Parsed HL7 document (both xml ccda file and pipe delimited file of electronic health record EHR) to structure the data using Databricks SMOLDER and spark.xml package.

Used AWS SAGEMAKER built-in algorithms to design a machine learning supervised model that can be trained to read each claim and predict if the claim is compliant.

Applied regularization expression to tokenize words and sentences as an application of text parsing to perform data cleaning like split, substring, search, word tokenization using NLTK and regex libraries.

Developed and managed end-to-end observability for new and existing data pipelines including implementing alerting and near real-time data visualization using Tableau.

Build a pipeline to perform Big Data matching (entity matching resolution) in Databricks to match the individual’s information coming from Medical Data – INDV and Commercial Insurance – NED. This model generates $1 billion each month.

Evaluated the effectiveness of the Category Classifier model based on the Classification model metrics such as Accuracy, Precision, Recall, F1 score and by building the AUC ROC (Area under the Curve Receiving Operating Characteristics) curve to capture the performance of the model and send alert if the model drifts significantly which helped in deciding the threshold of AUC below 75% to retrain the model.

Partnered with functional leaders within the company to identify, develop, and implement predictive modeling and analytic solutions based on understanding of business needs and opportunities.

Contributed and consulted with the broader predictive analytics community for best practices related to predictive modeling techniques, in addition to staying up to date on emerging trends.

Retraining the Machine Learning models with the most updated Data available in the Enterprise Data Lake (EDL), for every 3 months, as part of the Feature Enhancement requests. This improved the model performance by 3% with each iteration.

Built and maintained a robust library of reusable algorithms and supporting code, such that research efforts are based on the highest quality data, are transparently conducted, and can be productionized and are reproducible. This helped in reducing the model training duration by 10%

Created data objects in PostgreSQL8.2 and retrieved the concerned objects in Python 3.7 and applying Descriptive statistics using advanced statistics such as A/B Test, chi square test, T test, Z- test etc..,) to understand behavior of data. This helped in enhancing the model that helped the business find opportunities and generate more revenue between 12% and 15%.

Gathered reporting requirements from business stakeholders and converted them into technical solutions that give the users access to the data in a format that highlight actionable insights.

Communicated complex information clearly and concisely to a variety of audiences and mediums, including through dashboards using H2O wave and other data visualization techniques like Power Bi and Tableau. This helped in tracking the total number of claims processed and amount recoverable by 15%

Translated sophisticated analytical and technical concepts to senior management and non-technical employees to enable understanding and drive advised business decisions.

Created Databases for managing data using DML, DDL and DCL and utilizing python glob library to extract data from millions of data files and loading to Greenplum PostgreSQL database.

Established and automated post deployment reporting and analysis based on the success metrics, provide business and product teams with insights for post deployment performance.

Managed, maintained, and oversaw deployments of machine learning (ML) algorithms into production both on-premises and the cloud (AWS) using Data Lakes, Spark, Python, Flask APIs.

Developed and improved existing frameworks for deployment of ML backing pipeline on-premises and on the cloud (AWS) using MLFLOW, and Flask APIs.

Data Scientist – PNC Bank May 2021 – July 2022

Worked closely across the business partners to scope in the machine learning solution in the Audit Process (Credit Risk Data) and automated the entire process of scoping in the risk involved in any audit process that reduced the manual intervention by 100% with model accuracy of 96% thereby generating a revenue of 1 million dollar for each moth.

Performed NLP/Text Analytics using SPACY and Scikit Learn to identify risk involved in an Audit, Control, Risk process and customer complaints which helped reduce the risk by 15% - 20%

Built clustering models like K-means, BIRCH, GMM, DBSCAN, Mean- Shift Clustering and Agglomerative Hierarchical Clustering to come up with an automated machine learning solution to automate the Audit Process within PNC Bank and deploy the model in production environment.

Developed and adhered to technical best practices for data mapping, data transformations, data joining/blending, data quality, data cleansing, and other data movement related activities.

Designed and developed artificial intelligence (AI) software solutions for financial risk detection and management through risk data, and audit results using object-oriented software development technologies, relational database technologies including SQL and predictive analytics tools including Azure ML, AUTOML, and Data Robot in DevOps environment.

Data Scientist – Client: State Farm July 2019 – May 2021

Extracted, interpreted, and analyzed insurance data along all line of business (Auto, Fire, Life, Mutual Fund, Home) to identify key metrics and transform raw data into meaningful, actionable information.

Designed, built, and maintained predictive multiclass model using Neural Network for Loss Reporting in Insurance Claims using NLP (Keras, TensorFlow)

Performed Statistical Experimentation and Hypothesis test using A/B Test, Chi Square and ANOVA for determining the strength of relationship for Association between the features and target feature.

Worked on optimization model to address the objective of the insurance business and improve the workflow of the Agents Team Members and associates that interact with the ECRM.

Implemented Text Analytics to create a word cloud to improve the key word search and provide recommendation to customer for a product to make a potential sale.

Data Analyst - VETFED Feb 2019 – June 2019

Extracted, interpreted, and analyzed data to identify key metrics and transform raw data into meaningful, actionable information using Python.

Collected the data of the veterans and providers from regional office of Veteran Affairs across 11 states of western Region of USA to analyze and perform deep root analysis.

Performed requirement analysis and developed source-to-target mapping, data modeling, data profiling, and data lineage activities by executing code in Python & SQL

Data Scientist – IBM Technologies Mumbai, India Dec 2014 – May 2017

Developed segmentation models using K-means Clustering to discover new segments of users and helped target the potential customers which increased the sales by 20%

Led the development of a hotel performance assessment and pricing analysis platform including a scorecard dashboard created using k-NN Algorithm

Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python.

Data Scientist Farmers Mumbai, India June 2013 – Nov 2014

Identified fraud in the insurance claim with the help of unsupervised anomaly detection algorithm that helped reduce the false alarm by 20% and increased the efficiency of the claim management system.

Mined Customer data for Farmers Insurance and built a logistic regression model in R/Python to predict the probability of a customer buying auto insurance policy and achieved misclassification rate as 0.06

Contact this candidate