data scientist

Location:

United States

Salary:

65$

Posted:

June 03, 2020

Contact this candidate

Resume:

KARTHIK

Email: ****@******************.*********:****************@*****.***

Contact: 469-***-****

Professional Experience:

Around 6 years of rich experience in Machine Learning Algorithms, Data Mining techniques, Natural Language processing

Experienced scrum master and agile certified professional.

Worked on end-to-end basis from gathering business requirements, pulling the data from different data sources, data wrangling, implementing machine learning algorithms, deploying models, and presenting end results to clients.

Mined and analyzed huge datasets using Python and R languages. Created an automated data cleansing module using supervised learning model in python.

Worked with different data set manageable packages like Pandas, NumPy, SciPy, Keras etc

Implemented various statistical tests like ANOVA, A/B testing, Z-Test, T-Test for various business cases.

Worked with various text analytics libraries like Word2Vec, GloVe etc.

Knowledge in Seq2Seq models, Bag of Words, Beam Search, and other natural language processing (NLP) concepts.

Experienced with Hyper Parameter Tuning techniques like Grid Search, Random Search.

Worked with outlier analysis with various methods like Z-Score value analysis, Liner regression, dB scan (Density Based Spatial Clustering of Applications with Noise) and Isolation forest

Knowledge in PostgreSQL and Unix Shell Scripting. Designed and developed wide variety of PostgreSQL modules and shell scripts with maximum optimization

Commendable knowledge in SQL and relational databases (Oracle, SQL Server, gp admin)

Worked with MicroStrategy visualization to create business reports with key KPIs

Experienced with DevOps tools like Docker, Container, Jenkins.

Worked with DevOps teams to help them in deployment by writing python code for custom logics to achieve Infrastructure as code concept.

Technical Skills:

Database Management

MySQL, Entity relationship Diagrams (ERD)

Languages

Python, R

Machine Learning

Techniques

Regression: Linear, Polynomial, Support Vector, Decision Trees

Classification: Logistic Regression, K-NN, Naïve Bayes, Decision Trees, Support Vector Machines

Clustering: K-means, Hierarchical

Deep Learning: Artificial Neural Networks, Convoluted Neural Networks, Recurrent Neural Networks

Dimensionality Reduction: Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA)

Ensemble Learning: Random Forest, Bagging, Boosting, Stacking

Natural Language Processing (NLP): Tokenization, Part-of-speech tagging, Parsing, Stemming, Lemmatization, Named Entity Recognition, Semantic Analysis, Sentiment Analysis, Latent Dirichlet Allocation

Time-series Forecasting: Multiplicative & Additive decomposition, exponential

smoothing, winter multiplicative model, AR, MA, ARMA, ARIMA, SARIMA

Data Visualization

Tableau, Microsoft Power BI; R: ggplot2 and Plotly; Python: Matplotlib, Seaborn

Professional Experience:

Client: Verizon, Columbus, Ohio Oct 2019 - Present

Role: Data Scientist – Machine Learning

Responsibilities:

Created an Automated Ticket Routing algorithm for the support team using Natural Language processing and other machine learning algorithms.

Analyzed and significantly reduce customer churn using machine learning to streamline risk prediction and intervention models.

Worked with K-Means, K-Means++ clustering and Hierarchical clustering algorithm to sort of the customer classification.

Worked with outlier analysis with various methods like Z-Score value analysis, Liner regression, Dbscan (Density Based Spatial Clustering of Applications with Noise) and Isolation forest

Used cross-validation to test the models with different batches of data to optimize the models and prevent overfitting.

Worked with PCA (Principle Component Analysis), LDA (Linear Discriminant Analysis) and other dimensionality reduction concepts on various classification problems on various linear models.

Worked with sales forecast and campaign sales forecast models such as ARIMA, Holt-Winter, Vector Autoregression (VAR), Autoregressive Neural Networks (NNAR).

Experimented with predictive models including Logistic Regression, Support Vector Machine (SVM) and re-enforcement learning to prevent the retail fraud.

Worked with ETL developers to increase the data inflow standards using various preprocessing methods.

Worked with Survival Analysis for customer dormancy rates, periods, and inventory management.

Created a customer service upgrade which is an automated chatbot to better assist the online customers using text classification and knowledgebase.

Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for modeling.

Deep knowledge of a scripting and statistical programming language like python. Advanced SQL ability to efficiently work with very large datasets. Ability to deal with non-standard machine learning datasets.

Built visualizations to facilitate research into the Human Connectome Project data and identify on the anatomical and functional connectivity within the healthy human brain, as well as brain disorders such as dyslexia, autism, Alzheimer's disease, and schizophrenia.

Performed Exploratory Data Analysis (EDA) to maximize insight into the dataset, detect the outliers and extract important variables.

Develop data preprocessing pipelines using Python, R, Linux scripts on on-premise High-performance cluster and AWS, GCP cloud VMs.

Model development, evaluation using machine learning methods such as KNN, deep learning on MRI data

Client: Vanguard, Charlotte, NC Aug 2018 – Sep 2019

Role: Data Scientist

Responsibilities:

Assisted Business Analyst and Data Scientists in Data preprocessing: Data Preparation, Data cleaning masking, Data Analysis, Data profiling

Create a classification model to reduce the false alerts generated by the existing anti-money laundering & fraud detection system by 35 %.

Successfully upgraded Informatica from version 9.6.1 HF2 to 10.1.2

Worked with claim classification models to reduce the different workloads for the Core Operations team.

Implemented CNN model to go through various documents coming from downstream to identify set of images for claims department.

Explored and created different new data sets to work with and implement few data science workflow platforms for future applications.

Designed and implemented workflow methodologies for the claim predictions API and involved in ETL component creations to pull the required data.

Created various models like SVM with RBF kernel, Multi Perceptron Neural Network, KNN, Lasso, Ridge, Elastic net Regression models.

Worked with K-fold cross validation and other model evaluation techniques throughout different projects.

Worked with text extraction modules such as Tesseract to extract text from various documents and process the text with NLTK.

Set up process for Data archival to free up 24 TB space in production environment and improve the performance.

Refined the design of data quality design leading to 20% faster batches and added new data quality checks.

Worked on Backlog Management in JIRA as scrum master and capacity management for EM team.

Working together with Client Management to Define and Design the cloud migration strategy for the department

Groomed 14 data engineers in Informatica and lead the enterprise memory team

Client: Data by Choice, India May 2014 to Jul 2018

Role: Data Engineer / Machine Learning

Responsibilities:

Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning.

Designed, built, and deployed a set of Python modeling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction.

Application of various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models.

Segmented the customers based on demographics using K-means Clustering.

Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring.

Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau and Power BI. Also used R to generate regression models to provide statistical forecasting.

Used big data tools Spark (PySpark, Spark SQL, Mllib) to conduct real time analysis of loan default based on AWS.

Conducted Data blending, Data preparation using Alteryx and SQL for Tableau consumption and publishing data sources to Tableau server.

Created deep learning models using TensorFlow and Keras by combining all tests as a single normalized score and predict residency.

Collected data needs and requirements by Interacting with the other departments

Environment:

-Experience in using AWS and Azure environments

-Proficient experience in building models on Python

-Expertise in R

-Hands on with SQL and ALTERYX

-Python

-Spark

Education: Bachelor of Engineering in ECE, Dr MGR University, India

Master of Science in Analytics, Bowling Green State University, Ohio

Contact this candidate