Data Python

Location:

Charlotte, NC

Posted:

March 29, 2020

Contact this candidate

Resume:

Akhil P

Data Scientist

*****.*****@*****.*** 314-***-**** Linkedin

Professional Summary

Over 6 years of experience in data Science, data analysis, machine learning, predictive model building, data visualization and statistical analysis

Experience in building intuitive products and experiences, while working alongside an excellent, cross-functional team across Engineering, Product and Design

Expert in transforming business requirements into analytical models and designing algorithms

Proficient in developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data to improve business performance in every aspect

Knowledge of development and deployment of Machine Learning algorithms & AI systems to drive real-time forecasting, personalization and recommendation using Amazon SageMaker and Spark

Experience working with ML supervised algorithms - Linear Regression, Logistic Regression, Linear Discriminant Analysis (LDA), Decision Tree, Random Forest, Support Vector Machines (SVM), Naïve Bayes, K-NN

Experience working with ML un-supervised algorithms - Hierarchical clustering, K-means clustering, Probability Clustering, Density-Based Clustering (DBSCAN)

Experience using Dimensionality Reduction Techniques like Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Independent Component Analysis, Random Component Analysis and t - SNE

Experienced in developing deep-learning models like Artificial Neural Networks - Multilayer Perceptron’s (MLPs), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN) using TensorFlow for pattern recognition, prediction analysis, machine translation, social network filtering image & video recognition

Worked on Long Short-Term Memory (LSTM) using Keras for auto speech recognition and anomaly detection

Experience in using Artificial Neural Networks for recommendation systems

Proficient with Natural Language Processing (NLP) for Interactive Voice Response (IVR), Language Translation and Word processors for grammatical accuracy of texts

Experienced working on Natural Language Processing (NLP) techniques like Word2Vec, BOW (Bag of words), Tf-idf, AVG-Word2Vec, If-idf, Weighted Word2Vec

Used Sentiment Analysis to determine the emotional tone behind the series of words and gain the express of the attitudes to analyse the market of a product, customer service, fraudulent activities

Expert level mathematical knowledge on Linear Algebra, Probability, Statistics, Stochastic Theory, Information Theory and logarithms

Experience with Word Embeddings, Topic Modelling using Latent Dirichlet Allocation (LDA), Sentiment Analysis, Text Classification, Semantic Analysis and parts of Speech Tagging

Strong experience with python and its libraries Pandas, NumPy, Sci-Kit learn, Seaborn, Matplotlib and R for algorithm development, data manipulation, analysis and visualization

Proficient in writing complex SQL queries like stored procedures, triggers, joints and subqueries to access and manipulate database systems like MySQL, PostgreSQL, NoSQL

Experience in using Tableau for data visualization and designing dashboards for publishing and presenting storyline on web and desktop platforms

Proficient in the entire project life cycle and actively involved in all the phases including data acquisition, cleaning, engineering, feature scaling, feature engineering, statistical modelling and visualization

Experienced in integrating Hadoop into traditional ETL, accelerating extraction, transformation and loading massive structured and unstructured data

Tools & Technologies

Languages

Python, R, SQL, Java, C, C++

Libraries

Pandas, NumPy, SciPy, Scikit-learn, Matplotlib, Seaborn, NLKT, Tensor Flow, Keras

Database/Analytics

MySQL, PostgreSQL, NoSQL, DynamoDB, Aurora, MongoDB, Cassandra, Hive, Teradata, Vertica, Hadoop Streaming, MapReduce, SPSS, SAS, Weka, SSIS,

Technical

Amazon Web Services (AWS), Spark, GIT, Jenkins, Agile

Mathematical skills

Statistics, Linear Algebra, Probability

Machine Learning Algorithms

Linear Regression, Logistic Regression, Linear Discrimination Analysis (LDA), Decision Trees, Random Forests with Adaboost and Gradient Descent Boosting, Support Vector Machines (SVM), Naïve Bayes, K - Nearest Neighbor, Hierarchical clustering, K-means clustering, Density based clustering (DBSCAN)

Machine Learning Techniques

Principal Component Analysis, Single Value Decomposition, Data Standardization Techniques, L1 and L2 regularization, RMS prop, Hyperparameter tuning, KL Divergence, Resampling Techniques like SMOTE, Cluster Centroid Methods, Ensemble Methods, Feature Selection and Feature Engineering, Cross Validation Methods (K-fold), Bleu Score

Deep Learning

Convolution Neural Network, Recurrent Neural Network, LSTMS, GRU, Autoencoders, Generative Adversarial Neural Networks, Policy-based and Value-based Boltzmann Machines

Professional Experience

Vanguard, Charlotte, NC Nov 2019 - Till Date

Data Scientist

Extracted large sets of structured and unstructured data by using SQL queries and performed data mining tasks including handling missing data, data wrangling, feature scaling, outlier analysis in python by importing pandas.

Participated in all phases of data mining – data collection, data cleaning, data manipulation, developing models, validation, visualization and performed gap analysis.

Worked with project teams, data architecture, data management, data stewardship, lines of business & the delivery/development group to align business needs with enterprise data management strategy & solutions.

Conducted root cause analysis and doing performance tuning for complex business processes and functionality, resulting in reducing the job run timings.

Worked on installation of AWS CLI to control various AWS services through SHELL/BASH scripting.

Worked on continuous Integration/Delivery tools like Jenkins to merge development through pipelines.

Involved in the development and deployment of machine learning algorithms & AI system to drive real-time forecasting, personalization, and recommendation using Amazon SageMaker and Spark.

Studied the feature distribution with the help of Probability Density Function, Cumulative Distribution Function, Percentiles, Quantiles to draw some insights.

Worked on Grid Search and random search in order to identify the right parameters to improve model performance.

Conducted data investigation, discovery & mapping tools to scan every single data record.

Built python, ML algorithms, by importing Sci-kit learn, SciPy, NumPy, Pandas modules to analyze the terabytes of data to find the customer lifetime value prediction

Performed text analysis on the reviews of the products using NLP techniques like Bag of Words, Term Frequency-Inverse Document Frequency, Word2vec, Average Word2vec with help of NLTK library and Gensim package.

Worked on principle components analysis to minimizing the features and regularizations to avoid overfitting and generalizing the models.

Built decision tree model from the set of training data using the information entropy and attribute with the highest normalized information gain is chosen to make the decision of credit approval.

Used ML algorithms, logistic regression, support vector machine, K nearest neighbours, Naïve Bayes, CART, bagging, boosting, ensemble learning, to analyze data based on the features selected for data-driven decisions.

Performed exponential smoothening on multivariate time series data for short-term forecasts.

AAA National, Orlando, Fl July 2018 - Nov 2019

Data Scientist

Involved in the development of algorithms for fraud detection, lifetime value prediction, product development and prediction analysis based on company requirements and goals.

Performed Exploratory Data Analysis (EDA) to categorize and organize data based on caller information like identification, date, time, type of Service (voice call, SMS, etc), duration, network access point identifiers.

Worked on K-means clustering to find the groups of data within the number of groups represented by the variable to find the feature similarities for behavioural segmentation.

Built time - series model for complex pattern recognition of financial time series data and forecast of returns.

Performed Sentiment analysis using Natural Language Processing (NLP) model on email feedbacks and reviews of the customers to determine the emotional tone behind the series of words and gain express of the attitudes and emotions by Long-Short Term Memory (LSTM) cells in Recurrent Neural Networks (RNN).

Extracted texts data related to fraud cases from different telecommunication companies to train the machine learning algorithm with Word Segmentation, Part of speech tagging, select keywords and frequency value.

Trained algorithms are used to test for the word distribution, correlation value by passing it through the NLP (speech recognition and conversion) algorithm to detect for the possibility of fraudulent activities.

Developed and applied machine learning algorithms - linear regression, logistic regression, multiple regression, mean-variance, dummy variable, Poisson distribution, Naïve Bayes, fitting function.

Developed clustering algorithms, Hierarchical, K-means, with Sci-kit learn and SciPy to group customers and made data driven decisions on promotional offers & price strategies that reduced customer churn significantly.

Worked on Support Vector Machines (SVM), clustering models, Principle Component Analysis (PCA) with different structured & unstructured datasets for dimensionality reduction & analyse the accuracy of the models.

Worked on Naïve Bayes and Random Forests, to find possible hidden patterns for forecast predictions.

Validated models using cross-validation and loss function to measure model performance. Created Confusion Matrix, ROC and CAP curves.

Addressed overfitting and underfitting by tuning hyperparameters using L1 and L2 Regularization.

OTIS, Farmington, CT Aug 2017 - June 2018

Data Scientist

Analysed the customer purchase data and product trends to recommend the types of products/services to customers based on their behaviour tracked through the customer accounts, purchase history and location.

Performed data analysis, data validation, data cleansing and data verification to identify data mismatch using Relational Data modelling (3NF) and Dimensional Data Modelling.

Tackled highly imbalanced fraud dataset using under-sampling, over-sampling with SMOTE and cost sensitive algorithms using Python Sci-kit Learn.

Developed automated model training, testing & deployment via machine learning continuous delivery pipelines.

Clustered customer actions using K-means Clustering and Hierarchical Clustering and segmented them into different groups which helped the marketing team to further analyse behavioural patterns of customers.

Used Multi-Linear Regression algorithm and created the Customer Lifetime Value (CLV) from the data recorded through applications for a period of at least three months.

Built one class Support Vector Machine (SVM) and Principal Component Analysis (PCA) algorithms for anomaly detection of fraud and other errors that signal dishonest behaviours.

Used Classified instances, Relative Operating Characteristic curve (ROC) and Confusion Matrix to find the accuracy of the models built.

Acquired knowledge on designing, iterating and fine-tuning neural network model’s architecture for runtime efficiency of the models built.

Load and transform large sets of structured, semi-structured and unstructured data.

Forecasted sales and improved accuracy - (MAPE & RMSE) by 30% by implementing advanced forecasting algorithms that were effective in detecting seasonality and trends in the patterns.

Evaluated the performance of different models using F-score, AUC/ROC, Confusion Matrix and RMSE/MSE and used Matplotlib extensively to generate human-readable data visualizations.

Visualized results in python using Matplotlib, Seaborn libraries of Scikit-learn and used Tableau to create the interactive dashboards to present results for team members, management and clients.

S.C. Johnson, Racine WI April 2016 - Aug 2017

Data Scientist

Analysed the data using various ML algorithms to determine credit limits for existing applicant, to approve/not new credit line to a new applicant will likely result in profit or loss based on various considerations including but not limited to credit history, utilization rate, income, age, location, hard enquires & number of deliquesces.

Identified patterns, data quality issues, and opportunities and leveraged insights by communicating opportunities with business partners.

Co-ordinated with business users to gather business requirements and prepared the documentation for analysis.

Assisted in supporting the enterprise conceptual and logical data models for analytics, operational and data mart structures using an industry standard model

Assisted marketing team to devise the business strategy to target customers with discount coupons, deals and offers to improve customer purchases by identifying distinct patterns in which customers respond to offers.

Developed a ML system that predicted purchase probability through offers based on customer’s real-time location data and past purchase behaviour which is being used for mobile coupon pushes.

Developed a model that collects data across thousands of locations to optimize product placement and advertising to catch the attention of shoppers that fit the right profile and selection of products to be removed entirely to reduce clutter and making it easier to find in-demand items automatically.

Performed statistical analysis to understand the data & produced forecast trends for various categories.

Identified gaps in different processes and implement process improvement initiatives across the business improvement model.

Participated in developing deep learning model for the process of comparing items against each other, tracked their performance in various situations & made suggestions to support key business decisions.

Used ML algorithms to forecast the company’s short-term and long-term growth in terms of revenue, number of customers, various costs, stock changes etcetera.

Closely monitored the operating and financial results against plans and budgets.

Millennium Software Solutions, India April 2014 - March 2016

Jr. Data Scientist

Worked with data scientists to support model building, scoring, monitoring and reporting.

Used SQL for creating and using Views, User Defined functions, Triggers, Indexes and Stored procedures involving joins and sub-queries from multiple tables.

Established relationships between the tables using primary and foreign key constraints using SQL triggers.

Performed ETL process to Extract, Transform & Load the data from OLTP tables into staging tables & data warehouse.

Strong ability to Merge datasets, clean constructed datasets, produce summary statistics, conduct difference in means tests and store all accompanying files in an organized manner.

Loaded large sets of structured, semi-structured and unstructured data into Hadoop File System (HDFS).

Prepared and analysed the data, includes, locating, profiling, cleansing, extracting, mapping, importing, transforming, validating or modelling.

Explored dataset using various diagrams such as Histograms, Boxplots, skewness in R studio.

Applied linear regression to understand the relationship between different attributes of the datasets and causal relationship between them using R and Python.

Performed statistical analysis to understand the data & produced forecast trends for various categories.

Designed and created tables, charts and graphs to visualize analysis for reports to clients using Excel & Tableau.

Performed User Acceptance Testing (UAT) for various system releases.

Produce reports on ad hoc basis per requirements.

Education

Master of Science in Computer Networking at Wichita State University, Kansas

Bachelor of Technology in Computer Science at Gandhi Institute of Technology & Management, India

Contact this candidate