Looking for Data Scientist roles

Location:

Vancouver, BC, Canada

Salary:

90000

Posted:

June 30, 2021

Contact this candidate

Resume:

MANINDER KAUR

Canadian PR Holder

Phone: +1-604-***-****

E-mail: ****************@*****.***

Data Scientist

PROFESSIONAL SUMMARY:

IT Industry experience and as Data Scientist with specialization in implementing advanced Machine Learning and Natural Language Processing algorithms upon data from diverse domains and building highly efficient models to derive actionable insights for business environments leveraging exploratory data analysis, feature engineering, statistical modelling, and predictive analytics.

Worked in the entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modelling, and data visualization with large data sets of structured and unstructured data

Strong background in Machine Learning, Predictive Modelling and Data Mining with a broad understanding of Supervised and Unsupervised learning techniques and algorithms(e.g.: Regression, K-NN, SVM, Naïve Bayes, Decision trees, Clustering, etc.)

Solid experience in Deep Learning techniques with Convolutional Neural Networks (CNN), Recursive Neural Networks (RNN), max pooling and normalization

Experience with Statistical Analysis, Data Mining and Machine Learning Skills using R, Python and SQL.

Extensive experience in various phases of software development like analyzing, gathering and designing the data with expertise in documenting.

Hands on experience on clustering algorithms like K-means & Medoids clustering and Predictive and Descriptive algorithms.

Data Driven and highly analytical with working knowledge and statistical model approaches and methodologies (Clustering, Regression analysis, Hypothesis testing, Decision trees, Machine learning), rules and ever evolving regulatory environment.

Strong practical understanding of statistical modelling and supervised/unsupervised/reinforcement machine learning techniques with keen interests in applying these techniques to predictive analytics

Expert in entire Data Science project life cycle, including Data Acquisition, Data Cleansing, Data Manipulation, Feature Engineering, Modelling, Evaluation, Optimization, Testing and Deployment.

Experience using machine learning models such as random forest, KNN, SVM, logistic regressions and used packages such as ggplot, dplyr, rpart, RandomForest, nnet, PROC-(pca, dtree, corr, princomp, gplot, logistic, cluster), Numpy, sci-kit learn, pandas, etc., in R, SAS and python.

Experience in problem solving, data science, Machine learning, statistical inference, predictive analytics, descriptive analytics, prescriptive analytics, graph analysis, natural language processing, and computational linguistics with extensive experience in predictive analytics and recommendation.

Expertise in Model Development, Data Mining, Predictive Modeling, Descriptive Modeling Data Visualization, Data Clearing and Management, and Database Management.

Experience in Apache Hadoop technologies like Pig, Hive, Scoop, Spark, Flume and HBase.

Strong knowledge of statistical methods (regression, time series, hypothesis testing, randomized experiments), Machine learning techniques, algorithms, data structures and data infrastructure.

Experienced of statistical analysis using R, Matlab and Excel.

Extensive hands-on experience and high proficiency with structured, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools.

Strong skills in Statistics Methodologies such as Hypothesis Testing, Principle Component Analysis (PCA), Correspondence Analysis.

Highly motivated team player with excellent Interpersonal and Customer Relational Skills, Proven Communication, Organizational, Analytical, Presentation Skills, and Leadership Quality.

Professional working experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, Random Forests, Decision Trees, K-Means Clustering and Association Rules.

TECHNICAL SKILL SET

Statistical Tools

R, Python, Minitab, XL Miner, Dell Statistica

BigData Technologies

Hadoop, HDFS, Map Reduce, Apache Hadoop ecosystem, Apache Spark, HDFS, MapReduce, Apache Kafka, Hive, Pig, ETL, Storm, Sqoop

Languages

SAP- ABAP/4, C, C++, Matlab, SQL, R, Hadoop, Python

Database

MS-SQL Server, PL SQL, MS-Access

Testing Tools

Win-runner 8.0, Load-runner, Test Director 7.2, Quality center, Quick Test Professional 8.2, Rational Robot

Business Tools

Microsoft Office (Word, Excel, PowerPoint), MS Visio, SharePoint, Outlook, MS Project

BI and Visualization

Tableau, Power BI, RShiny

Methodologies

SDLC, Agile, Waterfall

Algorithms

Machine Learning, Deep Learning, NLP, Bayesian Learning, Optimization, Prediction, Pattern Identification, Data / Text mining, Regression, Logistic Regression, Bayesian Belief, Clustering, Classification, Statistical modeling

Data Science/Data Analysis Tools &Techniques

Generalized Linear Models, Logistic Regressions, Boxplots, K-Means, Clustering, SVN, PuTTY, WinSCP, Redmine (Bug Tracking, Documentation, Scrum), Teradata, Tableau

Techniques

Machine learning, Regression, Clustering, Data mining

Machine Learning

Naïve Bayes, Decision trees, Regression models, RandomForests, Time-series, K-means

PROFESSIONAL EXPERIENCE:

Client: Royal Bank of Canada –Vancouver, BC, Canada

September 2020 – Till Date

Role: Data Scientist

Responsibilities:

Extensively worked in all phases of data acquisition, data collection, data cleaning, model development, model validation, and visualization to deliver data science solutions.

Built machine learning models to identify fraudulent applications for loan pre-approvals and to identify fraudulent credit card transactions using the history of customer transactions with supervised learning methods.

Tackled highly imbalanced Fraud dataset using sampling techniques like down-sampling, up-sampling and SMOTE (Synthetic Minority Over-Sampling Technique) using PythonScikit-learn.

Extracted data from database copied into HDFS File system and used Hadoop tools such as Hive and PigLatin to retrieve the data required for building models.

Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.

Used cross-validation to test the models with different batches of data to optimize the models and prevent over fitting.

Used PCA and other feature engineering techniques to reduce the high dimensional data, feature normalization techniques and label encoding with Scikit-learn library in Python.

Worked on Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn in Python for developing various machine learning models such as Logistic regression, Gradient Boost Decision Tree and Neural Network.

Experimented with Ensemble methods to increase the accuracy of the training model with different Bagging and Boosting methods.

Implemented a Python-based distributed random forest via PySpark and MLlib.

Used AWSS3, DynamoDB, AWS lambda, AWS EC2 for data storage and models' deployment.

Used PCA and other feature engineering, feature normalization and label encoding Scikit-learn preprocessing techniques to reduce the high dimensional data (>150 features).

Created and maintained reports to display the status and performance of deployed model and algorithm with Tableau.

In preprocessing phase, used Pandas to clean all the missing data, data type casting and merging or grouping tables for EDA process.

In data exploration stage used correlation analysis and graphical techniques in Matplotlib and Seaborn to get some insights about the patient admission and discharge data.

Experimented with predictive models including Logistic Regression, Support Vector Machine (SVC), Random Forest provided by Scikit-learn, XG Boost, Light GBM and Neural network by Keras to predict showing probability and visiting counts.

Implemented, tuned and tested the model on AWS Lambda with the best performing algorithm and parameters.

Designed and implemented Cross-validation and statistical tests including k-fold, stratified k-fold, hold-out scheme to test and verify the models' significance.

Environment: Oracle 11g, Hadoop 2.x, HDFS, Hive, Pig Latin, Spark/PySpark/MLlib, Python 3.x (Numpy, Pandas, Scikit-learn, Matplotlib, Seaborn), Jupyter Notebook, AWS, Github, Linux, Machine learning algorithms, Tableau.

Client: BlueCross Blue Shield (Remote)

Aug 2018 – July 2020

Role: Data Scientist

Responsibilities:

Translated business questions into research objectives, design and conduct analyses, develop findings and synthesize recommendations to deliver valuable, relevant, and actionable insights

Strong track record of contributing to successful end-to-end analytic solutions (clarifying business objectives and hypotheses, communicating project deliverables and timelines, and informing action based on findings)

Expert in Pandas, NumPy, Scikit-Learn in Python for performing exploratory analysis and developing various machine learning models such Random forest

The missing data in the dataset is handled using Imputer method in SkLearn library

Performed categorical variable analysis using pythonLabelEncoder, fit_transform, OneHotEncoder methods in sklearn library

Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for modelling

Defined a generic classification function, which takes a model as input and determines the Accuracy and Cross-Validation scores

Advanced SQL ability to efficiently work with very large datasets. Ability to deal with non-standard machine learning datasets

Built forecasting models in Python using Gradient Boost Regression Trees. Forecasted the revenue for future

Worked with applied statistics and applied mathematics tools for performance optimization

Worked with K-Means clustering and Hierarchical clustering algorithm to do segmentation of stores

Collected various store attributes and added them into our segmentation model in order to better classify different segments using clustering algorithms

Used cross-validation to test the models with different batches of data to optimize the models and prevent over fitting

Analyzed the SQL scripts and designed the solution to implement using PySpark and developed scripts as per the requirement

Worked with Tableau in order to represent the data in visual format and better describe the problem with solutions.

Environments: Python 3, PyCharm, Jupyter Notebook, Spyder, R, Tableau, MySQL, Tableau

Client: Adroit Business Solutions, Noida, UP, India

Sept 2016 – July 2018

Role: Data Scientist

Responsibilities:

Work independently and collaboratively throughout the complete analytics project lifecycle including data extraction/preparation, design and implementation of scalable machine learning analysis and solutions, and documentation of results.

Performed statistical analysis to determine peak and off-peak time periods for ratemaking purposes.

Implemented Customer Churn method to analyze insights of the data according to the requirements of the client.

Performed Customer Segmentation analysis focusing customer demography, and customer information such as age, salary in order to create various classes of the customer.

Identified root causes of problems and facilitated the implementation of cost-effective solutions with all levels of management.

Implemented Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, Principle Component Analysis.

Performed K-means clustering, Regression and Decision Trees in Python and R.

Worked on NLP, working on text analytics, Naïve Bayes by creating word clouds and retrieving data from social networking platforms.

Pro-actively analyzed data to uncover insights that increase business value and impact.

Prepared Data Visualization reports for the management using Matplotlib.

Expertise in Hadoop eco system components HDFS, MapReduce, Yarn, HBase, Pig, Hue, Sqoop, Flume, Oozie Hive for scalability, distributed computing and high-performance computing.

Cost function and hyper parameters optimization through the following algorithms: greedysearch, genetic algorithm, brute force search.

Created a hybrid model to support new user recommendation and existing ones with changing trends.

Environment: Python 3, R, SQL and SQL Script. Regression analysis, Decision Tree, NaïveBayes, SVM, K-Means Clustering and KNN, NumPy and Pandas, Bayes Law.

Client: Calance Software Pvt LTD – Gurgaon, Haryana, India

July 2014- August 2016

Role: Data Scientist/ Analyst

Responsibilities:

Involved in cross functional team to establish the project requirements and develop action plans to incorporate the requirements.

Conducted data preparation and EDA to identify the key trends in data.

Used multiple imputation methods for missing data namely neutral ratings, dropping NA’s and iterative imputers in python to impute missing data from existing information.

Implemented various machine learning models like Multi-linear Regression, Support Vector Regressor, Random forest Regressor for determining the target variable.

Predicted top N features most important in predicting overall customer satisfaction.

Compared data across different competitors to come up with a comparison metrics.

Implemented word cloud in python for both positive and negative reviews to get an understanding of importance of various words.

Recommended marketing strategies to highlight the strength and documented improvement strategies for enhancing overall customer satisfaction.

The second half of project revolved around customer segmentation and targeted marketing.

Performed customer segmentation based on the purchasing power as High, Medium and Low.

Extracting customer information from RDBMS for customers segmented as High spenders to redirect the marketing effort on this focus group.

Environment: Excel, Python (Pandas, Scikit, Numpy, Seaborn, Matplotlib), Jupyter Notebook, MYSQL

EDUCATION DETAILS:

Post Graduate diploma (IT), (75%) (GPA 3) Aug-2012 to Nov-2014, Guru Nanak Dev Engineering College, Ludhiana, India.

B. Tech (CSE), (76%) (GPA 3.33) June-2008 to April-2012, Ludhiana College of Engineering and technology, Ludhiana, India

CERTIFICATIONS:

Machine Learning with Python – IBM

Data Analytics – IBM

Contact this candidate