Machine Learning Data Scientist

Location:

Toronto, ON, Canada

Salary:

80000

Posted:

October 04, 2023

Contact this candidate

Resume:

JAIMIN B

*******.**@*****.*** 647-***-****

PROFESSIONAL SUMMARY:

Highly experienced Data Scientist with over 4 years experience encompassing in Machine Learning, Datamining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization, Web Crawling, Web scraping Statistical Modeling, and NLP.

Experience in Analytics, Big data, BPM, SOA, ETL, and Cloud technologies as an Integration Architect and Data Scientist.

Tagging of expertise with fundamental machine learning models and ideas, such as deep learning, regression, random forest, boosting, GBM, NNs, and HMMs.

Familiar with image identification projects with deep learning Movie Recommender System (PyTorch), image captioning (CNN-RNN auto encoder architecture), and stock price prediction CNN, RNN.

Exposure to platforms and techniques for AI and deep learning, such as tensor flow, RNN, and LSTM.

Create a deep neural network using the LSTM and other features output.

Skilled in using various packages in Python and R like ggPlot2, caret, dplyr, Rweka, gmodles, RCurl, tm, C50, NLP, Reshape2, rjson, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2.

Expertise in applying advanced Machine learning techniques to solve complex business problems and derive actionable insights from data.

Experienced in machine learning techniques, encompassing supervised and unsupervised learning, regression, classification, clustering, and deep learning using TensorFlow and PyTorch.

Worked with NoSQL Database including HBase, Cassandra and MongoDB.

Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge of Recommender Systems.

Expertise in creating informative and visually appealing data visualizations with Matplotlib and Seaborn, conveying complex insights effectively.

Mastery of various data analytics techniques, including exploratory data analysis (EDA), statistical hypothesis testing, Data Manipulation, Data Visualization, and A/B testing to derive actionable insights from data.

Experienced with machine learning algorithms such as logistic regression, random forest, XP boost, KNN, SVM, neural network, linear regression, lasso regression and k-means.

Proficient knowledge of statistics, mathematics, machine learning, recommendation algorithms and analytics with an excellent understanding of business operations and analytics tools for effective analysis of data.

Adept in developing and debugging Stored Procedures, User-defined Functions (UDF), Triggers, Indexes, Constraints, Transactions, and Queries using Transact-SQL (T-SQL).

Experienced in Cloud Services such as Amazon Web Services (AWS) and Google Cloud Platform to assist with big data tools, solve storage issues and work on deployment solutions.

Working knowledge of the Hadoop ecosystem, including HDFS, Pig, Hive, Sqoop, HBase, and Oozie

SPARK experience working with PySpark & MLib.

Utilized Agile and Scrum project management techniques, which promote the cooperative and iterative creation of data science solutions.

Expertise with Linux and Windows operating systems, allowing for flexibility in the creation and use of data science solutions.

Skilled at Git version control, ensuring effective code management and communication.

Skilled at deploying machine learning models for use in production using Flask and Docker.

An excellent understanding of business and the capacity to explain data insights to clients, both technical and non-technical.

Have good quantitative and analytical talents in addition to outstanding communication skills.

TECHNICAL SKILLS:

Languages

Python, R, SQL, HTML, CSS, JavaScript, C, C++, Java, PHP, JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL

Libraries

Pandas, NumPy, Matplotlib, Plotly, Seaborn, Stats-models, R shiny and GGplot2, Tensorflow, Pytorch, Scikit-Learn

Databases

MS SQL Server, Firebase, Hive, Impala, Pig, Spark, MySQL, Hadoop, HDFS, HBase, Cassandra, Hive, Kafka, Oracle DB, IBM DB2

BI Tools

Tableau, Tableau server, Tableau Reader, PowerBI, QlikView, SAP Business Intelligence, Azure Data Warehouse, MS Excel, Cognos 7.0/6.0

Cloud Platforms

MS Azure, AWS, GCP

Data Analytics Techniques

Data warehousing, Data Manipulation, Data Visualization, Data Mining, Data Cleaning

ML Techniques

Regression, Polynomial Regression, Logistic Regression, Kernel SVM, Clustering, K-NN, Association, Deep Learning, Supervised & Unsupervised Learning, Reinforcement Learning, Classification, Random Forest Classifier, Decision Tree Classifier, ANN, RNN, CNN

Methodologies

Agile, Waterfall, SDLC and Bill Inmon data warehousing methodology

Operating System

Windows, Linux, Unix, Macintosh HD, Red Hat

Version Control/Deployment

Kubernetes, Git, Github, Kafka, Docker, OpenShift, Jira

EDUCATION:

BTech in Information Technology BVM Engineering College, Anand, GJ Aug 2016 – July 2020

PROFESSIONAL EXPERIENCE:

Client: WestJet, Toronto, ON Jan 2022 – Present

Role: Data Scientist

Responsibilities:

Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed gap analysis

Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming

Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis

Automated csv to chatbot friendly Json transformation by writing NLP scripts to minimize development time by 20%.

Proficient in Big Data tools: Apache Hadoop, Hive, Spark 3.0 and Teradata 16.

Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Hadoop, MongoDB

Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques

Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc

Designed and implemented system architecture for AWS EC2, SageMaker based cloud-hosted solution for the client.

Applied machine learning, deep learning and other advanced techniques, such as Convolution Neural Network (CNN), Recurrent Neural Network (RNN) and combined models through ensemble modeling to have better performance.

Strong skills in web tools: AngularJS, HTML5, CSS3, JavaScript, and web Analytics: Google Analytics and web scraping by python.

Conducted Data blending, Data preparation using Alteryx and SQL for tableau consumption, and publishing data sources to the Tableau server.

Excellent understanding of SDLC (Systems Development Life Cycle), Agile and Scrum.

AWS with Kinesis, S3, DynamoDB, Elastic Map Reduce (EMR), Redshift, SageMaker, Athena etc.

Exported data to AWS S3 and record the steps and processes of the progresses of the work

Used Git to do version control to ensure the programs and codes were updated.

Environment: Python, SQL, Oracle 12c, SDLC, SSRS, EMR, Tableau, MLlib, NLP, Scala, Spark, Kafka, MongoDB, Hadoop, PySpark, OLAP, Azure, HDFS, NLTK, JSON, XML, AWS, AWS S3, Redshift, Git, Agile, Scrum, AngularJS, HTML5, CSS3, JavaScript, and web Analytics.

Client: Equity Credit Union, Durham, ON Jan 2021 - Dec 2021

Role: Machine Learning Engineer

Responsibilities:

Gathered, analyzed, documented, and translated application requirements into data models and Supported standardization of documentation and the adoption of standards and practices related to data and applications.

Worked with several R packages including knit, dplyr, Spark, R, Causal Infer, Space-Time.

Coded R functions to interface with Caffe Deep Learning Framework.

Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms

Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python

Configure, monitor, and automate Amazon Web Services & GCP as well as involved in deploying ML models on AWS using EC2, S3, EBS, Lambda, SagesMaker

Utilized SQL/No SQL Databases, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and utilized the engine to increase user lifetime by 45% and triple user conversations for target categories

Executed ad-hoc data analysis for customer insights using SQL using Amazon AWS Hadoop Cluster.

Performed POC on data sourced in ADLS can be processed in HDinsight using pipeline with Hive activity.

Used Hive for creation of ORC formatted tables and used ADF for data orchestration to azure database.

Data copied from ADLS to Azure SQL database using ADF pipelines invoked using PowerShell scripting.

Built Mixed Effect Stepwise Logistic Regression model to list those statistically significant risk factors in predicting maternal delivery-associated complications.

Used various metrics (RMSE, MAE, F-Score, ROC, and AUC) to evaluate the performance of each model.

Designed and delivered rich interactive data visualizations in Power Bi and Tableau to turn analytical metrics into business cases.

Experienced in RDBMS such as SQL servers and Non-relational databases such as MongoDB 3.x.

Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies

Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data.

Deployed machine learning models on cloud services including AWS Lambda and EC2 with Linux/Unix.

Implemented Classification using supervised algorithms like Logistic Regression, Decision Trees, KNN, Naive Bayes.

Environment: R, Python, HDFS, MLlib, OLTP, Oracle 10g, Hive, OLAP, DB2, MS Excel, SQL, MongoDB, AWS, MS Azure,, Power Bi, Big data, RDBMS, PowerShell, Tableau, Hadoop cluster, data analytics, RMSE, MAE, F-Score, ROC, AUC, Git, Linux and windows.

Client: Invenza Solutions, Ahmadabad, India Feb 2019 - Dec 2020

Role: Data Scientist Intern

Responsibilities:

Used various approaches to collect the business requirements and worked with the business users for ETL application enhancements by conducting various Joint Requirements Development (JRD) sessions to meet the job requirements.

Performed exploratory data analysis like calculation of descriptive statistics, detection of outliers, Assumption testing, factor analysis, etc., in Python and R.

Build models based on domain knowledge and customer business objectives

Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, dimensionality reduction etc. and utilized the engine to increase user lifetime by 45% and triple user conversations for target categories

Extracted data from the database using Excel/Access, SQL procedures and created Python and R datasets for statistical analysis, validation and documentation.

Extensively understanding BI, analytics focusing on consumer and customer space.

Innovate and leverage machine learning, data quality using SQL through complex internal database.

Improved sales and logistic data quality by data cleaning using Numpy, SciPy, and Pandas in Python.

Designed data profiles for processing, including running SQL, procedural/SQL queries and using Python and R for Data Acquisition and Data Integrity which consists of Database.

Used R to generate regression models to provide statistical forecasting.

Used Python, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random Forest, Decision Trees, SVMs for estimating the risks of welfare dependency

Performed report visualization using Python package (Matplotlib, Seaborn).

Hands-on experience in using HIVE, Hadoop, HDFS and Big Data related topics.

Collaborated with the web development team by JavaScript to publish dashboards on NJ state official websites.

Carried out neural networks through back propagation method and a 10-fold cross validation are used to select best parameters of the Ebola network.

Environment: AWS, Python, HDFS, OLTP, Oracle 10g, HIVE, Hadoop, HDFS, OLAP, DB2, MS Excel, SQL, MongoDB, Big Data, JavaScript, Power BI, Python Packages, Matplotlib, Seaborn, Machine learning, Numpy, Pandas, Excel, Kafka, Spark, windows.

Contact this candidate