JAIMIN B
*******.**@*****.*** 647-***-****
PROFESSIONAL SUMMARY:
Highly experienced Data Scientist with over 4 years experience encompassing in Machine Learning, Datamining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization, Web Crawling, Web scraping Statistical Modeling, and NLP.
Experience in Analytics, Big data, BPM, SOA, ETL, and Cloud technologies as an Integration Architect and Data Scientist.
Tagging of expertise with fundamental machine learning models and ideas, such as deep learning, regression, random forest, boosting, GBM, NNs, and HMMs.
Familiar with image identification projects with deep learning Movie Recommender System (PyTorch), image captioning (CNN-RNN auto encoder architecture), and stock price prediction CNN, RNN.
Exposure to platforms and techniques for AI and deep learning, such as tensor flow, RNN, and LSTM.
Create a deep neural network using the LSTM and other features output.
Skilled in using various packages in Python and R like ggPlot2, caret, dplyr, Rweka, gmodles, RCurl, tm, C50, NLP, Reshape2, rjson, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2.
Expertise in applying advanced Machine learning techniques to solve complex business problems and derive actionable insights from data.
Experienced in machine learning techniques, encompassing supervised and unsupervised learning, regression, classification, clustering, and deep learning using TensorFlow and PyTorch.
Worked with NoSQL Database including HBase, Cassandra and MongoDB.
Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge of Recommender Systems.
Expertise in creating informative and visually appealing data visualizations with Matplotlib and Seaborn, conveying complex insights effectively.
Mastery of various data analytics techniques, including exploratory data analysis (EDA), statistical hypothesis testing, Data Manipulation, Data Visualization, and A/B testing to derive actionable insights from data.
Experienced with machine learning algorithms such as logistic regression, random forest, XP boost, KNN, SVM, neural network, linear regression, lasso regression and k-means.
Proficient knowledge of statistics, mathematics, machine learning, recommendation algorithms and analytics with an excellent understanding of business operations and analytics tools for effective analysis of data.
Adept in developing and debugging Stored Procedures, User-defined Functions (UDF), Triggers, Indexes, Constraints, Transactions, and Queries using Transact-SQL (T-SQL).
Experienced in Cloud Services such as Amazon Web Services (AWS) and Google Cloud Platform to assist with big data tools, solve storage issues and work on deployment solutions.
Working knowledge of the Hadoop ecosystem, including HDFS, Pig, Hive, Sqoop, HBase, and Oozie
SPARK experience working with PySpark & MLib.
Utilized Agile and Scrum project management techniques, which promote the cooperative and iterative creation of data science solutions.
Expertise with Linux and Windows operating systems, allowing for flexibility in the creation and use of data science solutions.
Skilled at Git version control, ensuring effective code management and communication.
Skilled at deploying machine learning models for use in production using Flask and Docker.
An excellent understanding of business and the capacity to explain data insights to clients, both technical and non-technical.
Have good quantitative and analytical talents in addition to outstanding communication skills.
TECHNICAL SKILLS:
Languages
Python, R, SQL, HTML, CSS, JavaScript, C, C++, Java, PHP, JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL
Libraries
Pandas, NumPy, Matplotlib, Plotly, Seaborn, Stats-models, R shiny and GGplot2, Tensorflow, Pytorch, Scikit-Learn
Databases
MS SQL Server, Firebase, Hive, Impala, Pig, Spark, MySQL, Hadoop, HDFS, HBase, Cassandra, Hive, Kafka, Oracle DB, IBM DB2
BI Tools
Tableau, Tableau server, Tableau Reader, PowerBI, QlikView, SAP Business Intelligence, Azure Data Warehouse, MS Excel, Cognos 7.0/6.0
Cloud Platforms
MS Azure, AWS, GCP
Data Analytics Techniques
Data warehousing, Data Manipulation, Data Visualization, Data Mining, Data Cleaning
ML Techniques
Regression, Polynomial Regression, Logistic Regression, Kernel SVM, Clustering, K-NN, Association, Deep Learning, Supervised & Unsupervised Learning, Reinforcement Learning, Classification, Random Forest Classifier, Decision Tree Classifier, ANN, RNN, CNN
Methodologies
Agile, Waterfall, SDLC and Bill Inmon data warehousing methodology
Operating System
Windows, Linux, Unix, Macintosh HD, Red Hat
Version Control/Deployment
Kubernetes, Git, Github, Kafka, Docker, OpenShift, Jira
EDUCATION:
BTech in Information Technology BVM Engineering College, Anand, GJ Aug 2016 – July 2020
PROFESSIONAL EXPERIENCE:
Client: WestJet, Toronto, ON Jan 2022 – Present
Role: Data Scientist
Responsibilities:
Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed gap analysis
Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming
Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis
Automated csv to chatbot friendly Json transformation by writing NLP scripts to minimize development time by 20%.
Proficient in Big Data tools: Apache Hadoop, Hive, Spark 3.0 and Teradata 16.
Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Hadoop, MongoDB
Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques
Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc
Designed and implemented system architecture for AWS EC2, SageMaker based cloud-hosted solution for the client.
Applied machine learning, deep learning and other advanced techniques, such as Convolution Neural Network (CNN), Recurrent Neural Network (RNN) and combined models through ensemble modeling to have better performance.
Strong skills in web tools: AngularJS, HTML5, CSS3, JavaScript, and web Analytics: Google Analytics and web scraping by python.
Conducted Data blending, Data preparation using Alteryx and SQL for tableau consumption, and publishing data sources to the Tableau server.
Excellent understanding of SDLC (Systems Development Life Cycle), Agile and Scrum.
AWS with Kinesis, S3, DynamoDB, Elastic Map Reduce (EMR), Redshift, SageMaker, Athena etc.
Exported data to AWS S3 and record the steps and processes of the progresses of the work
Used Git to do version control to ensure the programs and codes were updated.
Environment: Python, SQL, Oracle 12c, SDLC, SSRS, EMR, Tableau, MLlib, NLP, Scala, Spark, Kafka, MongoDB, Hadoop, PySpark, OLAP, Azure, HDFS, NLTK, JSON, XML, AWS, AWS S3, Redshift, Git, Agile, Scrum, AngularJS, HTML5, CSS3, JavaScript, and web Analytics.
Client: Equity Credit Union, Durham, ON Jan 2021 - Dec 2021
Role: Machine Learning Engineer
Responsibilities:
Gathered, analyzed, documented, and translated application requirements into data models and Supported standardization of documentation and the adoption of standards and practices related to data and applications.
Worked with several R packages including knit, dplyr, Spark, R, Causal Infer, Space-Time.
Coded R functions to interface with Caffe Deep Learning Framework.
Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms
Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python
Configure, monitor, and automate Amazon Web Services & GCP as well as involved in deploying ML models on AWS using EC2, S3, EBS, Lambda, SagesMaker
Utilized SQL/No SQL Databases, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and utilized the engine to increase user lifetime by 45% and triple user conversations for target categories
Executed ad-hoc data analysis for customer insights using SQL using Amazon AWS Hadoop Cluster.
Performed POC on data sourced in ADLS can be processed in HDinsight using pipeline with Hive activity.
Used Hive for creation of ORC formatted tables and used ADF for data orchestration to azure database.
Data copied from ADLS to Azure SQL database using ADF pipelines invoked using PowerShell scripting.
Built Mixed Effect Stepwise Logistic Regression model to list those statistically significant risk factors in predicting maternal delivery-associated complications.
Used various metrics (RMSE, MAE, F-Score, ROC, and AUC) to evaluate the performance of each model.
Designed and delivered rich interactive data visualizations in Power Bi and Tableau to turn analytical metrics into business cases.
Experienced in RDBMS such as SQL servers and Non-relational databases such as MongoDB 3.x.
Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies
Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data.
Deployed machine learning models on cloud services including AWS Lambda and EC2 with Linux/Unix.
Implemented Classification using supervised algorithms like Logistic Regression, Decision Trees, KNN, Naive Bayes.
Environment: R, Python, HDFS, MLlib, OLTP, Oracle 10g, Hive, OLAP, DB2, MS Excel, SQL, MongoDB, AWS, MS Azure,, Power Bi, Big data, RDBMS, PowerShell, Tableau, Hadoop cluster, data analytics, RMSE, MAE, F-Score, ROC, AUC, Git, Linux and windows.
Client: Invenza Solutions, Ahmadabad, India Feb 2019 - Dec 2020
Role: Data Scientist Intern
Responsibilities:
Used various approaches to collect the business requirements and worked with the business users for ETL application enhancements by conducting various Joint Requirements Development (JRD) sessions to meet the job requirements.
Performed exploratory data analysis like calculation of descriptive statistics, detection of outliers, Assumption testing, factor analysis, etc., in Python and R.
Build models based on domain knowledge and customer business objectives
Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, dimensionality reduction etc. and utilized the engine to increase user lifetime by 45% and triple user conversations for target categories
Extracted data from the database using Excel/Access, SQL procedures and created Python and R datasets for statistical analysis, validation and documentation.
Extensively understanding BI, analytics focusing on consumer and customer space.
Innovate and leverage machine learning, data quality using SQL through complex internal database.
Improved sales and logistic data quality by data cleaning using Numpy, SciPy, and Pandas in Python.
Designed data profiles for processing, including running SQL, procedural/SQL queries and using Python and R for Data Acquisition and Data Integrity which consists of Database.
Used R to generate regression models to provide statistical forecasting.
Used Python, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random Forest, Decision Trees, SVMs for estimating the risks of welfare dependency
Performed report visualization using Python package (Matplotlib, Seaborn).
Hands-on experience in using HIVE, Hadoop, HDFS and Big Data related topics.
Collaborated with the web development team by JavaScript to publish dashboards on NJ state official websites.
Carried out neural networks through back propagation method and a 10-fold cross validation are used to select best parameters of the Ebola network.
Environment: AWS, Python, HDFS, OLTP, Oracle 10g, HIVE, Hadoop, HDFS, OLAP, DB2, MS Excel, SQL, MongoDB, Big Data, JavaScript, Power BI, Python Packages, Matplotlib, Seaborn, Machine learning, Numpy, Pandas, Excel, Kafka, Spark, windows.