Sign in

Data scientist -data engineer

Toronto, ON, Canada
December 18, 2020

Contact this candidate


Jeet Shah Email: Mobile: +1-226-***-**** Summary

A passionate big data engineer with problem-solving skills, having 2 years of expertise in building and providing intuitive data driven solutions, adhering to data management lifecycles strategies and tools including data acquisition, data storage, data transformation and data processing using Machine Learning and Big Data Technologies. Technical Skills

Machine Learning Big Data: ML Algorithms (Supervised/ Unsupervised Learning), Python, Hadoop, Map Reduce, YARN, Spark, Spark-SQL, Spark Streaming, Kafka, Sqoop, Nifi, Airflow, GCP

Visualization: Tableau, Power BI, Jupyter Notebook, Excel Charts Database technologies: MySQL, SQL, Pig, Hive, HBase, Cassandra Research: Medical Diagnosis with Deep Learning and Time Series Analysis. Web Technologies OS: HTML5, CSS3, JavaScript Windows, Linux Programming: C, C++, Python, JAVA, Scala

Soft Skills: Excellent verbal and written communication skills, presentation skills. Functional Skills: Data Analysis Data Visualization ETL Design Data Integration Data Modelling Database Design Data Pipelines Work Experience

Data Scientist- I Ahmedabad, India

Pixel Analytics September 2017 - July 2018

Data Integration and Machine Learning: Captured all client activities, contacts and engagement through a real time platform with an integration of Hadoop and analysed data using Machine Learning algorithms for classification.

ETL Development: Developed and designed multiple data pipelines that perform ETL activities.

Scripting and Data Ingestion: Worked on building automation scripts leveraging various Python libraries to perform accuracy checks from multiple sources to target databases.

Analytics and Monetisation: Helped the company decide in dynamic costing by identifying business insights and forecasted future trends using time series analysis with Python and Tableau. Strengthened the overall sales by 24%, which was almost constant since last four months.

Testing and Collaboration: Worked closely with the developing team using scrum methodology and performed unit testing.

Deployment: Deployed ETL pipeline to production on Google cloud platform and helped in Application support Data Analyst Intern Bangalore, India

HP India May 2017- August 2017

Data Modelling: Converted data into actionable insights by predicting and modelling future outcomes.

Data Processing and Transformation: Performed Data Analysis, Data Cleaning, Data Migration, Data Transformation, Data Import/ Export using Python. Designed and architect several layers of data lakes.

Data cleansing and visualization: Worked closely with the data engineers to understand the structure of data and wrote python routines to collect the data. Evaluated and analysed customer’s insurance datasets using Excel V-Lookup and removed outliers using pandas. Built interactive Tableau dashboards/reports on month-to- month sales reports for data visualization.

Database Management and Analyst Intern Ahmedabad, India Dolphin InfoTech December 2016- January 2017

Database Management: Applied SQL queries to collect the data from warehouse, created data dictionaries and built data relationships with E-R diagram.

Data Mining: Worked closely with the concepts relating data mining such as ETL, business intelligence and schemas (snowflake and star).

OKR Delivery: Conducted reports and presentation to senior team to understand the flow of data. Academic Projects

Github Repo:

Tableau: newProfile=&activeTab=0 London Hydro-Western University, Hadoop, Spark, Kafka, Sqoop, Hive, GCP, Python

Participated with the University’s Big Data research team to provide data processing and analytic solutions including streaming data ingestion, data transformation and data modelling.

Loaded huge volume of structured/unstructured data with ingestion tools such as Kafka and Sqoop.

Developed Kafka consumer to receive and store real time data from Kafka to GCP.

Configured Spark cluster and integrated it with the current Hadoop cluster.

Worked closely with the analytics team to predict the outcome for the model. Pneumonia Detection from Histopathological Images, Python, NumPy, pandas, Sklearn, TensorFlow

Used Python libraries to store, prepare and clean data to analyse and predict the probability of pneumonia in the images.

Implemented the pre-trained models like ResNet, Inception and compared each model with its accuracy.

Applied advanced CNN architectures like U-net and Deep Medic, achieved 69% accuracy. Used Google Collab platform to execute and compile the code. Stock Market Prediction, Python, ARIMA Model, Tableau, Excel

Cleaned and scaled the time series data. Performed visualization on a number of features in order to figure out the trends and seasonality present in the data.

Enhanced several strategies such as features selection and Principal Component Analysis (PCA) for dimensionality reduction.

Applied time series forecasting using the ARIMA model, used several ML algorithms such as Linear Regression, Naive Bayes, SVM and CNN and compared its accuracy to find the best model. Forecasted the stock price for the next 6 months, achieved 74% accuracy. Education

Master of Engineering in Electrical and Computer Science University of Western Ontario: London, Ontario

September 2018 - August 2019

Bachelors in Engineering- Computer Science

Gujarat Technological University: Gujarat, India August 2014 – June 2018

Contact this candidate