PREETPAL SINGH
***********@*****.*** 804-***-****
linkedin.com/in/preetpal725 github.com/preetpal725
Professional Summary
Data Scientist with 5 years of experience in Data Science using Python/R and Big Data.
Implemented complex ML and big data solutions for clients to get business profits and customer satisfaction.
Trained, tested and deployed machine learning models, evaluation metric, performed hyper-parameter tuning to get the desired results as per the business requirement.
Experienced in implementing machine learning solutions using GCP AI Platform, BigQuery; Azure Databricks, HDInsight; AWS SageMaker.
Contributed in developing automated products for Feature Engineering and Machine Learning.
Experienced in working with small and large distributed datasets with different frameworks such as Pandas, Spark, TensorFlow, Keras, Dask and Koalas.
Expert in building ETL pipelines with Python, PySpark and SQL.
Developed and hosting Rest APIs using Docker and Kubernetes.
Technical Skills
Machine Learning
Regression, Classification, Ensemble Learning, Deep Learning Clustering, Neural Network
Cloud Platforms
GCP (AI Platform, BigQuery), AWS (S3, EC2, Kinesis, Glue, DMS, SageMaker, Redshift), Azure (Databricks, HDInsight), Docker and Kubernetes
Libraries/ Frameworks
Scikit-Learn, PySpark, TensorFlow, Keras, Pandas, Koalas, Dask, NumPy, Matplotlib
Big Data
Spark, MapReduce, Kafka, HDFS, HIVE, HQL, Sqoop
Programming
Python, R, Java, C/C++
Database
SQL (Oracle DB/ MySQL), MongoDB
Analytics
Tableau, MS PowerBI
Professional Experience
Data Scientist/ Engineer- Miracle Software Systems Aug 2019 - Present
Technologies: Python, Spark, Machine Learning, Big Data, GCP, Azure, AWS, SQL
Product: AI-PaaS (AI Platform-as-a-Service)
Developed automated Machine Learning platform for developers with limited ML experience to train high quality models by developing AI-PaaS (AI Platform as a Service) product.
Contributed in developing FEDA (Feature Engineering & Exploratory Data Analysis) product to automate data pre-processing & feature engineering and reduce 60-70% of time spent on data pre-processing and cleansing.
Contributed in developing autoML product for reducing ML complexities for developers to show top 5 best fit models for given data and perform hyper-tuning.
Implemented Big Data/ Hadoop for faster data extraction from any source using Kafka and storing features in HDFS and running HIVE for Data Analytics.
Project: Predictive Analytics on AR with COVID-19 Public Dataset
Implemented and hosted webinars on Data Science solutions for Account Receivables and Google’s COVID-19 public data for customer analytics.
Implemented ML training & prediction jobs for batch and online data using GCP AI Platform.
Performed Data Analysis on GCP BigQuery and helped in creating real-time dashboards in Tableau for better real-time visualizations.
Project: Invoice Outcome Prediction
Implemented end-to-end Machine Learning solution on Azure Databricks and HDInsight using PySpark for ‘Invoice outcome prediction’ on Account Receivables data.
Predicted invoice payments using customer behavior analytics by applying multi-class classification to predict customer’s payment time using ensemble learning in PySpark and achieved 95% accuracy.
Trained, tested and deployed ML models in MLFlow and created version control to track model performance.
Hosted Rest APIs using Docker and Kubernetes to make them fault tolerance.
Project: ETL & Data Warehousing using AWS
Implemented data extraction from multiple sources such as on-premise, Oracle and SAP.
Created data pipelines and ETL workflows in AWS Glue using PySpark jobs to transform data as per business requirement and storing it in S3 bucket.
Provided data warehousing solution using Star schema in AWS Redshift for incremental loads.
Data Mining Assistant- George Mason University Aug 2017 – May 2019
Technologies: Python, Java, SQL, Machine Learning, Azure
Taught Data Mining & Machine Learning techniques to unlock hidden patterns & trends along with managing labs to clear the understanding of students.
Developed, trained and tested machine learning algorithms (Regression, Ensemble models, Deep Neural Networks) in Python.
Assisted professors in developing big data solutions to store 5 million+ data points to prepare for text classification and sentiment analysis using unstructured data.
Taught databases concepts using SQL on Oracle DB with Star/Snowflake schema.
Software Engineer- Infosys Ltd.- Oct 2015 - Aug 2017
Technologies: Java, Python, SQL
Provided key insights on ‘Customer Segmentation’ based on Recency, Frequency & Monetary value (RFM) using K-Means clustering in Python & PySpark on Databricks.
Forecasted quicker, more dynamic and efficient trends by performing ‘Sales Forecasting’ using LSTM model consolidating monthly and yearly sales.
Research and developed data pipelines on Azure Data Factory to load data from on-premises and MySQL Server and load it into Azure Data Lake along with email notification.
Contributed in automatic translation of documents into 18 different languages using Java and SQL to provide 23% faster transformations with reduced latency.
Associate Software Engineer- Honda Ltd. Jan 2015 – Sep 2015
Technologies: Python, SQL, Machine Learning
Performed log analysis and anomaly detection on manufacturing and engine assembly data using Python.
Created data pipelines for fetching data from SQL Server and dumping it into Azure Data Lake.
Worked on data visualization tools such as Tableau to create data analytics dashboards.
Education
Master’s- George Mason University, Fairfax, VA May 2019
Bachelor’s- Chitkara University, Punjab, India May 2015