Soumyajit Paul
*** *********** ****, ******, ** *6205 *********.******@*****.*** +1-412-***-**** LinkedIn Kaggle PROFILE
- 3 years of professional experience in developing & deploying Anomaly Detection, Text Classification, Geo-Spatial Analytics and Time-Series Prediction ML models for Automobile Manufacturing and Energy Industry
- Experience in large scale data manipulation and advanced feature engineering using statistical measures
- Applied distributed database architecture (Teradata, Snowflake, OracleDB) for data manipulation of Industry Standard Financial Data, Hardware Sensor Data & Manufacturing Parts Data
- Deployed subscription based industry standard Tableau Dashboard for internal business stakeholders SKILLS
Dev Ops & Tools: • MS Azure • AWS • Jira • git • Snowflake • Teradata • Hadoop • Shell Scripting Analytics Tools: • Python • SQL • R • MATLAB • SPSS • Tableau Analytics Libraries: • SciKit • NumPy • Tensorflow • NLTK • Pandas • Geopandas • Gensim • Plotly Analytics Techniques: • Time-series Model • Support Vector • Geo-Spatial Analysis • Text Analysis • XG Boost
• AR-I-MA • Clustering • Decision Tree based Model • Natural Language Processing EMPLOYMENT HISTORY
Peterbilt Motors Co. Data Scientist – Advanced Analytics
(PACCAR Inc.) • Responsible for defining a couple of project charter after identifying the business needs Dallas, TX, USA • Processing, cleansing and verifying the integrity of the data used for analysis (ETL Pipeline) June’19 – Present • Generated business insights on Customer Analytics through Vehicle Performance metrics
• Conducted company-wide SQL training class to empower non data-savvy employees
.
Customer Profiling
(PACCAR Financial & Leasing) • Designed Similarity Score Ranking to predict loan repayment ability for vehicle leasing merchants
• Built Optimization Algorithm to identify the hotspots across N. America for new leasing Business
• Implemented labeling function for text data using P-O-S tagging & spell checker in NLP
• Developed algorithm to flag fraudulent transactions for Financial division based on ranking metric
.
Predictive Maintenance
(Product Development Team) • Developed Binary Classification model for Parts Failure using Decision Tree approach (on Naïve Bayes Base model) post to feature importance for each failure.
.
• Performed feature engineering to drive hypothesis testing and develop score Index for vehicle usage pattern (Big Data – 1Hz frequency data for millions of Vehicles in Service).
..
.
.
Apple Inc. Data Collection and Testing Engineer (Apple Watches) - Contract Cupertino, CA, USA • Integrated testing automation across the system through Python Scripting for anomaly detection December’18 – June’19 • Designed automation of analytical test cases using database frameworks and data visualization
• Identify health fitness trends on test samples and perform comparative analysis on sensor level health tracker data collected from the testing samples.
• Created actionable statistical (T-test, ANOVA and normality test) matrix for sensor data
..
.
.
Extensible Energy Inc. Associate Data Scientist (Time-Series Modeling) Berkeley, CA, USA • Evaluated optimal power consumption by analyzing seasonality and trends for customers January ‘18 – Dec’18 • Developed Solar-power forecasting and power demand prediction AR-I-MA models
• Applied ML algorithm (Support Vector & Naïve Bayes approach) on multivariate features EDUCATION Architecture Sustainable Building Energy Efficiency Carnegie Mellon University Master of Science in Advanced Infrastructure Systems Pittsburgh, PA, USA, Dec 2017 - Courses on Data Science & Visualization, Statistical Modeling and Data Management FELLOWSHIP
IFF–Real Estate Solutions Data Science Fellow (Home Loan Prediction) San Francisco, CA • Automated data classification process with traditional measure of financial capability paired October’18 – July ‘19 with equity-focused metrics (Interest Rate, Loan Amount, Demographic distribution)
• Developed classification model to predict client dropouts based on loan application data