Post Job Free
Sign in

Data Scientist

Location:
Denton, TX
Posted:
June 09, 2020

Contact this candidate

Resume:

Soumyajit Paul

*** *********** ****, ******, ** *6205 *********.******@*****.*** +1-412-***-**** LinkedIn Kaggle PROFILE

- 3 years of professional experience in developing & deploying Anomaly Detection, Text Classification, Geo-Spatial Analytics and Time-Series Prediction ML models for Automobile Manufacturing and Energy Industry

- Experience in large scale data manipulation and advanced feature engineering using statistical measures

- Applied distributed database architecture (Teradata, Snowflake, OracleDB) for data manipulation of Industry Standard Financial Data, Hardware Sensor Data & Manufacturing Parts Data

- Deployed subscription based industry standard Tableau Dashboard for internal business stakeholders SKILLS

Dev Ops & Tools: • MS Azure • AWS • Jira • git • Snowflake • Teradata • Hadoop • Shell Scripting Analytics Tools: • Python • SQL • R • MATLAB • SPSS • Tableau Analytics Libraries: • SciKit • NumPy • Tensorflow • NLTK • Pandas • Geopandas • Gensim • Plotly Analytics Techniques: • Time-series Model • Support Vector • Geo-Spatial Analysis • Text Analysis • XG Boost

• AR-I-MA • Clustering • Decision Tree based Model • Natural Language Processing EMPLOYMENT HISTORY

Peterbilt Motors Co. Data Scientist – Advanced Analytics

(PACCAR Inc.) • Responsible for defining a couple of project charter after identifying the business needs Dallas, TX, USA • Processing, cleansing and verifying the integrity of the data used for analysis (ETL Pipeline) June’19 – Present • Generated business insights on Customer Analytics through Vehicle Performance metrics

• Conducted company-wide SQL training class to empower non data-savvy employees

.

Customer Profiling

(PACCAR Financial & Leasing) • Designed Similarity Score Ranking to predict loan repayment ability for vehicle leasing merchants

• Built Optimization Algorithm to identify the hotspots across N. America for new leasing Business

• Implemented labeling function for text data using P-O-S tagging & spell checker in NLP

• Developed algorithm to flag fraudulent transactions for Financial division based on ranking metric

.

Predictive Maintenance

(Product Development Team) • Developed Binary Classification model for Parts Failure using Decision Tree approach (on Naïve Bayes Base model) post to feature importance for each failure.

.

• Performed feature engineering to drive hypothesis testing and develop score Index for vehicle usage pattern (Big Data – 1Hz frequency data for millions of Vehicles in Service).

..

.

.

Apple Inc. Data Collection and Testing Engineer (Apple Watches) - Contract Cupertino, CA, USA • Integrated testing automation across the system through Python Scripting for anomaly detection December’18 – June’19 • Designed automation of analytical test cases using database frameworks and data visualization

• Identify health fitness trends on test samples and perform comparative analysis on sensor level health tracker data collected from the testing samples.

• Created actionable statistical (T-test, ANOVA and normality test) matrix for sensor data

..

.

.

Extensible Energy Inc. Associate Data Scientist (Time-Series Modeling) Berkeley, CA, USA • Evaluated optimal power consumption by analyzing seasonality and trends for customers January ‘18 – Dec’18 • Developed Solar-power forecasting and power demand prediction AR-I-MA models

• Applied ML algorithm (Support Vector & Naïve Bayes approach) on multivariate features EDUCATION Architecture Sustainable Building Energy Efficiency Carnegie Mellon University Master of Science in Advanced Infrastructure Systems Pittsburgh, PA, USA, Dec 2017 - Courses on Data Science & Visualization, Statistical Modeling and Data Management FELLOWSHIP

IFF–Real Estate Solutions Data Science Fellow (Home Loan Prediction) San Francisco, CA • Automated data classification process with traditional measure of financial capability paired October’18 – July ‘19 with equity-focused metrics (Interest Rate, Loan Amount, Demographic distribution)

• Developed classification model to predict client dropouts based on loan application data



Contact this candidate