Resume

Data Scientist Wind Energy

Location:

Portland, OR

Posted:

February 18, 2023

Contact this candidate

Resume:

Profile Summary

• Aspiring data scientist with broad-based experience in building data extensive end-to-end solutions and PGP DSBA.

• Proficient in predictive modeling, data processing, Machine Learning algorithms AutoML platforms including Python language.

• Capable of developing highly adaptive & diverse AI/ML solutions to translate business and functional qualifications into substantial deliverables.

• Experienced in working with large datasets and performing data cleaning, visualization, analysis, and dashboard creation using Tableau. EDUCATION:

Post Graduate Program in Data Science and Business Analytics, University of Texas at Austin Year:2022 – 2023 Bachelor of Engineering, Canara Engineering College, Karnataka, India. Year: 2007 - 2011 PROJECTS

Project 1:

Project Title: Stock Prediction Clustering Using Machine Learning (Trade & Ahead)

• Developd a clustering model to group stocks based on their historical price and volume data, in order to inform investment decisions.

• Collected historical stock data and preprocessed it to prepare it for clustering. Developed a K-means clustering model using Python's scikit-learn library to group stocks into similar clusters based on their price and volume characteristics. Visualized the results using Matplotlib and Seaborne.

• Identified six clusters of stocks with distinct price and volume characteristics, including high-growth stocks, blue-chip stocks, and defensive stocks. The clustering analysis provided investors with valuable insights into the overall market trends and the characteristics of specific stocks.

• The model provided a more granular understanding of the stock market, which helped investors make more informed investment decisions. The analysis contributed to an improved accuracy of stock predictions and resulted in better portfolio performance. Project 2:

Project Title: Optimizing Employee Bandwidth with a Tableau Dashboard

• Developed a Tableau dashboard to track employee workloads and identify opportunities to optimize their time.

• Collected the source that includes a project management tool and timesheets and created a data model to support the analysis. Developed a Tableau dashboard with various visualizations, including charts, tables, and heat maps, to identify trends in employee workloads and to balance workloads across projects.

• The dashboard helped identify opportunities to optimize employee bandwidth, such as by reallocating resources to balance workloads and reduce overtime. This led to increased efficiency, better resource allocation, and cost savings.

•

Project 3:

Project Title: Loan Delinquent Analysis:

• As an analyst I worked on building a model which helps in identifying the criteria to approve the loans to borrowers.

• This model which I built also helps in determining the factors that drive the behavior of Loan delinquency.

• I used Logical regression technique with over sampling data and perform cross validation tests. Project 4:

Project title: RENEWIND:

• “ReneWind" is a company working on improving the machinery/processes involved in the production of wind energy using machine learning and has collected data of generator failure of wind turbines using sensors. SANIYA SHAIKH

BEAVERTON,OR

Mobile no: 669-***-**** LinkedIn profile: https://www.linkedin.com/in/sania-shaikh/ Email: advfde@r.postjobfree.com GitHub: https://github.com/saniashaikh89

• The objective is to build various classification models, tune them and find the best one that will help identify failures so that the generator could be repaired before failing/breaking and the overall maintenance cost of the generators can be brought down.

• The final tuned model (XGBoost) was chosen after building ~6 different machine learning algorithms & further optimizing for target class imbalance (having few "failures" and many "no failures" in dataset) as well as finetuning the algorithm performance (hyperparameter and cross validation techniques).

• The XGBoost tuned model is generalized well on the test data with Recall score of 0.887and Accuracy of 0.886. Project 5:

Project Title: EASY VISA

• Built a Classification ML model which was able to give generalized prediction on training & testing datasets (not prone to overfitting) and is able to explain over 80% of information (accuracy of 75% on test dataset & F1 score of 82% on test dataset). F1 Score was used as the metric for evaluation of the model to minimize both false positives and false negatives.

• Based on the EDA and the classification Model, was able to identify the important factors like Education for specialized occupation, Unit of wage, continent of the employee for visas getting certified than denied. Project 6:

Project Title: ReCell

Supervised Learning - Foundations

• Analyzed the data provided and developed a dynamic pricing strategy for used and refurbished phones and tablets using a linear regression model and identify factors that significantly influence the price of the refurbished phones.

• The model explains ~84% of the variation in the data and can predict the normalized used device price within ~4.5% and can be used for predictive purposes.

Project 7:

Project Title: E-NEWS EXPRESS:

Business Statistics

In this project I used the statistical analysis, a/b testing, and visualization to decide whether the new landing page of an online news portal (E-news Express) is effective enough to gather new subscribers or not. The simulated dataset has certain important metrics such as converted status and time spent on the page that will help to conclude the effectiveness of the new landing page. Apart from that, the dependence of conversion on the preferred language will also be analyzed in this project.

SKILLS AND TOOLS

• Statistics and Data Visualization: Descriptive, Statistical, Predictive Analytics, Seaborn, Matplotlib

• Python Libraries: Scikit-learn, Pandas, Numpy, Seaborn, MatplotLib, Streamlit, Pandas Profiling

• Programming Languages: Python (PyCharm, Visual Studio, Anaconda, Jupyter, Google Colab ), C, C++

• Database: MS SQL Server, Oracle SQL

• Data Visualization/Reporting tools: Tableau, Microsoft (Excel/Office/PowerPoint)

• Machine Learning: Classification and Regression Algorithms, Statistical Inference, Exploratory Data Analysis, Clustering Techniques, PCA, Recommendation Systems, Hyperparameter Tuning, Feature Selection, Scikitlearn, Numpy, Pandas ACHIEVEMENTS

• Participated on Data Science Hackathon project for predicting the annual turnover of the restaurants with least RMSE.

• Collected data from various sources to create a dataset of restaurants in a specific region. Conducted exploratory data analysis and preprocessed the data to prepare it for regression analysis. Developed a multiple linear regression model and used regularization to minimize the RMSE.

• Achieved a RMSE of 0.11, which outperformed other regression models that were evaluated. Found that the location, type of cuisine, and ratings were the most significant features affecting restaurant turnover.

• I achieved 2nd rank out of 28 participants in the bootcamp.

Contact this candidate