Machine Learning Data Analytics

Location:

Philadelphia, PA

Posted:

May 03, 2024

Contact this candidate

Resume:

Supriya Odnala

484-***-**** *********@*****.*** https://www.linkedin.com/in/supriya-odnala/

PROFESSIONAL SUMMARY:

●Certified Professional with over 4+ years of experience in Machine Learning, Data Mining with large data sets of Structured and Unstructured data, Data Validation, Acquisition, Visualization and Predictive modeling

●Experienced in Data Analytics, Data/Ad-hoc Reporting, Graphs, Scales, Pivot Tables and OLAP reporting.

●Experienced in creating cutting edge data processing algorithms to meet project demands.

●Expert in performing Data preparation and exploration to build the appropriate machine learning model.

●Hands on experience on ETL, Data warehousing, Data Store concepts and OLAP technologies.

●Worked on user interface using HTML/CSS, JavaScript and JQuery.

●• Expertise in Machine Learning models like Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, clustering (K-means, Hierarchical), Bayesian

●Expertise in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.

●Working experience in Amazon Web Services (AWS), Microsoft Azure and GCP(Google Cloud Platform) cloud computing environment.

●Experienced in SQL programming and creation of relational database models.

●Proficient in Statistical Modeling and Machine Learning techniques in Forecasting/Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA.

●Implemented Machine learning techniques on structured and unstructured data with equal proficiency.

●Developed predictive models using Decision Tree, Random Forest and Naïve Bayes.

●Regression Analysis (Linear Regression, Lasso Regression, Ridge Regression & Elastic net Regression)

●Competent at researching, visualizing and analyzing raw data in order to identify recommendations for meeting organizational challenges.

●Non-Parametric Fast Learning ML Algorithms (Decision Tress, Random Forest, Gradient Boosting (Xgboost, Light Gradient Boosting, CATBoost), SVM)

●Deep Learning Models (Neural Network: ANN, CNN, RNN), Deep Learning Frameworks (Tensor Flow, Keras, H20). Worked on Data Visualization tools like Tableau and Power BI

●Used dimensionality reduction techniques and regularization techniques.

●Expert in data flow between primary DB and various reporting tools. Expert in finding Trends and Patterns within Datasets and providing recommendations accordingly.

●Proficient in requirement gathering, writing, analysis, estimation, use case review, scenario preparation, test planning, strategy decision making, test execution, test results analysis, team management and test result reporting. Has very rich experience in working with large data sets and classification of data.

●Expert in analyzing and extracting relevant information from large amounts of data to help automate for self-monitoring, self-diagnosing and optimize key process. Have outstanding proficiency in understanding statistical and other tools/languages such as R, Python, C, C++, MATLAB and LT spice.

●Proficient in data visualization tools such as Tableau and Python Matplotlib (plotly) to create visually powerful and actionable interactive reports and dashboards.

●Excellent knowledge in numerical and scientific libraries such as SciPy and NumPy.

TECHNICAL SKILLS:

Data Analytics:

Python (NumPy, SciPy, pandas, Seaborn, Plotly, Matplotlib), Power BI, Pivot Tables, Charts, Data Connection, Data Validation

Data Visualization:

Tableau, Visualization packages, Microsoft Office, Interactive Dashboard, Storyline, Grouping, Bin

Software & Tools:

MS Projects, Excel, Spyder, PyCharm, Jupyter, MATLAB, PostgreSQL, Ambari Sandbox, HDFS, Hive, TensorFlow, Keras

Databases:

SQL, NoSQL, MS Access

Operating System:

Windows, UNIX, Linux, Mac OS

Languages:

Python, C, C++

Machine Learning:

Regression, Classification, Clustering, Association, Simple Linear Regression, Multiple Linear Regression, Decision Trees, Random Forest, Logistic Regression, K-NN, SVM, Recommendation system, Association Rules, Apriori, PCA, Time series Analysis, KNNeighbor, Unsupervised Learning, NLTK, Count Vectorizer, TFiDF, C-Fuzzy Clustering, Reinforcement Learning.

Deep Learning

Artificial Neural Network, Spacy, Keras, Transformers, Convolution Neural Network, Recurrent Neural Network

PROFESSIONAL EXPERIENCE:

United Health Group – Data Scientist May 2023 – Present

Implemented Churn Prediction of Unbalanced Customers Dataset using classification ML algorithms like Random Forest, XgBoost and ANN. ANN achieved the highest Precision of 82%. Reduced Churn by 4.5%.

Extracted large datasets (5M+ rows) from Teradata using SQL queries, API Queries, JSON and Excel/CSV files and merged data from different data sources to form a Master data. These datasets included confidential customer info and KPIs like 5G uptime, Signal Strength, DL / UL speed, etc.

Imbalance dataset was used since the number of churn customers will always be less than non-churn customers. Hence, sklearn equal balancing library imblearn was used to make churn and non-churn classes equal.

ROC/AUC curve was plotted to find the optimal threshold for classification and classification metrics like Precision / Accuracy was used.

Performed version controlling of the code in Git using bash terminal with proper committing and documentation.

Generalized feature extraction in the machine learning data pipeline which improved efficiency throughout the system.

Performing the Post pruning techniques in machine learning to reduce the complexity of the final classifier which results in improving the predictive analysis by reducing over fitting, using python libraries(sklearn)

Designed several high-performance prediction models using various packages in Python like Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, Pandas-data reader, and Stats models.

Used reinforcement learning techniques to enhance model performance.

Connected SQL engine to Python to extract data from MySQL, made inner / outer to connect 5-10 different Databases to extract the right data source.

Different data types were grouped by building interactive reporting dashboard using Verizon’s customer data of 5G, Uptime, Data, Signal strength etc.

Any anomaly in signal, SNR, 5g Data usage etc. was detected and active investigation was done to mitigate the customer issue as soon as possible.

Built a Production ready Front-End model using Flask python web-framework to take inputs from end users. Deployment ready model for Heroku/AWS/GCP platform using CI/CD pipelines.

Used A/B testing to identify if a new Firmware push to devices had significant impact on 5G Uptime / DL speed etc. Control and experimental groups were created to perform one sample / Paired two tail Z-test to compare mean value confidence interval of 95%.

Null and alternate hypothesis were defined, and Z-test was used to find the p-value statistics and compared with the confidence level threshold to accept/reject the null hypothesis.

Similar A/B testing was done on state vs state population and state vs country population.

Collaborated with the Marketing and Finance team to report these findings to stakeholders in non-technical language.

Developed a Time-Series forecasting model to forecast the sales of Internet devices over the next quarter / 6 months with a MSE of 2000.

Used AR, MA, ARIMA baseline models to forecast. Plotted ACF and PACF plot and implemented Auto_ARIMA to identify the AR and MA component that would yield least MSE.

AD-Fuller test was used to identify stationarity of the dataset and how to remove it before feeding it to the ARIMA model.

Maintained and led JIRA meetings every week with the developer team to add new features and resolve existing ones in the dashboard.

Presented all the findings to the higher stakeholders and larger audience on ad-hoc basis in a non-technical language using plots and charts.

ENVIRONMENT: Random Forest, XGBoost, ANN, Teradata, SQL, API, JSON, Excel/CSV, feature extraction, ROC/AUC curve, Git version control, Flask web-framework, A/B testing (Firmware push, population comparison), Time-Series forecasting (ARIMA), JIRA meetings, Reinforcement Learning, stakeholder reporting.

Infosys Pvt LTD - Data Scientist April 2019 - August 2022

Participate in the requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.

Worked with a team of developers on Python applications for RISK management.

Used Distributed Version Control System to commit, pull, changes to the source code. GitHub and Git was used.

Developed Python application for Google Analytics aggregation and reporting.

Worked on Python Open stack API's, used Python scripts to update content in the database and manipulate files.

Implemented machine learning models using Python libraries Scikit-learn and SciPy.

Experience in MVC architecture using Django for web-based application in OOP concepts.

Worked on several python packages like Matplotlib, Pillow, NumPy, sockets.

Worked on data transformation, data sourcing and mapping, Conversion, and loading.

Designed and deployed machine learning solutions in Python to classify millions of previously unclassified users into core data products.

Utilized NLTK to for NLP for text mining and utilized OCR for contract-based PDF classification.

Used Pandas API to put the data as time series and tabular form for east timestamp data manipulation and retrieval to handle time series data and do data manipulation.

Used Machine-learning techniques like unsupervised Classification, optimization, and prediction.

Worked on Python open stack API's.

ENVIRONMENT: Python application development (Google Analytics, RISK management), Git for version control, Django for MVC web-based applications, Scikit-learn, SciPy for ML models, NLTK for NLP, Pandas for time series data manipulation, OpenStack APIs.

PROJECTS EXECUTED

Nutrition Analysis and Food Recommendation System

Led the development of a Nutrition Analysis and Food Recommendation System, curating and preprocessing diverse datasets from prominent sources.

Implemented Model-Based Collaborative Filtering (SVD) and Content-Based Filtering (Cosine Similarity), proposing an Ensemble Model for a robust recommendation system.

Empowered individuals to make health-conscious choices through cutting-edge technology, advanced modeling, and comprehensive nutritional analysis.

House Price Prediction

Utilized advanced feature engineering techniques to identify and incorporate crucial predictors such as location, size, and amenities, optimizing the dataset for accurate house price predictions.

Applied advanced regression models, including linear regression, Random Forests, and Gradient Boosting, to analyze housing market trends. Notably, the implementation of Random Forests resulted in a 20% improvement in predictive accuracy, effectively capturing intricate non-linear relationships in the data and significantly refining the precision of house price predictions.

EDUCATION:

Master’s in data science GPA: 3.8/4.00

Contact this candidate