Data Scientist Analytics

Location:

Milwaukee, WI

Posted:

March 05, 2024

Contact this candidate

Resume:

Name: Obarai Vasanth Akkidasari Email ID: *********@*****.***

Phone: 414-***-**** LinkedIn: LinkedIn Profile Git: Link

Professional summary: Data Scientist with 4+ years of professional experience in the technology industry. Proven ability to use data to drive business insights and Improve decision making. I have expertise in a variety of data science and data analytics techniques which include Data Structures and Algorithm, Core Java, Statistics analysis, data visualization, EDA, Python, PySpark, Machine Learning, Deep Learning, NLP, SQL, Tableau and AWS. Strong communication and presentation skills.

Kaggle Competition score: 90th Percentile

EDUCATION:

Master of Science (M.S.) in Computer Science, University of Wisconsin Milwaukee.

May 2023, GPA 3.4

Master of Technology in Digital Systems & Computer Electronics (DSCE-ECE), Jawaharlal Nehru Technological University, Hyderabad, India, GPA 3.1.

SKILL SET:

Python, Core Java, Data Structures & Algorithm, Statistics, Data Analytics, Machine Learning (ML), Deep Learning (DL), PySpark, Natural Language Processing (NLP), Computer Vision, MySQL, SQLite, Tableau, Amazon web services AWS, Cloud computing, Git, Git Hub, EDA, Data Cleaning, Data Wrangling, Feature Selection, Requirement management tools (Jira, Jama), AI.

Libraries: Pandas, NumPy, Matplotlib, Seaborn, SK Learn, SciPy, NLTK, Tensor Flow, Keras.

Techniques: Hyper Parameter Tuning, Cross Validation, Over Fitting/Under fitting, ML Pipeline.

Algorithms: Regression (Simple Linear Regression, Multiple Linear Regression, Polynomial Regression, Lasso, Ridge,), Classification (Logistic Regression, KNN, SVM’s, Decision Trees, Random Forests, Boosting (Gradient, ADA Boosting, XG-Boosting), Ensemble Techniques, Bagging, Clustering (K-Means, Hierarchical).

CERTICATIONS:

IBM Data Science Professional Certificate-2023 (Coursera certification).

Google Advanced Data Analytics Certificate-2023(Grow with Google Coursera)

AWS Certified Cloud Practitioner by Udemy.

Python for everybody (Coursera certification).

Database and SQL for beginners (Udemy).

EXPERIENCE:

Client: DocuSign- Data Scientist Duration: 2023 Jan - Till Date.

Key Responsibilities & Contributions:

Method followed is CRISP-DM (Cross Industry Standard Process-Data Mining). Business Problem Understanding, data understanding, data preprocessing (EDA, data cleaning, data wrangling, train test split), modelling, evaluation, and deployment.

Implemented Model Building by using various algorithms for regression- (Simple linear regression, multiple linear regression, polynomial regression), regularization (lasso, Ridge, elastic).

Implemented Model Building by using various algorithms for classification-logistic regression, K-NN, SVM, Decision tree, Random Forest, ada-boosting, gradient and xg-boosting algorithms. Identified the best model.

Started from data preprocessing (EDA, data cleaning, data wrangling, train test split), modelling, performed Exploratory data analysis, data Visualization, data Cleaning. Identify business challenges and provide data-driven solutions to address them.

The role of a data scientist involves working with data to extract valuable insights that can inform business decisions.

Clean and preprocess data to ensure accuracy and consistency. This may involve handling missing values, removing duplicates, and transforming data formats.

Use statistical methods and tools to analyze data and identify trends, patterns, and insights. This may include exploratory data analysis, hypothesis testing, and regression analysis.

Creating visual representations of data through Tableau/Power BI, charts, graphs, and dashboards. Communicating complex findings to non-technical stakeholders.

Generating regular reports based on data analysis for decision-making purposes. Presenting reports to team members or management.

Experience developing solutions using agile methods. Experience with Testing or data accuracy/quality designs.

Deep understanding of analytics, data science applications, and machine learning systems. Development of requirement driven data models such dimensional and relational.

Using programming languages (e.g., Python, R, SQL) to manipulate and analyze data. Ensuring compliance with data governance policies and standards. Implementing best practices for data management.

Technology used: Machine Learning, Python, NumPy, Pandas. Finally predicted values using our trained model. evaluation, and deployment.

Experience in developing applications using amazon web services like EC2, Cloud Search, Elastic Load balancer ELB, S3, CloudFront, Route 53, Virtual private clouds (VPCs), and Lambda Services. Finally worked on development.

Client: USBank -Data Scientist Duration:3+ Years (2018 Nov-2021 Nov)

The work involves analyzing CRISP-DM methodology from understanding the business problem to deployment through data cleaning, data wrangling, exploratory data analysis, data visualization, Python (NumPy, Pandas, Matplotlib, Requests, SciPy and Scikit-learn), Statistics analysis, implemented model building by using various ML algorithms. Worked-on deep learning and NLP, CNN, feature selection, ML pipeline clustering, MySQL, tableau, AWS etc. I have experience having experience of 20+ real time projects in various domains and knowledge of 20+ algorithms.

Key Responsibilities & Contributions:

The work involves analyzing CRISP-DM (Cross Industry Standard Process-Data Mining) from Business understanding, data understanding, data preparation, modelling, evaluation, and deployment.

Performed data cleaning, data wrangling, exploratory data analysis, data visualization by using Python (NumPy, Pandas, Matplotlib, SciPy and Scikit-learn libraries), worked on Statistics analysis, implemented model building by using various ML algorithms.

Started with CRISP-DM Methodology from business problem understanding, Performed Exploratory data analysis, data Visualization, data Cleaning.

Imported libraries, treated missing values, outliers, and null values. Converted the text data into feature vectors.

Performed data cleaning, data wrangling, feature transformation, feature scaling. In addition, worked on descriptive statistics and inferential statistics I, e hypothesis testing, statistical testing and confusion matrix analysis, train test split.

Implemented Model Building by using various algorithms for regression- (Simple linear regression, multiple linear regression, polynomial regression), regularization (lasso, Ridge, elastic).

Predicted the best ML algorithm by using test accuracy and done the deployment by using job lib or pickle. Technologies: Python, TensorFlow, Keras, Matplotlib, NumPy, Pandas, KNN Model, Jupiter lab

Worked-on deep learning and NLP, ANN, CNN, feature selection, ML pipeline, clustering, MySQL, tableau, AWS etc. Moreover, having experience of 20+ real time projects in various domains and knowledge of 20+ algorithms.

Proficiency in navigating various types of database models and DBMSs to create data sets for analytics and model training and development.

Strong Python and cloud compute skills. Strong SQL skills and background in ETL. Deep understanding of distributed data systems and applications

Using programming languages (e.g., Python, SQL) to manipulate and analyze data. Ensuring compliance with data governance policies and standards. Implementing best practices for data management.

Creating visual representations of data through Tableau/Power BI, charts, graphs, and dashboards. Communicating complex findings to non-technical stakeholders.

Worked on Pyspark in expensively on large datasets I, e performed data cleaning, data wrangling, feature selection, encoding, treated missing values, treated outliers, finally done the regression analysis.

PROJECTS:

Data Science: Market Segmentation: GitRepo

It’s a regression problem. Started from Business Problem Understanding, data understanding, data preprocessing (EDA, data cleaning, data wrangling, train test split), modelling, evaluation, and deployment. Methodology (CRISP-DM)

The following machine learning techniques used Simple Linear regression, Multiple Linear Regression, Polynomial Regression and Regularization. Finally predicted the best ML algorithm by using test accuracy and done the deployment.

Pyspark: Regression: GitRepo

Started with the Pyspark session, read the dataset, added/dropped performed feature engineering on columns, handled the missing values, performed filter operations. Implemented Simple Linear regression, train-test split, modelling, evaluation and finally done the deployment.

Data Science: University Recommendation System: GitRepo

Started with CRISP-DM Methodology from business problem understanding, Performed Exploratory data analysis, data Visualization, data Cleaning.

Implemented Model Building by using various algorithms like linear regression, lasso, SVM, Decision tree, Random Forest, K-NN Algorithm. Identified the best model. Verified the linear Regression algorithm has the highest accuracy. Finally predicted the best ML algorithm by using test accuracy and done the deployment.

Technologies: Python, TensorFlow, NumPy, Pandas, KNN Model, Jupiter lab.

Data Analysis: EDA (Exploratory Data Analysis): GitRepo

Started with CRISP-DM Methodology from business problem understanding, Performed Exploratory data analysis, data Visualization, data Cleaning, data wrangling.

Performed statistical analysis I, e descriptive and inferential along with done the feature engineering, performed univariant analysis and bivariate analysis. Treated missing values and outliers. Finally done the encoding.

NLP: Recommendation Engine: Git Repo

Imported libraries, identified null values, converted the text data into feature vectors. Moreover, identifies cosine similarity, sorted similarity score, acquired close match, and finally verified similar movies based on similarity score. Technology used: Machine Learning, NLP, Python, NumPy, Pandas.

Deep Learning: CNN: Multi-Classification Review of CNN Models:

Performed Multi classification on cats & dogs’ data set using 4 different CNN architectures.

CNN Steps: Convolution, Max pooling, Flattening, Full connection.

Technologies: TensorFlow, Keras, NumPy, Pandas, Matplotlib.

Payroll System Implementation- Python & Django:

This project is implemented by using Django and Python created user interface for payroll and monthly and weekly payroll. Created a payroll system database to calculate staff members’ salary with all benefits. We are considering waterfall model SDLC for development of this project. Additionally, generated reports on pdf and csv format. Technologies: Python, Django, HTML.

Contact this candidate