Data Scientist Or Coding

Location:

Akron, OH, 44321

Posted:

June 22, 2021

Contact this candidate

Resume:

Arshita Srinivasa Reddy

330-***-****

*******.****@*****.***

Data Scientist/Machine Learning

Summary

Around 8+ years of experience on Data Analytics, Machine Learning (ML), Predictive Modeling and Natural Language Processing (NLP) and developing Enterprise applications SDLC

Develop Python programs to engineer the Data pre-processing, validation required for the ML Models

Excellent knowledge using python packages like (pandas, numpy, scikit learn, tensor flow, matplotlib, seaborn) for solving different problems.

Adept in applying Machine Learning like Supervised Learning(Classification and Regression) and Unsupervised learning(Clustering) on different datasets.

Familiarity with Machine Learning solution offerings/operationalize from cloud providers such as AWS, Azure, GCP.

Strong experience in developing applications using Flask, Django framework

Basic knowledge in Devops, Master data management

Knowledge in creating data mapping documents for Extracting, Transforming, and Loading (ETL) tasks.

Excellent knowledge in creating Databases, Tables, DDL/DML queries, Triggers, Views, User defined data types, effective functions, Cursors and Indexes using PSQL.

Extensive experience in-depth data analysis on different data bases and structures. Strong knowledge in writing PSQL and MySQL Queries, sub-queries, CTEs and complex joins.

Experience on Hive queries and tables that helped to identify trends and patterns on historical data

Knowledge in Hadoop Architecture, HDFS Framework and its eco system like Hadoop Map Reduce, HIVE, HBase, Sqoop and Oozie.

Knowledge of Java/Scala/Apache Spark

Performed statistical data analysis such as hypothesis testing, Anova, chi-sq test, regression, association, correlation etc.

Experience in Probability, Statistics & Mathematics foundation for AI/ML

Strong computer science fundamentals such as algorithms, data structures, multithreading, object-oriented development, distributed applications, client-server architecture

Technical Skills

Libraries

Pandas, Numpy, Scikit learn, Matplotlib, nltk, Plotly, Seaborn

Machine Learning Algorithm

Linear & Logistic Regression, SVM, Decision Tree, Random Forest, KNN, Naïve Bayes, K-Means Clustering, XG boost

Framework

Flask, Spring MVC, Django

Tools

Pycharm, Jupyter notebook, Anaconda, Eclipse

Database

SQLite, MongoDB

Cloud

Heroku, AWS[EC2,S3,Sagemaker]

Others

Html, CSS, Statistics [Hypothesis testing], AutoML, Postman, Rest API, Chatbot [google dialog flow], NLP, Jira, Dockers, kubernetes

J2EE (JDBC, EJB), XML, JSP, Servlet, Java Script

Professional Experience

State Auto Insurance, Columbus OH June 2019 – Till Date

Data Scientist/Machine Learning

Responsibilities

Working on developing AutoML application and built a classification methodology to determine whether a customer is placing a fraudulent insurance claim.

Involved in the entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modeling and data visualization with large data sets of structured and unstructured data.

Design and implement Machine learning models and data ingestion pipelines.

Experience with Continuous Data Science (CI/CD) tools

Practiced in exploratory data analysis (EDA) and manipulating large data sets

Collecting the data from different sources includes MS SQL Server, flat files-excel, csv, txt etc

Good understanding of MLOPS

Wrote python scripts to scrape web data for data usage/collection using Beautiful SOUP, and Scrapy.

Using GitHub repository to clone the code and commit the changes and push it to the develop branch from feature branch.

Performed Model Validation and Model Tuning with Model selection, K-ford cross-validation, Hold-Out Scheme and Hyperparameter tuning by Grid search to find the optimal hyperparameters for the model.

Perform data Preprocess, including resolving data quality issues, data transformation

Adhere to agile project management frameworks

Used PANDAS, NUMPY, SEABORN, MATPLOTLIB, SCIKIT-LEARN, SCIPY, NLTK, TensorFlow, Keras, PyTorch in Python for developing various machine learning algorithms.

Dockerization of model and deployment

Creating machine learning pipelines using big data technologies like pyspark etc.

Experience in data extraction, ingestion and processing of large data sets

Performed data analysis by using SQL to retrieve data from Oracle database

Utilized Spark SQL API in Pyspark to extract and load data and perform SQL queries.

Deployed end to end Machine learning applications in AWS EC2 instance

Environment: Power BI, AWS, JavaScript, HDFS, CSS, Python, Hive, Machine Learning Algorithms, NLP, Spark, A/B testing, Html, PyCharm, Flask, API, SQL, MongoDB, GitHub

AT&T, Atlanta GA Jan 2017- June 2019

Data Scientist

Responsibilities

Built an ML model to predict fault wafer for classification problem

Performed data validation, data insertion into Database using Mongo DB and processed files in batch coming from client as per requirement.

Dividing data into subsets of data for better model training using K-Means clustering algorithm

Worked on developing production ready models using Random forest, K means clustering

Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learnin Python for developing various machine learning algorithms and utilized machine learning algorithms such as XgBoostand Random Forest.

Creating machine learning pipelines using big data technologies like pyspark etc

Familiarity with language models (SpaCy, NLTK) and using them to operationalize and enhance chatbot user experience.

Used over sampling technique from python SMOTE library to up sample low occurrence target values to be learned more by the model.

Developed train and test data by train test split technique from scikit learn library and by using stratifying technique to split data into same proportion of high and low occurrence target values.

Used k-fold cross validation(scikit learn library) on train data to minimize variance and predict better results on test data.

Explored grid search cross validation(scikit learn library) to tune hyper parameters to find best parameters. Tested various performance metrics like accuracy,F1-score,Precision and Recall and used Recall to calculate the performance.

Used the Google colab environment for running python codes/jupyter notebooks.

Environment: Python(pandas, Machine Learning,numpy, scikit learn, matplotlib, seaborn, Pysql), R studio, Jupyter Notebooks, Google Colab, Excel,Django

Aprimo, Chicago IL March 2015 – Dec 2016

Data Scientist

Responsibilities

Tracked the sales of CPG in multiple categories to identify key drivers of the revenue generated on a weekly basis

Worked on end-to-end data pipeline solution: acquisition, extraction, transformation, loading and visualization of data.

Built machine learning (ML), artificial intelligence (AI) applications and deployed across huge data sets.

Working on NLP based project on analyzing tweets, review and feedbacks of brand. Working on leveraging the brand reputation by understanding customer’s emotions, requirements and needs in Python. Applying Topic modelling, data extracting, data pre-processing includes stemming, stop-word etc. handling outliers, missing values etc

Using AWS S3 for data storage

Extracted terabytes of structured and unstructureds data by using SQL queries and performed data mining tasks including handling missing data, data wrangling, feature scaling, outlier analysis in python by importing pandas.

Utilized data visualization tools such as Tableau and Python’s vast data visualization libraries to communicate findings to the data science, marketing and engineering teams.

Defining requirements, coding, and delivering new functionality to be rolled out to clients.

Defining requirements for and/or coding new internal analytical capabilities (eg. Balance and Profit and Loss monitoring)

Building relationships with clients, the global Prime team, internal business management, and support partners.

Solving issues for clients and ensuring the client experience is balanced with risk and other financial attribute allocation.

Building new statistical models and Machine Learning algorithms that drive Marketing Customer Experience, Credit Valuations, and Operations programs

Leading data science projects from concept to deployment for delivering substantial business value through the application of new-age AI-driven methodologies.

Utilizing Machine Learning algorithms (k-Means, Random Forest, Gradient Boosting, etc.) and using programming languages such as Python

Environment - Python, scipy, Machine Learning,Pandas, scikit-learn, matplotlib, k-Means, Random Forest, Gradient Boosting, SQL, Tableau

NTT DATA, India Feb 2013- Jan 2015

Software Engineer

Responsibilities

Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.

Supervised data collection and reporting. Ensured relevant data is collected at designated stages, entered into appropriate database(s) and reported appropriately.

Extensively worked on Data Acquisition and Data Integration of the source data from Data stage to Talend.

Performed Data analysis on list of tables that are in-scope of the project and determined the source and target at table/column level for the Data Mapping using SQL SSIS.

Dealt with descriptive statistical analysis like customer profiling, content classifications and categorical data analysis from databases.

Utilized SQL testing and scripting in Databases to integrate data for data analysis, development of data visualization for presentation.

Solving Production Issues and Bug fixing.

Good understanding of OOPs concept in Core java

Involved in the developing of PL/SQL script to fix production issues

Worked on the coding of Servlets and EJB communication

Worked on Maven for getting the latest jar files including common-collection.jar, common-logging.jar, etc from Apache.

Worked on analyzing code to fix UI issues like browser incompatibility, sound knowledge on debugging JSP pages using IE developer tool

Used IBM RSA as an IDE for application development.

Developed code for obtaining bean references in the Spring IOC framework.

Used defect tracker to track all the QA and Production issues

Handled production support of the application

Environment: Windows XP, Java 5.0, Eclipse 3.3, Tomcat 5.5, JSP, Servlets, Oracle PL SQL, Jira, caliber, WebLogic, unit, HTML, JDBC

Education:

Bachelor of Engineering [Electronics], VTU,2012

Certifications:

Oracle Certified Professional, Java SE 6 Programmer [Mar 2013]

ML Masters Certification, INeuron.ai

Contact this candidate