Arshita Srinivasa Reddy
adnaqy@r.postjobfree.com
Data Scientist/Machine Learning
Summary
Around 8+ years of experience on Data Analytics, Machine Learning (ML), Predictive Modeling and Natural Language Processing (NLP) and developing Enterprise applications SDLC
Develop Python programs to engineer the Data pre-processing, validation required for the ML Models
Excellent knowledge using python packages like (pandas, numpy, scikit learn, tensor flow, matplotlib, seaborn) for solving different problems.
Adept in applying Machine Learning like Supervised Learning(Classification and Regression) and Unsupervised learning(Clustering) on different datasets.
Familiarity with Machine Learning solution offerings/operationalize from cloud providers such as AWS, Azure, GCP.
Strong experience in developing applications using Flask, Django framework
Basic knowledge in Devops, Master data management
Knowledge in creating data mapping documents for Extracting, Transforming, and Loading (ETL) tasks.
Excellent knowledge in creating Databases, Tables, DDL/DML queries, Triggers, Views, User defined data types, effective functions, Cursors and Indexes using PSQL.
Extensive experience in-depth data analysis on different data bases and structures. Strong knowledge in writing PSQL and MySQL Queries, sub-queries, CTEs and complex joins.
Experience on Hive queries and tables that helped to identify trends and patterns on historical data
Knowledge in Hadoop Architecture, HDFS Framework and its eco system like Hadoop Map Reduce, HIVE, HBase, Sqoop and Oozie.
Knowledge of Java/Scala/Apache Spark
Performed statistical data analysis such as hypothesis testing, Anova, chi-sq test, regression, association, correlation etc.
Experience in Probability, Statistics & Mathematics foundation for AI/ML
Strong computer science fundamentals such as algorithms, data structures, multithreading, object-oriented development, distributed applications, client-server architecture
Technical Skills
Libraries
Pandas, Numpy, Scikit learn, Matplotlib, nltk, Plotly, Seaborn
Machine Learning Algorithm
Linear & Logistic Regression, SVM, Decision Tree, Random Forest, KNN, Naïve Bayes, K-Means Clustering, XG boost
Framework
Flask, Spring MVC, Django
Tools
Pycharm, Jupyter notebook, Anaconda, Eclipse
Database
SQLite, MongoDB
Cloud
Heroku, AWS[EC2,S3,Sagemaker]
Others
Html, CSS, Statistics [Hypothesis testing], AutoML, Postman, Rest API, Chatbot [google dialog flow], NLP, Jira, Dockers, kubernetes
J2EE (JDBC, EJB), XML, JSP, Servlet, Java Script
Professional Experience
State Auto Insurance, Columbus OH June 2019 – Till Date
Data Scientist/Machine Learning
Responsibilities
Working on developing AutoML application and built a classification methodology to determine whether a customer is placing a fraudulent insurance claim.
Involved in the entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modeling and data visualization with large data sets of structured and unstructured data.
Design and implement Machine learning models and data ingestion pipelines.
Experience with Continuous Data Science (CI/CD) tools
Practiced in exploratory data analysis (EDA) and manipulating large data sets
Collecting the data from different sources includes MS SQL Server, flat files-excel, csv, txt etc
Good understanding of MLOPS
Wrote python scripts to scrape web data for data usage/collection using Beautiful SOUP, and Scrapy.
Using GitHub repository to clone the code and commit the changes and push it to the develop branch from feature branch.
Performed Model Validation and Model Tuning with Model selection, K-ford cross-validation, Hold-Out Scheme and Hyperparameter tuning by Grid search to find the optimal hyperparameters for the model.
Perform data Preprocess, including resolving data quality issues, data transformation
Adhere to agile project management frameworks
Used PANDAS, NUMPY, SEABORN, MATPLOTLIB, SCIKIT-LEARN, SCIPY, NLTK, TensorFlow, Keras, PyTorch in Python for developing various machine learning algorithms.
Dockerization of model and deployment
Creating machine learning pipelines using big data technologies like pyspark etc.
Experience in data extraction, ingestion and processing of large data sets
Performed data analysis by using SQL to retrieve data from Oracle database
Utilized Spark SQL API in Pyspark to extract and load data and perform SQL queries.
Deployed end to end Machine learning applications in AWS EC2 instance
Environment: Power BI, AWS, JavaScript, HDFS, CSS, Python, Hive, Machine Learning Algorithms, NLP, Spark, A/B testing, Html, PyCharm, Flask, API, SQL, MongoDB, GitHub
AT&T, Atlanta GA Jan 2017- June 2019
Data Scientist
Responsibilities
Built an ML model to predict fault wafer for classification problem
Performed data validation, data insertion into Database using Mongo DB and processed files in batch coming from client as per requirement.
Dividing data into subsets of data for better model training using K-Means clustering algorithm
Worked on developing production ready models using Random forest, K means clustering
Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learnin Python for developing various machine learning algorithms and utilized machine learning algorithms such as XgBoostand Random Forest.
Creating machine learning pipelines using big data technologies like pyspark etc
Familiarity with language models (SpaCy, NLTK) and using them to operationalize and enhance chatbot user experience.
Used over sampling technique from python SMOTE library to up sample low occurrence target values to be learned more by the model.
Developed train and test data by train test split technique from scikit learn library and by using stratifying technique to split data into same proportion of high and low occurrence target values.
Used k-fold cross validation(scikit learn library) on train data to minimize variance and predict better results on test data.
Explored grid search cross validation(scikit learn library) to tune hyper parameters to find best parameters. Tested various performance metrics like accuracy,F1-score,Precision and Recall and used Recall to calculate the performance.
Used the Google colab environment for running python codes/jupyter notebooks.
Environment: Python(pandas, Machine Learning,numpy, scikit learn, matplotlib, seaborn, Pysql), R studio, Jupyter Notebooks, Google Colab, Excel,Django
Aprimo, Chicago IL March 2015 – Dec 2016
Data Scientist
Responsibilities
Tracked the sales of CPG in multiple categories to identify key drivers of the revenue generated on a weekly basis
Worked on end-to-end data pipeline solution: acquisition, extraction, transformation, loading and visualization of data.
Built machine learning (ML), artificial intelligence (AI) applications and deployed across huge data sets.
Working on NLP based project on analyzing tweets, review and feedbacks of brand. Working on leveraging the brand reputation by understanding customer’s emotions, requirements and needs in Python. Applying Topic modelling, data extracting, data pre-processing includes stemming, stop-word etc. handling outliers, missing values etc
Using AWS S3 for data storage
Extracted terabytes of structured and unstructureds data by using SQL queries and performed data mining tasks including handling missing data, data wrangling, feature scaling, outlier analysis in python by importing pandas.
Utilized data visualization tools such as Tableau and Python’s vast data visualization libraries to communicate findings to the data science, marketing and engineering teams.
Defining requirements, coding, and delivering new functionality to be rolled out to clients.
Defining requirements for and/or coding new internal analytical capabilities (eg. Balance and Profit and Loss monitoring)
Building relationships with clients, the global Prime team, internal business management, and support partners.
Solving issues for clients and ensuring the client experience is balanced with risk and other financial attribute allocation.
Building new statistical models and Machine Learning algorithms that drive Marketing Customer Experience, Credit Valuations, and Operations programs
Leading data science projects from concept to deployment for delivering substantial business value through the application of new-age AI-driven methodologies.
Utilizing Machine Learning algorithms (k-Means, Random Forest, Gradient Boosting, etc.) and using programming languages such as Python
Environment - Python, scipy, Machine Learning,Pandas, scikit-learn, matplotlib, k-Means, Random Forest, Gradient Boosting, SQL, Tableau
NTT DATA, India Feb 2013- Jan 2015
Software Engineer
Responsibilities
Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
Supervised data collection and reporting. Ensured relevant data is collected at designated stages, entered into appropriate database(s) and reported appropriately.
Extensively worked on Data Acquisition and Data Integration of the source data from Data stage to Talend.
Performed Data analysis on list of tables that are in-scope of the project and determined the source and target at table/column level for the Data Mapping using SQL SSIS.
Dealt with descriptive statistical analysis like customer profiling, content classifications and categorical data analysis from databases.
Utilized SQL testing and scripting in Databases to integrate data for data analysis, development of data visualization for presentation.
Solving Production Issues and Bug fixing.
Good understanding of OOPs concept in Core java
Involved in the developing of PL/SQL script to fix production issues
Worked on the coding of Servlets and EJB communication
Worked on Maven for getting the latest jar files including common-collection.jar, common-logging.jar, etc from Apache.
Worked on analyzing code to fix UI issues like browser incompatibility, sound knowledge on debugging JSP pages using IE developer tool
Used IBM RSA as an IDE for application development.
Developed code for obtaining bean references in the Spring IOC framework.
Used defect tracker to track all the QA and Production issues
Handled production support of the application
Environment: Windows XP, Java 5.0, Eclipse 3.3, Tomcat 5.5, JSP, Servlets, Oracle PL SQL, Jira, caliber, WebLogic, unit, HTML, JDBC
Education:
Bachelor of Engineering [Electronics], VTU,2012
Certifications:
Oracle Certified Professional, Java SE 6 Programmer [Mar 2013]
ML Masters Certification, INeuron.ai