Data Scientist

Location:

Palatine, IL

Posted:

February 17, 2022

Contact this candidate

Resume:

Divya Madamanchi

Data Scientist

Chicago, IL

+1-847-***-****

****************@*****.***

Career Summary:

Certiﬁed Data Scientist with a total of 9+ years of IT experience and close to 4 years of relevant Data Science experience. Skilled in Machine Learning, Deep Learning, Natural Language Processing, Statistics and Probability in Decision Modeling and intuition of algorithms.

Professional Summary:

• Proficient in Statistical Methodologies including Confidence Intervals, Hypothetical Testing, Statistical Tests and Statistical modeling including Linear and Logistic Regression, Naïve Bayes, L1 and L2 Regularization, Dimensionality reduction using Principal Component Analysis.

• Proficient in Machine Learning algorithms and Predictive Modeling including Regression Models, Decision Trees, Random Forests, Naïve Bayes Classifier, SVM, KNN, Ensemble Models.

• Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling, testing and validation and data visualization.

• Engineering skills in Clustering techniques, Neural Networks – ANNs, Deep Learning Models including Convolutional Neural Networks, Recurrent Neural Networks, LSTM, BERT etc.,

• Expertise in Text Mining methods and Natural Language Processing including TF-IDF, Sentiment Analysis, Topic Modeling, Document Summarization etc.,

• Experimental design experience using packages like Keras, Pytorch, Scikit-learn, Matplotlib and seaborn.

• Proficiency in Programming Languages: Python, R, SQL, Scala.

• Experience in IDE Tools: PyCharm, Spyder and end to end machine learning platforms like Dataiku DSS.

• Knowledge and experience in agile environments such as Scrum and version control tool GitHub.

• Experience in IBM DataStage (8.5, 9.1) versions to perform ETL operations on data.

• Experience in using Transformer, Aggregator, Look-up, Filter, Join Stage, Sequential file stage, Funnel Stage and Oracle stages in DataStage.

• Experience in working with Server jobs, Parallel jobs, Troubleshooting and Performance tuning.

• Experience in Robotic Process Automation using Blue Prism.

• Good understanding of business practices in Insurance, Banking & Finance domains. Education:

Bachelor of Technology in Computer Science Engineering, Nagarjuna University, India Post-Graduation Program in Data Science, INSOFE, India Work Experience:

L3 Data Scientist Soothsayer Analytics July 2020 – Jan 2022 Duplicate Parts Detection in Inventory:

• Used Dataiku platform for building a workﬂow for “Duplicate Parts Detection”.

• Used Dataiku’s visual recipes to perform various data manipulation operations like ﬁltering, aggregation, merging etc.,

• Read inventory data from HDFS and performed data cleaning and data enrichment activities.

• Used fuzzywuzzy library to get the string similarity value for the product descriptions of the material pairs.

• Developed PySpark scripts which can identify duplicate parts at class and segment levels.

• Used Cloudera Data Science Workbench (CDSW) to execute the pyspark jobs for all the segments and write the duplicates information to HDFS.

• Used Dataiku workﬂow to read the duplicates information from HDFS and to write the material pairs results to MSSQL table for UI and dashboarding purpose.

Customer Lifetime Value Prediction:

• Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.

• Designed rich data visualizations to model data into human-readable form with Tableau and using python modules

(Seaborn, Matplotlib).

• Performed data pre-processing to handle noise and other data discrepancies.

• Performed data imputation using Scikit-learn package in Python.

• Used Python 3.X (NumPy, Pandas, SciPy, scikit-learn, seaborn) to develop variety of models and algorithms for analytic purposes.

• Developed and implemented predictive models using statistical and machine learning algorithms such as linear regression, Decision Trees, Random Forests, XGBoost.

• Applied customer segmentation with K-Means Clustering.

• Used MAE, RMSE to evaluate different Model performance.

• Performed model Tuning by adjusting the Hyper parameters, K-Fold cross validation which improved the model performance.

• Deployed the predictive model into AWS EC2.

Data Scientist Tata Consultancy Services May 2017 – Nov 2018 Gateway Fraud Detection:

• Tackled highly imbalanced Fraud dataset using oversampling with SMOTE.

• Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.

• Utilized techniques like Histogram, bar plot, Pie-Chart, Scatter plot, Box plots to determine the condition of the data.

• Performed Feature Extraction from the existing datetime columns.

• Performed data preprocessing steps like binning and categorical encoding.

• Built classification models for fraud detection using Logistic Regression, KNN, Random Forest, SVM, Gradient Boosting.

• Implemented autoencoders in Keras on non-fraud data to learn the features of non-fraud transactions and classiﬁed fraud and non-fraud records based on their respective reconstruction error.

• Used F-Score, AUC/ROC, Confusion Matrix to evaluate the performance of each model. RPA Blue Prism Developer Tata Consultancy Services Sep 2015 – Apr 2017

• Provided End to End solution for Business process using Robotic Process Automation using Blue Prism.

• Worked with Business Analysts in identifying and defining requirements.

• Worked closely with Process SME’s/Business users to translate process into Process Definition Document (PDD)

& Solution Design Document (SDD).

• Developed the object and project workflow as per the Functional requirements.

• Involved in managing BOT's workloads using Blue Prism Work Queues.

• Used Blue Prism MAPIEx add on library for manipulating mailboxes.

• Troubleshoot the Blue Prism environment through Control Room Queue management.

• Analyzed the feasibility of existing process suitable for automation.

• Created Object Design Instruction for clear understanding of the activities performed during development.

• Provided Exception handling at every possible scenario for a robust error-free development. DataStage Developer Tata Consultancy Services Sep 2011 – Aug 2015

• Involved in business requirements gathering and preparing architecture design documents.

• Extensively used DataStage Designer to develop various Parallel jobs to extract, cleanse, transform, integrate, and load data into EDW and then to Data Mart.

• Extensively worked on Parallel Jobs using Various Stages like Sequential File, CFF Stage, Data set, Lookup, Join, Aggregator, Remove Duplicate, Modify, Filter, Funnel, Copy, Surrogate Key Generator, Sort, Store Procedure Stage, Transformer, Change Capture, Oracle Connector/Oracle Enterprise, and ODBC Stage.

• Used Parameter Sets, Environment Variables, Stage Variables and Routines for developing Parameter Driven Jobs.

• Implemented logic for Slowly Changing Dimensions.

• Created Test Plans and Test Scripts for unit testing and integration testing. Voluntary Projects in Data Science

Hate speech Detection in Twitter Tweets:

• Performed data cleaning and preprocessed text data.

• Extracted features using TﬁdfVectorizer and built Logistic Regression, SVM & Naive Bayes.

• Used pre-trained glove embeddings and extracted features and trained bidirectional LSTM to classify tweets.

• Used pre-trained BERT base-cased model to obtain contextual word embeddings and ﬁne-tuned BERT. Traffic Signs Image Classification:

• Used Transfer Learning to classify trafﬁc sign images.

• Preprocessed text data for BERT (tokenization, attention masks and padding) and built PyTorch dataset.

• Used Transfer Learning to build a Sentiment Classifier using the Transformers library by Hugging Face.

Contact this candidate