Data Sales

Location:

St. Charles, MO

Salary:

125

Posted:

March 21, 2019

Contact this candidate

Resume:

VENKATA KANDALA

+1-210-***-**** **********@*****.***

Professional Summary:

Well-rounded Data Scientist with around 6 years of experience in, Statistical Modeling, Machine Learning, Data Mining, Database System and Data Visualization.

Significant industry experience and domain knowledge in Healthcare, Retail, Telecom, and Banking industries.

Expertise in Python (2.x/3.x) programming with multiple packages including NumPy, Pandas, SciPy and Scikit-learn.

Strong business judgment and ability to take ambiguous problems and solve them in a structured, hypothesis-driven, and data-supported way.

Hands on experience in implementing LDA, Naïve Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, Neural Networks, Principle Component Analysis and good knowledge on Recommender Systems.

Experienced in Machine Learning Classification Algorithms like Logistic Regression, K-NN, SVM, Kernel SVM, Naive Bayes, and Decision Tree.

Experience in tuning algorithms using methods such as Grid Search, Randomized Search, K-Fold Cross Validation and Error Analysis.

Worked with outlier analysis with various methods like Z-Score value analysis, Liner regression, Dbscan (Density Based Spatial Clustering of Applications with Noise) and Isolation forest

Also worked with several boosting methodologies like ADA Boost, Gradient Boosting and XGBoost.

Implemented various statistical tests like ANOVA, A/B testing, Z-Test, T-Test for various business cases.

Implemented and analyzed RNN based approaches for automatically predicting implicit relations in text. The disclosure relation has potential applications in NLP tasks like Text Parsing, Text Analytics, Text Summarization, Conversational systems.

Worked with various text analytics or Word Embedding libraries like Word2Vec, Count Vectorizer, GloVe, LDA etc.

Solid knowledge and experience in Deep Learning techniques including Feedforward Neural Network, ANN by using Theano and Keras, Convolutional Neural Network (CNN), Recursive Neural Network (RNN).

Worked with numerous data visualization tools in python like matplotlib, seaborn, ggplot, pygal

Worked and extracted data from various database sources like Oracle, SQL Server, DB2, MongoDB and Teradata.

Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, MapReduce concepts, and ecosystems including Hive and Pig.

Willing to relocate: Anywhere

PROFESSIONAL EXPERIENCE:

Centene Corporation, Washington DC 2018 April - Present

Role: Data Scientist

Description: For more than 30 years, Centene has operated government-sponsored healthcare programs that save money and improve medical outcomes. Centene deliver results for our stakeholders, including state governments, members, healthcare providers, individuals and families, and other healthcare and commercial organizations. The job involves creating statistical machine learning models for implementing Customer Churn, Ticket routing techniques, invoice premium predictions and claim classification.

Responsibilities:

Collaborated with other departments to collect and understand client business requirements.

Collaborated with Data Engineers to gathered business requirements and filtered the data according to project requirements.

Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files, SQL Server 2005 with high volume data.

Congregated data from multiple sources and performed resampling to handle the issue of imbalanced data.

Treated missing values and outliers with several techniques Boxplots, Z-Score and DB Scan.

Explored and visualized the data to check the pattern, distribution, descriptive statistic, and correlation using Python, Matplotlib, and Seaborn.

Applied different dimensionality reduction techniques like principle component analysis (PCA) and t-stochastic neighborhood embedding(t-SNE) on feature matrix.

Worked with various customer analytics such as Segmenting the customers, Product Recommendations and NLP Tasks.

Worked with Clustering algorithms like K-Means, K-Means++, DBSCAN and Agglomerative Hierarchical Clustering to target specific group of customers to generate profitable revenue.

Used several Classification algorithms including Decision Trees, Naïve Bayes, SVM, KNN.

Using graphical packages produced ROC Curve to visually represent True Positive Rate versus False Positive Rate. Equally produced visualization of Precision Recall Curve for Area under the Curve.

Used Market Basket Analysis, association rules analysis to identified patterns, data quality issues and leveraged insights.

Improved model’s accuracy by using Gradient Boosting technique like Light GBM and gained around 82% accuracy with Random Forest and 77% with Logistic Regression.

Used K-fold cross validation technique to increase the model performance and worked with hyper parameter tuning methods like Grid Search.

Worked with visualization tools like Tableau, Cognos and MicroStrategy to create business reports for higher management and used Python visualization libraries like Seaborn, Matplotlib and ggplot depending on business requirements.

Staples, MA 2017 Jan – 2018 April

Role: Data Scientist

Description: Staples helps the world work better with work solutions that deliver industry-leading products, services and expertise across office supplies, facilities, breakroom, furniture, technology, promotional products, and print & marketing services. We dealt with small and medium business people by predicting customer life time value modeling (CLV), Customer Segmentation, Product Recommendation and Time Series Forecasting and Market Mix Modelling.

Responsibilities:

Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.

Performed Data Cleaning, features scaling, features selection using Pandas, NumPy and scikit-learn packages in python.

Used Matplotlib and Seaborn libraries in Python to visualize the data for detecting outliers, missing values and interpreting variables.

Reduced dimensionality of the dataset using Principal Component Analysis (PCA) and feature importance ranked by tree-based classifiers.

Conventionally designed Fuzzy Logics and implemented statistical tests including Hypothesis testing, ANOVA, Chi-square test to verify models' significance

Used Pareto/NBD to find the purchase counts with customer lifetime and also used Gamma-Gamma as an extension to Pareto/NBD to find the Monetary Value.

Worked with forecasting models such as ARIMAX, VARMAX, SARIMAX and Holt-Winters with exponential smoothing for various tasks which is been given by higher management.

Predicted the likelihood of customer attrition by developing classification models based on customer attributes like customer size, revenue, type of industry, competitor products and growth rates etc.

The models deployed in production environment helped detect churn in advance and aided sales/marketing teams plan for various retention strategies in advance like price discounts, custom licensing plans etc.

It also helped business to establish appropriate marketing strategies based on customer values.

Developed Tableau data visualization using Cross tabs, Heat maps, Box and Whisker charts, Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart.

Worked closely to Hadoop ecosystem components like Hadoop, MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Flume including their installation and configuration.

Union Bank of India, Hyderabad 2015 May – 2016 Dec

Role: Data Scientist

Description: Union Bank has been playing a very proactive role in the economic growth of India and it extends credit for the requirements of different sectors of economy. Industries, exports, trading, agriculture, infrastructure and the individual segments. We worked on various projects which handles customer analytics, Credit Risk analysis and assessing risks associated with loans like identify and prevent fraudulent loans, identify and prevent fraud detection for transactions.

Responsibilities:

Compiled data from various sources public and private databases to perform complex analysis and data manipulation for actionable results.

Applied concepts of probability, distribution, and statistical inference on the given dataset to unearth interesting findings using comparison, T-test, F-test, R-squared, P-value etc.

Applied linear regression, multiple regression, ordinary least square method, mean-variance, the theory of large numbers, logistic regression, dummy variable, residuals, Poisson distribution, Naive Bayes, fitting function etc to data with help of Scikit, SciPy, NumPy and Pandas module of Python.

Applied Principal Component Analysis (PCA) based unsupervised technique to determine unusual VPN log-on time.

Also created classification model using Logistic Regression, Random Forests to classify dependent variable into two classes which are risky and okay.

Used F-Score, Precision, recall evaluating model performance.

Built user behavior models for finding activity patterns and evaluating risk scores for every transaction using historic data to train the supervised learning models such as Decision trees, Random Forests and SVM.

Real time analysis of customers financial profile and providing recommendation for financial products best suited.

Collected historical data and third-party data from different data sources and performed data integration using Alteryx.

Forecasted demand for loans and interest rates using Time Series analysis like ARIMAX, VARMAX and Holt-Winters.

Obtained better predictive performance of 81% accuracy using ensemble methods like Bootstrap aggregation (Bagging) and Boosting (Light GBM, Gradient)

Tested complex ETL mappings and sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.

Developed visualizations and dashboards using ggplot, Tableau.

Prepared and presented data quality report to stakeholders to give understanding of data.

Manthan Software Services Pvt. Ltd, Bangalore 2014 Jan – 2015 May

Role: Machine Learning Engineer

Description: Manthan’s mission is to make the most sophisticated analytics products intuitive for the clients. Manthan do that by integrating decision sciences, advanced math and artificial intelligence into the way businesses run. I worked for various in-house projects which handles customer analytics, NLP tasks, OCR models etc.

Responsibilities:

Developed various machine learning applications using python for several clients.

Gathered data from different sources, designed SSIS package to perform extract, transform and load (ETL) data across different sources and validate the data from the database.

Worked on data cleaning and ensure data quality, consistency, integrity using Python (Numpy, Pandas).

Participated in feature engineering such as feature intersection generating for adding potential powerful features, plotting feature correlation matrix for feature selection and reducing, feature normalization for ease to implement machine algorithms, Principal Component Analysis (PCA) for dimensionality reduction and label encoding with Scikit-learn preprocessing

Worked with several use cases like campaign sales analysis, forecasting sales, KPI analysis and NLP models.

Worked with Clustering algorithms to target specific group of customers to generate profitable revenue.

For one client we implemented Association rules analysis, market basket analysis in order to identify patterns and leveraged insights

Worked with OpenCV, pyTessaract and other image processing techniques to analyze the text content from the scanned documents.

Performed image classification and object detection by using Convolutional Neural Networks (CNN).

Used Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) with TensorFlow for creating text classification

Worked with word embedding techniques like Word2Vec, GloVe for sentiment analysis and text classifications.

Worked with text to vector representation methods including Counter Vectorizer, Tf-idf and Latent Dirichlet Allocation (LDA) for topic modelling.

Sutherland Global Services, Hyderabad 2013 April – 2014 Jan

Role: ETL Developer

Description: Sutherland builds processes for the digital age by combining the speed and insight of design thinking with the scale and accuracy of data analytics. Sutherland has customers across industries like Financial services to Healthcare. My role is to assist Analytics department for the data extraction and cleaning as a data preprocessing steps to build models.

Responsibilities:

Involved with Business Analysts team in requirements gathering and in preparing functional specifications and changing them into technical specifications.

Involved in Data mapping specifications to create and execute detailed system test plans. The data mapping specifies what data will be extracted from an internal data warehouse, transformed and sent to an external entity.

Managed full SDLC processes involving requirements management, workflow analysis, source data analysis, data mapping, metadata management, data quality, testing strategy and maintenance of the model.

Involved in extensive DATA validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.

Designed SSIS packages to extract, transform and load existing data into SQL Server, used lots of components of SSIS, such as Pivot Transformation, Fuzzy Lookup, Merge, Merge Join, Data Conversion, Row Count, Sort, Derived Columns, Conditional Split, Execute SQL Task, Data Flow Task and Execute Package Task.

Created SSIS Packages that involved dealing with different source formats (flat files, Excel, XML, OLE DB) and different destination formats

Debugged and troubleshot the ETL packages by using a breakpoint, analyzing the process, catching error information by SQL command in SSIS

Developed SQL queries in SQL Server management studio, Toad and generated complex reports forth end users.

Automated and scheduled recurring reporting processes using UNIX shell scripting and Teradata utilities such as MLOAD, BTEQ, and Fast Load

Experience with Perl.

Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and Teradata.

EDUCATION

Master of Science in Information Technology Texas A&M Commerce, Texas

Bachelor of Degree in Statistics Osmania University, Hyderabad, India.

Contact this candidate