Professional Summary and Skills
Over *+ years of experience as a Data Analyst in customer and market analytics to derive data-driven decision
Experience in building and automating dashboards to monitor and track KPIs BI tools like Tableau, Power BI, Python
Very efficient in interpreting and communicating large and complex data with exploratory data analysis, descriptive statistics, data modeling, visualization and presentations on customer analytics, behavior, web and market analytics
Proficient in data analytics and modeling using Python package- NumPy, Pandas, Stats Models, SciPy, Scikit-Learn
Experience in statistical modeling methods- Linear Regression, Discriminant Analysis, Feature Selection, PCA
Experience in building machine learning models- Linear model, SVM, Clustering, KNN, K-mean, Neural Networks
Good knowledge in database design using PL/SQL- stored procedures, Functions, Triggers with MySQL, PostgreSQL
Experience in designing, developing, modeling and managing Relational databases using ERwin, Snowflakes
Proficient in automating data collection, translation and store using ETL pipeline tools like Talend, Informatica
Experience in coding with Python, R, SPSS, SAS, VBA, PL/SQL, HTML
National Science Foundation Fresno, CA
Data Analyst 01/2018 – Present
Integrated data from 25 projects and established MySQL data warehouse over 1 million rows.
Built Logistic Regression model to predict operational cost overrun, utilized t-test, ANOVA and decision tree techniques for feature selection. Identified 6 out of 50 variables impacting operational cost overrun.
Automated data collection using SQL, preprocessing and trained Logistic Regression model to predict the cost overrun and Tableau dashboards to track KPIs of cost overrun. Saved 11% in the operational cost overrun.
Toodledo Inc San Ramon, CA
Data Analyst 05/2017 – 02/2018
Responsible for data mining, analysis and modeling using Python, SQL, Tableau, Talend, Excel, Teradata
Optimized the processing time for data collection, manipulation and report processing from hours to minutes using Python script, Talend for ETL pipeline and Tableau for visualization.
Combined social data and ad-platform data to draw insights on user groups, improved the targeted marketing by 34%
Conducted A/B testing in defining the scope of tests, choosing the right success metrics, ensuring tracking, and analyzing results
Web Ninjaz Technology Delhi, India
Data Analyst 05/2015 – 08/2016
Responsible for data mining, analysis and generating reports using SQL, Talend, Tableau, Hadoop, HBase
Automated data to various marketing channels using Talend, SQL, and Python to aggregate data, saved 14 hours weekly.
Conducted hypothesis testing and A/B testing through web analytics tools like Google Analytics, Facebook analytics
Performed ad-hoc tasks to generated reports and forecasts on web traffic, revenue and sales to marketing and sales team.
Collaborated with data engineer team for importing and exporting from RDBMS to HDFS using Hadoop, HBase, Hive
California State University-Fresno Fresno, CA
Master of Science, Major: Computer Science 08/2016 – 12/2018
Jawaharlal Nehru Technological University Hyderabad, India
Bachelor of Technology, Major: Computer Science 08/2012 – 05/2016
Personal Projects (Portfolio - Click Here )
Customer Segmentation (Demo: Link )
Developed customer segmentation on e-commerce customers data using MySQL to extract and integrate, Python to preprocess, visualize and trained using unsupervised learning techniques like KMeans and Gaussian mixture models.
The clusters helped simplify complex patterns of customer behavior, purchase patterns, web behavior to set strategies for customer retention, acquisition, spend and loyalty.
Customer Lifetime Value (Demo: Link )
Developed a predictive model using XGBoost to estimate lifetime value of each customer of an e-commerce business.
Clustered the customers based on the lifetime value to investigate the variables impacting customer LTV, through these insights derived strategies to increase the customer lifetime value, retention and churn.
Fraudulent Transaction Detection (Demo: Link )
Developed classifier on credit card transactional (time-series) data to detect the fraudulent transaction using Logistic Regression algorithm
Implemented sampling techniques like oversampling, under-sampling methods and Gaussian methods to add noise to reduce the imbalance in the data. Tuned the model using regularization techniques, and improved the model performance from 4% to 0.44% misclassifications on testing data
Customer Churn Analysis (Demo: Link )
Analyzed customer attributes like login patterns, spend time, likes, dislikes, purchases, complaints, etc. Built a classifier trained over XGBoost algorithm to predict the customer churn
Improved the model performance with F-score of 0.62 to 0.82 by implementing sampling techniques like oversampling, and under-sampling methods to reduce imbalance in the data; correlation analysis and t-test for feature selection
Price Forecasting (Demo: Link )
Implemented the ARIMA model to forecast oil prices. Conducted Augmented Dickey-Fuller test to evaluate the stationarity in the data. Reduced the stationarity using data transformations, decomposition technique to analyze Residual, Trend in the time-series data.
Tuned model parameters (p, d, q for ARIMA) using walk-forward validation techniques and PCAF, ACF analysis
Marketing campaign Analysis (Demo: Link )
Devised a predictive model to improve the targeted marketing strategy by analyzing the past campaign data of Portuguese bank customer data.
Used correlation analysis and t-test for feature selection. Trained the data over Adaboost algorithm and tuned the parameter to improve the performance of mean AUC score 0.81. This model helped to interpret the attributes and helped to draft the strategy and channelize the campaign to improve the customer base.