Naresh Abburi
https://www.linkedin.com/in/nareshabburi/
Advanced Analytics Data Science Products Analytics Engineering AI Solutions
Summary
Principal Data Scientist with over 13 years of experience in advanced analytics, data science, machine learning, reporting, & analytics engineering.
Proven expertise in designing and deploying machine learning models and AI solutions to drive strategic value creation and revenue growth across diverse industries. Adept at converting complex business problems into scalable machine-learning solutions.
Experienced in applying both supervised and unsupervised machine learning techniques, including regression, classification, segmentation, and clustering. Proficient in implementing algorithms such as Linear and Logistic Regression, Decision Trees, Random Forests, K-means clustering, and DBSCAN.
Utilizing techniques such as time series forecasting (ARIMA), Uplift Modeling, and natural language processing.
Proficient in utilizing evaluation and validation techniques to ensure model accuracy and reliability. Experienced in applying methods such as cross-validation, confusion matrix, ROC-AUC, precision-recall, and F1 score.
Skilled in performing hyperparameter tuning, model selection, and bias-variance tradeoff analysis to optimize model performance. Adept at using techniques like k-fold cross-validation and holdout validation to assess model generalization and prevent overfitting.
Proficient in designing and managing end-to-end data pipelines, ensuring reproducibility and scalability of machine learning experiments.
Proficient in a wide range of technologies including Python, PySpark, R, SQL, TensorFlow, and PyTorch.
Strong background in data architecture, ETL processes, developing automated data pipelines, and MLOps.
Skilled in data visualization with Tableau, Looker, and Qlik, and familiar with other cloud visualization tools such as Looker & SageMaker.
Proficient in leveraging large-scale language models (LLMs) such as GPT, XLNet, LLaMa, and BERT for diverse applications including text summarization, chatbot development, and natural language understanding.
Experienced in designing and analyzing experiments (A/B testing, multivariate testing) to drive data-driven decision-making. Skilled in applying causal inference techniques to understand the impact of interventions and inform strategy.
Experienced in implementing data governance frameworks and ensuring compliance with data privacy regulations like GDPR and CCPA. Skilled in data quality management, data lineage, and data cataloging to maintain high data integrity and security standards.
Tech Stack:
Model Building: Supervised and Unsupervised Learning, Ensemble Methods (XGBoost, Random Forest), Clustering Algorithms (K-Means, DBSCAN), Segmentation Techniques, Gradient Boosting
Programming and Scripting: Proficient in SQL, Python for Machine Learning and Data Analysis, Familiarity with PyTorch for Deep Learning, Scikit-Learn, PyCaret for Automated Machine Learning, Transformers, & LLMs, PySprak for data engineering.
Data Visualization and BI Tools: Tableau, Looker, Qlik, & Power BI
Cloud Platforms: Google Cloud Platform (BigQuery, Dataflow, Vertex AI), Amazon Web Services (Sagemaker, EC2, S3, Lambda), Microsoft Azure, Databricks.
Data Processing and ETL: Apache Spark for Big Data Processing, Hadoop Ecosystem, ETL Pipeline Development
Statistical Analysis and Testing: A/B Testing, Hypothesis Testing, Statistical Significance Analysis
Continuous Learning: Docker, Kubernetes, Parallel Computing, LLMs, LLMOps, RAG, & Vector Databases
Data Leadership: Predictive Analytics, Data Science, Machine Learning, Artificial Intelligence, ML Ops, Data Asset Building, Dashboards, Reports, Feature Engineering, & Storytelling
Management: Decision Making, Project Management, Program Management, Workflows, Scenario Planning, Resource Allocation, Leadership, Vision, & Strategy
Experience
CVS Health, New York City Oct 22 - Present
Principal Data Scientist, Advanced Analytics
●Uplift Modeling for Marketing:
Developed and trained the uplift model to target the most efficient segment of customers through our direct mail & digital campaigns
Utilized A/B testing and causal inference methods to validate the effectiveness of the uplift model.
Achieved a 30% increase in marketing campaign response rates by leveraging the uplift model through direct mail campaigns.
Forecasting Model & Automation:
Developed 4,000 lines of code in Google Vertex AI notebook that generates Medicare forecasting in as short as 30min.
Worked with marketing, sales, & product stakeholders to translate complex marketing strategy questions into actionable analytics queries, facilitating data-driven project planning.
The forecasting model assisted the marketing department in experimenting with several marketing budget allocations by various media tactics to identify the optimum media mix for the maximum number of subscriber acquisitions.
Marketing stakeholders achieved a 20% decrease in the time it takes for budget allocation exercise and a 200% increase in decision-making efficiency through an automated forecasting model
Data Asset Building:
Worked with marketing, sales, & product stakeholders to translate business logic into data attribute definitions to create a successful integrated data model
Architected and implemented an integrated data platform on Google Cloud Platform (GCP) using BigQuery.
Designed and built ETL pipelines using Python and SQL to ingest and transform large datasets.
Developed marketing performance dashboards in Looker, providing real-time insights on data quality.
Data Visualization Building: Lead the end-to-end development of the tableau dashboard with 150+ metrics to support marketing intelligence for strategic planning
Employed advanced data visualization techniques to highlight key performance indicators (KPIs).
Integrated multiple data sources using Tableau Prep for seamless data blending.
Comcast, Philadelphia Mar 2019 - Sep 2022
Lead Data Scientist, Product Analytics
Release Hound - ML Product (Random Forest):
Built an ML model-based product and addressed the challenge of evaluating firmware performance improvements and detecting bugs/issues during firmware upgrades for 30MM customers, ensuring efficient tracking and prioritization.
Developed the Random Forest model to identify significant differences between Last Known Good (LKG) and new firmware versions, capturing detailed feature changes.
Conducted extensive model validation using cross-validation, precision, recall, and F1-score metrics to ensure high accuracy in detecting firmware performance issues.
Implemented a Go/No Go decision framework based on key performance indicators and standard deviation thresholds, determining the viability of firmware releases to help decide whether to release the firmware to the larger populations.
Utilized Databricks MLFlow for scalable data processing and collaborative development in notebooks, optimizing the analysis and validation of large-scale firmware performance data.
Segmentation for Product Feature Launch/Development Prioritization (K-Means)
Developed a segmentation algorithm to divide the broad customer landscape into distinct segments based on behavioral attitudes, shared product experiences, and geographic and demographic characteristics, aiming to tailor firmware updates and reduce negative impacts on customer experience.
Gathered relevant data, including device usage and performance metrics, customer feedback, support call rates, firmware version histories, and geographic and demographic information of device owners.
Applied K-means clustering to partition the data into distinct clusters based on feature similarity, minimizing variance within each cluster and ensuring internal homogeneity and external heterogeneity.
Analyzed each segment to understand its defining characteristics, assessing firmware stability, comparing call and reboot rates, and identifying demographic patterns correlating with firmware performance.
Assisted in tailoring firmware rollouts based on segment analysis, prioritizing segments with fewer issues for early updates and focusing on troubleshooting for higher problem segments.
Established a feedback loop to continuously refine the segmentation model with new data, dynamically adjusting segments for optimal performance.
Utilized Databricks for end-to-end study development, results sharing, & deployment
Segmentation for Root Cause Analysis (Decision Tree):
Developed a Segment Analyzer & Visualizer to improve the firmware rollout triaging process by identifying segments or paths experiencing the most issues, such as frequent reboots, high call volumes, or poor WiFi signals.
Gathered data on device performance metrics, customer feedback, support call rates, and firmware version histories to inform the analysis.
Implemented a decision tree algorithm to provide a clear and interpretable analysis, identifying the causative path to the most significant problems encountered during firmware rollouts.
Analyzed decision tree outputs to pinpoint specific conditions leading to issues, enabling more targeted interventions and enhancing customer satisfaction.
Shifted from a reactive to a proactive approach in managing firmware updates, preemptively addressing potential issues in future updates based on decision tree insights.
Enhanced the efficiency of the firmware rollout process and improved product reliability, leading to a reduction in customer complaints and an overall improvement in user experience.
Product Performance Metrics Visualization:
Designed and developed interactive Tableau dashboards for product analytics, integrating multiple data sources to provide comprehensive insights into product performance and customer usage patterns (metrics that show the performance of internet & video products)
Implemented advanced data visualization techniques to highlight key performance indicators (KPIs), trends, and outliers, enabling stakeholders to make data-driven decisions and optimize product strategies.
Automated data extraction and transformation processes using SQL and Python, ensuring real-time data updates and accuracy in the dashboards, facilitating timely and actionable insights for product management teams.
●Project Management: Created product development workflows, project planning, & prioritization matrix to successfully develop highly visible customer-centric analytical/data products to assess the effectiveness of consumer products in terms of WiFi health, customer experience, and NPS scores.
●Led the development of APIs using Flask frameworks to support end product features, helping customers understand their WiFi quality.
●Ensured scalable and efficient API design and oversaw the deployment and management of microservices using containerization technologies like Docker and Kubernetes.
Customer Segmentation using AWS SageMaker:
●Experimented & Leveraged AWS SageMaker to develop, train, and deploy machine learning models, enhancing predictive analytics capabilities and supporting data-driven decision-making processes. Although, our team used Databricks more, we also have a few Unsupervised models that we built using AWS SageMaker to create Customer Segmentation.
Comcast, Philadelphia Mar 2015 - Mar 2019
Manager, Data Analytics & Reporting
Data Analysis and Interpretation: Design & develop intuitive business reports for mobile & accessory sales teams to measure the performance of ongoing promotional programs & marketing channels
National Subscriber Daily (NSD): Acted as the product owner for subscriber product insights, behavioral analytics, and forecasting for Comcast’s official daily subscriber report. This report, distributed to 2700+ top executives daily, provided critical insights for strategic decision-making.
Forecasting: Developed a forecasting model called PACE to project Comcast’s subscriber acquisition. Consulted with executives to help them understand the monthly volatility of subscriber acquisition.
Customer Lifetime Value (CLV): Served as the subject matter expert for the CLV product, projecting the future value of customer relationships. Automated the CLV scoring in Python, saving 20 manual hours weekly and providing more efficient and reliable insights.
Innovation: Created a simulation tool called RateCast POC in the 2017 Comcast Innovations Labs. RateCast helped build strategies to maximize revenue while mitigating churn
Product Analysis & Research Intern CreditXpert, Baltimore, MD SEP 2014 - DEC 2014
●Created detailed executive reports using SAS to analyze usage data for credit software. The reports helped make decisions on product offerings, aiming to increase usage.
Financial Models Developer – Team Lead Headstrong, India SEP 2011 - JUL 2013
●Conducted analysis of daily trade deals and settlements for PIMCO Ltd decision-making system
●Developed multiple reusable SQL procedures for financial models for daily trade reports
●Lead a team of four members in product development
IBM, India Sep 2008 - Jul 2011
Database Developer and Designer
●Participated in the implementation of hundreds of stored procedures for business reports
●Participated in the full design of the relational database and created application modules and stored procedures
●Worked on Oracle PL/SQL and UNIX Procedures Development
●Involved in full designing Relational Database and few application modules and procedures
●Developed many shell scripts for daily batch reports
Education
Certificate Program: Massachusetts Institute of Technology - Data Leadership - Transforming the Corporation’s Operations, Management, and Mindset to Leverage Data, AI, and Cloud Computing Nov 2023 - Feb 2024
Oklahoma State University, Stillwater, OK - Management Information Systems (Data & Predictive Analytics) GPA: 3.7 Aug 2013 - Mar 2015
JNTU, Hyderabad, India - Bachelors in Electrical & Electronics Engineering Mar 2004 - Mar 2008 GPA: 3.7
Certifications
●Generative AI Fundamentals - Databricks Jun 2024
●Machine Learning with Python - IBM Jan 2022
●AWS Certified Cloud Practitioner Dec 2021
●Google Analytics Certified Professional Aug 2014
●Dale carnegie Skills for Success Feb 2017
●SAS® Certified Base & Advanced Programmer for Machine Learning Dec 2013