Vamshi Krishna
PH: 940-***-****
Email: **************@*****.***
Senior Data Scientist
Professional Summary:
Data Scientist with around 5 years of experience in data science, machine learning, and data analytics, delivering innovative, data-driven solutions across diverse industries.
Expertise in data collection, cleaning, and preprocessing to ensure quality and consistency in datasets for model development.
Skilled in exploratory data analysis (EDA) and uncovering actionable insights through advanced statistical techniques.
Proficient in building, training, and optimizing predictive and prescriptive models for applications such as fraud detection, risk assessment, and customer segmentation.
Experienced in developing and deploying algorithms for recommendation systems, anomaly detection, and operational optimization.
Hands-on expertise in programming languages like Python, R, and SQL, as well as machine learning frameworks such as TensorFlow, Scikit-learn, and PyTorch.
• Experience in PySpark, SQL with expertise in developing and optimizing distributed data processing pipelines, performing complex transformations, and implementing scalable ETL.
Adept at creating interactive dashboards and data visualizations using Tableau, Power BI, Matplotlib, and Seaborn to communicate findings effectively to stakeholders.
• Expertise in performance tuning and optimization of Spark jobs, SQL queries, and Hadoop workflows, utilizing partitioning, caching, and adaptive query execution for improved efficiency.
Solid foundation in statistics, linear algebra, and calculus, ensuring strong model development and problem-solving capabilities.
Expertise in scientific modelling and data-driven simulations, utilizing Python, NumPy, SciPy, and Stats models to build scalable analytical frameworks for solving complex business and research problems.
• Experienced in monitoring model performance, detecting drift, and retraining workflows to maintain accuracy in production environments.
Collaborative team player with strong communication skills, translating technical insights into actionable business strategies
WORK EXPERIENCE: -
Role: Senior Data Scientist
Client: Comcast (Philadelphia, PA)
From March 2024- Till date
Responsibilities:
• Led retention chat analysis, revealing a 3.2% lower churn rate for chat users, driving strategic investment in chat capabilities. Conducted agent demand forecasting, expanding retention chat agents from 40 to 200, increasing session handling from 28% to full coverage, and optimizing shifts via A/B testing.
• Identified 17K target customers (from 1M) with low IVR containment and no self-serve history in 12 months. Conducted a month-long self-serve trial, measuring churn, tNPS, and repeat callers, achieving a 5% increase in digital engagement within 30 days.
• Developed a Decision Tree Classifier to target customers with less than 20% written share, reducing call volume by promoting chat/self-serve usage. Utilized LLMs on 50M+ call/chat transcripts, cutting manual review time by 60% (10 to 4 hours per batch) and improving retention interventions.
• Automated text classification on 500K+ transcripts/month, improving response time by 35% and optimizing agent strategies.
• Conducted fiber footprint analysis, identifying 46% of regions with fiber competition, with 34% in Tier 1 areas to guide marketing strategies. Designed strategies to shift customers from voice to written channels, reducing call volume and promoting digital engagement. Role: Data Scientist
Client: BRICKSIMPLE LLC (Philadelphia, PA)
From March 2023- Feb 2024
Responsibilities:
• AI Chatbot: Developed a RAG-based chatbot using OpenAI API to automate responses for visitors' inquiries on BrickSimple's website. The solution enabled accurate and dynamic engagement with prospective clients by leveraging the company’s past work data, significantly enhancing user interaction and information accessibility. Regularly interacted with directors and key stakeholders to ensure project alignment with business objectives.
• Real Estate Document Analyzer: Managed and engineered an innovative OCR-driven system leveraging named entity recognition (NER) and BERT language models to precisely extract critical data from real estate documents. Achieved an accuracy rate of over 92%, resulting in a substantial reduction in human error and cost associated with manual data extraction processes.
• Fraudulent Insurance Claims Detection: Engineered a big data solution using pyspark and unsupervised learning techniques, leveraging the k-means algorithm to cluster user information for detecting fraudulent insurance claims. Conducted cluster analysis to identify patterns close to known fraudulent data, enabling accurate labelling of suspicious clusters. This solution significantly enhanced the business's ability to filter out fraudulent claims from bots and fake users, leading to improved detection accuracy and a reduction in fraud-related losses.
• Sales Forecasting Model: Led the design and implementation of a Long Short-Term Memory deep learning model for sales forecasting across multiple SKUs. This initiative led to a notable improvement in inventory management efficiency and optimization, driving substantial cost savings and revenue growth.
• Medical Document Processing: Led the project development and utilized OpenAI (GPT-3) to develop a document classifier for automated classification of medical documents. Achieved an accuracy of over 88%. This solution significantly streamlined workflow processes, enhancing efficiency and reducing costs in healthcare settings.
• IVR Telephonic System: Conceptualized and created an interactive voice response (IVR) system for appointment management in a dentist's office using Google Dialog flow. Successfully streamlined operations, and minimized errors, resulting in improved operational outcomes and heightened patient satisfaction.
• Responsibilities:
• Managed end-to-end machine learning lifecycle, including data collection, preprocessing, model development, and deployment in production, with extensive experience working on big data and large datasets to optimize model performance and scalability.
• Engaged with clients and senior stakeholders to align machine learning projects with strategic business objectives.
• Served as a team lead, managing a team of 2-3 people, providing guidance and project oversight to ensure successful delivery of AI solutions.
Company: Envestnet Yodlee (Bangalore, India)
From July 2020 – Dec 2022
Role: Data Scientist
Responsibilities:
• Designed and implemented SQL-based workflows for large-scale data integration, improving retrieval times by 40%.
• Implemented incremental data processing using PySpark and Delta Lake to efficiently manage updates in large datasets and improve data freshness.
• Built and maintained highly scalable data lakes on cloud platforms (AWS S3, Azure Data Lake) using PySpark, ensuring efficient data storage and retrieval.
• Developed real-time data streaming pipelines using PySpark Structured Streaming and Kafka, enabling near real-time analytics for high-velocity data sources.
• Conducted performance tuning of Spark clusters using broadcast joins, shuffle optimizations, and adaptive query execution (AQE) for optimal resource utilization.
• Designed and implemented data quality frameworks using PySpark and SQL, ensuring data integrity, completeness, and consistency in large-scale data processing.
• Developed custom Python-based UDFs (User-Defined Functions) in PySpark to handle complex transformations and improve data processing flexibility.
• Implemented real-time data streaming solutions using Spark Streaming, Kafka, and Python, enabling low-latency processing of high-velocity data.
• Designed and optimized SQL-based queries on Hive and Spark SQL, ensuring efficient execution of analytical queries on large datasets.
• Utilized Dask and Pandas for parallel processing of medium-sized datasets when Spark overhead was unnecessary, improving resource utilization.
• Developed data ingestion frameworks using Python, Spark, and Apache Airflow, automating end- to-end data pipeline orchestration.
• Integrated Spark with cloud data platforms (AWS EMR, Azure Databricks, Google Big Query) for high-performance distributed computing.
• Automated ETL pipelines using Talend and Azure Data Factory, ensuring seamless and scalable data preprocessing.
• Built and optimized time-series forecasting models for demand planning, achieving a 30% improvement in resource allocation accuracy.
• Developed interactive dashboards using Power BI to communicate key insights and operational metrics effectively.
• Deployed NLP pipelines for text classification and sentiment analysis, achieving over 90% accuracy in production environments.
• Designed and implemented time series forecasting models using ARIMA, SARIMA, Prophet, and LSTMs, improving demand forecasting accuracy by 30%. Education:
Bachelors in computer science from Lingaya’s Vidyapeeth University.
Masters in advanced data Analytics from University Of North Texas.