Pradeep Gadanki
Lead Data Scientist
*******************@*****.*** Contact: 326-***-****
https://www.linkedin.com/in/pradeeepreddy
PROFESSIONAL SUMMARY
• Accomplished Lead Data Scientist with over 8+ years of progressive experience designing, developing, and deploying advanced machine learning and NLP solutions across various domains. Expertise spans end-to-end ML lifecycle management, including data engineering, feature store implementation, model training and scalable deployment.
• Designed and deployed Python 3.6–3.9 based ML and NLP models using PyTorch (1.4–1.8), TensorFlow (2.0), and Hugging Face Transformers, enabling domain-specific customer insights and predictive analytics in fintech and retail environments.
• Built reliable, scalable data workflows with Apache Airflow (1.10), Spark (2.4), and Databricks, automating feature extraction and data validation to support rapid model iteration and deployment.
• Architected feature stores leveraging Delta Lake on Azure Databricks to ensure consistent, auditable feature versioning, directly improving model accuracy and repeatability.
• Containerized AI services using Docker 19.x and orchestrated deployments on Kubernetes 1.18, enabling seamless scaling and environment parity for real-time model inference workloads.
• Developed secure RESTful microservices with OAuth2 authentication, translating complex ML models into accessible APIs that powered business-critical applications and decision automation.
• Applied explainability frameworks such as SHAP and LIME to complex models, facilitating compliance with regulatory standards and increasing stakeholder trust through transparent AI outcomes.
• Managed end-to-end model lifecycle with MLflow, including experiment tracking, model versioning, and automated governance checks, accelerating production readiness and reducing operational risk.
• Executed extensive data preprocessing and feature engineering using pandas, NumPy, and scikit-learn to improve model performance on large-scale structured and unstructured datasets.
• Spearheaded adoption of reproducible research practices by containerizing Jupyter notebooks with Docker and coupling with MLflow Projects, facilitating cross-team collaboration, faster iteration on experiments.
• Customized transformer architectures and fine-tuned pretrained models to enhance NLP-based sentiment analysis and recommendation engines, driving improved customer engagement and retention metrics.
• Leveraged cloud infrastructure on Azure ML and AWS (S3, EC2) to provision elastic compute and storage, optimizing cost and performance for training and inference pipelines in dynamic production environments.
• Implemented CI/CD automation using GitHub Actions and Azure DevOps, enabling rapid, reliable deployment cycles and monitoring for ML solutions integrated with broader software platforms.
• Delivered domain-focused analytics for fraud detection, credit risk scoring, demand forecasting, and SaaS user behavior, aligning ML solutions tightly with business KPIs and operational goals.
• Collaborated with product managers, engineers, and business teams to translate data insights into actionable strategies, driving measurable improvements in customer acquisition and operational efficiency.
• Integrated Kafka streaming pipelines for real-time data ingestion and model updates, enhancing responsiveness and accuracy of predictive systems in fast-paced financial and retail scenarios.
• Enforced data governance, security, and compliance policies within AI workflows, balancing innovation with organizational standards for ethical and responsible machine learning deployment.
• Architected scalable microservices and data pipelines to support cross-functional use cases, ensuring seamless integration of machine learning into existing enterprise technology stacks. TECHNICAL SKILLS:
1. Programming Languages: Python, R, SQL, Scala, Julia, SAS 2. Machine Learning & AI: Supervised Learning, Unsupervised Learning, Deep Learning, Neural Networks, Natural Language Processing, Computer Vision, Reinforcement Learning, MLOps, AutoML, Ensemble Methods, Feature Engineering, Model Selection, Hyperparameter Tuning
3. Deep Learning Frameworks: TensorFlow, PyTorch, Keras, Scikit-learn, XGBoost, LightGBM, CatBoost, Hugging Face Transformers
4. Data Engineering & Big Data: Apache Spark, Hadoop, Kafka, Airflow, ETL/ELT, Data Pipelines, Apache Beam, Databricks, Snowflake, dbt
5. Cloud Platforms: AWS (SageMaker, EC2, S3, Lambda, Redshift), Microsoft Azure (Machine Learning Studio, Data Factory) 6. Databases: PostgreSQL, MySQL, MongoDB, Cassandra, Redis, Neo4j, ClickHouse, Amazon DynamoDB 7. Data Visualization & BI: Tableau, Power BI, Looker, Matplotlib, Seaborn, Plotly, D3.js, Streamlit, Dash 8. Statistical Analysis: Statistical Modeling, Hypothesis Testing, A/B Testing, Bayesian Statistics, Time Series Analysis, Experimental Design, Causal Inference
9. DevOps & MLOps: Docker, Kubernetes, Git, CI/CD, Jenkins, MLflow, Kubeflow, DVC, Model Deployment PROFESSIONAL EXPERIENCE
Client: Medallia – California, US April 2024 – Present Role: Lead Data Scientist
• Led design and deployment of a customer sentiment analysis system using PyTorch 2.0 and Hugging Face Transformers to extract nuanced insights from unstructured feedback, directly enabling targeted customer experience improvements.
• Automated end-to-end ML workflows with Azure ML Pipelines and Databricks Runtime, integrating data preparation, training, and deployment stages, streamlining model lifecycle management and reducing operational overhead.
• Implemented feature engineering and serving with Feast to standardize feature consistency across batch and real-time inference pipelines deployed on Kubernetes, improving model reliability in production.
• Re-architected data transformation pipelines using dbt on Snowflake to ensure data integrity and reproducibility of training datasets, reinforcing compliance with internal governance standards.
• Integrated MLflow for comprehensive experiment tracking and model registry, enforcing auditability and reproducibility standards critical for regulatory requirements in customer data processing.
• Developed interpretable AI solutions applying SHAP and LIME, enabling explainability for complex deep learning models and providing transparency required for stakeholder acceptance and model governance.
• Pioneered development of domain-specific NLP workflows combining LangChain with Hugging Face models, enhancing contextual understanding and automating the extraction of actionable insights from large textual datasets.
• Containerized models using Docker and orchestrated with Kubernetes, deploying microservices for scalable, fault-tolerant inference APIs with seamless integration into internal customer analytics applications.
• Established CI/CD pipelines utilizing GitHub Actions integrated with Docker and Azure ML, ensuring continuous validation, container builds, and automated deployment for rapid, reliable model iteration.
• Partnered with data engineers to optimize Spark workloads on Azure Databricks Runtime, efficiently processing multi-terabyte customer feedback data to deliver clean, enriched features for model consumption.
• Diagnosed model degradation and data drift leveraging MLflow metrics and dbt lineage, enabling proactive retraining triggered by data changes and ensuring sustained model performance.
• Designed secure, scalable APIs behind Azure API Gateway with OAuth2 authentication, facilitating controlled access to ML predictions while maintaining compliance with enterprise security policies.
• Led mentoring sessions on advanced model interpretability and scalable ML architecture, promoting best practices and upskilling data science teams in modern ML operations.
• Collaborated with DevOps and engineering teams to align infrastructure strategy, ensuring that Snowflake, Databricks, and Kubernetes environments are optimized for large-scale ML deployment in a hybrid cloud setup.
• Directed implementation of end-user monitoring and feedback loops on ML predictions to continuously refine model outputs and align with evolving business objectives.
Environment: Python, PyTorch, TensorFlow, Hugging Face Transformers, LangChain, MLflow, Feast, Azure ML Pipelines, Databricks Runtime, dbt, Snowflake, Docker, Kubernetes, GitHub Actions, Azure API Gateway, SHAP, LIME Client: Sisense – California, US Dec 2022 – Mar 2024 Role: Senior ML Engineer
• Engineered scalable distributed training pipelines using TensorFlow 2.11 and PyTorch 2.0 on Kubeflow Pipelines, orchestrating multi-node GPU clusters to accelerate model convergence for large transformer-based NLP models like BERT, critical for powering enterprise-scale natural language BI queries.
• Architected a microservices-based model serving infrastructure leveraging Kubernetes and Docker, containerizing diverse ML models and managing lifecycle with Helm charts, enabling seamless blue-green deployments.
• Integrated MLflow for comprehensive experiment tracking combined with GitHub Actions CI pipelines, enforcing reproducibility and facilitating A/B testing workflows across multiple model iterations in parallel.
• Developed end-to-end automated ML workflows on Azure ML Pipelines that included data ingestion from Azure Data Lake, feature engineering, hyperparameter tuning via TensorBoard integrations, and batch inferencing,.
• Led the design and deployment of RESTful inference APIs using FastAPI coupled with Azure API Management and OAuth2, ensuring secure, scalable, and low-latency model access for interactive analytics dashboards with SLA-driven latency targets.
• Implemented advanced feature stores integrated with Spark MLlib and Azure Databricks, orchestrating complex feature transformation jobs, enabling feature reuse and reducing data leakage risk, directly improving model robustness
• Constructed a modular ML pipeline with Kubeflow Pipelines for multi-model ensemble serving, combining outputs from XGBoost, LightGBM, and BERT models to optimize precision-recall tradeoffs in real-time customer segmentation.
• Deployed continuous monitoring and alerting frameworks with Prometheus and Grafana integrated into Kubernetes clusters, tracking model drift, input data anomalies, and infrastructure metrics to proactively trigger retraining and rollback processes.
• Designed and executed an explainability framework incorporating SHAP values and LIME alongside model outputs, visualized through integrated Tableau dashboards, providing business stakeholders transparency into black-box model decisions
• Automated feature extraction and data preprocessing pipelines using Apache Airflow orchestrated workflows, ensuring end-to-end data lineage and timely availability of clean data feeds for daily model retraining cycles in cloud environments.
• Orchestrated containerized batch inference workflows with Docker and Azure ML Batch Endpoints, scaling compute dynamically based on workload with autoscaling Kubernetes pods, improving throughput for large-volume analytics data.
• Built NLP-based intent classification and named entity recognition models using BERT and fine-tuned TensorFlow Hub embeddings, enhancing natural language query understanding and enabling sophisticated user interactions
• Integrated model versioning and feature flagging via MLflow and Kubernetes-native config maps, enabling staged rollouts of new model versions with precise control over production traffic splits, minimizing risk in live Sisense deployments.
• Collaborated closely with DevOps teams to optimize CI/CD pipelines for ML lifecycle management using GitHub Actions and Azure DevOps, embedding static code analysis, unit testing for model validation, and container vulnerability. Environment: Python, TensorFlow, PyTorch, BERT, Hugging Face Transformers, FastAPI, MLflow, Kubeflow Pipelines, Azure ML Pipelines, Apache Airflow, GitHub Actions, Docker, Kubernetes, Helm, Azure API Management, OAuth2, Prometheus, Grafana, Spark MLlib, Azure Databricks, Azure Data Lake Storage
Client: LTI Mindtree – Charlotte, NC April 2021 – November 2022 Role: Senior Data Scientist
• Integrated Python, LightGBM, and SQL to build a creditworthiness scoring engine for small business clients, enabling nuanced risk stratification based on transaction and behavioral histories.
• Trained and deployed NLP models using BERT (Hugging Face Transformers), TensorFlow, and spaCy to classify and summarize inbound client communications, automating manual support workflows for high-volume banking channels.
• Leveraged XGBoost, PySpark, and Airflow to orchestrate customer churn prediction pipelines, supporting the CRM team in identifying at-risk segments and guiding retention offers.
• Constructed custom entity recognition models with PyTorch, BERT, and Regex-based NLP preprocessing, enabling automated extraction of legal and compliance metadata from structured and semi-structured documents.
• Operationalized ML lifecycle using AWS Sagemaker, MLflow, and Docker, encapsulating training, validation, and deployment steps within reproducible pipelines for production scoring environments.
• Engineered explainable decision layers using SHAP, LightGBM, and Streamlit, presenting business stakeholders with interpretable model insights and aligning model behavior with internal fairness policies.
• Created an automated document classification system using TensorFlow, Keras, and NLP tokenization to assign taxonomies to financial agreements, reducing downstream legal processing overhead.
• Designed segmentation features with PySpark, Airflow, and SQL, powering campaign targeting logic in digital lending products based on customer lifecycle behavior and payment patterns.
• Built and containerized fraud detection models with PyTorch, Docker, and Sagemaker endpoints, integrating inference APIs into core transaction systems for real-time threat flagging.
• Automated model retraining triggers via Airflow, MLflow, and Git, ensuring continuous updates to forecasting models when upstream market signals or economic indicators changed.
• Implemented drift tracking pipelines with Python, Great Expectations, and S3 versioning, identifying schema mismatches and concept shifts in borrower data streams pre-scoring.
• Built transformer-based summarization tools using BERT, NLTK, and SQLAlchemy, streamlining executive insights on regulatory disclosures and improving internal audit traceability.
• Tuned time-series models using LightGBM, Optuna, and SQL for forecasting repayment cycles across mortgage portfolios, improving schedule predictions under fluctuating interest rate regimes.
• Created training data lineage frameworks using Git, MLflow, and Jupyter, enabling version traceability across feature sets, labels, and model configurations within CI/CD-compatible workflows. Environment: Python (scikit-learn, XGBoost, LightGBM, TensorFlow, PyTorch, spaCy, NLTK, SHAP, BERT, Optuna), SQL, AWS Sagemaker, MLflow, PySpark, Airflow, Docker, Git, FastAPI, Streamlit, Jupyter, Great Expectations Client: Fintellix – Mumbai, India Feb 2019 – Mar 2021 Role: Data Scientist
• Built credit risk models using Python, scikit-learn, and XGBoost to profile customer creditworthiness from bureau and behavioral data, aligning risk stratification strategies with evolving regulatory scoring norms.
• Automated model training pipelines on Azure Databricks, PySpark, and Airflow, enabling periodic model lifecycle orchestration for financial products governed by internal compliance frameworks.
• Designed time series forecasting modules using R, statsmodels, and SQL, supporting non-performing asset analysis for unsecured loan portfolios under dynamic macroeconomic assumptions.
• Engineered robust ingestion workflows using Hive, PySpark, and Azure Data Lake, enabling structured parsing of scanned financials and optimizing downstream readiness for credit model inputs.
• Developed customer clustering algorithms with KMeans, Python, and SQL, mapping behavioral segments that informed wealth product recommendations and improved personalization strategies for digital banking journeys.
• Integrated external data streams using RESTful APIs, Airflow, and Python, automating third-party credit signal acquisition that enriched credit adjudication models with fraud intelligence overlays.
• Delivered interactive explainability dashboards in Tableau, sourcing model insights from Hive and PySpark, aligning transparency standards with internal audit reviews for regulated model usage.
• Implemented anomaly detection frameworks for high-value transactions using Spark, XGBoost, and Azure Blob Storage, isolating behavioral deviations in premium customer segments under internal risk triggers.
• Facilitated model performance experimentation using R, matplotlib, and SQL, comparing alternative model schemas to identify candidates best aligned with operational acceptance thresholds.
• Conducted pipeline-level diagnostics with Hive, Spark, and Tableau, mapping latent SLA violations in loan processing workflows and helping re-engineer bank-client integration protocols.
• Adopted Git to version control model assets, implementing structured branching for reproducibility and traceable experimentation across multiple model release candidates.
• Collaborated with platform teams to productionize features using Python, Airflow, and Hive, contributing to a scalable feature registry that fed scoring APIs integrated into frontline lending systems. Environment: Python (scikit-learn, XGBoost, statsmodels), R, SQL, PySpark, matplotlib, Azure Data Lake, Azure Databricks, Airflow, Hive, Hadoop, Tableau, Jupyter, Git, RESTful APIs
Client: Marlabs Inc. – Bangalore, India Aug 2017 – Jan 2019 Role: Junior Data Scientist
• Developed a demand forecasting solution using Python, scikit-learn, and Pandas, improving inventory planning accuracy by uncovering purchase trends across multi-category retail datasets.
• Automated record-level data quality checks using R, tidyverse, and SQL Server, enabling scalable validation of healthcare billing records and flagging inconsistencies before downstream processing.
• Designed A/B testing frameworks with RStudio, ggplot2, and Excel 2013, providing the product team with statistical evidence to guide UI/UX improvements for a digital commerce platform.
• Built an ETL preprocessing layer using NumPy, Python, and T-SQL, harmonizing heterogeneous data feeds to support real-time scoring pipelines for predictive modeling tasks.
• Created reusable visual templates in Matplotlib, Seaborn, and Jupyter Notebook, allowing business analysts to self-serve insights and reduce dependency on ad hoc engineering support.
• Designed SQL-based aggregation logic with T-SQL, Pandas, and SQL Server, optimizing feature set preparation for segmentation models used by the marketing analytics team.
• Engineered a classification pipeline using Python, scikit-learn, and Pandas, prioritizing high-conversion lead scoring for enterprise B2B campaigns in the telecom sector.
• Integrated Git, Jupyter, and RStudio into model development lifecycles, enabling reproducibility and controlled versioning for iterative experimentation and cross-team collaboration.
• Conducted churn pattern analysis using R, SQL Server 2012, and Pandas, surfacing critical behavioral indicators that supported a targeted customer engagement strategy.
• Contributed to feature engineering workflows by testing dimensionality reduction techniques including PCA and SelectKBest via scikit-learn, streamlining high-dimensional model pipelines for improved runtime efficiency. Environment: Python, R, scikit-learn, NumPy, Pandas, Seaborn, Matplotlib, tidyverse (dplyr, ggplot2), SQL (T-SQL), Jupyter Notebook, RStudio, Excel 2013, Git, SQL Server 2012
EDUCATION
Bachelor of Technology in Electronics and Communication Engineering
• Sri Venkateswara College of Engineering and Technology, India 2017