Machine Learning Data Scientist

Location:

Greensboro, NC

Posted:

June 04, 2025

Contact this candidate

Resume:

MANI KANTA REDDY

336-***-**** ***********************@*****.***

Linkedin URL

PROFESSIONAL SUMMARY

Senior Data Scientist with 8+ years of experience designing scalable machine learning solutions across various sectors

Designed and implemented predictive modeling pipelines using logistic regression, ARIMA, and deep learning for credit risk, emissions forecasting, and clinical diagnostics.

Operationalized ML solutions using Flask APIs integrated within microservice architectures, enabling real-time inference and system interoperability.

Automated ETL workflows with SQL, Apache Spark, and Pandas, ensuring consistent, low-latency data ingestion and transformation across distributed systems.

Engineered cloud-native ML systems using AWS Lambda, EC2, S3, Redshift, and Azure Machine Learning, optimizing cost and scalability.

Developed machine learning models using Python, scikit-learn, XGBoost, and TensorFlow to drive outcomes in finance, healthcare, and ESG domains

Built and maintained containerized environments with Docker and Kubernetes, supporting reproducible deployments across staging and production clusters.

Applied Natural Language Processing (NLP) techniques to extract structured insights from regulatory documents, financial disclosures, and ESG reports.

Integrated MLOps practices by implementing model versioning, drift detection, and retraining workflows with CI/CD using Jenkins and Azure DevOps.

Created asynchronous model scoring services using AWS SQS and SNS, enhancing throughput and decoupling inference layers from data streams.

Conducted hyperparameter tuning and optimization using Hyperopt, Optuna, and GridSearchCV to improve model robustness and generalization.

Developed explainable AI workflows using SHAP, LIME, and partial dependence plots to ensure transparency and compliance in regulated environments.

Leveraged Generative AI (LLMs, prompt engineering) to simulate ESG scenarios, generate counterfactuals, and aid strategic sustainability planning.

Constructed resilient data validation and schema enforcement layers using PySpark, Great Expectations, and custom validation frameworks.

Designed end-to-end forecasting pipelines using ARIMA, SARIMA, and Facebook Prophet to predict trends in lending, carbon emissions, and revenue.

Produced interactive data visualizations and executive dashboards using Power BI, Tableau, and Looker, aligning insights with KPIs and business metrics.

Led cross-functional Agile squads and translated business goals into scalable ML tasks, bridging gaps between data science, engineering, and product teams.

Built classification and clustering pipelines to enhance personalization, churn prediction, and targeted marketing in retail and fintech applications.

Collaborated with legal and compliance teams to align model design with regulatory frameworks and risk assessment protocols.

Refactored legacy analytics scripts and workflows into modular, testable components with Pytest, Pylint, and CI-integrated code quality checks.

Deployed real-time monitoring systems using Prometheus, Grafana, and custom alerting tools to observe model performance in production.

Drove stakeholder engagement through technical storytelling, model demos, and documentation to promote adoption of data-driven strategies across the organization.

TECHNICAL SKILLS

1.Programming & Scripting: Python (NumPy, Pandas, Scikit-learn, XGBoost, TensorFlow, Keras, PyTorch), R, SQL, Bash, PySpark

2.Machine Learning & AI: Supervised & Unsupervised Learning, Logistic Regression, Random Forest, XGBoost, ARIMA, SARIMA, Classification, NLP (spaCy, NLTK), Generative AI

3.Model Explainability & Optimization: SHAP, LIME, Feature Engineering, Hyperparameter Tuning

4.Big Data & Distributed Systems: Apache Spark, Hadoop, AWS Glue, EMR, Redshift Spectrum

5.Data Engineering & ETL: Pandas, SQL, Apache Spark, Airflow, DBT, Azure Data Factory

6.Cloud Platforms & DevOps: AWS (S3, EC2, Lambda, Redshift, SageMaker, SNS, SQS), Azure (ML Studio, Blob Storage), Docker, Kubernetes, Jenkins, Azure DevOps, Git

7.API Development & MLOps: Flask, FastAPI, RESTful APIs, MLflow, DVC, Prometheus, Grafana,

8.Visualization & BI Tools: Power BI, Tableau, Looker, Plotly, Matplotlib, Seaborn

9.Workflow Management: Agile, Scrum, Jira, Confluence, CI/CD Pipelines, Unit Testing, Git

10.Data Storage & Databases: PostgreSQL, MySQL, MongoDB, SQLite, Amazon Redshift, Snowflake

11.Compliance & Governance: ESG Reporting, Regulatory Compliance, Model Validation

PROFESSIONAL EXPERIENCE

Client : Zurich Insurance Addison, TX Oct 2023 – Present

Role - Sr. Data Scientist ML

Engineered and launched an ESG strategy roadmap, ensuring full compliance with EU and US climate regulations, targeting revenue from sustainable products through advanced data analytics, machine learning, and predictive modeling.

Led cross-functional teams to incorporate market insights, Generative AI, and proprietary analytics into sustainability strategies, driving data-backed decisions that enhanced long-term growth and aligned with corporate sustainability goals.

Streamlined ESG reporting by developing automated ETL pipelines, integrating them with legacy systems to cut manual reporting tasks, while ensuring full compliance with financial and environmental regulations.

Developed custom machine learning models in Python to quantify the financial impact of ESG metrics, feeding directly into executive decision-making processes, improving reporting accuracy for sustainability-related financial disclosures.

Automated ESG data processing by designing cloud-based pipelines on AWS Lambda and Redshift, reducing data latency, delivering real-time insights into environmental and social impacts on corporate performance.

Deployed AI-driven risk assessments using XGBoost and TensorFlow to forecast carbon emissions and assess the environmental footprint of investments, leading to reduction in carbon intensity across the portfolio.

Revamped financial models by integrating real-time sustainability KPIs, improving the accuracy of sustainability forecasts, and providing actionable insights to guide investment strategies.

Led the creation of an interactive ESG dashboard using Tableau and Power BI, combining real-time data sources to provide C-suite executives with actionable visualizations that prioritized critical areas for governance and sustainability improvements.

Developed machine learning pipelines to predict climate risk and impact on underwriting models, identifying risks early and mitigating potential losses before the policy renewal period.

Optimized ESG reporting by creating AI-based regulatory scrapers to automate compliance checks, reducing manual oversight and ensuring seamless real-time adjustments to evolving EU Taxonomy and SFDR regulations.

Implemented Natural Language Processing (NLP) to analyze large volumes of regulatory documents, extracting actionable insights for compliance strategy in real-time, reducing analysis time.

Architected predictive models to forecast loan repayment behavior under climate risk scenarios, providing critical insights for credit risk management, and influencing Zurich's investment and lending strategies.

Built and deployed machine learning models to optimize Zurich's sustainable investments, using Random Forests and XGBoost to enhance asset return predictions and deliver better ROI on green bonds and sustainability-focused portfolios.

Pioneered the use of Generative AI to simulate market volatility under ESG frameworks, improving Zurich’s response to regulatory changes and future-proofing the investment strategy.

Collaborated with executive teams and external stakeholders to align sustainability objectives with Zurich's corporate strategies, advancing Zurich’s position as a leader in sustainable finance.

Led the design and development of automated compliance workflows using Python and SQL, cutting regulatory report generation time and enabling faster regulatory filing.

Monitored inference latency and model health using custom logging and alerting scripts built with CloudWatch and Prometheus.

Championed the integration of ESG metrics into Zurich's financial models, leveraging ARIMA and time-series forecasting to assess the long-term impacts of sustainability on future revenues and liabilities.

Environment: Python, SQL, Flask, Generative AI, AWS Lambda, Redshift, S3, Docker, Apache Spark, SQL Server, Tableau, Power BI, DAX, Looker, Random Forests, Logistic Regression, Time-Series Forecasting, ARIMA, NLP, Git, Jenkins, CI/CD pipelines, Agile Scrum

Client : Edward Jones St. Louis, MO Jul 2021 – Sep 2023

Role - Sr. Data Scientist

Spearheaded the development and deployment of advanced machine learning models for credit risk prediction, leveraging scikit-learn, XGBoost, and TensorFlow to enhance asset recovery strategies, resulting in improved risk classification and predictive accuracy that reduced default rates by 18%.

Designed and implemented multi-stage ETL pipelines in SQL, utilizing complex window functions and subqueries, transforming unstructured loan data into actionable intelligence for downstream machine learning models and real-time decision-making.

Built and optimized end-to-end data pipelines for credit asset management, automating extraction, transformation, and load processes, ensuring seamless integration of multiple data sources from legacy systems into the Azure cloud platform.

Developed and fine-tuned predictive credit scoring models using logistic regression, random forests, and ensemble methods, resulting increase in loan approval accuracy and reducing manual underwriting time

Created advanced interactive dashboards using Power BI and Tableau with real-time data connectors, allowing senior executives to visualize complex financial risk metrics, delinquency trends, and loan portfolio health, driving data-backed strategic decisions.

Leveraged Hyperopt and GridSearchCV to automate hyperparameter tuning for gradient boosting models, achieving up to 10% higher AUC, ensuring models met stringent financial regulations.

Deployed automated model monitoring systems in AWS Lambda to track real-time model drift, performance degradation, and dataset shifts, ensuring models adapted to evolving credit trends without manual intervention.

Integrated real-time credit risk scoring models directly into AWS SQS and SNS queues, facilitating asynchronous data processing, accelerating response times for loan origination systems, enabling near-instant loan decisions.

Engineered RESTful APIs using Flask for seamless integration of predictive models into internal applications, enabling dynamic model scoring within business systems, improving user experience and decision velocity.

Architected and maintained CI/CD pipelines in Azure DevOps and Jenkins, incorporating automated model validation and deployment protocols, ensuring scalable model delivery and reliable production updates.

Optimized machine learning workflows by implementing data preprocessing pipelines with Pandas and NumPy, streamlining large data transformations and achieving significant reductions in model training times.

Conducted model explainability and sensitivity analysis using SHAP and LIME, delivering insights into model behavior for non-technical stakeholders and enhancing transparency, ensuring compliance with FCRA and other financial regulations.

Led the creation of a data governance framework, establishing processes for version control, data lineage, and audit trails for financial models, ensuring traceability and compliance with internal audit requirements.

Employed advanced time-series forecasting techniques to model loan repayment behavior using ARIMA and SARIMA models, providing insights into delinquency trends and supporting proactive financial risk mitigation strategies.

Played a pivotal role in Agile scrum teams, contributing to sprint planning, defining user stories for data science tasks, and delivering key milestones, ensuring the timely and high-quality execution of credit risk projects.

Environment: Python, SQL, Flask, Azure DevOps, AWS Lambda, SQS, SNS, Azure, SQL Server, Jenkins, Docker, RESTful APIs, Power BI, Tableau, DAX, Looker, Hyperopt, GridSearchCV, LIME, SHAP, Random Forests, Logistic Regression, ARIMA, SARIMA, Azure DevOps, Jenkins, Git, Agile Scrum

Client : Albertsons, Arlington, TX Oct 2019 – Jun 2021

Role - Associate Data Scientist

Built distributed data pipelines using Python and Apache Spark to process large-scale retail data, improving ETL performance and reducing processing delays.

Applied natural language processing (NLP) for entity resolution tasks, improving the accuracy of customer identity linking across disparate data sources.

Created RESTful APIs with Flask and AWS Lambda to expose real-time customer and transaction insights to internal applications.

Used Amazon SQS and SNS to decouple services and streamline asynchronous communication between microservices.

Developed classification and clustering models using scikit-learn and XGBoost to support targeted marketing and personalized promotions.

Deployed ML models in containers using Docker and orchestrated them with Kubernetes, ensuring consistency across dev and production.

Built CI/CD pipelines using Azure DevOps and Jenkins, automating code deployment and reducing manual errors in ML ops workflows.

Integrated model retraining logic based on data drift indicators, ensuring that predictions remained accurate as data patterns evolved.

Performed extensive feature engineering on transactional, behavioral, and loyalty program datasets to feed supervised models.

Collaborated with backend engineers to embed ML scoring services into business applications with minimal latency.

Designed data validation checks and implemented schema enforcement to prevent corrupt or out-of-spec records from entering production models.

Created internal dashboards with Power BI and Plotly to visualize model outputs and business impact for non-technical teams.

Used Git, branching strategies, and code reviews to maintain code quality and enforce reproducible ML pipelines.

Participated in Agile sprints, focusing on short release cycles for model iterations and integrating continuous feedback into development.

Environment: Python, SQL, Spark, Flask, TensorFlow, XGBoost, scikit-learn, Docker, Kubernetes, AWS (Lambda, SQS, SNS, S3), Jenkins, Azure DevOps, Power BI, Git, NLP, CI/CD, REST APIs

Client : Juspay Technologies Bangalore, India Feb 2017 – Jul 2019

Role - Data Scientist

Designed and implemented AI-driven predictive models for clinical diagnostics using Python, leveraging TensorFlow and PyTorch to support early disease detection systems.

Deployed machine learning pipelines within real-time environments by integrating with healthcare applications using Docker, enabling seamless inferencing in production.

Configured cloud-native services on AWS (EC2, S3, Lambda) to support scalable model deployment and real-time data ingestion for clinical decision-making tools.

Applied hyperparameter tuning and cross-validation techniques to improve model generalization and reduce overfitting across varied patient datasets.

Utilized TensorBoard for tracking experiment metrics, loss curves, and optimization trends, enabling transparent and reproducible model training workflows.

Engineered automated model retraining pipelines that incorporated versioning and rollback capabilities to support robust model lifecycle management.

Collaborated with data engineering teams to implement ETL pipelines, transforming raw health records into structured formats for downstream modeling.

Created dynamic data transformation and feature extraction modules using Pandas and NumPy, ensuring clean and consistent input across ML workflows.

Developed and containerized microservices for ML models using Flask and Docker, facilitating modular and portable architecture for deployment.

Led initiatives to automate unit testing and model validation processes using custom test harnesses and PyTest, increasing code reliability.

Documented model architecture, assumptions, and evaluation methodologies in technical reports and stakeholder presentations to ensure alignment and transparency.

Partnered with product owners, clinicians, and QA teams to translate business requirements into analytical solutions, enhancing clinical product features.

Adopted Agile methodology to drive iterative development, collaborating across sprints to refine modeling tasks and integrate stakeholder feedback.

Environment: Python, SQL, TensorFlow, PyTorch, Pandas, NumPy, Flask, Scikit-learn, Docker, TensorBoard, Git, PyTest, AWS (EC2, S3, Lambda), Agile, CI/CD

Client : High Radius Technologies Hyderabad, India Nov 2015 – Jan 2017

Role : Data Scientist

Architected and deployed deep learning models using Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to process complex data structures in high-performance environments.

Utilized TensorFlow and PyTorch to train and fine-tune neural network architectures for enhanced predictive capabilities in classification and sequence-based tasks.

Configured and optimized GPU-based clusters for parallel processing, accelerating deep learning training pipelines and increasing throughput across compute-intensive workloads.

Engineered a custom data sampling framework to support imbalanced datasets and improve model generalization across diverse data distributions.

Automated model retraining workflows and integrated continuous testing protocols using TensorBoard, enhancing monitoring and experiment tracking for iterative model refinement.

Developed data preprocessing routines in Python to clean, normalize, and transform unstructured datasets, facilitating accurate model input preparation and reducing noise.

Advanced high-performance computing (HPC) scheduling strategies to improve task distribution and resource utilization across cluster environments.

Built reusable model evaluation components leveraging Scikit-learn metrics and custom validation strategies to streamline accuracy assessment and performance benchmarking.

Collaborated with data engineering teams to establish end-to-end data pipelines using ETL best practices and batch-processing tools for scalable data ingestion.

Contributed to the deployment of containerized models with Docker, enabling reproducible environments and seamless integration with downstream applications.

Supported efforts to enforce data governance by creating validation scripts to detect anomalies and inconsistencies in large-scale datasets, enhancing overall data integrity.

Documented workflows, model architecture, and experimental results, effectively communicating findings and improvements to cross-functional teams.

Environment: Python, SQL, TensorFlow, PyTorch, Scikit-learn, NumPy, Pandas, TensorBoard, Docker

EDUCATION

Bachelor of Technology (B.Tech) in Information Technology

JNTUH, Hyderabad, Telangana Jan 2011 – Jan 2015

Contact this candidate