Machine Learning Data Scientist

Location:

Edison, NJ

Salary:

Posted:

November 05, 2025

Contact this candidate

Resume:

Veda Sai

Contact: 612-***-****

Email: **********@*****.***

LinkedIn: https://www.linkedin.com/in/veda-sai-m-b80804119/

PROFESSIONAL SUMMARY:

AI/ML Engineer and Sr. Data Scientist and with around 7 years of proven experience in designing and deploying end-to-end machine learning and deep learning solutions across healthcare and enterprise domains. Adept in building scalable data pipelines, predictive models, and advanced analytics workflows involving NLP, computer vision, and time series forecasting. Proficient in Python, SQL, and R with hands-on expertise in TensorFlow, PyTorch, Scikit-learn, and MLOps tools such as Docker, Kubernetes, MLflow, and DVC. In the past year, led projects in generative AI using LLMs (GPT-4, LLaMA 3, Gemini), RAG architectures, Hugging Face Transformers, and LangChain—deployed on cloud platforms such as Microsoft Azure and AWS, with integrated vector search using FAISS and Pinecone. Skilled in delivering explainable, secure, and regulatory-compliant (HIPAA/GDPR) AI solutions. Passionate about solving real-world problems through responsible AI and innovation at scale. And always eager to learn new technologies, adaptable, hardworking, and committed to continually upgrading skills to stay at the forefront of AI/ML advancements.

TECHNICAL SKILLS:

Programming Languages

Python, SQL, PySpark

Machine Learning Frameworks

Scikit-learn, XGBoost, LightGBM

Deep Learning / LLMs

TensorFlow, PyTorch, GPT-3.5 GPT-4, LLaMA 3, Gemini

NLP & Transformers

Hugging Face Transformers, LangChain, BERT, RoBERTa, SentenceTransformers, RAG

Data Engineering & ETL

Pandas, NumPy, SQLAlchemy, Airflow

Cloud Platforms

AWS, Azure Microsoft

MLOps & Deployment

MLflow, DVC, Docker, Kubernetes, Flask, Django

Databases

MySQL, PostgreSQL, MongoDB, ChromaDB, FAISS, Pinecone

Visualization & BI Tools

Power BI, Tableau, Matplotlib, Seaborn, Streamlit

Tools & Platforms

OpenAI API, Git, GitHub

Statistical Analysis

A/B Testing, Hypothesis Testing, SHAP, LIME, EDA

Version Control & Docs

Git, GitHub, Technical Documentation

Soft Skills

Cross-functional Collaboration, Stakeholder Communication, Agile Development

CERTIFICATIONS:

AWS Certified Machine Learning – Specialist

Microsoft Azure AI Engineer Associate

PROFESSIONAL EXPERIENCE:

AI/ML ENGINEER

JOHNSON CONTROLS TX, Remote Sep 2024 – Till Date

Project: AI-Powered Smart Building Intelligence Platform

Description: Designed and deployed a cloud-native AI platform at Johnson Controls to automate smart building operations, leveraging Gen AI, LLMs, RAG pipelines, and Azure cloud services. The solution enabled real-time fault diagnostics, predictive maintenance scheduling, and contextual decision support for field technicians. Integrated scalable MLOps pipelines, ensured GDPR and ISO 27001 compliance, and achieved measurable improvements in operational efficiency, energy optimization, and system uptime across enterprise building infrastructures.

Designed and deployed Generative AI (Gen AI) and LLM-powered solutions using Hugging Face Transformers and GPT-4 APIs to automate smart building operations, enhance energy efficiency insights, and enable intelligent fault diagnostics.

Built and maintained Retrieval-Augmented Generation (RAG) pipelines using LangChain, FAISS, and Pinecone, improving contextual search precision by 45% across building manuals, sensor logs, and maintenance protocols.

Fine-tuned transformer models (BERT, RoBERTa) and domain-specific Gen AI models using PyTorch to support tasks such as anomaly detection, system state classification, and predictive maintenance, achieving a 22% improvement in predictive maintenance scheduling accuracy.

Used SHAP and LIME to explain predictions from transformer and Gen AI models for anomaly detection and maintenance, helping stakeholders understand model decisions and build trust.

Developed and integrated real-time vector similarity search systems using SentenceTransformers and OpenAI Embeddings, supporting fault resolution dashboards and technician decision-making.

Deployed scalable Gen AI, ML, and LLM-based microservices using Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Functions, achieving 99.9% uptime and reducing model inference latency by 35%.

Extended MLOps pipelines with MLflow, DVC, Docker, and Kubernetes to automate model versioning, drift monitoring, retraining, and CI/CD deployment, reducing manual retraining effort by 40%.

Designed and deployed a cloud-native architecture on Azure, leveraging Azure ML and AKS to ensure high availability, seamless scalability, and built-in disaster recovery for critical smart building operations.

Engineered prompt-optimized APIs and fine-tuned Generative AI and LLMs (GPT-3.5, LLaMA 3, Gemini) for AI-powered dashboards and voice/chat-enabled support systems used by field technicians and facility managers.

Conducted rigorous LLM and Generative AI benchmarking and model evaluations to ensure domain alignment, output reliability, and performance optimization in real-time smart infrastructure environments.

Ensured secure data handling and GDPR, ISO 27001 compliance across all AI/ML and Gen AI model workflows, contributing to successful audits and certification renewals.

Achieved a 28% reduction in unplanned downtime across smart building sites through early fault detection and AI-driven maintenance recommendations.

Tech Stack: Azure Machine Learning, Azure Kubernetes Service (AKS), Python, PyTorch, TensorFlow, Hugging Face Transformers, LangChain, FAISS, Pinecone, MLflow, DVC, Docker, Kubernetes, OpenAI API, SentenceTransformers, Git, Streamlit, Power BI.

DATA SCIENTIST WITH AI/ML

TD BANK Morris Plains, NJ Jan 2024 - Aug 2024

Project: AI-Driven Risk Analytics and Customer Intelligence Platform

Description: Led the development of a scalable, cloud-based ML platform at TD Bank to support credit risk scoring, real-time fraud detection, and personalized customer segmentation. Designed automated data pipelines using PySpark and AWS Glue and deployed end-to-end models via SageMaker for low-latency inference. Integrated NLP and image recognition models for KYC verification and customer feedback analysis, contributing to improved onboarding accuracy, reduced fraud losses, and enhanced regulatory compliance.

Designed and developed ETL workflows using Python, PySpark, and AWS Glue to ingest large-scale financial transaction data, KYC documents, and feedback streams, enabling real-time risk analytics and compliance reporting.

Built scalable ML pipelines on AWS SageMaker Pipelines to automate model training, validation, and deployment, reducing deployment time by 30% and improving model operational efficiency by 25%.

Developed credit risk scoring and fraud detection models using Scikit-learn, TensorFlow, and PyTorch, implementing advanced feature engineering techniques to boost model performance.

Deployed models on AWS SageMaker and orchestrated real-time scoring via SageMaker Endpoints and AWS Lambda, achieving a 15% lift in fraud detection recall while maintaining low latency.

Built RESTful APIs using Flask and Django to expose model predictions for loan approvals, client segmentation, and fraud detection across Citibank's digital platforms.

Conducted exploratory data analysis (EDA), A/B testing, and hypothesis testing to refine risk models and optimize marketing strategies, uncovering customer behavior patterns that improved targeting by 12%.

Implemented NLP-based deep learning models for customer feedback analysis and automated KYC document verification, reducing manual verification time by 20%.

Deployed fraud detection models with continuous monitoring via SageMaker Model Monitor, enabling proactive alerts and reducing false positives by 12%.

Collaborated with compliance, analytics, and engineering teams to ensure AI solutions aligned with regulatory guidelines (e.g., AML, GDPR, internal risk policies).

Tech Stack: AWS SageMaker, AWS Glue, Python, PySpark, TensorFlow, Scikit-learn, Flask, Django, SQL, PostgreSQL, MongoDB, Power BI, Seaborn, GitHub.

Data Scientist

OTIS Hyderabad, India Aug 2021 – Apr 2023

Project: Scalable AI-Powered Data Intelligence Platform.

Description: Designed scalable data pipelines and deployed ML models using Python, PySpark, and cloud platforms (AWS, Azure) to automate analytics and decision-making. Built APIs for real-time model access, optimized database solutions, and applied deep learning techniques in NLP, computer vision, and forecasting to enhance operational insights and business integration.

Designed and developed scalable data pipelines and ETL workflows using Python (Pandas, NumPy, PySpark), optimizing data processing efficiency.

Built and deployed machine learning models using Scikit-learn, TensorFlow, and PyTorch, applying feature engineering, hyperparameter tuning, and validation to improve accuracy.

Created and optimized RESTful APIs with Flask/Django to serve ML models and enable seamless integration with business applications.

Developed and maintained SQL and NoSQL database solutions (MySQL, PostgreSQL, MongoDB), optimizing queries for faster data retrieval and analysis.

Deployed ML models and data pipelines on cloud platforms (AWS, Azure) using Docker, Kubernetes, and MLOps best practices for scalability and automation.

Conducted EDA, statistical analysis, A/B testing, and data visualization using Matplotlib, Seaborn, and Power BI to extract actionable insight

Implemented deep learning architectures for NLP, computer vision, and time series forecasting, improving automation and decision-making.

Collaborated with cross-functional teams for code reviews, model documentation, and stakeholder presentations, ensuring transparency and alignment with business goals.

Tech Stack: Python (Pandas, NumPy, PySpark), Flask, Django, AWS, Azure, Docker, Kubernetes, TensorFlow, PyTorch, Scikit-learn, SQL (MySQL, PostgreSQL), MongoDB.

PYTHON DEVELOPER

VENTEK SOLUTIONS Hyderabad, India Jun 2018 - Jul 2021

Project: Data-Driven Healthcare and Outcome Analytics.

Description: Developed data pipelines and analytical models at Navigant to optimize healthcare outcomes and identify cost-saving opportunities. Analyzed claims and operational data using Python and SQL, created interactive dashboards in Tableau/Power BI, and automated reporting workflows—reducing manual efforts by 30% and improving clinical decision-making through real-time insights.

Analyzed healthcare claims and operational data using Python (Pandas, NumPy) and SQL to identify trends and cost-saving opportunities.

Built and maintained ETL pipelines for structured healthcare data, ensuring quality and compliance.

Performed EDA, statistical evaluations, and A/B testing to assess patient outcomes and treatment impact.

Developed dashboards in Tableau and Power BI to track KPIs, cost drivers, and population health metrics.

Automated reporting workflows, reducing manual effort by 30%, and optimized SQL for real-time insights.

Collaborated with clinical and technical teams to deliver actionable insights and strategic recommendations.

Documented processes and maintained version control using Git for transparency and reproducibility

Tech Stack: Python (Pandas, NumPy), SQL, Tableau, Power BI, Git, AWS, A/B Testing

EDUCATION:

Bachelors in Aeronautical Engineering from JNTUH in 2018

Masters in ITM from Concordia University, St. Paul 2024

Contact this candidate