Agam Saraswat
MA, USA +1-508-***-**** ***************@*****.*** Linkedin
SUMMARY
Data Scientist and Data Analyst with 4 years of experience in building scalable data solutions, analyzing large datasets, and delivering machine learning–driven insights. Skilled in Python, R, SQL, and libraries such as Pandas, NumPy, Scikit-learn, TensorFlow, PyTorch, and XGBoost for analytics, forecasting, and modeling. Proficient in creating predictive models for classification, regression, clustering, NLP, and time-series forecasting. Hands-on with data warehouse and lakehouse solutions using Snowflake, Delta Lake, and Azure Synapse, with performance tuning and query optimization. Adept at designing dashboards in Power BI, Tableau, and matplotlib to track KPIs and support decision-making. Collaborative team player experienced in Agile, MLOps, and cloud platforms (AWS, Azure). SKILLS
● Programming Languages & Core Libraries: Python, SQL, R, Pandas, NumPy, SciPy, Stats Models, PySpark, Data Wrangling, API Integration.
● Machine Learning & Deep Learning: Scikit-learn, XGBoost, LightGBM, TensorFlow/Keras (primary),PyTorch (familiar), CNN/RNN-LSTM, Model Selection, Cross-Validation, AUC/ROC, F1, RMSE.
● AI & Advanced ML Applications: Transfer Learning, AI-powered Recommendation Systems, Retrieval-Augmented Generation (RAG), Conversational AI with LangChain/Rasa.
● Data Visualization & BI Tools: Matplotlib, Seaborn, Plotly, Power BI, Tableau, Looker (LookML – Looker Modeling Language), Excel (Pivot Tables, Power Query).
● NLP & Text Analytics: NLTK, Hugging Face (inference/pipelines), Topic Modeling (LDA), Named Entity Recognition
(NER), Text Summarization, Sentiment Analysis.
● Data Engineering & ETL Pipelines: Apache Spark, PySpark, Apache Airflow, AWS Glue, Azure Data Factory, GCP,Dataproc, DBT.
● Cloud Platforms:
AWS: S3, Redshift, Lambda, Glue, SageMaker
Azure: Azure Data Factory (ADF), Synapse Analytics, Databricks, Blob Storage, Azure Monitor GCP: BigQuery, Dataproc, Dataplex, Cloud Storage, Vertex AI.
● Databases & Data Warehousing: PostgreSQL, MySQL, SQL Server, MongoDB, Snowflake.
● ML-Ops & Deployment: MLflow (experiments & model registry), FastAPI/Flask, Streamlit, REST APIs, Model Versioning & Experiment Tracking.
● DevOps & CI/CD for ML: Docker, GitHub Actions, Jenkins or GitLab CI/CD, Git-Based Workflows.
● Data Governance & Compliance: GDPR, HIPAA, Data Quality Checks, Great Expectations, PII Masking; RBAC, Data Lineage.
● Statistical Analysis & Forecasting: Regression (Linear, Logistic), Hypothesis Testing, ANOVA, Chi-Square, A/B Testing, Time Series Forecasting (ARIMA, Prophet, FBProphet), Clustering (K-Means, DBSCAN), PCA.
● Version Control & Collaboration: Git, GitHub, GitLab, Bitbucket, Pull Requests, Code Reviews, Notion, JIRA, API. PROFESSIONAL EXPERIENCE
Data Scientist Humana USA Jan 2025 – Present
• Assisted in designing and deploying machine learning models (Logistic Regression, Random Forest, XGBoost, CNNs, RNNs) for predictive analytics tasks including readmission risk prediction, disease progression forecasting, clinical decision support, and population health risk stratification under senior data scientist guidance.
• Supported the processing and integration of EHR/EMR datasets from Epic, Cerner, Allscripts with claims, HL7/FHIR feeds, CCD-A documents, ICD-10 diagnosis codes, SNOMED CT terminologies to improve data interoperability, align with CMS quality reporting standards (HEDIS, MACRA, MIPS, PQRS).
• Assisted in developing Natural Language Processing (NLP) pipelines using spaCy, Hugging Face Transformers, and BERT- based models for clinical note mining, symptom extraction, medication entity recognition, and adverse event detection from unstructured provider documentation.
• Participated in feature engineering, model interpretability using SHAP, LIME, and AUC/F1-score optimization to enhance model accuracy and clinician trust.
• Assisted in implementing real-time streaming analytics of patient vitals and IoT-connected medical devices using Apache Kafka, AWS Kinesis, MQTT.
• Supported compliance activities for HIPAA, HITECH, GDPR by applying PHI de-identification, tokenization, encryption in transit and at rest (AES-256, TLS/SSL), and role-based access control (RBAC).
• Participated in A/B testing, cohort analysis, and survival analysis to evaluate treatment effectiveness, patient engagement platforms, and telemedicine adoption impact.
• Collaborated with clinicians, healthcare administrators, and data engineering teams in Agile/Scrum environments using Jira, Confluence, and Miro for sprint planning, backlog grooming, and clinical workflow mapping.
• Helped in optimizing Snowflake queries, partitioning strategies, and materialized views for high-performance analytical workloads, resulting in reduced BI dashboard load times by 45%. Data Scientist R-Financial India Sep 2021 – Mar 2023
• Developed and deployed ML models for customer churn prediction, revenue forecasting, and customer segmentation using TensorFlow, Scikit-learn, and XGBoost, improving retention and revenue KPIs by 20%.
• Engineered statistical, behavioral, and NLP features from transactional data, unstructured text (emails, chat logs), and user activity patterns to improve model accuracy across multiple customer personas; Designed Apache Airflow DAGs for automated ETL, feature engineering, model training, evaluation, and inference workflows, with MLflow for experiment tracking and model versioning.
• Integrated real-time prediction services into client applications via containerized ML microservices (Flask, FastAPI, Docker, Kubernetes) for seamless inference at scale; Built data pipelines to ingest, process structured (SQL, CSV, Parquet) and unstructured (text, JSON) data from Snowflake, PostgreSQL, and API feeds, ensuring low-latency data delivery.
• Collaborated with BI teams to design interactive Power BI dashboards for visualizing prediction insights, churn risk tiers, and ROI contributions to business leaders.
• Optimized SQL queries, indexing strategies, and schema designs in Snowflake and PostgreSQL, reducing report generation time by 40% and improving data retrieval reliability.
• Implemented CI/CD pipelines for ML deployments using GitHub Actions, Docker, and AWS ECS, ensuring rapid, reliable rollouts of updated models; Ensured data security and compliance by applying encryption at rest/in transit, IAM-based access controls, and GDPR-compliant anonymization practices. Data Analyst Hexaware India Aug 2018 – Jan 2020
• Assisted in managing a cross-domain analytics platform leveraging Azure Synapse Analytics, AWS Redshift, and Snowflake for unified querying of structured, semi-structured, and streaming data across multiple business units.
• Supported the development of scalable ETL/ELT pipelines using Azure Data Factory, AWS Glue, and PySpark, integrating diverse sources such as healthcare EMRs (FHIR/HL7), payment transactions, IoT sensor telemetry, and eCommerce clickstream logs.
• Assisted in developing ETL and streaming pipelines using Apache Spark, Kafka, Airflow, AWS Glue, and Azure Data Factory for efficient data ingestion and transformation.
• Contributed to implementing real-time streaming analytics using AWS Kinesis Data Streams and Azure Event Hubs, enabling live fraud detection in financial transactions, patient vitals monitoring, and energy grid anomaly detection.
• Assisted in building predictive models with Azure Machine Learning and AWS SageMaker for churn prediction, personalized product recommendations, risk scoring, and demand forecasting.
• Helped design and maintain interactive Power BI dashboards connected to Azure Analysis Services and Amazon QuickSight, providing role-based KPIs to executives, clinicians, operations teams, and data-driven marketing units.
• Participated in orchestrating cloud workloads with AWS Step Functions and Azure Logic Apps, ensuring integration between on-prem, Azure, and AWS environments; supported compliance and governance using Azure Purview, AWS Lake Formation, encryption (KMS/Key Vault), and access controls in alignment with GDPR, HIPAA, and PCI-DSS.
• Contributed to optimizing performance and cost with serverless compute (AWS Lambda, Azure Functions), query tuning, and autoscaling clusters, reducing data processing costs by 30% and improving SLA adherence. PROJECTS HIGHLIGHTS
● Image Classification with CNN
Used TensorFlow/Keras to classify CIFAR-10 dataset images; Applied Convolutional Neural Networks and data augmentation techniques; Improved accuracy with transfer learning (ResNet, VGG16).
● Predictive Housing Price Model
Analyzed real estate data using EDA, feature engineering, and regression models; Implemented Gradient Boosting and Random Forest for better predictions; Deployed model using Flask API for real-time predictions. EDUCATION
Master of Science in Data Analytics Clark University, MA, USA May 2025 Bachelor of Science in Physics, Chemistry and Mathematics H.N.B.G.U, India Sep 2018