Yashaswini Prakash
Data Scientist
*******************@*****.*** +1-442-***-**** LinkedIn
SUMMARY
● Data Scientist with 4+ years of experience delivering end-to-end AI/ML, NLP, and data engineering solutions across finance, SaaS, and enterprise domains.
● Expertise in building supervised/unsupervised ML models, deep learning architectures (CNN, RNN, Transformers), and advanced NLP workflows using spaCy, NLTK, Hugging Face, BERT.
● Skilled in designing and automating ETL/ELT pipelines, data lakehouse architectures, and streaming data systems using Apache Spark, PySpark, Airflow, Kafka, and Snowflake.
● Strong experience deploying production-grade ML systems using SageMaker, Vertex AI, MLflow, FastAPI, Docker, Kubernetes, and CI/CD (GitHub Actions, Jenkins).
● Proficient in SQL optimization, data modeling (Star/Snowflake schema), big data processing, and cloud platforms (AWS, GCP, Azure).
● Adept at creating interactive dashboards and KPI reports using Power BI, Tableau, Looker, Plotly, supporting strategic and operational decision-making.
● Known for strong problem-solving, stakeholder communication, Agile collaboration, and delivering measurable business impact through AI automation and scalable data systems.
SKILLS
Programming Languages: Python, SQL, R, Scala, Java, Bash Data Analysis & BI: Data Cleaning & Wrangling, Exploratory Data Analysis (EDA), A/B Testing, Statistical Modeling, Power BI, Tableau, Looker, Excel (Advanced: PivotTables, Power Query)
Machine Learning & AI: Supervised & Unsupervised Learning, Deep Learning (CNN, RNN, Transformers), NLP (spaCy, NLTK, Hugging Face, BERT), Feature Engineering, Time-Series Forecasting, Model Evaluation (AUC, Precision/Recall, RMSE) MLOps & Deployment: MLflow, Weights & Biases (W&B), Amazon SageMaker, Vertex AI, FastAPI, Flask, Dockerized Model Deployment, CI/CD
(GitHub Actions, Jenkins), Model Monitoring & Versioning Data Engineering & Pipelines: ETL/ELT Pipelines, Apache Airflow, Apache Kafka, Apache Spark / PySpark, Data Warehousing, Data Modeling
(Star/Snowflake Schema), Data Lakehouse Architecture Cloud Platforms: AWS (S3, Glue, Redshift, Lambda, SageMaker), GCP (BigQuery, Dataflow, Vertex AI), Azure (Synapse, Data Factory, Blob Storage)
Big Data & Storage: Hadoop Ecosystem, Databricks, Snowflake, Hive, ClickHouse, NoSQL (MongoDB, DynamoDB), SQL Optimization & Query Tuning
Databases: MySQL, PostgreSQL, Snowflake SQL, BigQuery SQL, MongoDB, DynamoDB DevOps & Automation: Docker, Kubernetes, Terraform (Basic), Git, GitHub, GitLab CI/CD, Linux Shell Scripting Data Governance & Quality: Data Lineage, Data Validation, Metadata Management, Data Security & Compliance (HIPAA/GDPR) Visualization & Reporting: KPI Dashboards, Automated Reporting, Interactive Visualizations, Plotly, Matplotlib, Seaborn Soft Skills: Stakeholder Communication, Problem-Solving, Requirement Gathering, Agile/Scrum, Cross-Functional Collaboration EXPERIENCE
American Express USA Data Scientist Jan 2024 – Current
● Built and deployed supervised ML models (XGBoost, Random Forest, LightGBM) using Python, SQL, PySpark, improving fraud-risk scoring accuracy and reducing false positives by 15–20%.
● Designed ETL/ELT pipelines with Airflow, Spark, Glue to automate ingestion for multi-terabyte datasets across Snowflake and Redshift, reducing pipeline latency by 30%.
● Developed NLP models using spaCy, Hugging Face Transformers, BERT for intent classification and entity extraction, increasing text-processing automation by 70%.
● Implemented end-to-end MLOps workflows using SageMaker, Vertex AI, MLflow, Jenkins, enabling automated training, versioning, CI/CD, and real-time model monitoring.
● Built streaming pipelines using Kafka + Spark Structured Streaming for real-time customer risk scoring and anomaly detection.
● Created executive-level Power BI/Tableau dashboards to track KPIs, model performance, and customer behavior insights, improving decision-making across teams.
● Optimized SQL queries and Star/Snowflake schema modeling, improving analytics query performance on cloud warehouses by 40%.
● Developed a centralized Data Lakehouse architecture (S3 + Glue Catalog + Redshift Spectrum) improving cross-team data accessibility and governance by 35%.
● Ensured HIPAA/GDPR compliance, data lineage, metadata validation, and data quality monitoring across analytics and ML pipelines. Accenture Private Limited India Data Engineer July 2020 – Aug 2022
● Designed end-to-end NLP pipelines using spaCy, NLTK, BERT for automated ticket classification & routing, reducing manual triage time by 65%.
● Built scalable PySpark + SQL ETL pipelines for near real-time reporting across distributed datasets, improving refresh cycles by 45%.
● Developed forecasting and anomaly detection models using ARIMA, LSTM, and clustering techniques for IT operations analytics.
● Automated Power BI/Tableau reporting workflows, improving analytics delivery efficiency and reducing manual reporting efforts by 60%.
● Deployed ML models using FastAPI / Flask, Docker, and GitHub Actions-based CI/CD, ensuring low-latency inference in production.
● Implemented data quality rules, metadata checks, and data governance workflows to enhance reliability and lineage tracking.
● Collaborated with cross-functional teams to gather requirements and deliver ML solutions aligned with business KPIs. Agimus Technologies India Data Analyst Intern Jan 2020 – June 2020
● Supported development of classification and regression models using Python, pandas, and scikit-learn, improving baseline accuracy by 15–18%.
● Performed EDA, feature engineering, and data cleaning to prepare datasets for senior data scientists and analytics teams.
● Automated pipelines and reporting tasks using SQL, Python scripts, and Excel, reducing manual workflow time by 40%.
● Developed Power BI/Tableau dashboards for internal KPI tracking and performance monitoring.
● Gained hands-on production experience by deploying micro-models using FastAPI + Docker, and collaborating with engineering teams. EDUCATION
Masters in Computer Science,
California State University, San Marcos, USA.
Bachelors in Computer Science,
GSSS Institute of Engineering & Technology for Women, Mysore, India. CERTIFICATIONS
● AWS Certified Machine Learning Specialty 2025
● Generative AI Leader Professional Certificate
ACADEMIC PROJECTS
Predictive Modeling for Liver Disease Diagnosis Tools & Tech: Python, Scikit-learn, Pandas, Matplotlib, Random Forest, Logistic Regression
● Built a machine learning model to predict liver disease using structured patient health records.
● Performed data cleaning, feature engineering, and exploratory data analysis (EDA) to identify key predictors such as ALT/AST enzyme levels, BMI, and alcohol consumption.
● Trained multiple models including Logistic Regression and Random Forest, optimizing with GridSearchCV and cross-validation.
● Achieved 85% accuracy, demonstrating strong predictive power in early-stage liver disease detection.
● Visualized model insights using Matplotlib and Seaborn, supporting potential clinical decision-making workflows.
● Highlighted the impact of predictive analytics in healthcare by aiding early diagnosis and risk assessment. Covid-19 Vaccine Provenance Tracking using IoT Tools & Tech: Python, MQTT, Raspberry Pi, RFID, IoT Sensors, Dashboard (Custom UI)
● Designed and implemented an IoT-based tracking system for Covid-19 vaccines to ensure cold chain compliance across the supply chain.
● Integrated temperature, GPS, and RFID sensors using Raspberry Pi and MQTT protocol to transmit real-time handling data.
● Developed threshold-based alerting mechanisms and a custom dashboard to monitor shipment history and environmental conditions.
● Ensured data integrity and traceability across the supply chain, enhancing transparency and trust in vaccine logistics.
● Demonstrated practical application of IoT in healthcare logistics, addressing critical issues in vaccine handling and verification.