Senior Data Scientist - AI, NLP & Data Eng

Location:

Escondido, CA

Posted:

March 24, 2026

Contact this candidate

Resume:

Yashaswini Prakash

Data Scientist

*******************@*****.*** +1-442-***-**** LinkedIn

SUMMARY

● Data Scientist with 4+ years of experience delivering end-to-end AI/ML, NLP, and data engineering solutions across finance, SaaS, and enterprise domains.

● Expertise in building supervised/unsupervised ML models, deep learning architectures (CNN, RNN, Transformers), and advanced NLP workflows using spaCy, NLTK, Hugging Face, BERT.

● Skilled in designing and automating ETL/ELT pipelines, data lakehouse architectures, and streaming data systems using Apache Spark, PySpark, Airflow, Kafka, and Snowflake.

● Strong experience deploying production-grade ML systems using SageMaker, Vertex AI, MLflow, FastAPI, Docker, Kubernetes, and CI/CD (GitHub Actions, Jenkins).

● Proficient in SQL optimization, data modeling (Star/Snowflake schema), big data processing, and cloud platforms (AWS, GCP, Azure).

● Adept at creating interactive dashboards and KPI reports using Power BI, Tableau, Looker, Plotly, supporting strategic and operational decision-making.

● Known for strong problem-solving, stakeholder communication, Agile collaboration, and delivering measurable business impact through AI automation and scalable data systems.

SKILLS

Programming Languages: Python, SQL, R, Scala, Java, Bash Data Analysis & BI: Data Cleaning & Wrangling, Exploratory Data Analysis (EDA), A/B Testing, Statistical Modeling, Power BI, Tableau, Looker, Excel (Advanced: PivotTables, Power Query)

Machine Learning & AI: Supervised & Unsupervised Learning, Deep Learning (CNN, RNN, Transformers), NLP (spaCy, NLTK, Hugging Face, BERT), Feature Engineering, Time-Series Forecasting, Model Evaluation (AUC, Precision/Recall, RMSE) MLOps & Deployment: MLflow, Weights & Biases (W&B), Amazon SageMaker, Vertex AI, FastAPI, Flask, Dockerized Model Deployment, CI/CD

(GitHub Actions, Jenkins), Model Monitoring & Versioning Data Engineering & Pipelines: ETL/ELT Pipelines, Apache Airflow, Apache Kafka, Apache Spark / PySpark, Data Warehousing, Data Modeling

(Star/Snowflake Schema), Data Lakehouse Architecture Cloud Platforms: AWS (S3, Glue, Redshift, Lambda, SageMaker), GCP (BigQuery, Dataflow, Vertex AI), Azure (Synapse, Data Factory, Blob Storage)

Big Data & Storage: Hadoop Ecosystem, Databricks, Snowflake, Hive, ClickHouse, NoSQL (MongoDB, DynamoDB), SQL Optimization & Query Tuning

Databases: MySQL, PostgreSQL, Snowflake SQL, BigQuery SQL, MongoDB, DynamoDB DevOps & Automation: Docker, Kubernetes, Terraform (Basic), Git, GitHub, GitLab CI/CD, Linux Shell Scripting Data Governance & Quality: Data Lineage, Data Validation, Metadata Management, Data Security & Compliance (HIPAA/GDPR) Visualization & Reporting: KPI Dashboards, Automated Reporting, Interactive Visualizations, Plotly, Matplotlib, Seaborn Soft Skills: Stakeholder Communication, Problem-Solving, Requirement Gathering, Agile/Scrum, Cross-Functional Collaboration EXPERIENCE

American Express USA Data Scientist Jan 2024 – Current

● Built and deployed supervised ML models (XGBoost, Random Forest, LightGBM) using Python, SQL, PySpark, improving fraud-risk scoring accuracy and reducing false positives by 15–20%.

● Designed ETL/ELT pipelines with Airflow, Spark, Glue to automate ingestion for multi-terabyte datasets across Snowflake and Redshift, reducing pipeline latency by 30%.

● Developed NLP models using spaCy, Hugging Face Transformers, BERT for intent classification and entity extraction, increasing text-processing automation by 70%.

● Implemented end-to-end MLOps workflows using SageMaker, Vertex AI, MLflow, Jenkins, enabling automated training, versioning, CI/CD, and real-time model monitoring.

● Built streaming pipelines using Kafka + Spark Structured Streaming for real-time customer risk scoring and anomaly detection.

● Created executive-level Power BI/Tableau dashboards to track KPIs, model performance, and customer behavior insights, improving decision-making across teams.

● Optimized SQL queries and Star/Snowflake schema modeling, improving analytics query performance on cloud warehouses by 40%.

● Developed a centralized Data Lakehouse architecture (S3 + Glue Catalog + Redshift Spectrum) improving cross-team data accessibility and governance by 35%.

● Ensured HIPAA/GDPR compliance, data lineage, metadata validation, and data quality monitoring across analytics and ML pipelines. Accenture Private Limited India Data Engineer July 2020 – Aug 2022

● Designed end-to-end NLP pipelines using spaCy, NLTK, BERT for automated ticket classification & routing, reducing manual triage time by 65%.

● Built scalable PySpark + SQL ETL pipelines for near real-time reporting across distributed datasets, improving refresh cycles by 45%.

● Developed forecasting and anomaly detection models using ARIMA, LSTM, and clustering techniques for IT operations analytics.

● Automated Power BI/Tableau reporting workflows, improving analytics delivery efficiency and reducing manual reporting efforts by 60%.

● Deployed ML models using FastAPI / Flask, Docker, and GitHub Actions-based CI/CD, ensuring low-latency inference in production.

● Implemented data quality rules, metadata checks, and data governance workflows to enhance reliability and lineage tracking.

● Collaborated with cross-functional teams to gather requirements and deliver ML solutions aligned with business KPIs. Agimus Technologies India Data Analyst Intern Jan 2020 – June 2020

● Supported development of classification and regression models using Python, pandas, and scikit-learn, improving baseline accuracy by 15–18%.

● Performed EDA, feature engineering, and data cleaning to prepare datasets for senior data scientists and analytics teams.

● Automated pipelines and reporting tasks using SQL, Python scripts, and Excel, reducing manual workflow time by 40%.

● Developed Power BI/Tableau dashboards for internal KPI tracking and performance monitoring.

● Gained hands-on production experience by deploying micro-models using FastAPI + Docker, and collaborating with engineering teams. EDUCATION

Masters in Computer Science,

California State University, San Marcos, USA.

Bachelors in Computer Science,

GSSS Institute of Engineering & Technology for Women, Mysore, India. CERTIFICATIONS

● AWS Certified Machine Learning Specialty 2025

● Generative AI Leader Professional Certificate

ACADEMIC PROJECTS

Predictive Modeling for Liver Disease Diagnosis Tools & Tech: Python, Scikit-learn, Pandas, Matplotlib, Random Forest, Logistic Regression

● Built a machine learning model to predict liver disease using structured patient health records.

● Performed data cleaning, feature engineering, and exploratory data analysis (EDA) to identify key predictors such as ALT/AST enzyme levels, BMI, and alcohol consumption.

● Trained multiple models including Logistic Regression and Random Forest, optimizing with GridSearchCV and cross-validation.

● Achieved 85% accuracy, demonstrating strong predictive power in early-stage liver disease detection.

● Visualized model insights using Matplotlib and Seaborn, supporting potential clinical decision-making workflows.

● Highlighted the impact of predictive analytics in healthcare by aiding early diagnosis and risk assessment. Covid-19 Vaccine Provenance Tracking using IoT Tools & Tech: Python, MQTT, Raspberry Pi, RFID, IoT Sensors, Dashboard (Custom UI)

● Designed and implemented an IoT-based tracking system for Covid-19 vaccines to ensure cold chain compliance across the supply chain.

● Integrated temperature, GPS, and RFID sensors using Raspberry Pi and MQTT protocol to transmit real-time handling data.

● Developed threshold-based alerting mechanisms and a custom dashboard to monitor shipment history and environmental conditions.

● Ensured data integrity and traceability across the supply chain, enhancing transparency and trust in vaccine logistics.

● Demonstrated practical application of IoT in healthcare logistics, addressing critical issues in vaccine handling and verification.

Contact this candidate