Data Scientist

Location:

Jersey City, NJ, 07307

Posted:

April 19, 2026

Contact this candidate

Resume:

NIPA SHAH

Data Scientist AI / ML Engineer

Jersey City, NJ • +1-551-***-**** • ************@*****.*** • LinkedIn • GitHub: nipa-analytics P ROFESSIONAL S UMMARY

Results-driven Data Scientist and AI/ML Engineer with 7+ years of experience designing end-to-end machine learning pipelines, deploying production-grade models, and translating complex data into strategic business insights. Proven expertise in NLP, LLMs (BERT, GPT, HuggingFace), healthcare analytics (MIMIC-IV EHR data), and cloud-based MLOps (AWS SageMaker, Docker, MLflow). Skilled in building scalable ETL workflows, real-time dashboards, and AI-powered solutions using Python, SQL, and leading ML frameworks. Adept at collaborating cross-functionally with product, engineering, and leadership teams. Seeking a Data Scientist role in the U.S. where I can drive measurable impact through advanced analytics and AI innovation. T E CHNICAL SKILLS

Languages Python, R, SQL, Jupyter Notebook, Git / GitHub ML / AI scikit-learn, XGBoost, LightGBM, TensorFlow, Keras, BERT, GPT, Transformers

(HuggingFace)

LLM / GenAI LangChain, Prompt Engineering, Embedding Search, Fine-Tuning LLMs (OpenAI), VectorDB, Pinecone, FAISS

MLOps Flask, FastAPI, AWS SageMaker, Docker, MLflow, CI/CD Pipelines, Model Monitoring & Drift Detection

Cloud & Big Data AWS (S3, EC2, SageMaker), Azure (ADF), Hadoop, Spark, Hive, Pig Databases PostgreSQL, MS SQL, Oracle, Snowflake, BigQuery, Redshift, Aurora ETL / Integration Airflow, Dataiku, Alteryx, Informatica, Pentaho, Knime Visualization Tableau, Power BI, Looker, QlikView, Domo, Excel (Advanced), ggplot Healthcare EHR Data Modeling, MIMIC-IV, Patient-Level Feature Engineering, Risk Modeling, Cohort Analysis, Population Health Analytics

Statistics A/B Testing, Hypothesis Testing, Regression, Clustering, NLP, Bayesian Analysis C E RTIFICATIONS

Databricks – Generative AI Fundamentals • AWS – Cloud Practitioner (CLF-C02) • Google – Looker Studio for Dashboards • Python Essential Training • Data Science Foundations • Generative AI Fundamentals • MySQL Essential Training • Supply Chain Basics

P ROFESSIONAL E X P E R I E NCE

Data Scientist AdvanceInnovative LLC – New Jersey June 2024 – Present

• Collected, cleaned, and preprocessed structured and unstructured datasets using Python (Pandas, NumPy) and SQL across healthcare, retail, and business domains, enabling efficient downstream ML modeling.

• Built and deployed classification, regression, and clustering models using scikit-learn, XGBoost, and LightGBM; improved prediction accuracy by up to 25%.

• Developed end-to-end NLP pipelines using BERT and GPT models (HuggingFace Transformers) for sentiment analysis, document summarization, and named entity extraction.

• Deployed ML models via Flask APIs and AWS SageMaker; integrated Docker containers and MLflow for reproducible, scalable production pipelines.

• Designed and maintained Airflow-orchestrated ETL workflows for real-time model inputs and consistent data delivery across platforms.

• Tracked post-deployment model performance using drift detection frameworks; retrained models proactively to maintain long-term business accuracy.

• Performed EDA and statistical testing (A/B testing, hypothesis testing) to uncover patterns; visualized insights in Power BI and Seaborn for executive stakeholders.

• Collaborated cross-functionally with product managers, analysts, and engineers to translate business objectives into data-driven solutions.

Healthcare Analytics Projects (MIMIC-IV EHR)

Advanced clinical ML projects using real-world ICU/EHR data (MIMIC-IV Clinical Database) — demonstrating production-grade healthcare data science capabilities sought by health-tech, pharma, and hospital systems. Patient Segmentation & Retention Analytics GitHub 2024–2025

• Engineered patient-level features from MIMIC-IV admissions, diagnoses, and ICU tables using SQL-style joins to capture clinical complexity, care intensity, and longitudinal engagement patterns.

• Applied Python-based preprocessing (feature scaling, encoding) and K-Means clustering to identify distinct patient cohorts supporting care management and population health programs.

• Generated actionable insights for patient retention strategy, cohort-level risk analysis, and hospital resource optimization — mirroring real-world payer and provider analytics workflows.

• Addressed real clinical data challenges: sparse records, high-dimensional categorical features, and irregular longitudinal histories.

Stack: Python · Pandas · Scikit-learn · SQL · Jupyter Notebook · K-Means Clustering 30-Day Hospital Readmission Risk Prediction & Care Prioritization GitHub 2024–2025

• Designed a readmission risk prediction model using patient-level features derived from ICU stays, diagnoses, demographics, and admission history via SQL-style feature engineering in Python.

• Built a complementary patient segmentation framework using MiniBatchKMeans to identify high-risk cohorts and prioritize proactive care interventions.

• Applied full ML pipeline: missing value handling, categorical encoding, feature scaling, model training, and performance evaluation (AUC-ROC, precision-recall).

• Delivered insights supporting care coordination, hospital quality improvement (HCAHPS/CMS metrics), and readmission penalty reduction strategies aligned with U.S. healthcare regulations. Stack: Python · Pandas · NumPy · Scikit-learn · SQL · MiniBatchKMeans · Jupyter Notebook ICU Deterioration Risk Prediction GitHub 2025

• Built a machine learning model to predict early deterioration of ICU patients using MIMIC-IV vitals, lab results, and clinical observations.

• Engineered time-series features capturing physiological trends, early warning score proxies, and clinical event sequences.

• Evaluated model performance using clinical validation metrics (AUROC, sensitivity/specificity) relevant to clinical decision support systems.

Stack: Python · Pandas · Scikit-learn · MIMIC-IV · Jupyter Notebook Data Analyst Code-Criteria Labs May 2018 – December 2023

• Owned end-to-end revenue data curation via ETL pipelines; automated daily distribution of revenue reports organization-wide, saving 500+ hours annually.

• Conducted Point of Sale (POS) analysis on major retailers (Amazon, Walmart, SharkNinja) to identify category trends in actualized and forecasted sales data.

• Built and maintained forecasting models; performed price audits and negative inventory audits to reduce dynamic pricing and inventory errors.

• Migrated data from legacy systems to Snowflake data warehouse for international regions, improving query performance and data governance.

• Developed KPI dashboards in Domo and QlikView for leadership and cross-functional teams using automated datasets; reported monthly revenue numbers across international regions.

• Performed gap analysis, root cause analysis, and data mining for cross-functional stakeholders; gathered business requirements and authored technical documentation.

• Conducted UAT approvals with IT for data governance systems, mentored new analysts on processes and systems.

Jr. Data Analyst – Intern TATA Motors Ltd. – Ahmedabad, India June 2017 – August 2017

• Leveraged PMG catalog suite to automate business processes across R&D teams.

• Developed SQL reports for stakeholders and calculated KPIs for performance analysis. E DUCATION

Master of Science, Business Analytics GPA: 3.75 / 4.00 Sacred Heart University – Fairfield, CT May 2025

VOLUNTEER & T E A CHING

Graduate Teaching Assistant (Volunteer) Sacred Heart University – Applied Statistics Jan 2025 – May 2025

• Mentored 20+ graduate students with statistical assignments, data interpretation, and structured problem-solving techniques.

• Conducted academic guidance sessions on applied statistics, analytical methods, and coursework requirements; managed midterm/final project workflows.

• Served as liaison between faculty and students to ensure smooth communication and academic delivery.

Contact this candidate