SOMYA KHATRI
New York 347-***-**** ***********@***.*** LinkedIn
EDUCATION
New York University New York, NY
Master of Science in Information Systems May 2025
Vellore Institute of Technology Vellore, India
Bachelors in Information Technology July 2023
PROFESSIONAL EXPERIENCE
Data Analyst Researcher June 2024 – Present
Stern Learning Science Lab, NYU New York, NY
• Developed an AI-powered analytics copilot using Streamlit, GPT-4o, Pandas AI, and LangChain, reducing manual data wrangling by 80% and enabling 5 faster insight generation with interactive visualizations.
• Led ad-hoc analysis of learner behavior and engagement trends using SQL, Pandas, and Alteryx, driving a 30% improvement in research reporting speed and enhancing experimental design strategies.
• Crafted and deployed interactive dashboards (Streamlit, Tableau) and partnered cross-functionally with researchers to translate data insights into a 20% increase in program efficiency and impact. MetaML - Capstone Project June 2024 - August 2024
Archemy Inc. New York, NY
• Engineered a large-scale query parsing and prompt mapping system using LLMs, increasing retrieval accuracy by 92% and reducing client search time by 85%.
• Implemented embeddings across 300+ technical and business nodes, enhancing metadata retrieval via ChromaDB and improving insight delivery speed.
• Secured knowledge base access through ArcNav API integration, ensuring regulatory-compliant, controlled data retrieval for all client interactions.
Analyst - DTAC January 2023– July 2023
KPMG India Mumbai, India
• Created interactive Tableau dashboards for audit and risk assessments, cutting manual effort by 40% and enhancing stakeholder visibility.
• Automated compliance reporting and data extraction using Python, SQL, and Excel, improving reporting accuracy by 25% and turnaround time by 60%.
• Led end-to-end delivery of GITC audits, IT application controls, and report testing, ensuring regulatory compliance across stakeholder systems.
SKILLS AND CERTIFICATIONS
Functional Competencies: Data Analysis, Business Intelligence, Data Warehousing, Data Visualization, Statistical Analysis, Experimentation Design, Workflow Automation.
Data Tools: Tableau, Power BI, Looker, Matplotlib, Alteryx Designer, Excel (PivotTables, VBA macros), Jira, Salesforce. Programming & Cloud: Python (pandas, numpy, matplotlib, seaborn), R (basic), Advanced SQL (Joins, CTEs, Window Functions), PostgreSQL, MySQL, BigQuery, AWS (S3, Redshift), GCP (BigQuery), Hadoop (familiarity). Certifications & Training: Google Data Analytics Professional Certificate [Link], Tableau Desktop Specialist (in progress), AWS Certified Data Analytics – Specialty (In Progress), Alteryx Foundation Micro-Credential [Link]. PROJECTS AND PUBLICATIONS
End-to-End Analytics on AWS AWS S3, AWS Glue, AWS Athena, AWS QuickSight, SQL [Link]
• Architected a dataset-agnostic data pipeline leveraging AWS S3, Glue, Athena, and QuickSight, enabling scalable analytics on 50K+ Spotify records through reusable, production-grade architecture.
• Engineered ETL workflows in AWS Glue to clean, transform, and model raw data into a star schema, improving query efficiency in Athena by 3 and enhancing data usability for BI tasks.
• Developed interactive dashboards in AWS QuickSight to analyze audio features, artist popularity, and genre trends, driving stakeholder insights and strategic decision-making. Customer Churn/ Retention Analysis – Telecom Case Study SQL, Power BI, Python [Link]
• Analyzed a dataset of 7,043 simulated streaming subscribers, uncovering that monthly plan users had a 3 higher churn rate than annual subscribers.
• Built a logistic regression model with 82% accuracy to predict churn risk, and developed interactive Power BI dashboards to visualize churn patterns, highlight at-risk cohorts, and simulate retention strategies.
• Presented strategic recommendations to stakeholders, demonstrating that a 15% churn reduction could drive a projected $300K annual revenue increase based on ARPU analysis.