Richik Ghosh
Glen Allen, VA +1-571-***-**** **************@*****.*** LinkedIn GitHub Portfolio Publications
(Open to Relocate)
SUMMARY
Results-oriented Business Intelligence Engineer professional with 4+ years of industry experience delivering high-impact solutions across marketing, sales, finance, and audit domains. Expert in Python, SQL, AWS, Databricks, and Tableau, with a proven ability to architect scalable data systems, deploy machine learning models, and drive strategic decisions through actionable insights. Actively seeking full-time opportunities to lead data initiatives that fuel growth and innovation.
AWARDS, RECOGNITION, CERTIFICATIONS
● Selected for Ericsson’s Cloud Computing Talent Development Program, ranking in the top 4% out of 3,000 applicants
Associated with KPMG
● Awarded 2 Applause Awards and 3 Spot Awards, featured in the monthly “Spotlight” newsletter, and garnered multiple shout outs for outstanding project delivery WORK EXPERIENCE
AARP Washington, DC
Business Intelligence Engineer May 2024 - Present
Designed and implemented data-driven ML systems at AARP to track user interactions, analyze behavior, and apply AI for intelligent article tagging and personalized content delivery, enhancing campaign performance, engagement, and organic traffic.
● Instrumented event tracking across AARP’s digital ecosystem, covering email, SEO, and content categories such as Health, Money, Medicare, and Travel. Used Google Tag Manager to capture behavioral events and routed them through API Gateway and AWS Lambda into Amazon S3 for downstream processing
● Built a cohort-level predictive modeling framework in Python (Random Forest, Gradient Boosted Trees), incorporating cross-validation and stratified sampling, achieving measurable lifts in CTR (+3%) and conversions (+1%)
● Engineered audience cohorts in SQL for downstream ML modeling, incorporating behavioral, demographic, and socioeconomic dimensions, validated with statistical tests
● Deployed end-to-end Machine Learning models workflows, surfacing demographic, behavioral, and content-level insights, powering audience personalization and marketing ROI optimization across subscribed and non-subscribed users
● Defined domain-specific conversion metrics (e.g., sign-ups, resource downloads, tool usage, donations) in collaboration with marketing teams, sales teams, and stakeholders, and used Google Analytics to track paid ad performance through CTR, CPC, and bounce rate
● Consolidated, modeled, and engineered over 108 million user interactions in Databricks using PySpark and SQL by integrating clickstream logs, CRM data, and campaign metadata. Optimized, scheduled workflows, and parallelized processing to improve ETL speed and pipeline efficiency
● Applied EDA and segmented users with K-means on behavioral, demographic, and socioeconomic data, validated clusters with Silhouette score, ANOVA, and chi-squared tests, and analyzed cohorts to surface engagement patterns and behavioral traits
● Trained Random Forest models per cohort to predict conversion likelihood and click-through rate, to identify barriers and drivers, and executed targeted campaigns; A/B tests confirmed a 3% lift in CTR and a 1% lift in conversion
● Used Hyperopt for Bayesian hyperparameter tuning and MLflow for experiment tracking (params, metrics, artifacts) and the Model Registry; scheduled retraining and batch scoring with Databricks Jobs; built Tableau dashboards to monitor cohort performance, drift, and campaign outcomes
● Led root cause analysis on underperforming cohorts using SHAP, isolating audience drift, feature leakage, and data quality issues, then implemented fixes and monitoring
● Built an AI agent framework using LLaMA and LangChain to automate semantic article tagging across 9+ AARP domains. Applied prompt engineering to inject existing taxonomy into the agent workflow, ensuring consistent and domain-aligned labeling
● Implemented a self-looping category validation mechanism to compare newly generated tags against historical categories, preventing redundancy and noise. Incorporated human-in-the-loop (HITL) validation to assess tag accuracy and maintain quality. This improved SEO metadata, internal search accuracy, and organic traffic while reducing manual effort through intelligent re-tagging GW SCHOOL OF BUSINESS Washington, DC
Research Assistant Jan 2024 - May 2024
Automated extraction and rechecking of 10 K risk disclosures, reducing assessment times and enabling more informed, data driven investment decisions.
● Automated an end-to-end risk-analysis pipeline using AWS Fargate and ECS by containerizing and orchestrating the ingestion and parsing of 1 TB of 10-K SEC filings stored in S3 using GitHub Actions.
● Built an ETL workflow leveraging regex and spaCy to extract Risk Factor 1A passages from raw filings, transforming unstructured text into structured tables, and persisting outputs back to S3 for scalable access.
● Integrated a transformer-based text similarity module within the ECS pipeline to detect anomalous or shifted risk disclosures, enabling proactive identification of emerging risk signals.
● Reduced manual review effort and accelerated investment-risk assessments, allowing downstream teams to monitor risk posture changes across companies and quarters more efficiently. KPMG Gurugram, India
Consultant: Data Analytics & Science Jan 2021 - Jul 2023 Client: AB InBev, USA
Delivered a data-driven forecasting and reporting platform that improved planning and visibility across inventory, logistics, and marketing
● Developed an LSTM-based sales forecasting solution to predict SKU-level demand, driving smarter inventory, logistics, and marketing decisions, resulting in reduced stockouts and increased sales.
● Ingested high volume raw sales data into S3, performed data quality checks, engineered feature workflows
(moving averages, DTW) to capture seasonality and temporal patterns, and loaded back the data using GitHub and GitHub Actions using CRON jobs to automate the ETL process.
● Operationalized the model as a Flask API, containerized with Docker, and deployed on AWS ECS for low-latency serving.
● Enabled Performance monitoring to retrain the model if any drifts were noticed and presented the findings in neat Power BI dashboards for deeper insights. Client: Nestle
Led SOX 404 reviews across P2P (Procure to Pay), O2C (Order to Cash), Finance, and HR functions, strengthening controls and streamlining audits through automation and reporting
● Led SOX 404 compliance reviews, implementing transaction-level controls such as order value thresholds, duplicate detection, credit holds, vendor onboarding checks, and payroll change approvals.
● Validated IT general controls, including user access reviews, change management, and data reconciliation, delivering audit-ready evidence to support internal and external compliance audits.
● Engineered automated ETL pipelines using Theobald, GeoKettle, and Python to extract SAP ERP data into PostgreSQL. Added data quality checks and lineage to enable root cause analysis of anomalies and control exceptions, improving accuracy and reducing processing time.
● Developed four Tableau dashboards to monitor control effectiveness and exception trends in real time. Built RCA drill-downs that trace issues to source systems, transformations, and owners, strengthening oversight and accelerating resolution across business units. Other relevant projects
● Developed and deployed a hybrid churn prediction model using the BG-NBD and regression approach, accurately identifying the top 10% of clients with the highest probability of churn. This strategic model enabled targeted client retention strategies, potentially impacting and safeguarding $70k in revenue using Python.
● Contributed to backend development and KPI implementation for the kPrism analytics app, automating 315 risk and governance KPIs and reducing manual effort for analysts.
● Mentored two interns through the full project lifecycle, accelerating their skill development, ensuring on time, high quality deliverables, and helping them secure subsequent full time offers. HIGH RADIUS Bhubaneswar, India
Machine Learning Intern Apr 2020 - Jun 2020
Built a system that analyzes financial data to find what drives customer profits and makes those insights instantly available in a web app
● Contributed to the backend and machine learning components of a cloud-based AI-enabled Fintech B2B application, with a focus on uncovering key drivers of profit per customer.
● Built scalable data pipelines and engineered features from financial datasets to support model training, and developed a Flask API to deploy the trained model for real-time inference within the application. PROJECTS
Quora Question Similarity Classifier
● Deployed a PyTorch inference pipeline backed by AWS S3 and EC2 GPU instances with CUDA mixed precision, containerizing the api built using Flask by Docker containers.
● Fine tuned Siamese and cross encoder BERT models to optimize similarity detection, achieving a 93.3% F1 macro score.
● Integrated real time deduplication results into the Dash UI for seamless analyst review and cleanup. Multimodal Image Retrieval System
● Hosted PyTorch training and inference on AWS EC2, leveraging scalable GPU resources.
● Fine tuned Vision Transformer and BERT via contrastive learning to align 512 dimensional image and text embeddings.
● Achieved 78% Recall@5 on test data, enabling accurate retrieval of relevant images from natural language queries.
Data Fellow, The World Bank (Capstone Project)
Provide data driven insights to enable economists and energy teams to optimize electrification investments in Sub Saharan Africa
● Engineered Python based causal inference and predictive modeling pipelines on multi country financial datasets to quantify the impact of electricity reliability on firm productivity, and surfaced insights through interactive Tableau dashboards.
● Partnered with economists, data engineers, and regional energy teams to translate findings into electrification investment roadmaps and KPI frameworks that guided funding decisions across Sub Saharan Africa.
SKILLS
● Databases & Warehousing: AWS Redshift, PostgreSQL, SQL Server, BigQuery
● Programming Languages: Python (Pandas, NumPy, Scikit Learn, Matplotlib, Plotly, statsmodels, Pytorch, Flask), R, SQL
● Cloud & Big Data Platforms: AWS (S3, EC2, Redshift, SageMaker), Databricks, GCP
● Tools: GitHub Actions, GitHub, Alteryx, Jira, Docker
● Data Visualization & BI: Tableau, Looker, Qlik, Power BI, PowerPoint, Excel
● Analytical Techniques: Data warehousing, data mining, data modeling, statistical analysis, machine learning, A/B testing, Generative AI
EDUCATION
● GEORGE WASHINGTON UNIVERSITY Washington, US
Master of Science in Data Science, GPA: 4.0/4.0 Aug 2023 - May 2025 Served as a TA for a Machine Learning course
● KALINGA INSTITUTE OF INDUSTRIAL TECHNOLOGY Bhubaneshwar, India B.Tech. in Electronics and Telecommunication Engineering, GPA: 8.41/10 May 2017- May 2021