Shahab Kazemi
818-***-**** Woodland Hills, CA *1364 ************@*****.*** shahab-kazemi samples Skills
Data Science scikit-learn • SciPy • NumPy • Matplotlib • pandas Deep Learning PyTorch • TensorFlow • Keras
Programming Python • R • PySpark • C • C++ • C# • Java Data Engineering / BI Athena • Redshift • SSAS • SSIS • SSRS • Tableau • Power BI • Spotfire • SQL • DAX Cloud Platforms AWS (SageMaker • Bedrock • Lambda) • Azure ML • Databricks RDBMS MySQL • SQL Server (T-SQL • Management Studio) • Oracle (PL/SQL) Data Analysis and Statistics Hypothesis Testing • Regression Analysis • Experimental Design • Causal Inference Statistical Tools SAS (Base • Macros • IML • JMP • FedSQL • SAS Enterprise Miner) • R • Stata Summary
I bring over 12 years of experience in developing scalable Machine Learning (ML) and Business Intelligence (BI) solutions, with a focus on using analytics to solve complex product and business problems. My expertise spans statistical modeling, predictive analytics, and causal inference techniques to drive data-driven decision-making. With a strong foundation in Python, SQL, and cloud platforms (AWS, Azure), I have led initiatives in optimizing resource allocation, designing large-scale ETL pipelines, creating actionable insights using advanced data visualizations, and developing, training, and deploying deep learning models.
Experience
2023 – 2024 Data Scientist • Novo Nordisk
- Develop deep learning models (CNN, Transformer-based architectures) for clinical trial result classification, leveraging PyTorch and TensorFlow with distributed training for large-scale medical datasets
- Lead the design of ETL pipelines for real-time data ingestion and capacity planning using SQL, Python, and AWS services, delivering data-driven insights into resource utilization.
- Manage data visualization dashboards (Tableau, Power BI) to monitor project KPIs and optimize model performance.
- Develop and optimize predictive models using SQL and AWS Redshift to forecast service consumption and resource utilization for large-scale business applications.
- Utilize version control and automating deployment using GitLab CI/CD and Azure DevOps to manage developed codes for reproducibility and collaboration.
2022 – 2022 Data Scientist (Visiting Researcher) • Stanford University • United States
- Design and create a predictive model (sequence-to-sequence) based on deep learning (LSTM) using PyTorch.
- Experiment design and A/B testing using causal inference techniques including controlled experiments. 2017 – 2020 Data Scientist • Aarhus University • Denmark
- Design and implement predictive models using deep learning frameworks such as TensorFlow and PyTorch to resolve real business problems.
- Develop scripts for statistical analysis (Python, R, SAS).
- Discover patterns in different areas of sentiments and topics by unstructured text mining (Gensim, NLTK). 2012 – 2015 Data Scientist / BI Manager • MCI • Iran
- Develop and optimize data models for capacity planning and resource utilization.
- Develop machine learning models to forecast service consumption patterns, applying techniques (NumPy, SciPy, scikit- learn) for trend analysis and resource allocation optimization.
- Design and optimize data models (SSIS, SSAS, SSRS) for robust reporting and analytics, improving decision-making for key business stakeholders.
- Lead the development of ETL processes and predictive dashboards to optimize business operations, enabling data- driven ad ranking and allocation strategies.
- Collaborate with stakeholders to define KPIs and business objectives, translating them into scalable analytical models to drive performance and pricing decisions.
Education
2020 – 2023 Ph.D. • NLP • Aarhus University • Denmark
- Research and develop novel transformer-based architectures (e.g., BERT) enhanced with graph-based learning.
- Advanced topic modeling techniques for short-text analysis using graph structures. Thesis: Enhancing Transformers with Graph-Based Learning for Improved Topic Modeling. 2015 – 2017 MSc. • Data Science • Aarhus University • Denmark
- Specialized in machine learning, data mining, and Bayesian networks using Python, SAS, and TensorFlow. Thesis: Improving Bagging and Boosting with Time Weighting 2012 – 2012 MBA • Data Analysis • Sharif University of Technology • Iran
- Marketing analysis, motivation, management theory, strategy, microeconomic, macroeconomic 2002 - 2006 BSc. • Software Engineering • University of Tehran • Iran Publications (Top-Tier journals)
- The exploration of users’ perceived value from personalization • International Journal of Information Management • 2025 • NLP
- Weeding out or picking winners in open innovation? • Research Policy • 2023 • Causal Inference with ML
- Consumers associative networks of plant-based food product communications • Food Quality and Preference • 2019 • fuzzy text mining
Certificates
- ML Engineering Databricks Fundamentals (2025)
- AI & Machine Learning Stanford University (NLP with Deep Learning, AI Professional Certificate, Graph ML, RL) (2022)
- Large Scale Computing University of California, San Diego (Hadoop Platform and Application Framework) (2020) Technologies & Tools
MLOps & ML Engineering Amazon SageMaker • Azure ML • MLflow • TFX • Domino • FastAPI Big Data PySpark • Hadoop • MLlib
Generative AI LangChain • LlamaIndex • Ollama • Hugging Face • Prompt Engineering • Multimodal LLM (text & image integration)
NLP & Graph Analytics NLTK • Gensim • LIWC • Data Embeddings (word • sentence • graph • multimodal)
• Scrapy • BeautifulSoup • Selenium • NetworkX • DeepSNAP Computer Vision OpenCV • Pillow
DevOps & Development Tools GitLab CI/CD • Azure DevOps • Git • Jupyter • RStudio • Git Bash