Post Job Free
Sign in

Data Analyst

Location:
San Francisco, CA
Posted:
March 21, 2025

Contact this candidate

Resume:

Bashaarat Nawaz Mohammad

APT **, **** Broadway Street, Redwoodcity, CA 94063

513-***-**** # **********.******@*****.*** ï bashaarat-nawaz-mohammad § Bash-1 Education

San Jose State University Jan. 2024 – Dec 2025

Master of Science in Data Analytics San Jose, CA

Relevant Coursework

• Database systems

• Data Visualization

• Big Data Tech and App

• Machine Learning

• Distributed Systems

• Deep Learning

• Generative AI

• Data Structures

Experience

Sas2py.com Feb 2023 – Dec 2023

Junior Data Engineer Hyderabad, India

• Worked on a migration project for various clients, enabling the smooth transition of legacy SAS files to PySpark

• Automated SAS-to-PySpark code conversion with custom functions, reducing manual effort and enhancing efficiency.

• Manually converted non-translatable SAS scripts to PySpark, ensuring full migration compliance.

• Optimized embedded SQL queries from SAS to PySpark, enhancing data processing performance.

• Designed ETL pipelines for Snowflake and Databricks, ensuring high data quality and preprocessing efficiency.

• Optimized SQL queries for data extraction, transformation, and analysis, ensuring data integrity. Projects

FIFA Player Position Prediction Python, PySpark, Machine Learning December 2024

Built a scalable PySpark Random Forest model to predict soccer player positions with 82 percent accuracy.

Processed and cleaned large-scale FIFA datasets (2015–2023) containing millions of records with over 19 attributes, ensuring high-quality, analysis-ready data.

Applied feature engineering, oversampling, and class weighting to address imbalances, achieving an F1-score of 0.81.

Deployed the model via a Flask-based web application, enabling real-time predictions through a user-friendly interface, making advanced analytics accessible for non-technical users. National Anthem Analysis through Machine Learning Python, NLP December 2024

Leveraged machine learning and NLP to analyze thematic similarities in national anthems.

Designed and implemented robust data pipelines for preprocessing anthem lyrics, including text cleaning, tokenization, stemming, and TF-IDF vectorization, ensuring high-quality input for machine learning models.

Developed and fine-tuned K-Means and Hierarchical Clustering models to uncover thematic relationships, using Silhouette Coefficient and Dunn Index for performance evaluation.

Developed and fine-tuned K-Means and Hierarchical Clustering models to uncover thematic relationships, using Silhouette Coefficient and Dunn Index for performance evaluation. E-Commerce Website Development and Data Analytics HTML, CSS, Javascript, React May 2024

Developed a scalable e-commerce platform using React (Frontend) and Spring Boot (Backend) with real-time inventory tracking and dynamic theme switching.

Designed an Inventory Dashboard to analyze monthly orders, revenue trends, top-selling categories, and user engagement, utilizing SQL-based data aggregation.

Deployed APIs and cloud infrastructure using GitHub Actions and Heroku, ensuring fast API responses, real-time updates, and seamless scalability.

Conducted data analysis and validated hypotheses, optimizing inventory management and enhancing user engagement. Cloud Analytics and Data Warehouse Implementation – New York Times Python, GCP, Sql May 2024

Collaborated with a team of five to design and implement a data warehouse integrating the New York Times API and historical datasets from Kaggle.

Processed and cleaned historical data using Python, ensuring quality before loading into BigQuery.

Built an automated ETL pipeline with Google Cloud Composer (Airflow) for daily API data updates in BigQuery.

Created interactive dashboards in Looker Studio, delivering actionable insights from aggregated data. Technical Skills

Languages: Python, Java, C, C++, HTML/CSS, JavaScript, SQL Python Libraries: Pandas, NumPy, Matplotlib, Seaborn, sklearn, Scikit-learn, TensorFlow, PyTorch, PySpark Databases: Microsoft Excel, MySQL, PostgreSQL, MongoDB, Elastic search, Big query, AWS Redshift, Neo4j Data Warehouse: Google Big query, AWS Redshift

Visualization Tools: Tableau, Powerbi, Rshiny, Amazon QuickSight, Looker studio Operaating systems: Linux, Windows



Contact this candidate