Data Scientist Data Analyst Business Intelligence Python, SQL

Location:

San Jose, CA

Posted:

September 08, 2024

Contact this candidate

Resume:

Ankur Ojha

Ñ Sunnyvale, CA, ***** +1-213-***-**** # ***********@*****.*** ï linkedin § portfolio O tableau Education

University of Southern California Aug 2022 – May 2024 Masters of Science, Analytics (Data Science), GPA: 3.85/4.0 Los Angeles, CA Courses: Data Mining, Machine Learning, Text Mining( NLP ), Fraud Analytics, Data Management (BI, SQL) Galgotias College of Engineering & Technology Jul 2015 – Jun 2019 Bachelors of Technology, Mechanical Engineering, GPA: 3.44/4.0 Greater Noida, India Courses: Engineering Mathematics, Statistics, Project Management, Computer Programming Tools & Techonologies

• Programming: Python (Pandas, Scipy, Scikit-learn, LSTM, NLTK, TensorFlow), SQL, R, PySpark

• Databases: Hive, MySQL, PostgreSQL, Snowflake

• Cloud Platforms: AWS (S3, IAM, VPC, RDS, EC2, SageMaker), Google Cloud, Databricks

• Tools/Frameworks: Tableau, A/B Testing, Rest API, Spark, Kafka, Hadoop, Airflow, Docker, Agile, MS Excel, Git Experience

Data Analytics Intern Jun 2023 – Aug 2023

Marqeta Inc Oakland, CA

• Developed a Snowflake data repository to efficiently manage diverse datasets, including 50K Okta SSO entries, 3.4M minutes of app usage, user data and IT system data, reducing report generation SLA from 7 days to real-time.

• Built data pipeline using Airflow, Spark to streamline data from diverse IT systems, reduced processing time by 80%.

• Built Looker dashboards by leveraging data repository, enabling real-time analytics that streamlined decision-making, reduced maintenance response times by 35%, and boosted policy compliance by 25%. Data Science Research Assistant Jan 2024 – May 2024 USC Viterbi School of Engineering Los Angeles, CA

• Designed big data pipelines with Spark & SageMaker, enhancing data processing by 30% for rapid model iterations.

• Utilized AWS Glue for serverless ETL and Athena for efficient SQL querying on S3, streamlining data analysis and reducing overhead. Tuned Spark SQL for faster data transformations, enhancing overall efficiency by 20%. Data Analytics Consulting Practicum Jan 2023 – May 2023 Kiana Analytics Los Angeles, CA

• Analyzed indoor device location data to offer spatial analytics for predictive maintenance, intrusion detection.

• Engineered scalable Kafka streaming pipeline for realtime ingestion of high-volume data (200 million data points).

• Leveraged Spark-ML to analyze ingested data to identify movement patterns and tag devices as fixed or moving. Business Analyst Apr 2020 – Mar 2022

R.K. BrickField Gorakhpur, India

• Developed predictive models to optimize procurement & delivery, increasing efficiency for 10 million bricks annually.

• Built MySQL data repository for manufacturing processes and automated SQL-based reporting to visualize various business KPIs, improving decision time by 55% using key insights via Tableau dashboards. Key Projects on Github

Customer Segmentation Using Yelp Data NLTK, SpCy, K-means, Random Forest Nov 2023

• Built segmentation data cube using Yelp Review Dataset, incorporating customer’s behavioral & predictive attributes.

• Engineered a streaming data pipeline with Kafka, employing Spark SQL for efficient data manipulation and analysis.

• Implemented Aspect-Based Sentiment Analysis with TF-IDF and LDA to extract key topics from review. Credit Card Application Fraud Detection Neural Network, Random Forest, XG-Boost Jan 2023

• Developed classification models to identify and predict fraudulent applications in 1 Million credit card applications.

• Created 2242 candidate variables to capture fraud, selected top 25 variables using KS filter & wrapper methods.

• Tuned hyperparameters and compared ML models; LightGBM achieved a 55.43% fraud detection rate at 3%. Stock Price Prediction Comparison of ARIMA and LSTM Time Series Data, Tensorflow Oct 2023

• Utilized ARIMA and LSTM models, leveraging Python to forecast stock trends for NVIDIA and AMD.

• Employed stationarity tests and time series decomposition for model accuracy refined predictive algorithms. Weather Data Analysis with Real-time Updates in AWS AWS, Snowflake, Airflow, Kafka, Spark Oct 2022

• Build a data pipeline using Airflow and Kafka in AWS to ingest real-time weather updates and save in Snowflake.

• Used Spark-Streaming for data processing and later create a Tableau application highlighting different insights.

Contact this candidate