Detail-oriented Data Science & Engineering fresher with practical experience in Big Data processing, predictive modeling, and cloud- based analytics. Proficient in EDA, ETL, SQL, and machine learning workflows, with a track record of building dashboards and analytical reports that enhance business understanding. Worked on 10+ data science projects, including cleaning and transforming real-world datasets and handling datasets of 100k+ rows, optimizing pipelines, and delivering actionable insights. Skill in Problem solving through data interpretation, with a strong foundation in statistics and a continuous learning mindset.
· Programming Languages &Databases: Python, SQL, Mysql, NoSQL (HBase), PySpark
· Big Data & Cloud Tools: Apache Spark, Hadoop, HDFS, MapReduce, HBase, Sqoop, AWS (S3, RDS, EC2, IAM)
· Data Analysis & Visualization: Power BI, Excel, Matplotlib, Pandas, NumPy
· ETL & Data Pipelines: Data Cleaning, Data Ingestion, Data Transformation Predictive Analytics, Exploratory Data Analysis (EDA), Statistical Modeling, Data Cleaning, Data Warehousing, Data pipelines, Data Wrangling, Data Visualization, Feature Engineering, Machine Learning Algorithms[Regression, Classification], A/B Testing, Business Problem Solving, Requirements Gathering, Data Storytelling, Dashboarding, ETL Workflow Understanding, Data Quality Assessment, Hypothesis Testing, Model Evaluation, Analytical Thinking, Problem-Solving, Communication Skills. Project Name: Credit Card Fraud Detection – NoSQL
Project: ETL Traffic Collision Analysis
Project Name: RSVP Movies Case Study – SQL Analytics Data Engineering Specialization[ CGPA: 3.7 / 4.0]
Grade “A” (62.18%), CGPA: 7.91
Power BI for Beginners From Upskill
Anish Joshi
Data Scientist
+917********* **************@*****.*** Pune Linkedin Github SUMMARY
TECHNICAL SKILLS
KEY SKILLS
PROJECTS
Developed a real-time fraud detection pipeline using Spark, Kafka, Hive, and HBase on AWS EMR, reducing lookup latency by ~40% through optimized NoSQL key-based retrievals.
•
Automated end-to-end batch and streaming ingestion from MySQL (RDS) via Sqoop, increasing data throughput by 30% and enabling continuous fraud rule validation at scale.
•
Engineered a Python–Pandas ETL pipeline that cleaned and standardized messy collision data, improving data quality by 95% and enabling reliable trend analysis.
•
Conducted EDA to uncover high-risk zones and peak collision patterns, driving data-backed safety recommendations that improved insight accuracy by 30%.
•
Analyzed IMDB data using advanced SQL (CTEs, window functions) to identify genre-rating trends, improving content decision insights by 35%.
Cleaned and validated multi-table datasets to raise reporting accuracy by 90%, enabling reliable dashboards and production-level business analytics.
•
EDUCATION
Executive PG Programme in Data Science& AI IIIT-Bangalore Sep '25 B.Com - Modern College of Arts, Science & Commerce, Pune May '24 CERTIFICATIONS/TRAINING