Detail-Oriented Data Science Fresher with Python & SQL

Location:

Posted:

December 06, 2025

Resume:

Detail-oriented Data Science & Engineering fresher with practical experience in Big Data processing, predictive modeling, and cloud- based analytics. Proficient in EDA, ETL, SQL, and machine learning workflows, with a track record of building dashboards and analytical reports that enhance business understanding. Worked on 10+ data science projects, including cleaning and transforming real-world datasets and handling datasets of 100k+ rows, optimizing pipelines, and delivering actionable insights. Skill in Problem solving through data interpretation, with a strong foundation in statistics and a continuous learning mindset.

· Programming Languages &Databases: Python, SQL, Mysql, NoSQL (HBase), PySpark

· Big Data & Cloud Tools: Apache Spark, Hadoop, HDFS, MapReduce, HBase, Sqoop, AWS (S3, RDS, EC2, IAM)

· Data Analysis & Visualization: Power BI, Excel, Matplotlib, Pandas, NumPy

· ETL & Data Pipelines: Data Cleaning, Data Ingestion, Data Transformation Predictive Analytics, Exploratory Data Analysis (EDA), Statistical Modeling, Data Cleaning, Data Warehousing, Data pipelines, Data Wrangling, Data Visualization, Feature Engineering, Machine Learning Algorithms[Regression, Classification], A/B Testing, Business Problem Solving, Requirements Gathering, Data Storytelling, Dashboarding, ETL Workflow Understanding, Data Quality Assessment, Hypothesis Testing, Model Evaluation, Analytical Thinking, Problem-Solving, Communication Skills. Project Name: Credit Card Fraud Detection – NoSQL

Project: ETL Traffic Collision Analysis

Project Name: RSVP Movies Case Study – SQL Analytics Data Engineering Specialization[ CGPA: 3.7 / 4.0]

Grade “A” (62.18%), CGPA: 7.91

Power BI for Beginners From Upskill

Anish Joshi

Data Scientist

+917********* **************@*****.*** Pune Linkedin Github SUMMARY

TECHNICAL SKILLS

KEY SKILLS

PROJECTS

Developed a real-time fraud detection pipeline using Spark, Kafka, Hive, and HBase on AWS EMR, reducing lookup latency by ~40% through optimized NoSQL key-based retrievals.

•

Automated end-to-end batch and streaming ingestion from MySQL (RDS) via Sqoop, increasing data throughput by 30% and enabling continuous fraud rule validation at scale.

•

Engineered a Python–Pandas ETL pipeline that cleaned and standardized messy collision data, improving data quality by 95% and enabling reliable trend analysis.

•

Conducted EDA to uncover high-risk zones and peak collision patterns, driving data-backed safety recommendations that improved insight accuracy by 30%.

•

Analyzed IMDB data using advanced SQL (CTEs, window functions) to identify genre-rating trends, improving content decision insights by 35%.

Cleaned and validated multi-table datasets to raise reporting accuracy by 90%, enabling reliable dashboards and production-level business analytics.

•

EDUCATION

Executive PG Programme in Data Science& AI IIIT-Bangalore Sep '25 B.Com - Modern College of Arts, Science & Commerce, Pune May '24 CERTIFICATIONS/TRAINING

Contact this candidate