Detail-oriented Data Engineer with hands-on experience in Big Data processing and predictive modeling using Python, SQL, and AWS. Successfully managed 10+ data projects, optimizing ETL pipelines and conducting EDA to deliver actionable insights. Developed a real-time fraud detection pipeline that reduced lookup latency by ~40% and significantly improved data quality and analytical accuracy.
-
Project Name: Credit Card Fraud Detection – NoSQL
Project: ETL Traffic Collision Analysis
Data Engineering Specialization with Coursework in Big Data Analytics, Cloud Computing, ETL Pipelines, Advanced SQL Anish Joshi
Data Engineer Data Analyst Python SQL AWS ETL Pipelines
+917********* **************@*****.*** Pune Linkedin Github SUMMARY
TECHNICAL SKILLS
• Programming Languages &Databases: Python, SQL, Mysql, NoSQL (HBase), PySpark
• Big Data & Cloud Tools: Apache Spark, Hadoop, HDFS, MapReduce, HBase, Sqoop, AWS (S3, RDS, EC2, IAM), SnowPro Associate
• Data Analysis & Visualization: Power BI, Excel, Matplotlib, Pandas, NumPy
• ETL & Data Pipelines: Data Cleaning, Data Ingestion, Data Transformation KEY SKILLS
• Data Engineering: ETL Pipelines Data Warehousing Data Cleaning Data Quality Assessment Big Data Processing Analytics & ML: Predictive Analytics Exploratory Data Analysis (EDA) Statistical Modeling Feature Engineering Machine Learning A/B Testing Hypothesis Testing Model Evaluation
•
Business & Communication: Business Problem Solving Requirements Gathering Data Storytelling Dashboarding Analytical Thinking
Cross-functional Collaboration
•
PROFESSIONAL EXPERIENCE
Gen-AI Powered Data Analytics Intern Feb '26 Present Forage (TCS Virtual Job Simulation) Remote
• Completed a job simulation involving AI-powered data analytics and strategy development for the Financial Services of Tata iQ. Conducted exploratory data analysis (EDA) using GenAI tools to assess data quality, identify risk indicators, and structure insights for predictive modeling
•
Proposed and justified an initial no-code predictive modeling framework to assess customer delinquency risk, leveraging GenAI for structured model logic and evaluation criteria.
•
Designed an AI-driven collections strategy leveraging agentic AI and automation, incorporating ethical AI principles, regulatory compliance, and scalable implementation frameworks.
•
KEY INDUSTRY PROJECTS
Developed a real-time fraud detection pipeline using Spark, Kafka, Hive, and HBase on AWS EMR, reducing lookup latency by ~40% through optimized NoSQL key-based retrievals.
•
Automated end-to-end batch and streaming ingestion from MySQL (RDS) via Sqoop, increasing data throughput by 30% and enabling continuous fraud rule validation at scale.
•
Engineered a Python–Pandas ETL pipeline that cleaned and standardized messy collision data, improving data quality by 95% and enabling reliable trend analysis.
•
Conducted EDA to uncover high-risk zones and peak collision patterns, driving data-backed safety recommendations that improved insight accuracy by 30%.
•
EDUCATION
Executive PG Programme in Data Science& AI IIIT-Bangalore Sep '25 B.Com - Modern College of Arts, Science & Commerce, Pune May '24 CERTIFICATIONS/TRAINING
• Snowpro Associate: Platform Snowflake Feb'26 Power BI for Beginners Simplilearn Jan'24
• Analyzing and Visualizing Data with Microsoft Power BI Microsoft Dec '23