Rahul Mishra
Data Engineer
NJ 201-***-**** *****.*@**********.*** LinkedIn GitHub
SUMMARY
Data Engineer with 3+ years of experience in designing and implementing scalable ETL pipelines, big data frameworks, and cloud-native solutions across healthcare, pharmaceutical, and IT domains. Skilled in Apache Airflow, Kafka, dbt, Spark, Hadoop, and Hive, with expertise in leveraging AWS, Snowflake, and Redshift for high-volume data processing and secure data management. Proficient in SQL, Python, and R for advanced data modeling, predictive analytics, and deploying ML/AI solutions, including NLP and Generative AI (LLMs, RAG, LangChain) to optimize forecasting. SKILLS
Methodology: SDLC, Agile, Waterfall
Programming Languages/Framework: Python, R, SQL, Django IDEs: PyCharm, Jupyter Notebook, IntelliJ IDEA, Visual Studio, NetBeans Big Data Ecosystem: Hadoop, MapReduce, Hive, Apache Spark, Spark SQL, Pig, Yarn, Data Lake, Data Warehouse Visualization and Reporting Tools: Power BI, Tableau, MATLAB, SAS, Bloomberg Terminal, Argos AI/ML: Natural Language Processing (NLP), Named Entity Recognition (NER), Semantic Chunking, TensorFlow, LangChain, Llama Index, Generative AI (LLM models), Retrieval-Augmented Generation (RAG), Text Classification ETL and Orchestration Tools: SSIS, Apache Kafka, Apache Airflow, dbt (Data Build Tool) Cloud Technologies and Infrastructure: AWS (S3, DynamoDB, Lambda, EC2, Redshift), GCP, Azure, Snowflake Packages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn, TensorFlow Version Control and Operations Tools: Git, GitLab, GitHub, Jenkins (CI/CD), Docker, Kubernetes, Grafana, Splunk, Jira Databases: MySQL, PostgreSQL, MongoDB, Neo4j, RDBMS, NoSQL, Cassandra, HBase, Elasticsearch EDUCATION
Master of Science in Business Analytics May 2025
University of Massachusetts Amherst – Isenberg School of Management Amherst, MA EXPERIENCE
Cardinal Health, NJ Nov 2024 – Current Data Engineer
• Designed and implemented robust ETL pipelines using Apache Airflow, Kafka, and dbt to ingest, clean, and transform large-scale healthcare data into centralized data warehouses.
• Built scalable cloud-native data solutions leveraging AWS (S3, Redshift, Lambda, EC2) and Snowflake to optimize storage, retrieval, and processing of sensitive healthcare data.
• Developed advanced data models in SQL and Python to support predictive analytics for demand forecasting, inventory optimization, and patient outcomes.
• Implemented big data processing frameworks (Hadoop, Spark, Hive, Spark SQL) for high-volume transaction data, accelerating processing by 40% and enabling near real-time reporting.
• Delivered actionable insights on pharmaceutical distribution and provider performance through interactive dashboards and reports in Tableau and Power BI.
WEBCRAFT-IT, India Jan 2020 – July 2023 Data Analyst
• Designed and optimized ETL pipelines in Python (Pandas, NumPy) to clean, preprocess, and transform raw datasets from multiple sources, increasing reporting accuracy by 30%.
• Built interactive dashboards in Tableau and Power BI to visualize sales, finance, and customer analytics, enabling stakeholders to track KPIs in real time and accelerate decision-making.
• Conducted SQL-based data extraction and advanced queries (PostgreSQL, MySQL) for trend and performance analysis, driving actionable insights into customer behavior and product growth.
• Deployed predictive analytics models (Scikit-learn) for churn prediction and demand forecasting, improving forecast accuracy and reducing churn risk by 15%.
• Migrated datasets and analytics workflows to cloud platforms (AWS, Azure, GCP), enhancing scalability, cost efficiency, and global data availability.
• Automated data validation and anomaly detection scripts in Python, cutting manual quality-check efforts and improving data reliability.
• Collaborated with cross-functional teams (business, tech, and operations) to provide data-driven recommendations, leading to process optimization and measurable cost savings. HONORS & AFFILIATIONS
Member, The Honor Society of Phi Kappa Phi (Inducted 2025) CERTIFICATIONS
Associate Data Analyst in SQL (Data Camp)
Supervised Learning with scikit-learn (Data Camp)
Data Manipulation with pandas (Data Camp)
EDA in Python & SQL (Data Camp)