Data Engineer Big

Location:

Harrison, NJ

Salary:

80000

Posted:

October 15, 2025

Contact this candidate

Resume:

Rahul Mishra

Data Engineer

NJ 201-***-**** *****.*@**********.*** LinkedIn GitHub

SUMMARY

Data Engineer with 3+ years of experience in designing and implementing scalable ETL pipelines, big data frameworks, and cloud-native solutions across healthcare, pharmaceutical, and IT domains. Skilled in Apache Airflow, Kafka, dbt, Spark, Hadoop, and Hive, with expertise in leveraging AWS, Snowflake, and Redshift for high-volume data processing and secure data management. Proficient in SQL, Python, and R for advanced data modeling, predictive analytics, and deploying ML/AI solutions, including NLP and Generative AI (LLMs, RAG, LangChain) to optimize forecasting. SKILLS

Methodology: SDLC, Agile, Waterfall

Programming Languages/Framework: Python, R, SQL, Django IDEs: PyCharm, Jupyter Notebook, IntelliJ IDEA, Visual Studio, NetBeans Big Data Ecosystem: Hadoop, MapReduce, Hive, Apache Spark, Spark SQL, Pig, Yarn, Data Lake, Data Warehouse Visualization and Reporting Tools: Power BI, Tableau, MATLAB, SAS, Bloomberg Terminal, Argos AI/ML: Natural Language Processing (NLP), Named Entity Recognition (NER), Semantic Chunking, TensorFlow, LangChain, Llama Index, Generative AI (LLM models), Retrieval-Augmented Generation (RAG), Text Classification ETL and Orchestration Tools: SSIS, Apache Kafka, Apache Airflow, dbt (Data Build Tool) Cloud Technologies and Infrastructure: AWS (S3, DynamoDB, Lambda, EC2, Redshift), GCP, Azure, Snowflake Packages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn, TensorFlow Version Control and Operations Tools: Git, GitLab, GitHub, Jenkins (CI/CD), Docker, Kubernetes, Grafana, Splunk, Jira Databases: MySQL, PostgreSQL, MongoDB, Neo4j, RDBMS, NoSQL, Cassandra, HBase, Elasticsearch EDUCATION

Master of Science in Business Analytics May 2025

University of Massachusetts Amherst – Isenberg School of Management Amherst, MA EXPERIENCE

Cardinal Health, NJ Nov 2024 – Current Data Engineer

• Designed and implemented robust ETL pipelines using Apache Airflow, Kafka, and dbt to ingest, clean, and transform large-scale healthcare data into centralized data warehouses.

• Built scalable cloud-native data solutions leveraging AWS (S3, Redshift, Lambda, EC2) and Snowflake to optimize storage, retrieval, and processing of sensitive healthcare data.

• Developed advanced data models in SQL and Python to support predictive analytics for demand forecasting, inventory optimization, and patient outcomes.

• Implemented big data processing frameworks (Hadoop, Spark, Hive, Spark SQL) for high-volume transaction data, accelerating processing by 40% and enabling near real-time reporting.

• Delivered actionable insights on pharmaceutical distribution and provider performance through interactive dashboards and reports in Tableau and Power BI.

WEBCRAFT-IT, India Jan 2020 – July 2023 Data Analyst

• Designed and optimized ETL pipelines in Python (Pandas, NumPy) to clean, preprocess, and transform raw datasets from multiple sources, increasing reporting accuracy by 30%.

• Built interactive dashboards in Tableau and Power BI to visualize sales, finance, and customer analytics, enabling stakeholders to track KPIs in real time and accelerate decision-making.

• Conducted SQL-based data extraction and advanced queries (PostgreSQL, MySQL) for trend and performance analysis, driving actionable insights into customer behavior and product growth.

• Deployed predictive analytics models (Scikit-learn) for churn prediction and demand forecasting, improving forecast accuracy and reducing churn risk by 15%.

• Migrated datasets and analytics workflows to cloud platforms (AWS, Azure, GCP), enhancing scalability, cost efficiency, and global data availability.

• Automated data validation and anomaly detection scripts in Python, cutting manual quality-check efforts and improving data reliability.

• Collaborated with cross-functional teams (business, tech, and operations) to provide data-driven recommendations, leading to process optimization and measurable cost savings. HONORS & AFFILIATIONS

Member, The Honor Society of Phi Kappa Phi (Inducted 2025) CERTIFICATIONS

Associate Data Analyst in SQL (Data Camp)

Supervised Learning with scikit-learn (Data Camp)

Data Manipulation with pandas (Data Camp)

EDA in Python & SQL (Data Camp)

Contact this candidate