Data Engineer Processing

Location:

Boston, MA

Posted:

October 15, 2025

Contact this candidate

Resume:

Dipen Patel

857-***-**** *****.****@************.*** linkedin.com/in/dipenpatel11 github.com/Dipenpatel3 SUMMARY

Data Engineer with 3+ years of experience in building scalable data pipelines and cloud solutions. Skilled in AWS, Azure, Apache Kafka, and Airflow for real-time data processing. Experienced in containerization with Docker and Kubernetes, and optimizing workflows with Python, SQL, and Tableau to drive data-driven insights and improve efficiency. PROFESSIONAL EXPERIENCE

Data Engineer Co-op OneSpan San Jose, CA Feb 2025 - Aug 2025

• Engineered Python-based ETL pipelines to process and transform 250GB+ monthly log files from AWS S3, streamlining data workflows and achieving 70% reduction in on-prem storage and accelerating downstream data processing.

• Built a serverless workflow in Java to convert nested JSON logs into CSVs, boosting transformation efficiency by 60%.

• Spearheaded backend services and RESTful API development for a client-facing dashboard using Python, overseeing 10+ AWS Lambda functions and redesigning the MySQL schema to increase data retrieval speed by 35%.

• Integrated AWS CloudWatch for pipeline observability, enabling real-time alerts and proactive failure monitoring.

• Developed and optimized data pipelines using FastAPI, integrating JSON Web Token authentication, resulting in a 30% improvement in data processing efficiency and enhanced security for user data access.

• Led design of validation frameworks to improve data integrity and reduce pipeline errors by 30% for transformed data. Data Engineer Skyfi Labs Bangalore, India Jan 2021 - Aug 2023

• Optimized ETL pipeline with Azure Data Factory, enhancing data ingestion from diverse sources, achieving 60% improvement in processing speed and 40% reduction in overall storage costs.

• Automated DAGs scheduling on Airflow to streamline ETL workflows, minimizing manual intervention by 70%.

• Optimized supply chain database by applying SQL indexing and data model refinement, reducing query times by 30%.

• Consolidated Apache Kafka to strengthen real-time data processing, managing high-velocity data streams efficiently.

• Containerized and deployed a scalable application using Docker and AWS EC2, improving resource utilization by 25%.

• Standardized CI/CD pipeline for microservice deployment with GitHub Actions resulting in a time-savings of 15 hours. Data Engineer Intern Revitech InfoSolutions Mumbai, India May 2020 - Sept 2020

• Leveraged Python for data processing, delivering 90% data cleanliness for sales analysis of pharmaceutical products.

• Created DAX calculations and measures in PowerBI for key performance indicators, boosting subscription share by 35%.

• Built Optical Character Recognition (OCR) model with Pytesseract and TensorFlow YOLOv3, achieved 98% accuracy.

• Extracted data from Excel (100K+ rows) using pyodbc and pymysql, improving SQL Server data processing in Python. PROJECTS

Multi-Agent Coding Assistant Airflow, Snowflake, Pinecone, Git

• Centralized 50GB of code and docs in Snowflake and built Pinecone indexes enabling sub-second semantic search.

• Architected scalable multi-agent workflows with LangGraph and LangChain, automating query resolution on large-scale code datasets and reducing response latency by 60% while improving system reliability.

• Built intelligent orchestration workflows with Airflow and Python to process user query data, dynamically routing tasks to AI services, boosting insight delivery speed by 50% and reducing manual research. Food Inspection Insights Alteryx, Talend, ER/Studio, AzureSQL, Tableau, PowerBI

• Conducted detailed data profiling with Alteryx to resolve 10+ data quality issues, and ingested data into Azure SQL’s normalized stage tables using Talend for consistent data handling and maximizing reliability.

• Designed a dimensional model in ER/Studio, optimizing schema for data warehousing and enhanced query performance.

• Created interactive dashboard in Power BI and Tableau, strengthening insight and data-driven decision-making by 90%. Credit Card Purchase Analytics Azure Data Lake, Databricks, PySpark, PowerBI

• Transformed 1.2 million records from Azure Data Lake Gen2 into Azure Databricks, enabling faster data processing.

• Developed a logistic regression model with transformed data attaining 90% accuracy for predicting purchases, integrating it with Power BI to create intuitive dashboards for presenting customer demographics and purchase history.

• Implemented PySpark functions for data processing and cleaning in layers, perfecting data analysis competence by 30%. TECHNICAL SKILLS

Languages and Tools : Python, SQL, PySpark, FastAPI, Flask, Java, C, Shell Scripting Cloud and Technology : Azure Data Factory, Airflow, Kafka, AWS, Databricks, GitHub Actions, Docker, Kubernetes Visualizations and BI : Splunk, Tableau, PowerBI, Talend Studio, Alteryx, ER Studio, Grafana, QuickSight Databases : Snowflake, Oracle SQL, PostgreSQL, MSSQL Server, MySQL, MongoDB, Cassandra, Redis EDUCATION

Masters of Science - Information Systems, Northeastern University Boston, MA Dec 2025 Bachelor of Engineering - Electronics & Telecommunication, Mumbai University Mumbai, India Oct 2020

Contact this candidate