Data Engineer

Location:

United States

Salary:

$85,000

Posted:

March 25, 2025

Contact this candidate

Resume:

Ansh Patel

+1-469-***-**** - *************.****@*****.*** - linkedin.com/in/patelansh110 - github.com/ANSH15007 EDUCATION

The University of Texas At Dallas, Richardson, TX, USA Aug 2022 - May 2024 Master of Science in Computer Science GPA : 3.70/4 Gujarat Technological University, Ahmedabad, India Aug 2018 - May 2022 Bachelor of Engineering in Information and Communication Technology GPA : 3.80/4 TECHNICAL SKILLS

Languages:Python, Java, SQL, NoSQL, Linux

Cloud:AWS (EC2, S3, Lambda, Glue, Redshift, Kinesis, CloudFormation, CloudWatch), Databricks Tools:Apache Spark, Apache Airflow, Kafka, Hadoop, Dbt, Informatica DevOps:Docker, Kubernetes, Jenkins, Terraform, GitHub Actions Databases:MySQL, PostgreSQL, DynamoDB, MongoDB, Snowflake Data Visualization: Tableau, PowerBI, AWS Quicksight WORK EXPERIENCE

Data Engineer

AKS Infotech Inc, NJ, US Remote Jun 2024 - Present

• Engineered robust data pipelines using Apache Airflow and Databricks for a retail client, processing daily ecommerce transactional data across 20+ microservices, achieving 99.95% uptime.

• Implemented ETL processes using AWS Glue and Databricks Auto Loader for customer purchase history and inventory management data, reducing processing time by 40% and improving data quality.

• Optimized Spark jobs on Databricks, resulting in a 30% reduction in processing time for large-scale data transformations.

• Developed Python frameworks for data integration, implementing Infrastructure as Code using Terraform for AWS resource management.

Data Engineer

Digital Sky 360, Ahmedabad, India On-site May 2020 - June 2022

• Engineered real-time IoT data streaming architecture using Apache Kafka and AWS Kinesis, processing 5k+ events per second from industrial sensors, reducing latency by 28% for critical business metrics including equipment performance monitoring.

• Implemented comprehensive data quality checks and governance policies for IoT data, including validation, metadata management, and access controls, improving overall data accuracy by 15% across all pipelines.

• Containerized data processing applications using Docker and Kubernetes, enabling seamless deployment across development and production environments, reducing deployment failures by 65% and MTTR by 43%. PROJECTS

ShopSense - Real-Time Fashion Analytics Pipeline

• Developed a scalable data pipeline using AWS Kinesis, Lambda, Databricks Delta Lake and S3 to analyze ASOS public dataset (15GB) containing 50K+ daily clickstream events, enabling real-time customer behavior analysis and personalized recommendations.

• Orchestrated the ETL workflow using Apache Airflow with automated data transformations and quality checks using AWS Glue, Databricks, and Dbt, reducing manual interventions by 40%. SentiTrack - Social Media Brand Perception Analyzer

• Built an end-to-end pipeline to collect and analyze Nike and Adidas public Twitter data (25GB dataset) using Twitter API, AWS Glue, and AWS Redshift, processing 50K tweets daily for brand sentiment analysis

• Designed a star schema in Redshift with automated alerting via Amazon SNS for negative sentiment spikes, reducing average query time by 50% and enabling faster response to potential PR issues GroceryLens - Retail Analytics Platform

• Designed a star schema data model in Snowflake using Instacart's public dataset (8GB), integrating purchase history, product inventory, and customer loyalty information from multiple retail channels

• Built ETL workflows using Databricks and PySpark, processing 100K+ daily transactions with Delta Lake partitioning for optimized storage and enhanced query performance for sales trend visualization COURSES AND CERTIFICATIONS

• AWS Cloud Practitioner Essentials offered by AWS Training in May 2024

• DevOps Foundations offered by SKILLup IT/DevOps Institute in March 2023

Contact this candidate