Anjan Depuru
Charlotte, NC +1-803-***-**** ***********@*****.*** github.com/Anjandepuru/
Summary
Data Engineer with 3+ years of experience architecting and automating scalable ETL pipelines using Apache Spark, Hadoop, and Azure/AWS services. Expertise in building high-performance data architecture and infrastructure that powers business intelligence, enhances data accessibility, supports regulatory compliance, and optimizes operational efficiency. EDUCATION
Clemson University, Master of Computer Science GPA: 3.8/4 May 2024 Sri Venkateswara University, Bachelor of Technology in Computer Science GPA: 7.0 /10 June 2020 TECHNICAL SKILLS
Languages: Python, Java, R,Scala, SAS, HTML, CSS, JavaScript Tools: Power BI, Tableau, MySQL, SQL Server, MongoDB, dbt, Snowflake, PostgreSQL, Jupiter, Jenkins, Hadoop, Splunk, Docker, GIT, ETL, Kafka, Data Bricks, Agile, Scrum, Project Management PROFESSIONAL EXPERIENCE
American Airlines, United States – Data Engineer (Contract) July 2024 – Present
• Designed scalable ETL pipelines using Azure Data Factory and Databricks (PySpark) with a Bronze–Silver– Gold Delta Lake architecture, reducing processing time by 25% and optimizing resource usage by 50%.
• Optimized data models and schemas in Azure Synapse and Delta Lake using partitioning, Z-Ordering, and query tuning, boosting data retrieval and reporting performance by 30%.
• Automated data pipeline monitoring and alerting using Apache Airflow, reducing downtime by 50% and ensuring high availability of critical workflows.
Tech Mahindra, India – Data Engineer September 2020 – June 2022
• Migrated legacy ETL workflows to modern ADF and Databricks pipelines in the automotive industry, enhancing scalability, improving data accuracy, and reducing redundant processes by 67%.
• Implemented robust CI/CD pipelines leveraging Git for Azure Databricks, enabling automated code versioning, continuous integration, and seamless deployment, thereby enhancing efficiency in the development lifecycle. Mphasis, India – Data Analyst February 2020 – August 2020
• Designed and executed complex SQL queries to extract and transform data from Excel, REST APIs, and flat files enabling centralized reporting solutions and accelerating decision-making across sales and finance teams.
• Architected interactive Power BI dashboards for regional automotive sales using DAX, dynamic filters, and drill- through reports. Enabled stakeholders to identify underperforming regions and optimize marketing strategies. ACADEMIC PROJECT EXPERIENCE
Data Warehouse Design for Retail Analytics – SQL, AWS Redshift, Tableau January 2023 – May 2023
• Designed a data warehouse on AWS Redshift, processing 1M+ transactions and improving query performance.
• Developed ETL processes with SQL and Python, ensuring data quality and creating 5 Tableau dashboards. Real-time Data Processing System – Apache Kafka, Apache Flink, MongoDB August 2022 – December 2022
• Built a real-time pipeline with Apache Kafka and Flink, processing 500K+ events/sec with 99.9% availability.
• Integrated real-time data into MongoDB, enabling faster analytics and reducing decision-making time. Big Data Analytics for Social Media – Hadoop, Spark, Python February 2020 – May 2020
• Developed a big data analytics platform using Hadoop and Spark, analyzing social media data from 1M users.
• Created data ingestion pipelines with Python and Spark, reducing data processing time by 50% CERTIFICATIONS
• Data Bricks Data Engineer Associate (Data Bricks) • Python Data Analytics (Udemy)
• GEN AI Fundamentals (NVIDIA) • Apache Airflow (Data Camp) LEADERSHIP AND ORGANIZATION EXPERIENCE
• Student Mentor and Volunteer, Graduate Student Assembly (GSA) at Clemson University