Big Data Engineer

Location:

Hyderabad, Telangana, India

Posted:

October 17, 2025

Contact this candidate

Resume:

Anudeep Pulluri

Mobile: +1-913-***-**** Email: **************.******@*****.*** Address: Chantilly, VA, USA

Professional Summary

●Senior Data Engineer with 3+ years of experience in designing, developing, and deploying scalable ETL and data pipelines for batch and streaming data, leveraging Java (Mid-Senior level) and Python/PySpark (Mid level) proficiency.

●Expert knowledge of Apache Spark (DataFrames, Spark SQL, Spark Streaming) for executing complex ETL transformations and orchestrating raw Change Data Capture (CDC) output into queryable data lakes.

●Extensive hands-on experience with the AWS Big Data ecosystem, including EMR, EMR Serverless, Glue Data Catalog, and S3 operations (CRUD), ensuring robust and scalable cloud-native data solutions.

●Proven ability to design and automate data workflows using Apache Airflow (MWAA) and AWS Step Functions, coordinating complex multi-step processes and utilizing Lambda functions and AWS Batch for optimized execution.

●Strong understanding of Big Data concepts and performance tuning, coupled with foundational experience in algorithms and data structures, ensuring efficient and resilient data systems.

Professional Skills

●Big Data Frameworks: Apache Spark (DataFrames, SQL, Streaming), Change Data Capture (CDC), ETL / ELT, Big Data Concepts, Performance Tuning

●Programming Languages: Java (Mid-Senior Level), Python (Mid Level / PySpark), Scala (Familiarity)

●AWS Big Data & Orchestration: EMR & EMR Serverless, S3 & S3 Operations (CRUD), Glue Data Catalog, MWAA (Apache Airflow), Step Functions, Lambdas (Python), AWS Batch

●Databases & Tools: Relational Databases (SQL), NoSQL (Principles), Apache Hudi (Familiarity), AWS Deequ (Familiarity), Git, Apache Airflow

Work Experience

Data & Application Engineering (Focus on Scalable ETL and Big Data) DBS Tech – Hyderabad, India January 2021 – January 2024

●Applied expertise in Apache Spark (PySpark/Java) to design and implement ETL jobs and streaming data pipelines that processed high-volume raw data, transforming it into usable, queryable data for analytics.

●Developed deep knowledge of Change Data Capture (CDC) processes from various relational databases, architecting downstream solutions to hydrate a centralized data lake efficiently and reliably.

●Managed and utilized AWS Big Data services, including EMR clusters and Glue Data Catalog, ensuring optimal configuration for Spark DataFrames and high-throughput data processing workloads.

●Automated complex data ingestion and transformation workflows using orchestration tools like MWAA (Apache Airflow) and AWS Step Functions, significantly reducing manual intervention and increasing pipeline reliability.

●Maintained and optimized data storage layers within Amazon S3, demonstrating extensive knowledge of S3 operations (CRUD) and lifecycle policies to manage vast datasets securely and cost-effectively.

●Wrote clean, high-quality, and testable code in Java and Python for core data services and utility Lambdas, focusing on system resilience, stability, and high performance.

●Contributed to code reviews and technical documentation, adhering to engineering standards for big data applications and applying principles of performance tuning to Spark jobs.

Projects

Full Stack Virtual Book Web App (Java/Data Management)

●Designed a full-stack solution utilizing Java and relational databases, applying fundamental principles of data integrity and transactional processes relevant to CDC.

Project Management Dashboard (Python/DataFrames & Automation)

●Developed complex data analysis and automated calculations using Python (Pandas), demonstrating proficiency in data manipulation and transformation analogous to using Spark DataFrames for ETL.

Multiple Disease Detection Web App (Algorithm & Scalability Foundation)

●Applied fundamental algorithms and data structures to improve predictions, demonstrating a commitment to performance optimization relevant to tuning Big Data job execution.

Education

University of Central Missouri, Warrensburg, Missouri Master of Science (MSc) in Computer Science

Certifications

●AWS Certified Solutions Architect – Associate (Demonstrates experience with core AWS services)

●Google Cloud Professional Cloud Architect (Demonstrates multi-cloud architectural knowledge)

Awards & Leadership

●Best Project Award at DBS Tech India for developing an innovative and efficient web application that improved operational efficiency and user experience.

●Served as a mentor at Smart Interviews, providing technical guidance to over fifty students, showcasing strong communication and collaboration skills vital for cross-functional data teams.

Contact this candidate