Big Data Developer

Location:

Vallejo, CA

Posted:

May 19, 2025

Contact this candidate

Resume:

Nandhini N

+1-586-***-**** **************@*****.*** Santa Clara, CA

Work Authorization: L2 Visa with EAD (Eligible to work in the U.S.) Professional Summary

Big Data Developer with 4.5+ years of experience in designing large-scale ETL pipelines using PySpark, Hadoop, Hive, and AWS (EMR, S3). Skilled in Spark RDD/DataFrame APIs, schema evolution in Hive

(Avro), and integrating Sqoop with RDBMS and S3.

Technical Skills

Big Data: Apache Spark, PySpark, Hadoop (HDFS, YARN, MapReduce) Cloud & Orchestration: AWS (EMR, S3), Airflow, NiFi Tools: Hive, Sqoop

Databases: SQL Server, MySQL

Languages: Python, SQL

Operating Systems: Windows

Work Experience

1.Cognizant Technology Solutions - Big Data Developer Oct 2020 - Feb 2025 Client: Abbott

• Successfully developed large-scale distributed data pipelines using PySpark on AWS EMR, processing terabytes of data efficiently.

• Designed and implemented end-to-end ETL processes using Apache Spark, with a focus on scalability and performance.

• Created and managed Spark RDDs and DataFrames for complex data transformations, leveraging caching and persistency for optimized performance.

• Integrated Spark SQL with Hive and Hadoop, enabling efficient query execution and improved data workflow performance.

• Tuned Spark applications through configuration optimization (memory allocation, serialization, caching), leading to significant runtime improvements.

• Developed custom Spark functions to handle complex business logic and transformation rules.

• Worked with a variety of data serialization formats (Avro, Parquet, JSON) and schema evolution techniques in Hive.

• Experienced in Hive data modeling: created partitioned and bucketed tables for faster querying of large datasets.

• Developed PySpark applications on AWS EMR, integrating with S3 and other AWS services for cloud-native big data solutions.

• Proficient in Sqoop for data ingestion between Hadoop and relational databases (MySQL, SQL Server), including incremental imports and custom queries.

• Designed complex data integration solutions and automated ETL workflows using Sqoop, Spark, and Hive.

• Expert in ETL testing: aggregation, joins, filtering, and sorting; ensured data quality and accuracy.

• Strong ability to identify and resolve ETL performance bottlenecks, ensuring pipeline reliability at scale.

2. Accenture - Senior Big Data Developer (Analyst) Feb 2025 - Present Client: BCBS - MI

• Expertise in applying Spark RDD transformations and actions (filter, map, reduce, groupBy, aggregate) to process large-scale structured and unstructured datasets.

• Strong troubleshooting skills with Spark RDD, including resolution of performance bottlenecks, data processing errors, and scalability issues.

• Proficient in Spark DataFrame operations: column manipulation (add, rename, drop), type casting, and null handling for robust schema management.

• Skilled in writing complex SQL queries and DataFrame logic using Spark SQL to solve business use cases efficiently.

• Experienced in data serialization formats such as Avro, Parquet, and ORC, and their use in Spark-based distributed data pipelines.

• Designed and implemented Spark SQL queries for querying, aggregation, and data analysis.

• Integrated AWS S3 with PySpark for processing large datasets in cloud-native big data environments.

• Hands-on experience with semi-structured data processing in Hive using Avro, Parquet, and ORC formats.

• Deep understanding of Hive table formats and their trade-offs based on use cases and performance requirements.

• Optimized Hive query performance using partitioning, bucketing, indexing, and caching techniques.

• Designed and managed Avro schemas for Hive tables, ensuring forward and backward schema evolution support.

• Configured and maintained Sqoop jobs for incremental imports/exports using features like last modified checks and custom SQL queries.

• Used Sqoop to move data between Hadoop, SQL Server/MySQL, and AWS S3, supporting CSV, Avro, and Parquet file formats.

• Experienced in writing custom Sqoop queries and stored procedures for advanced data ingestion scenarios.

• Tuned Sqoop performance with custom mappers and batch sizes for low-latency, high- throughput ETL pipelines.

• Conducted thorough ETL testing and validation on Hadoop and Spark platforms to ensure data accuracy and pipeline reliability.

Education

B.E. Biomedical Engineering

Sri Ramakrishna Engineering College, Coimbatore 2016 - 2020

Contact this candidate