Nandhini N
+1-586-***-**** **************@*****.*** Santa Clara, CA
Work Authorization: L2 Visa with EAD (Eligible to work in the U.S.) Professional Summary
Big Data Developer with 4.5+ years of experience in designing large-scale ETL pipelines using PySpark, Hadoop, Hive, and AWS (EMR, S3). Skilled in Spark RDD/DataFrame APIs, schema evolution in Hive
(Avro), and integrating Sqoop with RDBMS and S3.
Technical Skills
Big Data: Apache Spark, PySpark, Hadoop (HDFS, YARN, MapReduce) Cloud & Orchestration: AWS (EMR, S3), Airflow, NiFi Tools: Hive, Sqoop
Databases: SQL Server, MySQL
Languages: Python, SQL
Operating Systems: Windows
Work Experience
1.Cognizant Technology Solutions - Big Data Developer Oct 2020 - Feb 2025 Client: Abbott
• Successfully developed large-scale distributed data pipelines using PySpark on AWS EMR, processing terabytes of data efficiently.
• Designed and implemented end-to-end ETL processes using Apache Spark, with a focus on scalability and performance.
• Created and managed Spark RDDs and DataFrames for complex data transformations, leveraging caching and persistency for optimized performance.
• Integrated Spark SQL with Hive and Hadoop, enabling efficient query execution and improved data workflow performance.
• Tuned Spark applications through configuration optimization (memory allocation, serialization, caching), leading to significant runtime improvements.
• Developed custom Spark functions to handle complex business logic and transformation rules.
• Worked with a variety of data serialization formats (Avro, Parquet, JSON) and schema evolution techniques in Hive.
• Experienced in Hive data modeling: created partitioned and bucketed tables for faster querying of large datasets.
• Developed PySpark applications on AWS EMR, integrating with S3 and other AWS services for cloud-native big data solutions.
• Proficient in Sqoop for data ingestion between Hadoop and relational databases (MySQL, SQL Server), including incremental imports and custom queries.
• Designed complex data integration solutions and automated ETL workflows using Sqoop, Spark, and Hive.
• Expert in ETL testing: aggregation, joins, filtering, and sorting; ensured data quality and accuracy.
• Strong ability to identify and resolve ETL performance bottlenecks, ensuring pipeline reliability at scale.
2. Accenture - Senior Big Data Developer (Analyst) Feb 2025 - Present Client: BCBS - MI
• Expertise in applying Spark RDD transformations and actions (filter, map, reduce, groupBy, aggregate) to process large-scale structured and unstructured datasets.
• Strong troubleshooting skills with Spark RDD, including resolution of performance bottlenecks, data processing errors, and scalability issues.
• Proficient in Spark DataFrame operations: column manipulation (add, rename, drop), type casting, and null handling for robust schema management.
• Skilled in writing complex SQL queries and DataFrame logic using Spark SQL to solve business use cases efficiently.
• Experienced in data serialization formats such as Avro, Parquet, and ORC, and their use in Spark-based distributed data pipelines.
• Designed and implemented Spark SQL queries for querying, aggregation, and data analysis.
• Integrated AWS S3 with PySpark for processing large datasets in cloud-native big data environments.
• Hands-on experience with semi-structured data processing in Hive using Avro, Parquet, and ORC formats.
• Deep understanding of Hive table formats and their trade-offs based on use cases and performance requirements.
• Optimized Hive query performance using partitioning, bucketing, indexing, and caching techniques.
• Designed and managed Avro schemas for Hive tables, ensuring forward and backward schema evolution support.
• Configured and maintained Sqoop jobs for incremental imports/exports using features like last modified checks and custom SQL queries.
• Used Sqoop to move data between Hadoop, SQL Server/MySQL, and AWS S3, supporting CSV, Avro, and Parquet file formats.
• Experienced in writing custom Sqoop queries and stored procedures for advanced data ingestion scenarios.
• Tuned Sqoop performance with custom mappers and batch sizes for low-latency, high- throughput ETL pipelines.
• Conducted thorough ETL testing and validation on Hadoop and Spark platforms to ensure data accuracy and pipeline reliability.
Education
B.E. Biomedical Engineering
Sri Ramakrishna Engineering College, Coimbatore 2016 - 2020