Total Exp: 8-10 Years.
Python and PySpark.
SUMMARY
Develop and optimize ETL pipelines using Python, PySpark, PySpark Notebooks with AWS EMR.
Should have good understanding of Spark Resilient Distributed Datasets (RDD), Data Frame, and Datasets.
Work with large-scale datasets and build distributed computing solutions.
Design and implement data ingestion, transformation, and processing workflows using IICS jobs.
Write efficient and scalable Python code for data processing.
Collaborate with data engineers, data scientists, and business teams to deliver insights.
Optimize performance and cost efficiency for big data solutions.
Implement best practices for CI/CD, testing, and automation in a cloud environment.
Monitor job performance, troubleshooting failures, and tuning queries. Skills : Python And PySpark.
Contract