About the Client
ARAs Client is a leading technology-driven organization focused on building scalable, data-centric solutions that power business intelligence and digital transformation. The company fosters innovation, collaboration, and continuous learning, enabling teams to work on cutting-edge data platforms and enterprise-scale systems.
Role Summary
We are seeking a highly skilled Data Engineer with strong PySpark expertise to design, build, and optimize large-scale data pipelines. This role involves working with distributed data systems, ensuring high data quality, and enabling seamless data flow across platforms. The ideal candidate will combine strong technical expertise with leadership capabilities to drive data engineering best practices.
Key Responsibilities
Design, develop, and maintain scalable data pipelines using PySpark
Build and optimize ETL processes for efficient data ingestion and transformation
Ensure data quality, integrity, and governance across systems
Collaborate with cross-functional teams to define data requirements and solutions
Lead technical decision-making and contribute to architectural discussions
Troubleshoot and optimize data workflows for performance and reliability
Mentor junior engineers and promote knowledge sharing
Ensure compliance with data governance and security standards
Must-Have Qualifications
5+ years of experience in Data Engineering
Strong hands-on experience with PySpark
Experience with distributed data processing frameworks
Solid understanding of ETL processes and data integration techniques
Experience with cloud-based data platforms (AWS / Azure / GCP)
Strong problem-solving and debugging skills
Nice to Have
Experience with data warehousing solutions
Familiarity with workflow orchestration tools (e.g., Airflow)
Knowledge of big data ecosystems (Hadoop, Hive, etc.)
Exposure to real-time data processing frameworks.
Tier 2 locations preferred.