RQ00111 -Senior Big Data Engineer-Location: Rockville, MD or Tysons, VA
F2F IS A MUST after prescreening
Requirements:
Sr. Big Data Engineer – AWS, Hadoop, Athena, Glue (Catalog), Lake Formation, Trino, EKS, AI, CI/CD & MCP Server
We are looking for a Senior Big Data Engineer who thinks beyond tickets and task execution. This role is for someone who questions the “why,” modernizes tech stacks, optimizes performance, and drives outcomes, not just a task executor.
What You’ll Do
• Design and build scalable data lakes and pipelines on AWS using cloud-native and automated solutions.
• Enable fast, federated analytics using Amazon Athena and Trino, with performance tuning for large-scale queries.
• Manage metadata, schemas, and discovery using AWS Glue Data Catalog.
• Implement fine-grained data access and governance using AWS Lake Formation, KMS encryption, and SSL.
• Build and operate data services on EKS (Kubernetes).
• Work with the Hadoop ecosystem (Spark, Hive, HDFS) using partitioning, bucketing, and columnar formats (Parquet, ORC).
• Troubleshoot and resolve complex big data issues across pipelines, clusters, and queries.
• Design, implement, and maintain CI/CD pipelines using Jenkins or similar tools.
• Monitor and observe pipelines and clusters using CloudWatch and Grafana.
• Prepare high-quality datasets for AI/ML use cases.
• Build, configure, and operate an MCP server for AI/ML integration.
• Collaborate in Scrum teams; proactively identify gaps and propose out-of-the-box, scalable solutions.
What We’re Looking For
• 8+ years of experience in Big Data / Data Engineering.
• Strong hands-on experience with AWS (S3, Glue Data Catalog, Lake Formation, Athena, EMR, EKS).
• Proven experience with Trino (or Presto) and optimizing query performance.
• Working knowledge of Kubernetes / EKS for data workloads.
• Strong SQL, Python, and shell scripting skills.
• Experience with CI/CD pipelines and Jenkins.
• Experience building and configuring MCP server for AI/ML integration.
• Ownership mindset — problem solver, not a task executor.
Good to Have
• AI/ML data pipeline exposure
• Cloud-native data modernization experience