We are seeking an experienced Data Engineer to design, build, and maintain scalable data pipelines and distributed data systems. The ideal candidate will have strong expertise in Python, PySpark, and AWS-based data platforms, along with solid experience in data modeling and ETL frameworks.
Location: Columbus, OH
Work Mode: Onsite
Employment Type: Contract
Preferred / Nice to Have
Experience with Airflow or other workflow orchestration tools
Knowledge of Kafka, Kinesis, or streaming data platforms
Experience with Docker/Kubernetes
Exposure to Delta Lake, Iceberg, or Hudi Required Skills & Qualifications
Strong proficiency in Python for data processing and pipeline development
Hands-on experience with Apache Spark (PySpark preferred)
Solid experience with AWS services such as S3, Glue, EMR, Redshift, Athena, and Lambda
Strong experience with SQL and relational/non-relational databases
Knowledge of data modeling, data warehousing concepts, and ETL frameworks
Experience working with large-scale distributed data systems
Familiarity with CI/CD pipelines and Git
Strong analytical, problem-solving, and communication skills