Title: Data Engineer (Python, Spark, AWS - AI Exposure)
Location: Columbus, OH - Hybrid (3 Days Onsite / 2 Days Remote)
Duration: 6+ months (possibility of an extension)
Implementation Partner: Tekwings
End Client: To be disclosed
JD:
We are seeking a Data Engineer with strong expertise in Python, Apache Spark, and AWS, along with exposure to AI/ML data pipelines, to support scalable data processing and analytics initiatives.
The ideal candidate will work closely with data scientists, AI engineers, and business stakeholders to design, build, and optimize high-performance data pipelines that enable analytics and AI-driven use cases.
Key Responsibilities
Design, develop, and maintain scalable data pipelines using Python and Apache Spark
Build and optimize ETL/ELT workflows for structured and semi-structured data
Develop data processing jobs for batch and near real-time workloads
Integrate data from multiple sources including APIs, databases, and cloud storage
Support AI/ML workflows by preparing, transforming, and validating datasets
Collaborate with Data Scientists to enable feature engineering and model training pipelines
Implement data quality checks, validation rules, and monitoring processes
Optimize Spark jobs for performance, scalability, and cost efficiency
Deploy and manage data solutions on AWS cloud infrastructure
Participate in code reviews, documentation, and best engineering practices
Work in an Agile environment and support production data issues as needed Required Skills & Experience
Strong experience with Python for data engineering
Hands-on experience with Apache Spark (PySpark)
Solid experience working with AWS services, including:
S3
EC2
Glue
Lambda
EMR (preferred)
Experience with SQL and relational databases
Strong understanding of data modeling, data warehousing, and analytics concepts
Experience building and maintaining large-scale data pipelines
Familiarity with CI/CD and version control (Git) AI / ML Exposure (Preferred)
Exposure to AI/ML data pipelines and workflows
Experience supporting feature engineering for ML models
Understanding of how data is prepared for model training and inference
Familiarity with ML tools or frameworks is a plus (not mandatory) Nice-to-Have Skills
Experience with streaming technologies (Kafka, Spark Streaming)
Experience with Airflow or similar orchestration tools
Knowledge of data lakes and lakehouse architectures
Exposure to Docker or containerized environments
Experience working in regulated or enterprise-scale environments Ideal Candidate Profile
Strong problem-solving and analytical skills
Comfortable working in a hybrid onsite environment
Able to collaborate effectively with cross-functional teams
Proactive, detail-oriented, and delivery-focused
Clear communicator with strong documentation skills