Job Description
Benefits:
Dental insurance
Health insurance
Paid time off
Python Developer
Onsite Role
Charlotte NC
Key Responsibilities
Build and maintain large-scale data processing pipelines using Apache Spark for batch and streaming data.
Design and implement ML training and inference workflows using PyTorch and integrate them into production systems.
Develop and orchestrate ETL and ML pipelines with Apache Airflow, ensuring reliability, scalability, and observability.
Optimize performance of data pipelines and ML model training on distributed clusters.
Collaborate with Data Scientists and ML Engineers to productize models and deploy them into production environments.
Implement best practices for code quality, CI/CD, unit testing, and monitoring.
Ensure data quality, integrity, and security across all pipelines.
Troubleshoot performance bottlenecks and optimize resource utilization.
Stay up to date with advancements in ML frameworks, distributed computing, and workflow orchestration tools.
Required Qualifications
Bachelors or Masters degree in Computer Science, Engineering, or related field.
5+ years of professional Python development experience, with strong object-oriented programming and software engineering fundamentals.
Hands-on experience with PyTorch for model training and inference.
Deep understanding of Apache Spark for distributed data processing (PySpark or Scala is a plus).
Strong experience with Apache Airflow for workflow orchestration in production environments.
Proficiency in SQL and working with relational and NoSQL databases.
Experience with Docker, Kubernetes, and cloud platforms (AWS/GCP/Azure).
Familiarity with data versioning and ML model lifecycle management (MLflow or similar).
Strong problem-solving and debugging skills in distributed systems.
Preferred Skills
Experience with real-time data processing frameworks (Kafka, Flink).
Knowledge of feature stores, data lake architectures, and Delta Lake.
Familiarity with MLOps practices (CI/CD for ML, model registry, automated retraining).
Experience with GPU-accelerated ML training and performance optimization.
Contribution to open-source ML or data engineering projects.
Flexible work from home options available.
Full-time
Hybrid remote