Data Scientist

Company:

CloudTech Innovations

Location:

Dallas, TX

Pay:

100000USD - 120000USD per year

Posted:

August 22, 2025

Apply

Description:

Job Description

Job Title: Data Scientist – Machine Learning, Big Data, GenAI (8–10 Years Experience)

Location: Remote

Employment Type: Contract

About the Role

We are seeking a highly experienced Data Scientist with 8–10 years of expertise delivering production-grade AI/ML solutions at scale. This role requires deep technical proficiency in Machine Learning, Big Data, Generative AI, Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG), combined with hands-on cloud experience (AWS, Azure, or GCP) and migration expertise for modernizing data and AI platforms.

The ideal candidate can lead projects end-to-end, from architecture design to deployment, while mentoring teams, optimizing for performance and cost, and ensuring alignment with business objectives.

Key Responsibilities

Design, develop, and deliver end-to-end ML/AI solutions in cloud-native environments from design to deployment and monitoring.

Architect and implement Generative AI solutions leveraging LLMs (e.g., GPT, LLaMA, Claude, Mistral) and RAG pipelines with vector search.

Build and optimize Big Data pipelines using Apache Spark, PySpark, and Delta Lake integrated with cloud storage (AWS S3, Azure Data Lake, GCP Cloud Storage).

Design and maintain data lakehouse architectures with Databricks, Snowflake, or Delta Lake.

Deploy scalable MLOps pipelines using MLflow, SageMaker, Azure ML, or Vertex AI with Docker, Kubernetes (EKS, AKS, GKE), and CI/CD.

Implement and manage vector databases (Pinecone, FAISS, Milvus, Weaviate, ChromaDB) for RAG applications.

Oversee ETL/ELT workflows and pipeline orchestration using Airflow, dbt, or Azure Data Factory.

Migration projects, on-prem to cloud, cross-cloud, or legacy platform upgrades (e.g., Hadoop to Databricks, Hive to Delta Lake), ensuring data integrity and minimal downtime.

Integrate streaming data solutions using Apache Kafka and real-time analytics frameworks.

Conduct feature engineering, hyperparameter tuning, and model optimization for performance and scalability.

Mentor junior data scientists and guide best practices for AI/ML development and deployment.

Collaborate with product, engineering, and executive teams to align AI solutions with business KPIs and compliance requirements.

Required Skills & Experience

8–10 years in data science, machine learning, and AI/ML solution delivery.

Strong hands-on expertise in at least one major cloud platform (AWS, Azure, or GCP) with proven production deployments.

Proficiency in Python, PySpark, and SQL.

Proven experience with Apache Spark, Hadoop ecosystem, and Big Data processing.

Hands-on experience with Generative AI, Hugging Face Transformers, LangChain, or LlamaIndex.

Expertise in RAG architectures and vector databases (Pinecone, FAISS, Milvus, Weaviate, ChromaDB).

Experience with MLOps workflows using MLflow, Docker, Kubernetes, and CI/CD tools (Jenkins, GitHub Actions, GitLab CI).

Migration experience involving AI/ML workloads, big data pipelines, and data platforms to modern cloud-based architectures.

Knowledge of data services (AWS S3, Redshift; Azure Synapse; GCP BigQuery) and infrastructure-as-code (Terraform, CloudFormation, ARM templates).

Familiarity with streaming technologies (Kafka) and query engines (Hive, Presto, Trino).

Strong foundation in statistics, probability, and ML algorithms.

Preferred Qualifications

Experience with knowledge graphs and semantic search.

Background in NLP, transformer architectures, and deep learning frameworks (TensorFlow, PyTorch).

Exposure to BI tools (Power BI, Tableau, Looker).

Domain expertise in finance, healthcare, or e-commerce.

Fully remote

Apply

Data Scientist

Description:

Report this job