Post Job Free
Sign in

Data Engineer

Company:
Schemata
Location:
San Francisco, CA
Posted:
May 21, 2025
Apply

Description:

We are seeking a highly skilled Data Engineer to join our dynamic, early-stage team full-time. You will play a foundational role in designing, building, and scaling both the cloud infrastructure and data pipelines that power our AI-driven 3D interactive applications, neural rendering systems, and analytics frameworks.

At Schemata, we are transforming the $400B Virtual Training and Simulation market by integrating AI, neural rendering, and spatial computing into highly regulated industries. Our platform ingests vast amounts of structured and unstructured data—including 3D scans, dense technical documentation, and user interactions – we need a versatile data engineer to build out robust data pipelines, infrastructure, and analytical tools.

This is a high-impact, cross-functional role—you will work end-to-end on everything from cloud architecture and infrastructure-as-code to data ingestion, machine learning pipelines, and multi-modal outputs that empower both our internal teams and external customers.

Responsibilities

Build and optimize scalable machine learning based pipelines to process diverse data sources, including 3D spatial data, technical documentation, and user data

Implement real-time and batch data processing systems

Support ML engineers with data pipelines for training/inference across structured and unstructured data (text, images, video, 3D assets)

Design and manage scalable AWS cloud architecture while implementing infrastructure-as-code for reliability

Optimize distributed computing and storage solutions for cost-effective, high-performance workloads

Work closely with product and engineering teams to integrate data-driven features into interactive 3D applications

Document cloud architecture, data models, and infrastructure for cross-team collaboration

Stay current with emerging technologies in cloud/data engineering, AI, and spatial computing to continuously improve our stack

Qualifications

Need to have

4+ years of experience in data engineering, platform engineering, or cloud engineering roles, with a proven track record of delivering end-to-end solutions

Proficiency in Python and SQL, with experience building scalable cloud pipelines (AWS Batch)

Strong expertise in AWS and infrastructure-as-code (Terraform)

Experience designing and implementing real-time and batch processing workflows

Knowledge of data modeling, distributed computing, and storage architectures

Familiarity with containerization and orchestration tools (Docker, Kubernetes)

Experience with data visualization tools (Grafana or similar)

Nice to have

Experience working with unstructured 3D data, such as point clouds, mesh files, or volumetric captures

Expertise in modern data warehousing and lakehouse architectures (Databricks, Snowflake, Redshift, or BigQuery)

Familiarity with ML Ops and integrating machine learning pipelines into data workflows

Experience working with graph or multimodal data architectures

Knowledge of graph databases (e.g., Neo4j) or vector search for AI-powered retrieval

Previous experience working in highly regulated industries (e.g., defense, energy, finance)

Why Join Us?

Own and shape the platform engineering function at a fast-growing company

Tackle unique, high-impact infrastructure and data challenges at the intersection of AI, spatial computing, and neural rendering

Work with a world-class team of engineers, researchers, and product builders, solving real-world problems in high-stakes industries

Fast-paced, high-ownership environment—your work will directly impact the scalability, reliability, and performance of our core products

Apply