Title: Data Engineer
Location: San Francisco (hybrid schedule)
Reporting Structure: This role reports to the CTO
Company Description
With experience across far-ranging banks, Fortune 500 tech co’s, fintech unicorns, and AI experts, our client is built by financial institutions, for financial institutions. Started in 2023 by experienced founders, our client has raised $20 Million and hit $2 Million in ARR faster than any other identity company in history. Today more than 2,000 Financial Institutions and Government Agency customers later, they are revolutionizing the way businesses approach fraud prevention and compliance.
Responsibilities
Looking for a data engineer to assist with data collection across a variety of fragmented data sources available from both government, public, and private databases.
Is this you? We’re looking for the following experience:
ETL (extract, transform, load)
Database design
Primary key, foreign key
Indexing
Partitioning
Access patterns
Migrations
Data pipelining
Core data concepts: ACID transactions, Idempotency, Orchestration
Technologies:
Airflow
Google Cloud Platform (GCP)
GCP Dataflow (aka Apache Beam)
PostgreSQL
Python
Pydantic (a python library)
Distributed systems
Ideal candidates will have experience building data pipelines end-to-end. What does end-to-end mean?
Designing the database schema (i.e. table structure).
Migrating production tables.
Working with large scale data (i.e. terabytes).
Maintaining operational efficiency of databases.
Normalizing disparate schemas to a single unified schema.
#Perks
Hybrid in SF. In office 3 days/week
Flexible PTO
Healthcare, 401K
Smart, genuine, ambitious team