Job Description
Data Engineer:
Our client is Delivering the most comprehensive identity insights, our client’s platform equips businesses with fully automated KYB (Know your Business) solutions for Risk, and Fraud management, setting new standards in business verification.
The solution is designed for FS institutions to emphasize their interest in a coordinated effort to mitigate B2B fraud, reduce the risk associated with working with small businesses, and create a centralized, privacy-compliant entity for data-sharing between financial institutions.
Looking for a data engineer to assist with data collection across a variety of fragmented data sources available from both government, public, and private databases.
Responsibilities:
Designing the database schema
Migrating production tables
Working with large scale data in the terabyte range
Maintaining operational efficiency of database
Normalizing disparate schemas to a single unified scheme
Abstracting reusable components
Strength in minimizing the amount of new code for new pipelines and instead creating internal packages that allow a high level of reusability
Requirements
Experience with:
ETL (extract, transform, load)
Database design
Primary key, foreign key
Indexing
Partitioning
Access patterns
Migrations
Data pipelining
Core data concepts: ACID transactions, Idempotency, Orchestration
Technologies:
Airflow
Google Cloud Platform (GCP)
GCP Dataflow (aka Apache Beam)
PostgreSQL
Python
Pydantic (a python library)
Distributed systems
Full-time