The Role:
The data engineering team is an experienced team, responsible for supporting our product development and the entire organization. In addition to building ETL pipelines to automate analytics and building integrations between systems, the team is responsible for building and maintaining the infrastructure used to host these pipelines and integrations. The team is also responsible for building and maintaining data access components and providing tooling and analytics that are required for our predictive/ML models.
What you will be doing:
Build and maintain analytics with Python (pandas/pyspark)
Build and maintain ETL pipelines on AWS (EC2/Glue ETLs/Airflow)
Build and maintain Infrastructure components to support our pipelines and integrations(CDK)
Setup and maintain integrations between different systems to enable data flow between these systems (Appflow)
Actively contribute to shaping the direction of our data platform including architecting our data warehouse, machine learning deployment infrastructure, and ETL/ELT workflows
Gather and understand data requirements by working with stakeholders across multiple teams
Working closely with Engineering, IT, and Security to build processes and standards for our data science platform and how it integrates with data sources across the company
Developing ingestion, transformation, and cleansing pipelines to prepare a variety of structured and unstructured data sources for data analytics
Maintaining our data platform including managing and improving our redshift cluster and monitoring our data pipelines
Developing infrastructure using CDK to deploy data products to internal and external users
Providing operational support to the data science team
Being a go-to person about data-related questions company-wide
What you bring to this role:
Bachelor’s degree in Engineering, Computer Science, Mathematics, or a related technical discipline
4+ years experience in the data engineering field
Experience in setting up and maintaining a high volume of ETL pipelines
Experience in setting up ETL orchestration
Familiarity with infrastructure as code (CDK or Terraform) is a plus
Advanced knowledge of SQL and knowledge of NoSQL (MongoDB)
Ability to communicate effectively with people who are both highly technical, and non-technical alike
Strong analytical skills and an understanding of data science
Driven, passionate and creative, and thrives in a fast-paced environment
Knowledge of data modeling and system design using UML
Experience with AWS computing (eg. EC2, Lambda) and data storage technologies (eg. Redshift)
Tech Stack:
PostgreSQL
Python
Pandas
Nice to have Pyspark
Nice to have CDK or Terraform
AWS