Job Description
We are looking for a skilled Data Infrastructure Engineer to design, develop, and optimize data pipelines and analytics platforms for a variety of business needs. In this long-term contract position, you will play a pivotal role in managing real-time and batch processing systems, ensuring data integrity, and enabling actionable insights. This opportunity is based in New York, New York.
Responsibilities:
• Design and maintain robust data pipelines and integrations to process real-time and batch data from systems such as PostgreSQL, Kafka, and DynamoDB.
• Implement and configure business intelligence tools to deliver meaningful insights and reports.
• Collaborate with cross-functional teams to create dashboards and datasets that address compliance, operational, and customer requirements.
• Define and enforce data governance policies, ensuring data security, schema management, and integrity.
• Develop secure permission models and implement data masking or tokenization techniques.
• Assess platform requirements, recommend new tools or resources, and oversee project implementation.
• Optimize data workflows using modern tools like Snowflake, Databricks, and AWS Glue.
• Support real-time data streaming using technologies such as Spark Streaming and Kafka.
• Utilize orchestration tools, such as Apache Airflow, to automate and manage workflows effectively.
• Leverage cloud-based data lakes, Infrastructure as Code (IaC) tools like Terraform, and container platforms such as Kubernetes for scalable solutions.• Minimum of 5 years of experience in data engineering or analytics, ideally within a startup environment.
• Advanced proficiency in SQL and Python for data analysis and automation.
• Extensive experience with data pipeline and warehousing tools like Snowflake and Databricks.
• Familiarity with real-time streaming technologies such as Apache Kafka and Spark.
• Hands-on experience with orchestration tools like Apache Airflow for workflow automation.
• Knowledge of cloud-based data lakes and Infrastructure as Code (IaC) tools, such as Terraform.
• Expertise in containerization platforms like Kubernetes.
• Strong understanding of data governance, including schema management and security best practices.