Responsibilities:
Assembling large, complex sets of data that meet non-functional and functional business requirements
Design, develop, monitor and maintain scalable data pipelines and ETL processes
Build required infrastructure for optimal extraction, transformation and loading of data from various data sources using Integration and SQL technologies, often these are Cloud based
Identifying, designing and implementing internal process improvements including redesigning infrastructure for greater scalability, optimizing data delivery, and automating manual processes
Building analytical tools to utilise the data pipeline, providing actionable insight into key business performance metrics
Ensure data quality, consistency, integrity, and security across all systems
Drive continuous improvement of data engineering best practices and tooling
Required Skills and Experience:
Bachelors Degree in Computer Science, Engineering, Mathematics, or related field
5-7 years of experience in a database management, data engineering or similar role
Proficiency in programming languages such as Python or Scala
Strong proficiency in SQL and experience with relational databases (e.g. MSSQL, PostgreSQL, MySQL)
Hands-on experience with No-SQL database technologies
Experience in database optimization and performance tuning required
Good understanding of Data Integration patterns
Exposure to BI tooling such as Power BI or Yellowfin is advantageous
Experience setting up MS SQL Replication and Data Archiving strategies will be beneficial
Experience with cloud platforms (e.g., AWS, GCP, Azure) and services like S3, Lambda, Redshift, BigQuery, or Snowflake
Familiarity with big data technologies like Apache Spark, Data Bricks, and Hive
Familiarity with data modelling, warehousing concepts, and data governance practices
Exposure to Data cleansing and de-duplication techniques will be beneficial
Advantageous:
Experience with stream processing tools (e.g., Kafka, Spark Streaming, Flink)
Knowledge of containerization (Docker) and orchestration tools (Kubernetes)
Understanding of CI/CD principles and infrastructure-as-code
Exposure to machine learning workflows and MLOps