6-10+ years of experience. Hadoop ETL Developer with Data Governance Join our Information Technology team where you will work on new technologies and find ways to meet our customers’ needs and make it easy for them to do business with us.
You will work fairly independently on more complex operational & technical projects, issues, systems and applications.
You will use your experience, expertise and skills to solve problems that are more difficult & infrequent.
• Leading multiple projects using on-site resources to maintain and develop applications for both enterprise releases and smaller maintenance releases.
• Working closely with the business team, 3rd party vendors, and other internal IT teams to deliver projects on time.
• Good experience with Hadoop Big data tools : HDFS, Hive, Hive Meta store, Pig/Oozie etc
• Experience in developing Data ingestion pipeline, data processing and data modelling process
• Experience in taking proactive actions to identify typical problems and issues during the normal course of work and solve them with minimum guidance.
• Gathering metadata and/or data quality requirements about critical data elements in data lake from across the business units
• Record the meta data information in Informatica, Client approved tools and prepare it for review by business users and partners
• Support Data quality efforts, including data profiling, business rule development and helping identify standards that may be necessary to support data quality What we are looking for:
• Bachelor’s degree or four or more years of work experience.
• Experience in real time streaming tools like Kafka, Pulsar & messaging bus is a key
• Four or more years of relevant work experience in implementing data governance tools like Collibra, Informatica.
• Strong data analytical skills and content knowledge of Data Governance, Metadata Management and/or Data Quality required. Familiarity with data governance tools like Collibra, Atlas, Informatica (Metadata Manager or IDQ), Oracle EDQ a plus, but not essential.
• Hands on experience in design, construct, test, tune and deploy ETL infrastructure based on Hadoop ecosystem
• Experience in writing business definitions of tables, entities, files, columns, attributes and fields.
• Understanding of business and technical metadata, data lineage and data models and methods to document this metadata for both technical and business users
• Experience developing applications in Agile methodology and CICD pipeline process
• Experience with DevOps automation and tool chain including Jenkins, Jira, GIT, and Maven.
• Experience in ETL development for KPI extraction and the data serving layer using Pig, Hive, Sqoop, Oozie
• Experience working with Big Data Flow tools – Apache Nifi, Data Highway etc.
• Experience managing Hadoop platform across multiple data centers
• Experience with UDFs (Java) and understanding of distributed caching mechanisms as part of Pig ETL.
• Experience with multiple database engines like Oracle, MySQL, Teradata
• Must have strong programming experience with languages like Java, Python or Scala
• Strong experience with UNIX/Linux systems including scripting
Even better if you have:
• A Master’s degree in Data Science or Business Analytics
• Experience in implementing Data Science/AI/ML use cases using Jupyter notebook, SparkML, SparkOnTensorflow
• Familiarity with Scaled Agile Framework
• Experience creating dashboards and reporting using visualization tools like Tableau, Qlik etc.