Data Engineer

Location:

Posted:

May 12, 2020

Resume:

Expertise:

Having an overall experience of over *.5 years in Data warehousing and Business Intelligence and almost a year in Retail Industry.

Have good knowledge in AWS Aurora Postgres, Apache Hadoop (Hive), Teradata, Informatica, Snowflake, Power BI and excel Reporting which involves development, migration, and enhancement.

Familiar creating buckets, storing and retrieving large data files in AWS S3.

Have Experience working in agile/DevOps environment with CI/CD and application lifecycle management.

Hands on experience with some versioning tools like Azure DevOps (VSTS), GitLab and GitHub.

Over 4.5 years of experience in building Enterprise Data Warehouse/Data Marts, Dimensional modeling schemas (Star / Snow Flake), Logical/Physical Datamodels, complete Business/System analysis, solution designing for ETL applications/development, handle Release/Change management, requirement gathering, Process flow designs, production support, Testing and detail in deliverables.

Specialist in Extraction, Transformation & Loading (ETL) using Informatica PowerCenter tools, PL/SQL, data mappings, capture meta data details, change data capture (CDC) and converting business rules into technical design artifacts. Excellent at creating BRDs, SRDs, IRDs, Traceability matrix etc.

Master in multi tasking, prioritizing work/projects, all phases of ETL application SDLC, implementations, recognizing data patterns, collaboration with business/system/testing/infrastructure and development teams. Vender management experience.

Expert in writing UNIX commands related to ETL micro jobs for near-realtime processing.

Expertise in SQL Coding complex SQL querying for extraction of data & quality rules mapping.

Proficient with methodologies like AGILE/Scrum and process/timeline improvement on Jira and Target Process.

Excellent team player, Self starter, independent working ability, Analytical & Problem solving by root cause finding and Interpersonal Skills to manage business expectations in tight timelines.

Technical skills:

ETL Tools

AWS Aurora Postgres, Snowflake, BIGDATA Hadoop Hive, Sqoop, Informatica Powercenter 9.x/8.x.

Databases

Oracle 11gR2/10g/9i, Microsoft SQL Server, Mysql Server, Teradata.

Database Tools

Aurora postgres DB, Hadoop Hive, Mysql, SQL developer, TOAD, SQL*PLUS, SQL*Loader.

Programming

SQL, PL/SQL, HQL, SnowSQL, Python (Basics)

Client /server OS

Versioning Tools

Windows 10, AIX-UNIX/Linux-Solaris

Azure DevOps (VSTS), GitLab, Github

Tools

QA Tools

Reporting Tools

MS Visio, Outlook, Word, Excel, Power Point, Control M, PUTTY, Confluence.

Jira, Target Process.

Power BI, Excel Reporting, SSRS.

Education:

Bachelor of Technology in Engineering, VIT University, Vellore, India.

Professional Experience:

Data Engineer (Contract)

Lululemon Athletica (FLOW Project) (May 2019 – Present)

As a part of daily AGILE/Scrum routine we have retail business stockholders provide specs to gather from the existing source systems.

We identify the location of attributes that are necessary form different source systems and get the sign off from business and the solution architect on confluence to start creating a pipeline.

Designing extract scripts with different source systems RMS, ELA, BI Netezza in extracting the source csv files into one drive.

Pivoting the data from business excel spread sheets to a consumable format using VBA to extract in csv format.

Reverse engineering RMS (OracleDB tables) using TOAD data modeler in creating table structures to postgres DB instance.

Mainly involved building facts & dimensions and migrations scripts to load into postgres db instance.

For entire DB postgres instance I have created ODS (Operational DataStore) Data Dictionary on confluence page as a part of documentation.

All the tasks have traceability on Jira and confluence in detailed documentation.

Used Jenkins UI to deploy schema evolution into the Data base instance.

Performance tuning for the existing scripts to finish jobs quickly for supporting business with deliverables.

Creating batch scripts to run transformations and to load csv the files into Aanaplan UI and share drive.

Converting manual data transfers process to automate using TES scheduler and later migrated to Apache Airflow.

Created a process to Delete, Purge and Archive the data as per the SLA’s.

Experience in Error handling and debugging coding issues and troubleshoot production problems.

Ad-hoc meetings with the stockholders and team to implement urgent changes in the production since go-live for Lululemon.

Environment:

AWS S3, AWS Aurora Postgres sql, Github, Jira, Anaplan, TES Scheduler, Agile/scrum, Excel, Power BI.

Data Engineer (Contract)

Electronic Arts (Market Research and Business Analytics) (Jan 2019 – April 2019)

Designing DAGS with use of Apache Airflow to orchestrate Snowflake SQL.

Responsible for performing DevOps over a weekly schedule and monitoring airflow dags on Astronomer.io.

Familiarity with cloud services like Azure and AWS.

Mainly involved building PFO’s, facts & dimensions and migrations scripts to run on snowflake.

Understanding of and proficiency with connecting tobackend systems and cloud services.

Worked on the maintaining Kubernetes on GCP clusters.

Manage and maintain Snowflake SQL for creating Data Lake and downstream analytics services.

Environment:

AWS S3, Snowflake, SnowSQL, Azure DevOps (VSTS), GitLab, Docker, Airflow DAGs(Astronomer.io), UNIX AIX, Jira, Target Process(TP), Agile/scrum.

Senior Software developer (Lead)

Verilog Networks Pvt Ltd. (Chennai) (Jan 2017 –Aug 2018)

Job Responsibilities

Playing role of Lead developer in projects.

Mainly involved building facts & dimensions in Hadoop environment based on project requirements.

Developed OLAP Multidimensional cubes (SSAS) using data from Hadoop environment.

Creating Analysis and trend charts by using Power View.

Involved in Sprint Planning, Daily scrum calls, Retrospective meetings, Backlog grooming sessions.

Involved in performance tuning of hive queries.

Currently leading the team from offshore.

Have thorough knowledge in reporting, worked in Excel and Power BI reports.

Understanding the Project requirements.

Created HIVE queries for existing data to be transferred from Teradata to new Hadoop platform.

Decoded the existing Teradata system and provided mapping documents for various tracks

Responsible for System testing, Integration testing and bug fixing.

Responsible for the end-end delivery from Teradata perspective.

Environment: Teradata, Bigdata, Apache Hadoop hive, SSAS, SSRS, Power View, UNIX AIX, Excel, Power BI, MySql.

ETL Developer

FIRB Manitoba (June 2014 - March 2016)

Job Responsibilities

Designed data mapping documents and detail design documents for ETL Flow, delivered BRDs, SRDs and IRDs.

Created sessions, configured workflows to extract data from various sources, transforming data, and loading into data warehouse.

Created Informatica mappings to build business rules to load data into different Target Systems.

Extensively used Informatica Power Center Designer, Workflow Manager and Workflow Monitor to develop, manage and monitor workflows and sessions.

Most of the logics are designed to reuse by mapplets, reusable transformations, and sessions and optimized at mapping & session level. Parameterized workflows for automatic environments selection.

Extensively worked on mapping variables, mapping parameters and session parameters.

Scheduled Batches and sessions at required frequency using WF Scheduler and Crontab.

Used Erwin to understand the existing data model of the data warehouse and added new entities.

Worked with creating Dimensions and Fact tables for the data mart. Created Informatica mappings, sessions, workflows, etc., for loading fact and dimension tables for data mart presentation layer.

Have implemented SCD (Slowly Changing Dimensions) Type I and II for data load full and delta more than a 50k rows each feed daily in size with dynamic parameters generated for load process.

Did performance tuning of Informatica components for daily and monthly incremental loading tables.

Developed Mapplets, reusable transformations, source and target definitions and mappings using Informatica. Developed mapping using parameters and variables.

Created complex workflows, with multiple sessions, Worklets with consecutive or concurrent sessions. Implemented source and target based partitioning for existing workflows in production.

Involved in migration and all ETL Code from SSIS to Informatica.

Experience in writing custom code expressions in SSRS for Ad-hoc Reports.

Environment: Informatica Power Center 9.5, Oracle 11g R2, SQL server, TOAD, UNIX AIX, SQL *Loader, PL/SQL, Sun Solaris UNIX, Windows-XP, Control M.

ETL Data Consultant

Zurich, Canada (April’ 2012 –May 2014)

Job Responsibilities and Achievements:

Worked on Informatica PowerCenter to load data from various data sources i.e., Oracle database, XML Files, cobol and Flat files (fixed-length / delimited) to the Data Marts in Oracle database.

Implemented SCD (Slowly Changing Dimensions) Type II for delta data load and facts.

Created complex workflows with multiple sessions, Worklets with consecutive or concurrent sessions.

Designed workflows using Session, Command, Event Raise, Event Wait, Decision, & Email tasks in Informatica Workflow Manager for running sequential and concurrent batches.

Implemented source and target passthrough partitioning for existing workflows in production to improve performance in Informatica.

Designed and developed the reusable logic to perform ETL Audit on various sessions of workflows.

Analyzed workflows, sessions, events and error logs for trouble shooting Informatica ETL processes.

Involved in creating test plans, test scenarios to unit test Informatica mappings, sessions and workflows. Used proactive approach in performance tuning of Informatica components for daily & monthly incremental loads.

Designed and Developed Informatica mappings, sessions, workflows for loading dimensions and fact tables for data mart presentation layer.

Developed mapping using parameters and variables.

Used reverse engineering in ERwin Data Modeler to understand the existing data model of the data warehouse.

Environment: Informatica PowerCenter 9.1, ERWin Data Modeler 8.2, TOAD 9.0, PL/SQL, Flat files, XML, Oracle 11g/10g, Autosys, WinSCP, Putty, AIX-UNIX.

Contact this candidate