Azure Data Factory

Location:

Cumming, GA

Posted:

January 12, 2024

Contact this candidate

Resume:

Chittaranjan Sahoo

Cumming, GA

+1-404-***-**** ********@*****.*** LinkedIn Github Kaggle

¢ Professional Summary

IT professional with 15 years in Database Development, ETL Development, Data Engineering, Data Warehousing, Data Migration, Data Quality, Business Intelligence & Data Analytics, including 4 years in United States.

Expertise in Database development, OLAP, OLTP, ETL/ELT Data Pipelines

Expertise in ANSI SQL, PL/SQL, TSQL and RDBMS like Oracle, MS SQL Server, MySQL, PostgreSQL

In-depth knowledge of Azure data services such as Azure Data factory, Azure Data Lake Storage, Azure Databricks, Azure Synapse Analytics.

Writing Spark applications to process and analyze large datasets efficiently. This involves using Spark's core functionalities like RDDs (Resilient Distributed Datasets), DataFrames, and Spark SQL.

Experienced in working with Spark eco system using Scala and HIVE Queries on different data formats like Text file and parquet/avro/orc.

Implemented data integration solutions, including ETL (Extract, Transform, Load) processes using Azure Data Factory, Informatica, Python.

Proficiency in Azure Cosmos DB and its associated APIs

Experience in Data Visualizations using Cognos, Tableau.

Experience in Snowflake Cloud Technology.

Good in Microsoft Azure, AWS and GCP

Understanding in Data Science areas like statistical analysis, machine learning algorithms, classification, clustering, regressions, feature engineering etc.

Worked closely with the different Product team, Data Scientists, data Visualization team, Risk & Compliance team and Sourcing team to provide the required dataset.

Excellent problem-solving and debugging skills.

Strong teamwork and communication skills.

Able to adapt quickly to new technologies and challenges.

Academics

10/2011 – Master in Computer Application, MKU, TN, India

(US equivalent of Master of Science Degree in Computer Information Systems)

Certifications & Badges

Python for Data Science - University of California, San Diego (EdX)

Microsoft Azure AZ-900

Snowflake - Hands On Essentials - Data Warehouse

IBM Certified Designer Cognos 10 BI Reports

Oracle Certified Associate- Oracle Database 10g

Work Experience

# 1 Azure Lead, Cielo Talent (RPO) (through KForce)

Feb/ 2022 – Present

Working with business partners, architects, and other groups to identify the technical and functional needs of analytical systems and determine the priority of needs.

Data ingestion into Azure: Data sources included files (various formats), on-prem data warehouse (MS SQL Server). I have also built more than 200 data pipelines with output for consumption into Azure SQL Database or Azure Cosmos DB for advanced analytics. I have built pipeline orchestration using Azure Data Factory, DataBricks and Synapse.

As an Azure Data Factory (ADF) lead involved in managing and overseeing the development, deployment, and maintenance of data pipelines within the Azure Data Factory environment and led and managed a team of data engineers and developers working on Azure Data Factory projects. Lead the design, development, testing and deployment of more than 200+ data pipelines using Azure Data Factory.

Develop, test, implement, and document technical engineering solutions to assist business partners' self-service analytic needs, client reporting, and data consumption requirements.

Optimize the data integration platform to provide optimal performance under increasing data volumes.

Analyze, define, design, and document requirements for data, workflow, logical processes, and system environments.

Provide post deployment/production support for users through on Zendesk/ Jira tickets

# 2 Data Engineer, Home Depot (Inventory Planning) (through KForce)

Sep/ 2021 – Feb/ 2022

Working with Business team to understand the requirement and develop python code using Airflow to design data pipelines and SQL for transformation.

Analyzing Data and Developing pipelines in BigQuery using Airflow for downstream EDW Advanced Service Level apps to calculate weekly Safety Stock quantities for different locations.

# 3 Data Engineer, Verizon (DCIM Migration)

Feb/ 2021 – Sep/2021

Designing migration of DCIM (DataCenter Infrastructure Management) data from Oracle and SQL Server to Sunbird dcTrack and PostgreSQL

Designed and Implemented migration of Salesforce data for Oracle Fusion Application.

Used SQL Server Integration Services (SSIS) packages for data integration.

Involved in migration approach and detailed plan for migration

Analyzed interdependencies and constraints of SFDC Reports wrt Target schema

Defined data validation rules, data profiling and quality rules

Provided L2 support for user’s request, incidents and providing technical solutions.

Prepared Recon Report, DQ/DI Report, Data Validation using Tableau

# 4 Big Data Engineer, AT&T (Customer Experience Analysis of Postpaid Users)

Jun/ 2019 – Jan/2021

Participated in understanding the business problem with client.

Identified the sources required to address the business problem of classifying customer experience.

Designed a Data Lake in Hadoop (HDP) to ingest big data from heterogeneous sources like Dynatrace, Splunk, Oracle, HTML logs and Quantum Metric.

Developed scripts to extract, transform and load data into Data Lake.

Designed and developed data processing applications using Apache Spark and Scala to streamline the data ingestion process.

Experienced in working with Spark eco system using Scala and HIVE Queries on different data formats like Text file and parquet.

Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.

Automated the whole data loading process from landing to staging to master tables.

Developed Data Cleansing programs to load master tables in Hive for analysis.

Implemented Feature Engineering and done Exploratory Data Analysis

Developed python code for classification of data using Unsupervised Machine Learning.

Designed and Developed an Enterprise Data lake using Azure Data Lake Storage (ADLS) with Databricks.

Prepared 60+ aggregated tables for visualization in Cognos / QlikView

Developed a Sankey diagram to present different customer browsing behavior pattern which was highly appreciated by the client.

# 5 Business Intelligence Analyst, AT&T (Gate Keeper)

Jun/ 2018 – May/ 2019

Designed Data warehouse model for Oracle ERP using Star Schema model.

Consolidated of ETLs and Visualization into a single platform.

Implemented re-usable ETL data pipe lines for complex business rules.

Created batch Process which schedules and ran the ETL by Parameterization.

Redesigned/ tuned the long run queries for nightly load.

Created health checks to compare incoming data vs. data loaded in target.

Developed Tableau dashboards and analytical reports using KPIs.

Prepared Business Requirements, GAP Analysis & ETL Design Documents.

# 6 Business Intelligence Analyst, CenturyLink (Revenue Assurance)

Jan/ 2018 – May/ 2018

Designed enterprise Data Model for Oracle and SAP ERP system.

Created ETL data pipelines to load complex datasets in data warehouse.

Implemented SCD to track history from different OLTP systems.

Developed UNIX scripts for pre-and post-loading processes.

Migrated of various legacy applications into new ETL and visualization platform

Performance improvement of various SQL based jobs

# 7 Business Intelligence Developer, AT&T (Broadband Operations)

Apr/ 2015 – Dec/ 2017

SME for BOBI, RDW and WDW Application.

Reverse engineering of existing workflows and mapping to optimize code to achieve 20% reduction in job run time YoY.

Creation of shell script/Python scripts to generate automated alerts to avoid any potential SLA misses.

Chatbot modification and configuration to generate automated job failure, critical SLA long running alerts and Database usage alerts.

One time or adhoc Teradata load/Pull query creations to resolve skew errors for quick resolution to SLA jobs.

Modify existing Teradata queries to resolve Skew errors and Spool errors.

Modification in Informatica mappings to accommodate business requirement via Monthly maintenance change request.

# 8 Business Intelligence Developer, General Electric (Digital Energy)

Apr/ 2011 – Mar/ 2015

Developed complex aggregated queries using joins and sub queries.

Developed complex ETL jobs using Informatica and DataStage

Designed several Indexes for faster data retrieval strategy.

Created data base metadata catalogs/ data dictionary.

Provided hands-on support for all data related activities.

# 9 Business Intelligence Developer, HP

Dec/ 2008 – Mar/ 2011

Developed various data base objects.

Member of a 24x7 Oracle on Demand Team supporting Oracle Database and Application.

Created data selection rules using data query languages via joins.

Automatic Storage Management setup Reconciliation checks for the data migration

Industries:

Telecom

Retail, Orders & Sales

Manufacturing

RPO

Skills:

Data Infrastructure

Data Engineering

Big Data

Data Warehouse & BI

Data Migration

Data Integration

Data wrangling, Statistics

Master Data Management

Change Data Capture

Oracle, MS SQL Server, MySQL, PostgreSQL, MongoDB

Azure Data Factory

Azure SQL Database

Azure Blob Storage

Azure Cosmos DB

Azure Data Lake Storage

Azure Synapse Analytics

Hadoop, Hive, Sqoop, Kafka

Apache Spark, Scala

GCP, BigQuery

SQL Performance Tuning

Snowflake Cloud Database

SOUP UI, REST API, JSON

CI/CD (Git)

Unix Shell Script

Scrum, Agile Methodologies

Microsoft Excel

Contact this candidate