Azure Data Engineer

Location:

Borjhar, Assam, India

Posted:

August 27, 2024

Contact this candidate

Resume:

Sai Siva Prasad. Ande

Senior Data Engineer

317-***-****/************@*****.***

Summary:

Over 10+ years of IT experience in Azure cloud services, cloud computing warehouses, ETL tools and Business Intelligence in analysis, design, development, testing and deployment of various applications.

Experience working on various domains like Banking, Healthcare and Manufacturing.

Experience in building data pipelines using Azure Data factory, Azure Databricks and loading data to Azure Data Lake and Azure SQL Data warehouse.

Experience in creating data models and schemas to support business requirements, ensuring data integrity, security, and optimal performance.

Developed robust data ingestion pipelines using Azure Data Factory, managing the extraction, transformation, and loading (ETL) processes for diverse data sources.

Implemented medallion structures in data warehousing solutions, facilitating efficient storage, organization and retrieval of large volumes of structured and unstructured data.

Experience in developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analysing & transforming the data to uncover insights into the customer usage patterns.

Strong experience in writing scripts using Python API, PySpark API and Spark API for analysing the data.

Experience in organizing transformation logic into modular DBT models, promoting code reuse and maintainability across projects, resulting in streamlined development workflows.

Experience in configuring and fine-tuning Databricks clusters for different workloads.

Utilized R for data cleaning, processing and transformation tasks, ensuring data quality and integrity in preparation for downstream analytics.

Implemented Apache Airflow for authoring, scheduling and monitoring data pipelines.

Developed custom connectors to external APIs, streaming data retrieval processes and improving overall dataflow efficiency.

Ability to write scripts (e.g., Python) for automating data pipelines, monitoring, and maintenance tasks on Snowflake.

Worked on ETL tools like Informatica Powercenter tools - Designer, Repository Manager, Workflow Manager and Workflow Monitor.

Experience in optimizing Snowflake performance by fine-tuning configurations, query optimization, and resource management.

Hands on experience on Tableau desktop, Tableau Reader and Tableau server.

Experienced in handling different file formats like Text file, parquet files and Json files.

Experienced in building Automation Regression Scripts for validation of ETL process between multiple databases like Oracle, SQL Server and Hive using Python.

Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.

Wrote AZURE POWERSHELL scripts to copy or move data from local file system to HDFS Blob storage.

Designed the end-to-end flow to deliver/consume the raw/clean logs to/from HDFS and AWS S3.

Proficient in leveraging DBT for building and managing complex data transformation pipelines, ensuring efficient and reliable data processing from raw sources to curated datasets.

Experience in Data Integration and Data Warehousing using various ETL tools Informatica PowerCenter, AWS Glue, SQL, Data Flow Server Integration Services (SSIS), Talend.

Experienced on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Sqoop.

Experienced in integrating Snowflake with BI and analytics tools such as Tableau, Looker, or Power BI, enabling users to perform ad-hoc queries and create interactive dashboards and reports on Snowflake data

Knowledge of Snowflake security features and best practices for data security, access control, and compliance requirements.

Experience with ETL workflow Management tools like Apache Airflow and have significant experience in writing the python scripts to implement the workflow.

Hands on experience in AWS Cloud in various AWS services such as Redshift cluster, Route 53 domain configuration.

EDUCATION:

Master of Sciences in Computer Science, Trine University, USA, 2022.

TECHNICAL SKILLS:

BigData/Hadoop Technologies

Map Reduce, Spark, Spark SQL, Azure, Spark Streaming, Kafka, PySpark, Pig, Hive, HBase, Flume, Yarn, Oozie, Zookeepe.

Languages

HTML5, DHTML, WSDL, CSS3, C, C++, XML, R/R Studio, SAS Enterprise Guide, SAS, R FORTRAN, DTD, Schemas, Json, Ajax, Java, Scala, Python (NumPy, SciPy, Pandas, Genism, Keras), Java Script, Shell Scripting,

NO SQL Databases

Cassandra, HBase, MongoDB, Maria DB

Web Design Tools

HTML, CSS, JavaScript, JSP, jQuery, XML

Development Tools

Microsoft SQL Studio, IntelliJ, Azure Data bricks, Eclipse, NetBeans.

Public Cloud

EC2, IAM, S3, Auto scaling, Cloud Watch, Route53, EMR, Redshift, Snowflake.

Development Methodologies

Agile/Scrum, UML, Design Patterns, Waterfall

Build Tools

Jenkins, Toad, SQL Loader,T-SQL, PostgreSql, Talend, Maven, ANT, RTC, RSA, Control-M, Oozie, Hue, SOAP UI

Reporting Tools

MS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS, cognos, Tableau.

Databases

Microsoft SQL Server 2008,2010/2012, MySQL 4.x/5.x, Oracle 11g, 12c, DB2, Teradata, Netezza

Operating Systems

All versions of Windows, UNIX, LINUX, Macintosh HD, Sun Solaris

WORK EXPERIENCE:

Client: Merchants bank of Indiana, IN April 2024 – Present

Role: Senior Data Engineer(Azure Fabric)

Responsibilities:

Designed and implemented robust data pipelines on Azure Service Fabric to handle ingestion, transformation, and storage of terabytes of data daily.

Developed and maintained scalable, fault-tolerant microservices using Azure Service Fabric, ensuring high availability and performance of critical data applications.

Integrated Azure Data Factory with Azure Service Fabric to automate ETL processes, enhancing data flow efficiency and reliability.

Steered the team through critical project challenges, demonstrating decisive leadership and problem-solving skills. Successfully navigated tight deadlines and unexpected issues, maintaining project timelines and quality standards

Built a data warehousing solution utilizing Azure SQL Data Warehouse, improving data query performance and enabling advanced analytics.

Championed diversity and inclusion within the team, promoting a diverse range of perspectives that enriched team creativity and problem-solving abilities.

Implemented data backup and disaster recovery strategies for Azure Data Lake Storage and Azure SQL databases, ensuring data integrity and availability.

Optimized data workflows and processes by leveraging Azure Databricks, resulting in a 30% reduction in data processing time.

Ensured data security and compliance by implementing Azure security best practices, including encryption, identity management, and access controls.

Used Azure Service Fabric for managing and orchestrating microservices, ensuring seamless data flow and minimal downtime.

Developed complex data transformation workflows with Azure Data Factory, integrating multiple data sources and transforming data for business intelligence.

Managed and optimized data models for use with Azure Synapse Analytics, enabling rapid and flexible data analysis.

Client: FSSA State of Indiana, IN Feb 2023 – March 2024

Role: Senior Data Engineer

Responsibilities:

Designed and implemented scalable and high-performance data architectures on Azure, incorporating services such as Azure Data Lake Storage and Azure SQL Data Warehouse.

Developed robust data ingestion pipelines using Azure Data Factory, managing the extraction, transformation, and loading (ETL) processes for diverse data sources.

Creating data transformation and processing logic using SQL and data transformative activities.

Developing Data integration solutions, ensuring data quality and reliability.

Using a medallion lake house architecture design, we have developed bronze, silver and gold layers.

Automating data workflows and orchestrations in ADF and Azure Databricks.

Implemented data ingestion pipelines using PySpark to extract, transform, and load (ETL) data from diverse sources.

Utilized PySpark transformations and actions to clean, filter, and manipulate data as per business requirements.

Performance tuning and optimisation of Spark Jobs on Databricks clusters.

Designed, developed, and maintained APIs for data extraction, transformation, and loading (ETL) processes.

I used to Design and implement job scheduling and automation workflows using Databricks Jobs.

Designed and implemented Delta tables in Azure Databricks for efficient storage and processing of structured and semi-structured data.

Implemented RESTful and SOAP APIs to facilitate seamless data communication between systems and applications.

Implemented secure authentication mechanisms for API access with Azure environments, including the use of Azure Active Directory and API keys.

Troubleshooting and resolving issues in Databricks notebooks and clusters.

Integrated Azure Logic Apps with external systems and APIs, facilitating data exchange and synchronization with third-party applications.

Used windows Azure SQL reporting services to create reports with tables, charts and maps.

Performed several ad-hoc data analysis in Azure Data bricks Analysis Platform on KANBAN board.

Created complex stored procedures, triggers, cursors, tables, views, and other database objects using T-SQL

Used T-SQL to create queries and stored procedures to produce reports for end-users.

Created interactive and visually appealing dashboards using Tableau Desktop to convey complex data insights to stakeholders.

Collaborated with business analysts and stakeholders to understand reporting requirements and translate them into Tableau solutions.

Created Tableau scorecards and dashboards using stacked bars, bar graphs, scatter plots geographical maps, Gantt charts, and other visualizations.

Implemented distributed computing solutions in R to handle big data analytics and processing tasks efficiently.

Used T-SQL to create queries and stored procedures to produce reports for end-users

Designed and implemented end to end big data platform on Teradata database.

Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL

Write UDFs in Hadoop PySpark to perform transformations and loads.

Data Ingestion to one or more cloud Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and cloud migration processing the data in Azure Databricks.

Develop Python, PySpark, Bash scripts logs to Transform, and Load data across on premise and cloud platform.

Source Analysis, tracing back the sources of the data and finding its roots though Teradata, Oracle etc.

Identifying the jobs data load the source tables and documenting it.

Implement Continuous Integration and Continuous Delivery process using GitLab along with Python and Shell scripts to automate routine jobs, which includes synchronize installers, configuration modules, packages and requirements for the applications.

Environment: Azure Databricks, Azure Data Factory,Python, PySpark, DBFS, Data Lake, Azure SQL, sharepoint, JSON, Teradata, SQL Server, Tableau, Azure Logic apps, Apache Airflow.

Client: Master Card, O’ Fallon, MO Dec 2021 - Jan 2023

Role: Senior Data Engineer

Responsibilities:

Designed and implemented end-to-end data ingestion and ETL pipelines using Azure Data Factory to extract, transform, and load data from various source systems.

Integrated Azure Data Factory with Azure SQL Database, Azure Blob Storage, Azure Data Lake Storage, and other Azure services for seamless data movement and transformation.

Utilized Azure Data Factory linked services to establish connections with external data sources.

I used to Implement dynamic allocation and scaling of Databricks clusters to match workload demands.

Orchestrated complex data workflows and dependencies using Azure Data Factory pipelines, ensuring reliable and scalable execution.

Managed scheduling, monitoring, and logging of data pipelines for efficient operation.

Implemented incremental loading strategies and Change Data Capture (CDC) mechanisms to optimize data processing and reduce latency.

Utilized Azure Data Factory capabilities to handle delta loads and track data changes efficiently.

Implemented data cleansing, validation, and enrichment processes as part of ETL workflows.

Configured robust error handling mechanisms within Azure Data Factory pipelines, including retry policies and notifications.

Integrated Azure Data Factory with external systems and data sources using REST APIs, custom connectors, and other integration techniques.

Implemented security measures, including Azure Key Vault integration, for securing sensitive information such as connection strings and credentials in Azure Data Factory.

Utilized Azure Data Factory monitoring tools to identify bottlenecks and optimize resource usage.

Configured role-based access control (RBAC) and fine-grained permissions for Databricks workloads.

Understanding of data warehousing principles and experience in building and maintaining data warehouses on Snowflake.

Developed and implemented Python scripts and applications within Azure Databricks for various data processing and analysis tasks.

Utilized Python libraries such as pandas, NumPy, and scikit-learn for data manipulation, analysis, and machine learning.

Conducted large-scale data processing and transformation using PySpark within Azure Databricks.

Developed Python scripts for real-time data processing using Databricks Structured Streaming.

Strong understanding of data modelling concepts and experience in designing and implementing data models optimized for Snowflake.

Implemented Delta tables to support both batch and streaming workloads, providing a unified platform for diverse data processing scenarios.

Implemented ETL processes and data transformations to prepare data for analytical purposes.

Implemented windowing and event-time processing for time-sensitive data.

Utilized Azure Monitor and Azure Log Analytics to monitor and log Databricks job performance and diagnose issues.

Implemented event-driven architecture using Azure Logic Apps, responding to events and triggers from various sources such as Azure Event Grid and Azure Service Bus.

Collaborated with business analysts, data scientists, and other stakeholders to understand requirements and deliver Tableau solutions.

Experience in designing, developing, and maintaining ETL pipelines within Snowflake using tools like Snowflake's native features, Apache Airflow, or other ETL tools.

Extracted data from various sources, including databases and flat files, and performed data cleansing and transformation for Tableau visualization.

Conducted performance tuning of Tableau workbooks and dashboards, optimizing queries and reducing load times.

Environment: Azure Data Factory, Azure Databricks, Tableau, Python, PySpark, MS SQL, Data Lake, DBFS, Sharepoint, Azure Loic Apps

Client: Advance Auto Parts, Glen Allen, VA (India) May 2018 – Nov 2021

Role: Data Engineer

Responsibilities:

Worked on UNIX and LINUX environments to work on the data files received from various clients and developed UNIX shell scripts to automate the build process and generated reports on top.

Worked with AWS cloud services i.e., EC2, EMR, S3 buckets

Expertise in Snowflake to create and maintain tables and views.

Experience with Snowflake cloud data warehouse for integrating the data from multiple source systems which includes loading nested json formatted data into Snowflake.

Practical expertise with loading and unloading data in bulk into Snowflake using the COPY command

Working knowledge of Python libraries such as NumPy, SciPy, matplotlib, urllib2, Data frame, Pandas, and Pytables

Experience with data transformations utilizing Python in Snowflake

Worked on creating Python scripts to migrate the data from MongoDB to Snowflake

Worked on creating Python scripts to parse JSON and XML documents and load the documents into the target database

Developed transformations logic using Snow Pipe. Hands-on experience working with snowflake utilities such as SnowSQL and SnowPipe.

Developed ETL pipelines in and out of the data warehouse using a combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.

Involved in Migrating Objects from Teradata to Snowflake and created Snow pipe for continuous data load.

Developed a POC that leverages Snowpark, a Snowflake tool, to query and process data in a pipeline.

Designed and implemented the ETL process using Talend to load data from source to target

Created and maintained secure data transfer pipelines, including batch data processing.

Worked with Autosys scheduler to schedule daily batch jobs.

Developed ETL pipelines in and out of the data warehouse using Python

Experience in change implementation, monitoring, and troubleshooting of AWS, Snowflake databases, and cluster-related issues.

Developed merge scripts to UPSERT data into Snowflake from an ETL source.

Environment: HDFS, AWS, SSIS, Snowflake, Hadoop, Hive, Hbase, MapReduce, Spark, Sqoop, Pandas, MySQL, SQL Server, PostgreSQL, Teradata, Java, Unix, Python, Tableau, Oozie, Git.

Client: SunTrust, Florence, AL (India) Jan 2016 – May 2018

Role: Data Engineer

Responsibilities:

Created consumption views on top of metrics to reduce the running time for complex queries.

Created performance dashboards in Tableau/ Excel / Power point for the key stakeholders

Generated Custom SQL to verify the dependency for the daily, Weekly, Monthly jobs.

Using Nebula Metadata, registered Business and Technical Datasets for corresponding SQL scripts

Developed spark code and spark-SQL/streaming for faster testing and processing of data.

Worked with stakeholders to communicate campaign results, strategy, issues or needs.

Experienced in working with spark ecosystem using Spark SQL and Scala queries on different formats like text file, CSV file.

Uploaded and processed more than 10 terabytes of data from various structured and unstructured sources into HDFS using Sqoop and Flume.

Incorporated predictive modelling (rule engine) to evaluate the Customer/Seller health score using python scripts, performed computations and integrated with the Tableau viz.

Wrote various data normalization jobs for new data ingested into Redshift.

Prepared Test Plan to ensure QA and Development phases are in parallel

Loading data from Linux/Unix file system to HDFS and working with PUTTY for the better communication between Unix and Window system and for accessing the data files in the Hadoop environment.

Written and executed Test Cases and reviewed with Business & Development Teams.

Monitor the Daily, Weekly, Monthly jobs and provide support in case of failures/issues.

Environment: Map Reduce, Python, Service Now, Pig, Hive, Teradata, SQL Server, Scala, Apache Spark, Sqoop, GitHub.

Client: WNS Global Services, Hyderabad, India Apr 2014 – Oct 2015

Role: Data Engineer

Responsibilities:

Design and Implemented the Sqoop incremental imports, delta imports on tables without primary keys and dates from Teradata and SAP HANA and appends directly into Hive Warehouse.

Converted Hive QL to Spark API.

Worked on on-call production issues- scrubbing, resolve hive query issues, workaround for defects within SLA duration.

Developed PL/SQL Procedures, Functions, Packages, Triggers, Normal and Materialized Views.

Utilized SQL functions to select information from database and send it to the front end upon user request.

Created an alien controller script that flies alien ships around the player while shooting at the player and gradually flies the alien ships towards the player.

Developed Spark streaming pipeline in Java to parse JSON data and to store in Hive tables.

Worked extensively with Sqoop for importing metadata from Oracle.

Involved in creating Hive tables and loading and analyzing data using hive queries.

Developed Hive queries to process the data and generate the data cubes for visualizing.

Implemented schema extraction for Parquet and Avro file Formats in Hive.

Good experience with Talend open studio for designing ETL Jobs for Processing of data.

Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.

Collaborated with the infrastructure, network, database, application, and BI teams to ensure data quality and availability.

Migrated an existing on-premises application to AWS.

Contact this candidate