Resume

Big Data Engineer

Location:

Irving, TX

Posted:

March 29, 2024

Contact this candidate

Resume:

Surya J

Senior Data Engineer

Phone: +1-203-***-****

Email: ad4npq@r.postjobfree.com

PROFESSIONAL SUMMARY

As Data Engineer with 9 years of experience in the software industry, I bring a wealth of experience on Azure,Amazon cloud services, Data warehousing and Big data technologies.

Expertise in setting up data warehouses and have hands-on knowledge of the services offered by Azure services like Azure Data Factory, Azure Databricks, Azure Blob Storage Gen2.

Proficient in setting up, configuring, and managing Databricks clusters for data processing and analytics workloads.

Experienced in administering production databases, ensuring data availability, performance, and security.

Hands-on experience on Databricks Workspace UI, Managing Databricks notebooks, Delta lake with Pyspark and Spark SQL.

Skilled in tuning Databricks clusters and production databases for optimal performance and efficiency.

Implemented robust backup and disaster recovery strategies for Databricks environments and production databases to safeguard against data loss.

Designed and implemented IAM roles and policies to manage access control across AWS resources.

Orchestrated serverless ETL workflows using AWS Lambda, streamlining data processing tasks and enhancing system efficiency.

Conducted performance tuning and optimization of SQL queries on Redshift for improved data retrieval speed.

Expertise in utilizing Azure Storage Explorer for managing and monitoring storage accounts and containers and integrating Azure Storage with other Azure services.

Created Batch and Structured streaming ADF pipelines using Databricks spark, Delta tables, ADLS Gen2, Azure Key-Vault and Azure Event Hub.

Experience in building ETL (Azure Data Bricks) data pipelines leveraging PySpark, Spark SQL.

Developed and implemented data ingestion pipelines using Azure Synapse Analytics to efficiently extract, transform and load (ETL) large volumes of structured and unstructured data into the data warehouse.

experience in Software Development of Data Warehousing projects using Informatica power center as ETL on Oracle, Unix using flat files.

Experience with Azure Event Hubs, Spark Streaming and Apache Kafka for messaging and streaming applications.

Extensive hands-on experience in spark performance tuning.

Utilized Snowflake to analyze large datasets, extracting valuable insights while also overseeing the management and maintenance of Snowflake environments.

Extensive experience in developing maintaining and implementation of EDW, Data Marts, ODS and Data warehouse with Star schema and snowflake schema.

Experience in building Database Architecture for OLAP and OLTP Applications, Database designing, Data Migration, Data Warehousing Concepts with emphasis on ETL.

Developed serialization and deserialization routines to convert complex data structures into a compact binary format, reducing storage footprint and improving data transfer efficiency.

Experience with data governance and data security practices to ensure data privacy and compliance.

Developed data ingestion workflows to read data from various sources and write it to Avro, Parquet, Sequence, JSON, and ORC file formats for efficient storage and retrieval.

Hands on experience with Git,GitHub to push the code for maintaining versions.

Have experience with Software Development Life Cycle and worked on Agile Methodology.

TECHNICAL SKILLS

Azure Services

Azure data Factory, Azure Databricks, Logic Apps, Functional App, Azure Synapse Analytics, Azure DevOps,AWS Glue, Redshift, S3

Hadoop Distribution

Cloudera

Big Data Technologies

Hive, Python, PySpark, HDFS

Languages

SQL, SAS, PL/SQL, Python, HiveQL.

Operating Systems

UNIX, LINUX

ETL

Informatica 9.6X

Version Control

GIT, GitHub.

Methodology

Agile

IDE &Build Tools, Design

Pycharm, Visual Studio.

Databases

MS SQL Server 2016, DB2, Azure SQL DB, Azure Synapse, Oracle 11g/12c, Teradata

Reporting Tools

SAS

PROFESSIONAL EXPERIENCE

Role: Data Engineer Mar 2023 – Till Now

Client: Elevance Health

Atlanta, GA

Responsibilities:

●Create and maintain optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure Databricks.

●Skilled in tuning Databricks clusters and production databases for optimal performance and efficiency.

●Implemented robust backup and disaster recovery strategies for Databricks environments and production databases to safeguard against data loss.

●Proficient in setting up, configuring, and managing Databricks clusters for data processing and analytics workloads for production data bases .

●Developed and maintained end-to-end operations of ETL data pipeline and worked with large data sets in azure data factory.

●Involved in designing and developing Azure steam analytics jobs to process real time data using Azure event hubs.

●Exposed transformed data in Azure Spark Databricks platform to parquet formats for efficient data storage.

●Writing PySpark and spark SQL transformation in Azure Databricks to perform complex transformations for business rule implementation.

●Develop Spark applications using PySpark and spark SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming.

● Developed a data pipeline using Kafka, Spark, and Hive to ingest, transform and analyze data. Configured and implemented the Azure Data Factory Triggers and scheduled the Pipelines.

●Configured logic apps to handle email notification to the end users and key shareholders with the help of web services activity.

●Developed ELT/ETL pipelines to move data to and from Snowflake data store using combination of Python and Snowflake Snow SQL.

●Worked on Azure Data Factory to integrate data of both on-prem (MYSQL) and cloud (Blob storage, Azure SQL DB) and applied transformations to load back to snowflake.

●Analytical approach to problem-solving; ability to use technology to solve business problems using Azure data factory, data lake and azure synapse.

●Using Power Query, the data preparation and transformation tool within Power BI, to clean, shape, and combine data from multiple sources.

●Proven ability in Performance Spark Tuning and Query Optimization

●Created a Git repository and added the project to GitHub.

●Utilized Agile process and JIRA issue management to track sprint cycles.

Environment: PySpark,, Azure Data bricks Azure Data Lake, Azure Blob storage, Azure SQL Database, Azure Data Lake Gen2, Azure data bricks, Delta lake, Snowflake, SQL Server, MS SQL, GIT,GIT HUB,Power BI

Role: Hadoop Engineer/Software Engineer Dec 2020 – Dec 21

Client: Oracle Corp Pvt Ltd (First republic Bank)

Hyderabad, India

Responsibilities:

Hands-on experience working with Hadoop Cloudera distributed platform, Microsoft Azure.

Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.

Worked on Microsoft Azure services like HDInsight Clusters, BLOB, Data Factory and Logic Apps and also done POC on Azure Data Bricks.

Create end-to-end solutions for ETL transformation jobs that involve writing Informatica workflows and mappings.

Developed ELT/ETL pipelines to move data to and from on-premises data store using informatica.

Data integration using informatica.

Deployed and optimized Python web applications to Azure DevOps CI/CD to focus on development.

Developed enterprise level solution using batch processing and streaming framework (using Spark Streaming, Apache Kafka.

Created Partitions, Buckets based on State to further process using Bucket based Hive joins.

Imported Data using informatica to load Data from MySQL to HDFS on regular basis.

proficiency in writing complex SQL queries for data retrieval, manipulation, and analysis in the Netezza environment.

Worked with Data Lakes and big data ecosystems (Hadoop, Spark, Hortonworks, Cloudera).

Load and transform large sets of structured, semi structured, and unstructured data.

Written Hive queries for data analysis to meet the Business requirements.

Wrote Hive queries for data analysis to meet the specified business requirements by creating Hive tables and working on them using Hive QL to simulate MapReduce functionalities.

Created Partitions, Buckets based on State to further process using Bucket based Hive joins.

Worked on RDD’s & Data frames (SparkSQL) using PySpark for analyzing and processing the data.

Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data

Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.

Implemented CICD pipelines to build and deploy the projects in Hadoop environment.

Using JIRA to manage the issues/project workflow.

Worked on Spark using Python (PySpark) and Spark SQL for faster testing and processing of data.

Used Git as version control tools to maintain the code repository.

Environment: Hdfs, Hive, SQL, Python, PySpark, GIT, GIT HUB, Kafka, Azure Databricks, Data Factory, ADF Pipeline, Azure SQL, Power BI.

Role: ETL Developer July 2017 – Nov 20

Client: Enormous IT Solutions (PEPSICO)

Responsibilities:

●Developing ETL(Extract/Transform/Load) mappings using Informatica to support the business requirements and loading the data from various source systems such as Oracle, Flat files and load it into downstream integration applications for reporting layer purposes.

●Performing Informatica code review and doing code deployment at each higher environment using deployment group (Using MKS tool).

●Fine tuning existing Informatica mappings for performance optimization.

●Involved in different environments of testing, trouble shooting of Informatica mappings and SQL queries for unit testing and developing test cases.

●Utilizing various transformations like Look Up, Joiner, Update Strategy, Filter, Union, Sorter, Router.

●Involved in requirement gathering, implementing the code changes, refreshing XML every sprint.

●Performed UAT deployments and coordinated PROD deployments.

●Track the reviews, documentation, test activities.

●Architected and implemented end-to-end data solutions on AWS, driving efficiency improvements and cost savings.

● Successfully integrated Informatica PowerCenter with AWS services, optimizing ETL workflows and ensuring data consistency across the organization.

●AWS Glue for ETL processes, streamlining data transformations and enhancing system scalability.

● Managed the development and maintenance of Informatica mappings and workflows, ensuring data accuracy and timeliness.

●Designed and implemented IAM roles and policies to manage access control across AWS resources.

●Orchestrated serverless ETL workflows using AWS Lambda, streamlining data processing tasks and enhancing system efficiency.

●Conducted performance tuning and optimization of SQL queries on Redshift for improved data retrieval speed.

Environment: Data Pipelines, Oracle 12c, 19c, MongoDB,, Python, informatica power Centre 9.X, AWS Glue,EC2, AWS Red shift, LAMBDA.IAM

Role: Data warehouse Developer Apr 2015 – Jun 17

Client: VARA INFO TECH

Hyderabad, India (ICICI Bank)

Responsibilities:

●Working as SQL Server Analyst / Developer / DBA using SQL Server 2012, 2015, 2016.

●Created jobs, SQL Mail Agent, Alerts and schedule DTS/SSIS Packages.

●Manage and update the Erwin models - Logical/Physical Data Modeling for Consolidated Data Store (CDS), Actuarial Data Mart (ADM), and Reference DB according to the user requirements.

●Source Controlling, environment specific script deployment tracking using TFS

●Export the current Data Models into PDF out of Erwin and publish them on to SharePoint for various users.

●Developing, Administering, and Managing corresponding databases: Consolidated Data Store, Reference Database (Source for the Code/Values of the Legacy Source Systems), and Actuarial Data Mart

●Writing Triggers, Stored Procedures, Functions, Coding using Transact-SQL (TSQL), create and maintain Physical Structures.

●Deployment of Scripts in different environments according to Configuration Management, Playbook requirements Create / Manage Files/File group - Table/Index association Query Tuning, Performance Tuning.

●Defect tracking and closing by using Quality Center Maintain Users / Roles / Permissions.

Environment: SQL Server 2012 Enterprise Edition, SSRS, SSIS, T-SQL, Shell script, Oracle 10g, visual Studio 2010

Role: Data warehouse Developer July 2014 – Mar 15

Client: Citi Financial

Kolkata, India

Responsibilities:

●Proficient in dimensional data modelling for Data Mart design, identifying facts and dimensions, and developing fact tables and dimension tables using Slowly Changing Dimensions (SCD) techniques.

●Developed complex stored procedures, efficient triggers, and necessary functions, along with creating indexes and indexed views to optimize performance in SQL Server.

●Extensive experience in monitoring and tuning SQL Server performance, employing best practices to ensure optimal database performance.

●Developed and implemented SSIS and SSRS packages to extract, transform, and load data from various sources, including DB2, SQL, Oracle, flat files (CSV, delimited), APIs, XML, and JSON.

●Skilled in error and event handling techniques, such as precedence constraints, breakpoints, check points, and logging, ensuring reliable and robust ETL processes.

●Experienced in building cubes and dimensions with different architectures and data sources for business intelligence purposes, including writing MDX scripting.

●Expertise in designing ETL data flows using SSIS, creating mappings and workflows for extracting data from SQL Server, as well as performing data migration and transformation from Access/Excel sheets using SQL Server SSIS.

●Proficient in developing SSAS cubes, implementing aggregations, defining KPIs (Key Performance Indicators), managing measures, partitioning cubes, and creating data mining models. Deploying and processing SSAS objects.

●Experience in creating ad hoc reports and reports with complex formulas, utilizing querying capabilities of the database for business intelligence purposes.

●Expertise in developing parameterized, chart, graph, linked, dashboard, scorecards, and drill-down/drill-through reports on SSAS cubes using SSRS (SQL Server Reporting Services).

Environment: MS SQL Server, Visual Studio, SSIS, Share point, MS Access, DB2 Team Foundation server, Git.

EDUCATION

•Master’s in business analytics.

Contact this candidate