Data Engineer Java Developer

Location:

Posted:

March 15, 2023

Contact this candidate

Resume:

SARANYA DINESH CHANDAR

(Azure Data Engineer)

Phone: +1-513-***-****

Email: ****************@*****.***

PROFESSIONAL SUMMARY

Data Engineering professional with solid foundational skills and proven tracks of implementation in a variety of data platforms. Self-motivated with a strong adherence to personal accountability in both individual and team scenarios.

Over 9+ years of experience in IT industry which includes 5+ years of experience on Azure Data Engineering, Data Pipeline Design, Development and Implementation as a Data Engineer/Data Developer and Data Modeler, Big Data/Hadoop. 4+ years of experience on Java/J2EE

Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.

Experience on Migrating SQL database to Azure Data Lake, Azure data Lake Analytics, Azure SQL Database, Data Bricks and Azure Synapse.

Experience in Developing Spark applications using PySpark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.

Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Kafka (distributed stream processing).

Domain Knowledge of Finance, Logistics and Health insurance.

Strong skills in visualization tools Power BI, Confidential Excel - formulas, Pivot Tables, Charts and DAX Commands.

Expertise in various phases of project life cycles (Design, Analysis, Implementation and testing).

Hands - on experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake.

Good experience in tracking and logging end-to-end software application build using Azure Devops.

Have Data warehousing experience in Business Intelligence Technologies and Database with Extensive Knowledge in Data analysis, TSQL queries, ETL & ELT Process

Designed Enterprise level data warehouse solutions and lead multiple implementations.

Has good experience working with 11 to 12 TB‘s SQL databases.

Have extensive experience in creating pipeline jobs, schedule triggers using Azure data factory.

Good Experience in implementing and orchestrating data pipelines using Oozie and Airflow.

Experience in designing star schema, Snowflake schema for Data Warehouse, ODS architecture.

Expertise in migrating SAP databases to Oracle environment by first assessing the requirements, creating a plan, preparing the infrastructure, performing the migration and finally validating the migration to success according to client requirements.

Experience with Snowflake Multi - Cluster Warehouses.

Understanding of Snowflake cloud technology.

Proficient at installing and configuring Oracle 10g/11g/12c/18c/19c multi-node RAC database instances on ASM/OCFS/VCS/Sun Cluster in Windows and UNIX flavors.

Upgraded different versions of Cluster ware by CPU patch using OPATCH and migrated the database from non-cluster to cluster using RMAN/R-Config.

Configured Oracle Cluster File System (OCFS2) and RAW devices to host redundant OCR files and VOTING disks.

Extensive understanding of RAC and NON-RAC databases.

Successfully installed and configured Golden Gate for data replication and cross migrations.

SKILLS

Big Data Technologies

Hadoop, Map Reduce, HDFS, Sqoop, Hive, HBase, Flume, Kafka, Yarn, Apache Spark.

Databases

Oracle, MySQL, SQL Server, MongoDB, Dynamo DB, Cassandra, Snowflake.

Programming Languages

Python, Pyspark, Shell script, Perl script, SQL, Java.

Tools

PyCharm, Eclipse, Visual Studio, SQL*Plus, SQL Developer, SQL Navigator, SQL Server Management Studio, Eclipse, Postman.

Version Control

SVN, Git, GitHub, Maven

Operating Systems

Windows 10/7/XP/2000/NT/98/95, UNIX, LINUX, OS

Visualization/ Reporting

Tableau, ggplot2, matplotlib

EDUCATION:

Bachelors of Electronics and Communication Engineering from Annamalai University, India.

WORK HISTORY

Client: Johnson & Johnson, Cincinnati, OH Oct 2021 – till now

Role: Azure Data Engineer

Responsibilities:

Designed and Configured Azure Cloud relational servers and databases analyzing current and future business requirements.

Performed PySpark tuning on Azure data bricks.

Implemented production pipeline on Azure data bricks

Worked on migration of data from On-Prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).

Have good experience in setting up separate application and reporting data tiers across servers using Geo replication functionality.

Implemented Disaster Recovery and Failover servers in Cloud by replicating data across regions.

Have extensive experience in creating pipeline jobs, scheduling triggers, and Mapping data flows using Azure Data Factory (V2) and using Key Vaults to store credentials.

Good experience in creating Elastic pool databases and schedule Elastic jobs for executing TSQL procedures.

Worked on creating tabular models on Azure analysis services for meeting business-reporting requirements.

Have good experience working with Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW).

Worked on creating correlated and non-correlated sub-queries to resolve complex business queries involving multiple tables from different databases.

Developed business intelligence solutions using SQL server data tools 2015 & 2017 versions and load data to SQL & Azure Cloud databases.

Perform analyses on data quality and apply business rules in all layers of data extraction transformation and loading process.

Perform validation and verify software at all testing phases, which includes Functional Testing, System Integration Testing, End-to-End Testing, Regression Testing, Sanity Testing, User Acceptance Testing, Smoke Testing, Disaster Recovery Testing, Production Acceptance Testing and Pre-prod Testing phases.

Worked on SnowSQL and Snowpipe

Converted Talend Joblets to support the snowflake functionality.

Created Snowpipe for continuous data load.

Used COPY to bulk load the data.

Created data sharing between two snowflake accounts.

Have good experience in logging defects in Jira and Azure Devops tools.

Involved in planning cutover strategy, go-live schedule including the scheduled release dates of Portfolio central DataMart changes.

Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data.

Understand current Production state of application and determine the impact of new implementation on existing business processes.

Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics.

Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.

Loaded the tables from the azure data lake to azure blob storage for pushing them to snowflake

Wrote ETL jobs to read from web APIs using REST and HTTP calls and loaded into HDFS using java and Talend.

Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.

Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

Implemented autoloader functionality using Azure data bricks.

Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.

To meet specific business requirements wrote UDF’s in Scala and PySpark.

Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.

Hands-on experience on developing SQL Scripts for automation purpose.

Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS).

Environment: Microsoft SQL Server 2012, 2016, SSDT-2012 &2015, Azure Synapse Analytics, Snowflake Azure Data Lake & BLOB, Azure SQL, Azure data factory, Azure analysis services, BIDS.

Client: Guardian, Cincinnati, OH Jun 2020 –Sep 2021

Role: Azure Data Engineer

Responsibilities:

Responsible for gathering requirements, system analysis, design, development, testing and deployment.

Developed Spark applications using Spark-SQL and PySpark in Databricks for data extraction, transformation, and aggregation from multiple file formats for Analyzing& transforming the data to uncover insights into the customer usage patterns.

Design & implement migration strategies with Azure suite: Azure SQL Database, Azure Data Factory (ADF) V2, Azure key Vault, Azure Blob Storage.

Created end-to-end Data pipelines using ADF services to load data from On-Prem to Azure SQL server for Data orchestration.

Build scalable and reliable ETL pipelines to pull large and complex data from different systems efficiently.

Worked on building the data pipeline using Azure Service like Data Factory to load the data from Legacy, SQL server to Azure Data Datawarehouse using Data Factories and Databricks Notebooks.

Develop Spark applications using pyspark, spark SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing, and transform.

Designed and developed Real time stream processing application using Spark, Kafka, and Scala to perform streaming ETL.

Used Spark Streaming to divide streaming data into batched as an input to spark engine for batch processing.

Developed Python based API (RESTful Web Service) using Flask. Involved in Analysis, Design, and Development and Production phases of the application.

Rewrite existing Java application in Python module to deliver certain format of data.

Developed Python batch processors to consume and produce various feeds.

Constructed product-usage data aggregations using Py-Spark, Spark SQL and maintained in Azure Datawarehouse for reporting, data science dash boarding and ad-hoc analyses.

Used Jenkins for continuous integration services.

Involved in AJAX driven application by invoking web services/API and parsing the JSON response.

Created a Git repository and added the project to GitHub.

Utilized Agile process and JIRA issue management to track sprint cycles.

Environment: Azure, Azure Data Factory, Databricks, PySpark, Python, Pandas, Numpy, PyCharm, REST API, Flask, JSON, Node.js, JIRA, GitHub.

Client: caterpillar, TN Jan 2019 –May 2020

Role: Data Engineer

Responsibilities:

Involved in complete project life cycle starting from design discussion to production deployment.

Developed a job server (REST API, Spring boot, ORACLE DB) and job shell for job submission, job profile storage, job data (HDFS) query/monitoring.

Design and develop a daily process to do incremental import of raw data from DB2 into Hive tables using Sqoop.

Involved in debugging Map Reduce job using MR Unit framework and optimizing Map Reduce.

Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest data into HDFS for analysis.

Used Oozie and Zookeeper for workflow scheduling and monitoring.

Primarily involved in Data Migration using SQL, SQL Azure, Azure storage, and Azure Data Factory, SSIS, PowerShell.

Develop New Spark Sql ETL logics in Big Data for the migration and availability of the Facts and Dimensions used for the Analytics

Expert in developing JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data.

Create Azure storage blob to Azure SQL, to load from web API to Azure SQL and scheduled web jobs for daily loads.

Have good experience working with Azure BLOB and Data lake storage and loading data into Azure SQL Synapse analytics (DW).

Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity.

Implemented data ingestion from various source systems using Sqoop and PySpark.

Hands on experience implementing Spark and Hive jobs performance tuning.

Designed Hive external tables using shared meta-store instead of derby with dynamic partitioning &buckets.

Worked on Big Data Integration &Analytics based on Hadoop, SOLR, Spark, Kafka, Nifi and web Methods.

Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.

Participated in building the data lake environment on Hadoop (Cloudera) and building the Campaign Pre-processing and opportunity generation pipeline using Hadoop services such as Hive and Spark.

Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.

Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark with Hive and SQL/Teradata.

Involved in design and developed Kafka and Storm based data with the infrastructure team.

Worked on major components in Hadoop Ecosystem including Hive, PIG, HBase, HBase-Hive Integration, Pyspark, Sqoop and Flume.

Environment: Hadoop, Hive, Impala, Oracle, Spark, Sqoop, Oozie, Map Reduce, Teradata, SQL, Kafka, Zookeeper, Pyspark.

Client: William Scotsman, SFO, CA Aug 2017 – Dec 2018

Role: Big data Developer.

Responsibilities:

Involved in Requirement gathering, Business Analysis and translated business requirements into technical design in Hadoop and Big Data.

Involved in SQOOP implementation, which helps in loading data from various RDBMS, sources to Hadoop systems and vice versa.

Developed Python scripts to extract the data from the web server output files to load into HDFS.

Written a python script, which automates to launch the EMR cluster and configures the Hadoop applications.

Extensively worked with Avro and Parquet files, converted the data from either format Parsed Semi Structured JSON data, and converted to Parquet using Data Frames in PySpark.

Involved in Analyzing system failures, identifying root causes, and recommended course of actions, Documented the systems processes and procedures for future references.

Involved in Configuring Hadoop cluster and load balancing across the nodes.

Involved in Hadoop installation, Commissioning, Decommissioning, Balancing, Troubleshooting, Monitoring and, debugging Configuration of multiple nodes using Hortonworks platform.

Involved in working with Spark on top of Yarn/MRv2 for interactive and Batch Analysis.

Involved in managing and monitoring Hadoop cluster using Cloudera Manager.

Used Python and Shell scripting to build pipelines.

Developed data pipeline using Sqoop, HQL, Spark and Kafka to ingest Enterprise message delivery data into HDFS.

Developed workflow in Oozie also in Airflow to automate the tasks of loading data into HDFS and pre-processing with Pig and Hive.

Assisted in creating and maintaining technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.

Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive semi structured and unstructured data. Loaded unstructured data into Hadoop distributed File System (HDFS).

Environment: Hadoop 3.0, Hive 2.1, J2EE, JDBC, Pig 0.16, HBase 1.1, Sqoop, NoSQL, Impala, Java, Spring, MVC, XML, Spark 1.9, PL/SQL, HDFS, JSON, Hibernate, Bootstrap, jQuery.

Client: Ncs Pearson India private limited, Delhi, India Jul 2012 – Nov 2016

Roles: Java Developer

Responsibilities:

Involved in Analysis, Design and implementation using [AGILE] methodology for iterative development of application.

Participated in Scrum, Retrospective and Release Planning Meetings.

Developed application-using spring, Hibernate, Web Services (RESTFUL).

Worked on Spring Controllers, configured using annotations to handle presentation requests, and integrated with services and DAO using annotation.

Used Spring Boot, which is radically faster in building services and develop spring, based application with minimal configuration.

In addition, used functions of Stream including Map, Filter, Reduce and Collect.

Used JSP, JSTL and spring tags for retrieving and displaying data.

Helped build team to integrate code Continuous Integration using Jenkins.

Used Log4j framework to log system execution details to log files.

Used GIT version controlling to maintain project versions.

As part of Quality assurance participated in Code review process.

Used Tomcat web server to deploy application locally and used IBM Web Sphere to deploy application in production

Environment: Java 8, JSP, Servlets, Spring MVC/ DI/ AOP/ Templates, Spring Boot, Hibernate, Web Services, RESTFUL Services, Jenkin, UML, Tomcat Server, Eclipse, Oracle, Linux/Unix.

Contact this candidate