Resume

Data Analyst Engineering

Location:

Beaverton, OR

Posted:

April 01, 2024

Contact this candidate

Resume:

EZHILARASI CHINNAPAIYAN

Email: ad4pou@r.postjobfree.com

PH: +1-971-***-****

** ***** ** *** *** AWS Azure Certified Spark Python Scala SQL

PROFESSIONAL SUMMARY

10+ years of professional IT experience with Data warehousing and Business Intelligence background in Desinging, Developing,Analysis,Implementation and post implementation support and enhancement.

Around 6 years of experience in a data engineering and Data Analyst role on BIG DATA using HADOOP framework and related technologies such as HDFS, MapReduce,YARN,Spark,Scala,Python,PySpark, HIVE, OOZIE,AIRFLOW.

Extensive experience on Data Engineering field including Ingestion,Datalake,Datawarehouse,Reporting,Analysis and managing BILLIONS of data.

Familiarity with cloud platforms (e.g., AWS, Google Cloud Platform (GCP), Azure) for data storage and processing.

Extract, load, model, and reconcile large amounts of data across multiple system platforms and sources

Perform data mining, cleansing, and transformation tasks to ensure data accuracy, completeness, and integrity.

Experience on DBT tools for developing and managing data transformations

Strong hands on experience with SQL, SQL Server, Teradata and postgres.

Experience on ELK stack to monitor health, build and maintain logging pipelines, improve logging efficiencies, measure alerting efficiency, and reduce time-to-detect and time-to-respond.

Hands on experience in Spark to improve the performance and optimization using various transformations in Hadoop using Spark-Context, Spark-SQL, Data Frame and Pair RDD's.

Use expert SQL skills to provide data extracts as needed to address project and business need

Hands of Experience in GCP,BigQuery,GCS bucket, DataProc,cross-project access,hashing,data-security.

Strong knowledge and hands on experience on datalake, data modeling, Data processing and ETL processing in Azure,GCP cloud platforms.

Experience with database SQL database like Azure SQL database, Postgres SQL, SQL server, Teradata

Responsible for implementing a generic framework to handle different data collection methodologies from the client primary data sources, validate transform using spark and load into S3 or GCP buckets.

Extensive experience on SQL, Data Analysis,SPARK, SCALA in GCP environment

Created modules using scala and python programming in spark environment.

Experience with ETL tools for developing datamodels using DBT tools and AWS Redshift,glue and analysis

Created logical view instead of tables in order to enhance the performance of HIVE queries

Written OOZIE, AIRFLOW workflow to invoke the Jobs in predefined Interval.

Experience in creating batch DATAFRAME and fine tuning spark applications using broadcasting,cache and persist methods

Experience with version control and DevOps platforms such GitHub, GitLab

Solid experience in working various data formats like Parquet, Orc, Avro, Json etc.,

Experience in ingesting, transforming, and analyzing large datasets to support the Enterprise in the Data Factory on Google Cloud Platform

Experience with version control and DevOps platforms such as GCP DevOps, GitHub, GitLab and using Agile methods and project management tools like Jira,kanban boards.

Experience with data warehouses like BigQuery, Databricks, Snowflake,DBT and Postgres, SQL Server

Techical Experience with Azure Data Factory, Azure DevOps, Data Catalogue tools, AD Security Azure Databricks, Client tools like SSMS Azure Data Studio, Powershell knowledge, PySpark, SparkSQL, ER Modeling.

Proficient knowledge in Data Lake, Data Warehouse and Snowflake platforms to analyse and store data in AWS / Azure Data Factory

TECHNICAL SKILLS:

Hadoop/Spark Ecosystem: Hadoop, Map Reduce, Hive, YARN, Flume, Sqoop, Oozie, Spark, Airflow.

Hadoop Distribution: Cloudera distribution and Horton works.

Cloud: GCP – Big Query, GCP buckets, dataProc clusters AWS – S3,EC2,EMR,Glue,Redshift,Data Catelogue,Athena Azure:Data Factory, Azure Synapse Analytics, DataLake, Databricks,DBT

Programming Languages: Scala, Python,PySpark, C, C++, Java.

Databases: Oracle, MySQL, MangoDB, SQL Server,Teradata, Snowflake, ELK, MS Access

Operating Systems: Linux, Windows, Ubuntu.

IDE : IntelliJ, Eclipse,Pycharm,Jupyter notebook

Scripting Language: HTML, SQL, XML,HQL.

Version Control: BitBucket,git,gitlab

Methodology: Agile.

Verizon Technologies, TX April 2022 – Jan 2024

Data Engineer

Data Engineer:

Responsible for Developing code with various transformations using hive sql and spark scala .

Written Hive SQL queries into Spark transformations using Spark RDDs, dataframes using Scala and PySpark.

Expertise in analyzing source data, designing a robust and scalable data ingestion framework and data pipelines using adhering to client Enterprise Data Architecture guidelines.

Hands of Experience in GCP,BigQuery,GCS bucket, DataProc.

Hands on experience with snowflake data warehouse in GCP and AWS for storing and analyzing millions of data.

Expertise On optimizing spark Jobs in GCP using scala with bucketing, broadcasting,persists functions.

Proficiency in SQL and Experience in Spark, PySpark, Scala, Python languages for data manipulation and transformation.

Experienced in working with spark eco system using Spark SQL and scala queries on different formats like Text file, CSV file and Performed necessary Transformations and Aggregation on data model and persists the data in HDFS.

Designed, implemented and developed ETL solutions for data ingestion, cleansing, business rules execution as per the business requirements.

Knowledge and experience to use code versioning tools such as Git, GitLab,GitDesktop

Used Hive to analyze the Partitioned and Bucketed data and compute various metrics for reporting.

Written oozie and Airfloe workflow for workflow archestrtion to invoke the Jobs in predefined Intervals.

Used Postgres SQL for data storing and data analysis.

Involved in converting Hive/SQL queries into Spark transformations using Spark dataframes and Scala.

Familiarity with cloud platforms (e.g., AWS, Google Cloud Platform, Azure) for data storage and processing.

Proficient experience in AWS Services - S3, AWS Glue, AWS Lambda, Hive, EMR, Elastic Search.

High Level knowledge on DAG in creating, scheduling, monitoring multiple tasks using airflow.

Production Support :

Responsible for monitoring the production servers, scheduled jobs, incident management and receiving incidents and requests from end-users.Analyze the available data and find the root cause of the problem.

Analyzing the requests and either responding to the end user with a solution or escalating it to the other IT teams.

Ability to prioritise work to successfully deliver service to agreed levels in a diverse and constantly changing technical environment.

Environment: Spark, Hive, Spark SQL, Oozie, Airflow, Scala, PySpark, Python, Jupiter Notebook, GCS buckets, Bigquery,Dataproc Clusters,S3, Redshift, Glue, AWS Lambda, Hive, Elastic Search, Snowflake,PostgresSQL, SQL Server, Teradata, Intellij

AT&T, Dallas, TX Jan 2019 to march 2022

Data Engineer

Responsibilities:

Developed Fully automated Configuration driven data pipeline using Spark, Hive, HDFS, SQLServer, oozie and AWS S3 File storage to load the client data into Mirror databases.

Strong Experience in data engineering and building ETL pipeline in batch and streaming data using SPARK SQL.

Designed and implemented data pipelines consisting of launching several Spark clusters equipped with Glue that read the datasets from various data sources and perform transformations, analytics and finally store results to application

Performed hive performance tuning aspects like Partitioned and Bucketed data and compute various metrics for reporting with Map join, cost-based optimization and column level statistics.

Created Data Frames out of that data and perform transformations such as filter, join, grouping, aggregation, sorting

Involved in developing Hive DDLS to create, alter and drop Hive tables and created UDF's, UDAF's for Hive for the analysis.

Experience on DBT tools for developing and managing data transformations

Work with business subject matter experts to analyze, validate, and utilize data for consumption by business stakeholders

Handled ELK stack to monitor health, build and maintain logging pipelines, improve logging efficiencies, measure alerting efficiency, and reduce time-to-detect and time-to-respond.

Determine the required data and transformation needs for project deliveries, scope, acceptance, installation and deployment

Responsible for understanding the ETL flow to document data flow throughout the organizationCreating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS.

Good working experience on Hadoop tools related to Data warehousing like Hive and also involved in extracting the data from these tools on to the cluster using Sqoop.

Strong experience with ETL, Data Pipelines with cloud infrastructure of Azure, Azure Data Factory (ADF).

Hands on experience with azure Databricks and done coding with Python and pyspark.

Environment: Spark,Python,PySpark, Hive, Spark SQL, Oozie, postgresSQL,Jupiter Notebook, Azure,Azure Data Factory, Azure Datalake,Azure Syanapse Analytics,Databricks,DBT,SQL, Snowflake, Unix Shell Scripting, Tableau.

Zubaid InfoTech, Chennai, India December 2016 –November 2018

Data Analyst

Responsibilities:

Responsible for managing data coming from different sources like RDBMS, Oracle, Mainframe Systems & Web server logs.

Handled Data Ingestion: Importing and exporting data into HDFS using SQOOP.

Load and transform large sets of structured and semi structured data like XML, JSON format.

Experienced in developing Hive Queries which converts as Map Reduce programs on different data formats like Text file, CSV file, Log files etc.

Involved in creating Hive tables, and loading and analyzing data using hive queries.

Developed and maintained data models, algorithms, and statistical analyses to extract actionable insights from large datasets

Collaborate with cross-functional teams to understand business requirements and translate them into data-driven solutions.

Advanced Excel skills for data manipulation, analysis, and reporting Assisted in exporting analyzed data to relational databases using Sqoop.

Environment:: Hadoop, HDFS, Hive, Map Reduce, Nifi, Sqoop, LINUX, SQL,HQL,SQL Server .

Siri Data Services Private Ltd March 2015 - November 2016

Senior Developer

Responsibilities:

Responsible for designing database.(database tables design with relationships)

Involved in modifying the components using Java technologies to implement the Business logic as per the needs of the client.

Responsible for writing all the required SQL Scripts, Joins, Stored Procedures and triggers.

Involved in developing the code for use bean, DAO classes and POJO classes for data base interactions.

Responsible for developing Arrays and Collection API concepts and handling Exceptions.

Responsible for developing data warehousing design based on OLTP.

Environment: - Tomcat, MySQL, Maven, Java SE, JSP, Oracle, MySQL,SSIS,Postgres SQL

Siri Data Services Private Ltd January 2012 - February 2015

Developer

Responsibilities:

Involved in client interaction for gathering requirements, building up a solution ti implement the business logic..

Involvement in designing and creation of DB objects as per the needs of requirements.

Involved in configuration of JDBC connection to interact with the database and then writing SQL Scripts in order to perform the DB related operations.

Developed simple web pages using JSP and handled the requests by using Java and Servlets.

Responsible for deploying the code in a local machine for end to end testing of the changes made, and then deployed the same into production server.

Development of Masters, Indexing and Quality check screens.

Created views with business logics to reduce database complexities for easily Adhoc reporting.

Involved in all stages of software life cycle in the project.

Environment:-Java SE, Jdbc, Servlets, Jsp, Html, Java Script, Tomcat, SQL Server,SSIS

EDUCATION

BACHELOR OF ENGINEERING - Computer Science (73%)

MASTER OF ENGINEERING – Computer Science (9.3/10 CGPA)

PUBLICATIONS - INTERNATIONAL CONFERENCE

Presented a paper titled “Enterprise Integration Using Open Source Middleware” in an International Conference ISSISS’09 held in GCT, Coimbatore, INDIA

Speaker in a Faculty Development Program titled “Enterprise Integration using open source middleware” held at PSG College of Technology, Coimbatore.

CERTIFICATION

Microsoft Certified: Azure Data Fundamentals

Feb 2024 Credential ID CDBA7EDB0CD3F6F2

Contact this candidate