Data Entry Mainframe Developer

Location:

Atlanta, GA

Posted:

February 09, 2023

Contact this candidate

Resume:

Senior Data Engineer

E-mail: *********.*******@*****.***

Mobile: +1-470-***-****

PROFESSIONAL SUMMARY

Around 10 years of professional experience in Information Technology and around 5 years of expertise in BIGDATA using HADOOP framework and Analysis, Design, Development, Testing, Documentation, Deployment, and Integration using SQL and Big Data technologies.

Expertise in using major components of Hadoop ecosystem components like HDFS, YARN, MapReduce, Hive, Impala, Pig, Sqoop, HBase, Spark, Spark SQL, Flume, Oozie, Zookeeper, Hue.

Good understanding of distributed systems, HDFS architecture, Internal working details of MapReduce and Spark processing frameworks.

Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.

Deployed the Big Data Hadoop application using Talend on cloud Microsoft Azure.

Involved in creating external Hive tables from the files stored in the Azure ADLS.

Good knowledge in AWS cloud services like Amazon S3, Glu, EC2 and redshift.

Optimized Hive tables utilizing partitions and bucketing to give better execution Hive QL queries.

Used Spark-SQL to read data from hive tables and perform various data cleansing, data validations, transformations, and aggregations as per down stream business team requirements.

Worked extensively with Data Science team to help productionalize machine learning models and to build various feature datasets as needed for data analysis and modelling.

Hands on experience on Scala programming to implementation spark scala code.

Good working knowledge with RDD’s and Data Frames.

Worked with different RDD transformations and actions in order to transform data.

Involved in Spark Sql integration with Hive in order to work with hive tables and process the data quickly in spark

Experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.

Excellent programming skills with experience in Core Java, C, SQL, and Python Programming.

Worked on various programming languages using IDEs like Eclipse, IntelliJ, Putty and GIT.

Experienced in working in SDLC, Agile and Waterfall Methodologies.

EDUCATION

Masters in Computer Application (M.C.A.) from Bharath University, India in 2010.

Bachelor of Sciences (Math, Physics & Computers) from Nagarjuna University, India in 2007.

TOOLS AND TECHNOLOGIES

BigData/Hadoop Technologies

MapReduce, Spark, SparkSQL, Azure, Spark Streaming, Kafka, Pig, Hive, HBase, Flume, Yarn, Oozie, Zookeeper, Hue, Ambari Server

Languages

C, C++, Core Java, Scala, Python, Shell Scripting

NO SQL Databases

Cassandra, HBase, MongoDB

Web Design Tools

HTML, JavaScript, XML

Development Tools

Microsoft SQL Studio, IntelliJ, Azure Databricks, Eclipse.

Public Cloud

Azure ADLS, Data Factory, Data Bricks

Orchestration tools

Oozie, Airflow, Azkaban, Control M, DSeries

Development Methodologies

Agile/Scrum, Waterfall

Build Tools

Jenkins, SQL Loader, Talend, Maven, Control-M, Oozie, Hue

Reporting Tools

MS Office (Word/Excel/Power Point/ Outlook)

Databases

Postgre SQL,DB2, MySQL 4.x/5.x, Oracle 11g, 12c, Teradata

Operating Systems

All versions of Windows, UNIX, LINUX.

PROJECT #:

Project Name : Visa

Client : Visa, Austin

Environment : Apache Hadoop, Hotonworks

Tools : Hive, Sqoop, Spark, Scala, Control M

Duration : May 2022 to Present

Role : Bigdata Engineer

Description:

Visa is handling different applications, that application data we are processing.

Roles and Responsibilities:

Involved in analyzing the system and business requirements.

Involved in gathering the requirements, designing, development and testing.

Worked with sqoop import commands inorder to ingest data from sql server to Hadoop.

Created Hive external tables in publish layer and loaded data into those tables.

Worked with hive optimization techniques like partitioning and bucketing to improve the query performance.

Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.

Analyzed the SQL scripts and designed the solution to implement using Scala.

Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled structured data using Spark SQL.

Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data.

Tested Apache Tez for building high performance batch and interactive data processing applications on Pig and Hive jobs.

Environment: Hadoop (HDFS, Map Reduce), Scala, Yarn, Spark, Hive, Pig, Azure ADLS, ADF, Pyspark, Hue, Sqoop, Oracle, Postgre SQL,NIFI, Git, Gerrit, Jenkins, Control M, DSeries, Jira

PROJECT #:

Project Name : Customer Data Insights

Client : Synchrony Financial, Chicago

Environment : Apache Hadoop, Hotonworks

Tools : Hive, Sqoop, Spark, Scala, Azkaban

Duration : April 2019 to May 2022

Role : Bigdata Engineer

Description:

Customer Data Insights project deals about customer data with regards to customer complaints on transactions happened on credit cards. It carries all information about customer bank account, credit card information and credits, debits.This project is migration project from Data Ware house to Data Lake. Using Abnitio graphs we have to do reverse engineering process and develop the code in spark for data ingestion into Data lake.

Roles and Responsibilities:

Involved in Implementation of the generic framework for data onboarding and processing.

Responsible for onboarding the data into HDFS from Different source system which includes File systems and RDBMS using the framework.

Responsible for maintaining / organizing the data in different layers of centralized data lake.

Building code in spark from existing Abnitio graphs.

Meeting the business requirements and do unit testing for developed code.

Code Integrations using Git bash and Bit bucket.

Jenkins build.

Created various hive external tables, staging tables and joined the tables as per the requirement. Implemented static Partitioning, Dynamic partitioning, and Bucketing.

Worked with various HDFS file formats like Parque, ORC, Json for serializing and deserializing.

Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Azure, PySpark, Pair RDD's, Spark YARN.

Used PySpark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.

Environment: Hadoop (HDFS, Map Reduce), Scala, Yarn, Spark, Hive, Pig, Azure ADLS, ADF, Pyspark, Azure Databricks, Mongo DB, Control M, HBase, Hue, Sqoop, Oracle, Postgre SQL,NIFI, Git, Gerrit, Jenkins, Tonomy, Jira

PROJECT #:

Project Name : Barclay’s – ABSA Power curve

Client : ABSA, Johannesburg

Environment : Apache Hadoop, Hotonworks

Tools : Hive, Sqoop, PySpark, Scala, Azkaban

Duration : Sep 2018 to April 2019

Role : Data Engineer

Description:

From different external vendor’s data is coming to Power Curve which is accommodated in Sql Server (2014). We are migrating that data to Hadoop environment in different layers like Raw, Published and Insight layers etc and building data lake.