Data Engineer Software Development

Location:

Smyrna, GA

Posted:

June 06, 2025

Contact this candidate

Resume:

Arvind Kumar Agrawal

******.******@*****.***/+1-848-***-****

PROFESSIONAL SUMMARY

More than 18 years of experience in Software Development, coding and application support including more than 10 years’ experience in Big Data Hadoop, Spark, Cloud AWS, Cloud Azure and Devops. I worked on different business domains financials, technology services. I have led onshore and offshore teams for successful implementation of projects.

Excellent implementation knowledge of Distributed / Enterprise / Web / Client Server systems using Java, J2EE (JSP, Servlets, JDBC, EJB, JNDI, JMS, Custom Tags), XML, Spring, Struts, AJAX, Hibernate, Web Services, ANT, JUnit, Log4J and Maven.

Hands on experience on ETL pipeline using Kafka, spark, Scala, and NoSQL and hive database.

Hands on experience on Apache spark integration with Kafka, Elasticsearch, Cassandra, MongoDB, file System source (CSV, XML, JSON) and RDBMS data source (Oracle,DB2)

Experience in docker, Kafka, Spark and Cassandra End to End streaming pipeline through airflow spark operator in cloud AWS.

Good experience in Amazon Web Services like S3, IAM, EC2, EMR, Kinesis, VPC, Dynamo DB, Redshift, Amazon RDS, Lambda, Athena, Glue, DMS, Quick Sight, Amazon Elastic Load Balancing, Auto Scaling, Cloud Watch, Cloud migrations, SQS and other services of the AWS family.

Hands on experience in Data Analytics Services such as Athena, Glue, Data Catalog & Quick Sight.

Hands on expertise with AWS Databases such as RDS (Aurora), Redshift, Dynamo DB and Elastic Cache (Memcached & Redis).

Hands on experience in Architecting Legacy Data Migration projects on-premises to AWS Cloud.

Wrote AWS Lambda functions in python for AWS's Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters.

Experience in loading data to Azure Data Lake, Azure SQL Data, Azure SQL Database, and building data pipelines using Azure Databricks, Azure, and SQL.

Experience with Azure services like HDInsight, Active Directory, Storage Explorer, and Stream Analytics.

Extensive experience in Azure Cloud Services (PaaS & IaaS), Storage, Data-Factory, Data Lake (ADLA &ADLS), Active Directory, Synapse, Logic Apps, Azure Monitoring, Key Vault, and Azure SQL.

Experience working with Microsoft Azure for building, managing, and deploying applications and creating Azure virtual machines.

Hands on experience on Scala language used in Apache spark application.

Hands on experience on writing python validation scripts for data quality check for Batch and streaming pipeline using pyspark.

Hands on experience in application servers like WebLogic 8.1, Tomcat 9, JBoss AS 7.0, WAS 7.0.

Hands on experience in Hadoop Big data technology working on MapReduce, Pig, Hive as Analysis tool, Sqoop and Flume data import/export tools.

Experienced with IDE tools such as IntelliJ IDEA for java and scala & PYCHARM IDE for python and pyspark.

Good experience in recognizing and reusing Design Patterns – classical and J2EE design patterns.

Developed architecture framework for Presentation layer, Service and Business layer using Spring, JPA, Spring boot REST and Data Access Object using Hibernate.

Proficient in relational database environments (Oracle, DB2, MS-SQL, and MySQL).

Strong experience in Systems Development Life Cycles and Object-Oriented Design and Development and Agile Development (Jira, Sprint, Stories).

Experienced in handling offshore business process model.

Proficient in MS Office, particularly Excel, PowerPoint and MS Visio.

TECHNICAL SKILLS

Programming Languages

Core JAVA, JDBC 2.0, Servlets, JSP, JavaBeans, Python, Scala, Shell Scripting

Database

Oracle 12g/11g/10g/9i

Framework

Spring 3.1/4.x, Hibernate 3.x/4.x, JPA 2.1, Struts 1.3 & Web services JAX RPC, JAX WS, JAX RS, JMS

Other Tools

TOAD, SQL Navigator, FileZilla, HP Quality Center, JIRA, Elastic Search

IDE Tools

IntelliJ IDEA 2024

Web Server

Apache Tomcat 5.5/6.0/7.0/8.0/9.0

ORM Tool

Oracle 9i/10g (SQL, PL/SQL), DB2, MySQL, Derby, SQL Server Express

Design Pattern

MVC, Service Locator, Business Delegate, DAO, Value Object, Singleton, Factory, Abstract Factory, Builder, Prototype Design Pattern

Version Control Tools

CVS, SVN, GIT

Code Review Tool

Sonar, Crucible

Database Tool

Toad, SQL Developer

JavaScript Framework

Angular JS 1.0, jQuery

Testing Framework

Mockito, JUnit

Security Framework

ACEGI Spring Security, OWSAP Security

Big Data framework

Apache Hadoop, SPARK, HDFS, Map Reduce, PIG, Hive, Sqoop, Flume, HBase and Storm, ETL

NoSQL Database

MongoDB, Cassandra

Message Broker Tool

ActiveMQ, Apache Kafka

Web Service Framework

SOAP, Restful Web service (Jersey), Micro Services

Caching Framework

EHCache

Hadoop Distributions/Spark

Cloudera distribution 6.12 and Horton Works (HDP 2.5), Spark 3.0.1(spark core, spark SQL, spark streaming)

Rest Client

Postman, Advanced Rest client, SOAP UI

AWS Cloud Technologies

S3, RedShift, EC2, RDS, DynamoDB, AWS Lambda, AWS Glue, AMS EMR,

OpenSearch, Document DB, Keyspaces, SNS, Elastic Cache (Memcached & Redis),

SQS, Event Bridge, Step Functions.

Azure Data engineering

ADF, ADB, Synpase, Azure Functions, Azure Event hub, Azure blob container, Azure VM, Azure Cosmos DB, Azure SQL Server, Azure container blob storage.

Visualization Tools

Tableau, Kibana

Devops Tools

GIT, Jenkins, Docker, Kubernetes, Helm chart, Grafana, Prometheus

Ingestion Tool

NiFi

Scheduler tools

Control-M, Airflow, Oozie

Cloud SQL Datawarehouse

Snowflakes

PROFESSIONAL EXPERIENCE

Employer: Compunnel Inc July 2022 – Till date

Client: Care First Virginia

Role: Bigdata Lead Engineer

Responsibilities:

Worked on Notification framework using Spring Boot to read data using BSON query from MongoDB and filter data based on lookup of notification Ids list from another mongo collection and send XML notification messages (JAXB) using JMS Template using IBM MQ Connection Factory to down stream system & write output JSON document into notified message Collection.

Establish and implement standards, best practices, and design patterns with respect to object Oriented Programming design and develop real-time data ingestion framework bring data from Kafka broker to be sourced by spark 3.2. Implemented CI/CD pipeline-based data pipelines in the big data framework with Python and Bash for scripting.

Worked on NiFi Kafka Spark job integration, reading click streaming data using FAST API (rest API) and write data to NiFi using HTTP Invoker, writing to Kafka topic and spark structured streaming data consumed from Kafka topic and process it and write into Cassandra table using batch streaming update mode and implemented with watermarking and checkpointing for fault tolerance.

Worked on Data quality check validation for reading csv file data frames and apply missing value, null value and duplicate record count check and applying business validation on data frame and find valid and invalid data frame and write to finally in hive partition tables on basis of date partition column.

I have experience in various transformation filter, join, union and aggregation using spark Scala transformation and enrichment data process.

Worked on REST services based microservices deployed in AWS EKS cluster and providing load balancer using HA proxy.

Implemented Airflow job for pyspark operator to schedule job to read data from Worked on Filesystem data CSV, XML to ingest using spark and doing custom validation on data and doing type casting validation to get valid and invalid data frame and based on load date and load type write data into actual hive and invalid data frame need to be written in error table in hive and generate report out of this data for total no of correct records and total no of incorrect records.

Worked on python core programming writing validation script for record count validation from RDBMS data base and hive table.

Worked on AWS glue to read parquet files arriving in S3 using custom events and convert into CSV files and write to S3. Using copy command to write to Redshift table from S3 and doing further analytics.

Worked on AWS Lambda to trigger lambda job from various trigger points S3, SQS, streams on EC2 to parse xml data in Lambda handler function and write into DynamoDB tables.

Worked on AWS EMR to setup Transient and long running cluster using AWS cli and deploy spark job on EMR master node to read data from S3 and generate output files in S3. Spark WebUI to monitor these jobs. Worked on AWS other services EC2, S3, VPC, Load balancer and create infrastructure using terraform scripts.

Worked on spark elastic search integration to write data into elastic search and generate report in Kibana.

Worked on Docker Jenkins CI/CD pipeline and integrated monitoring tool Grafana and Prometheus.

Worked on python validation script to validate data in RDBMS database and hive tables.

Setup Kafka cluster on docker containers and run spark Kafka streaming job and monitoring on Kafka cluster using Grafana and Prometheus.

Enhanced search query performance based on Splunk search queries.

Involved in monitoring server's health using Splunk monitoring and alerting tool

Involved reporting mechanism for the Splunk tool for the client.

Gained experience in the tool related Splunk DB connect and imported the data from oracle platform to Splunk platform.

Designed and developed automated data ingestion pipelines to collect raw data from various sources (e.g., APIs, databases, streaming services) into the bronze layer, using Spark and Delta Lake in Databricks.Worked on Linux shell scripting using AWK, SED and file manipulation and job scheduling and automation of jobs.

Built efficient and scalable Spark-based transformations for data cleaning, enrichment, and deduplication in the silver layer, ensuring high-quality data for downstream analytics.

Developed and managed ETL/ELT pipelines to ingest data from various sources into Snowflake, using tools like Apache Airflow, and custom SQL scripts.

Technologies Used: Java, Scala, Python, HDFS, Hive, Sqoop, Apache Spark, Restful Web service, AWS cloud, Oracle, Kafka, HBase, Elasticsearch, Grafana, Prometheus, Docker and Jenkins, AWS data bricks, Medallion Architecture, snowflakes, MongoDB

Employer & Client: Enhance It, Atlanta, GA May 2021 – June 2022

Description: This is service based project to migrate on premise data to cloud database Azure using Azure data factory ingestion of data from different sources of data file system, SQL database and NoSQL databases and doing transformations using Azure data bricks and write data to Azure synpase SQL data warehouse and used azure event hub stream analytics on logs servers’ data coming from click streaming events.

Role: Bigdata Lead Engineer

Responsibilities:

Scheduling and maintaining data warehouse ELT package load plans, and outgoing vendor sftp package files

Developed generic code modules and defined a strategy to extract data from different vendors and build ETL logic on top of them to feed the data to the central data warehouse.

Performed Data Pre-processing, Data Profiling, and Data Modeling and created frameworks using Python framework for validation and automation of regular checks.

Involved in daily operational activities to troubleshoot ad-hoc production and data issues and enhancement of infrastructure in the space of Big Data to provide better solutions to delegate the existing issues.

Developed and enhanced code for implementing required data quality checks before sending it into the cloud-based data warehouse.

Developed Spark jobs to transform data and apply business transformation rules to load/process data across the enterprise and application-specific layers.

Worked on submitting the Spark jobs which show the metrics of the data which is used for Data Quality Checking.

Working on building efficient data pipelines that transform high-volume data into a format used for analytical, fraud prevention cases.

Extensively used Splunk Search Processing Language (SPL) queries, Reports, Alerts, and Dashboards.

Excellent knowledge of source control management concepts such as Branching, Merging, Labeling/Tagging, and Integration, with a tool like Git.

Collaborated with and across agile teams to design, develop, test, implement, and support technical solutions in full-stack development tools and technologies.

Created Databrick notebooks to streamline and curate the data for various business use cases and also mounted blob storage on Databrick.

Created and maintained Snowflake schemas (star, snowflake) to organize data efficiently and enable optimized querying and reporting for business users and analysts.

Automated data workflows and processes using Snowflake Tasks, Streams, and Snowflake's Snow pipe for real-time data ingestion, reducing manual intervention and ensuring continuous data flow into the data warehouse

Collaborated with BI tools like Power BI to create automated reporting pipelines and dynamic dashboards directly on Snowflake data, providing actionable insights to business stakeholders.

Responsible for implementing monitoring solutions Azure Data Factory, Terraform, Docker, and Jenkins.

Technologies Used: Java, Hive, Spark, Java, SQL Server, PySpark, data brick, Ansible, Hortonworks, Apache Solr, Apache Tika, Linux, Azure synpase, Azure data bricks, Maven, Azure Data Factory, GIT, JIRA, ETL, Toad 9.6, UNIX Shell Scripting, Scala, Apache Nifi, Snowflakes, Power BI

Project: Risk Management data platform

Description: This is Legal domain project to develop java spring boot applications with backend database oracle and Elasticsearch used as online search database were storing json logs with Front end Angular.

Client: Ventiv, Atlanta, USA

Role: Senior Java Developer

Duration: May 2020 – April 2021

Responsibilities:

Worked on migration of RC legacy Application to IRM Spring boot application.

Worked on various modules User management, Business Rule management, Data Model Design, Change Password, Structured Search, Workflow automation, UI workbench model.

Worked on Log processing data generated by application portal and sending this log files data to Kafka topic and spark Scala consumer is analyzing this log data and find application logs and system logs and aggregation performed on different category of log data and store into HBase column family.

Worked on quick search and Advanced Query search where we are giving input for different records type Policy, Claim, Organization, Property and Risk.

Worked on Filesystem data CSV, XML to ingest using spark and doing custom validation on data and doing type casting validation to get valid and invalid data frame and based on load date and load type write data into actual hive and invalid data frame need to be written in error table in hive and generate report out of this data for total no of correct records and total no of incorrect records.

Worked on AWS cloud using AWS lambda write python code to parse xml using SQS queue and generate logs in cloud watch to monitor and AWS glue service to write crawler for converting parquet format data convert into csv file and doing analytics in Athena.

Worked on spark elastic search integration to write data into elastic search and generate report in Kibana.

Worked on Docker Jenkins CI/CD pipeline and integrated monitoring tool Grafana and Prometheus.

Worked on python validation script to validate data in RDBMS database and hive tables.

Worked on AWS EMR to schedule spark job through oozie coordinator jobs and Spark Web UI to monitor these jobs.

Worked Linux file management, shell scripting commands for deployment application on prem and cloud environments.

Technologies Used: Java, Spring boot microservices, Hive, Sqoop, Apache Spark, Restful Web service, AWS cloud, Oracle, Kafka, HBase, Elasticsearch, Grafana, Prometheus, Angular JS, Docker and Jenkins

Project: Trinity Big data platform

Description: This is Insurance domain project to read different business data files Policy, Contract, Activities data, apply data cleansing and various business transformations and write into hive table. It is using azure cloud Databricks job cluster to schedule job though azure data factory.

Client: Transamerica, USA

Role: Senior Bigdata Developer

Duration: Dec 2018 – April 2020

Responsibilities:

Writing RESTful webservices for exposing services to clients create UI based application for Trinity processes for business data of Party, Contract, Activity, Contract option and Contract fund Option.

Generating flat files for these business domain data as pipe separated to consume in big data platform.

Worked on enforce data lake files loaded from data lake to staging table in hive. Written validation in hive table for enforce tables and applying validation through dynamic sql and populate error table and based tables and apply custom validations for duplicate policy. finally load data into trinity enforce tables in hive.

Worked on Azure platform to develop azure data factory pipeline to read csv files from azure blob storage and using spark Scala activity code to store data in sql Datawarehouse.

worked on PowerShell scripts for copy files from one server to another server and connect sql server to generate reports.

Integration of Kafka with Spring using Kafka template to read rest services data and store it into Elasticsearch.

Worked on FIA enforce CSV data files to read using Apache Spark and doing validation on schema of hive tables and apply transformation on the specific column to provide transformation rule from property file and correct records insert overwrite into actual hive table and write the error data records into hive error table.

Worked on ETL pipeline designed reading data from RDBMS oracle to Kafka broker using Kafka connect and from Kafka broker to read in spark consumer and after applying transformation finally stored in Elasticsearch for reporting.

Creating dashboard application UI for jobs running for big data platform to see the status of jobs.

Technologies Used: Java, Spring boot microservices, Apache Spark, Restful Web service, azure cloud,

PowerShell script, Oracle, Kafka, Elasticsearch

Project: Data Governance System

Description: This is migration project dumping data from various data sources SQL, NoSQL, Streaming data sources to Data Lake.

Client: Citi Bank, USA

Role: Senior Hadoop Developer

Duration: June 2018 – Nov 2018

Responsibilities:

Data preparation using REST API

Integration of hive and tableau for dashboard report and graphs for visualization.

Integration of Kafka and spark streaming to read the online streaming data. After transformation and aggregation result store into HBase.

Used Microservice architecture, with Spring Boot-based services interacting through a combination of REST and Apache Kafka message brokers.

Used incremental load using Sqoop import into hive table

used python script to validate data loaded into HDFS from RDBMS

Worked on ETL pipeline to read data from data lake to landing zone and then landing zone to staging layer and from staging layer to core layer and transformed it into structured data and apply transformation and storing it into hive external tables.

worked on oozie for workflow of notifications mails to end users

Technologies Used: Java, Hive, Sqoop, Flume, Oozie, Python, Spark, REST, Service now ticketing tool

Project: Cornerstone Big data Platform

Description: This project manages all different projects running in bigdata Hadoop cluster to manage their jobs failures, analyzing their logs and create ticket to resolve issue within SLA.

Client: Amex, Phoenix, USA

Role: Senior Hadoop Developer

Duration: Feb 2018 – April 2018

Responsibilities:

Monitoring Event Engine tool which shows alerts for failed jobs. Accept the alarm and alerts and analysis of logs of job to login into edge node.

Working on service now tickets incidents to assign to work on that and service restored once issue is fixed.

Worked on different parser in big data application e.g., Delimited parser, Mainframe MR parser, fixed width parser, json parser, RFC delimited parser.

At the end of all parsers, there is a record count check, with the number of parsed records validated against the total record count sent by the SOR system as part of control info.

Worked in PDS (Platform Data Standardizer) module which is related to platform specific functions like data validation against ingest/feed layer, voltage decryption, threshold-based ingestion

Worked in FDS (Feed data Standardizer) module which creates hive tables on top of PDS output. It is expected the output of hive query should match the storage schema. The external hive meta data is purged then.

Worked on Data load module which comes after input data processing with different types Load append (Appending data into existing data), Snapshot (deduplication of data and replacing older file), Organization (transforming existing data into a new data set).

DQM is usually executed once the data has been standardized and transformed (if required).

Masking module is responsible for masking sensitive column values in the data file. A third-party tool – DgSecure – has been selected to provide the masking capability. The data guise tool accepts delimited data and a set of structure files as input.

worked on Hadoop and Apache spark jobs in production support to debug & fix the issues.

Technologies Used: Java, Hive, Sqoop, Flume, Oozie, Python, Spark, REST, Service now ticketing tool

Project: Event Management System

Description: This project is used spring boot application used with front Angular where batch and streaming events data processed and after analysis store data into oracle database.

Client: Apple, Sunnyvale, USA

Role: Senior Java Developer

Duration: June 2017 – Dec 2017

Responsibilities:

Created Login module using spring security, user management module, event module and event attendee module using spring boot architecture for various environment DEV, IT, QA, UAT and Production.

Worked on Integration User management module integration with sales platform integration using Kafka.

Worked on workflow management module to create workflow and assign workflow to contact and add registration fields in workflow.

Created RESTful web services using Spring Boot architecture.

Used Microservice architecture, with Spring Boot-based services interacting through a combination of REST and Apache Kafka message brokers.

Worked on configuration changes for different environments to design, implement and test application.

Worked on Kafka nodes scalability and no of messages in batch size and no of threads involved in pulling data using Kafka message broker.

Worked on mail management module to create segment, assign to campaign, sending mail to campaign using Email template.

Worked on Apache spark to analyze events data producing coming through rest services and kafka integration and storing into Cassandra.

Working in Agile environment with daily scrum calls and JIRA integrated for issue tracking for fixes for new requirement and existing code fixes.

Used Java 8 Lambda functions and stream API in project.

Technologies Used: Java 8, J2EE, Spring MVC, Spring Boot, Microservices, Kafka, Oracle 12g, Spring REST, Cassandra, Apache spark, Hibernate and JPA

Project: Hadoop Platform Job Monitoring (MDP)

Description: This project was maintenance production support role project where Hadoop spark jobs are controlled by Cisco Tidal tool to monitor and debug and fixed issue at priority basis. Every day was reported generated to find status of each and every job to present to stack holders to see analysis.

Client: Adobe Systems

Role: Senior Developer

Duration: January 2016 – June 2017

Technologies Used: HDFS, MapReduce, Linux, Pig, Hive, Sqoop, HBase, Oozie, Cisco Tidal, Spark, Scala, Snap Logic and Oracle

Project: Stream Processing System

Client: Bureau VERITAS, France

Role: Technical Specialist

Duration: November 2015 – December 2015

Technologies Used: Spring MVC and Hibernate, Web services and DB2, MapReduce, Pig, Hive, Sqoop

Project: Global Application Portfolio (GAP)

Client: Bureau VERITAS, France

Role: Technical Specialist

Duration: May 2015 – October 2015

Technologies Used: Spring MVC and Hibernate, Web services and DB2, MapReduce, Pig, Hive, Sqoop, Dojo

Project: Choctaw Partner Website

Client: Choctaw, USA

Role: Technical Lead

Duration: May 2014 –March 2015

Technologies Used: Spark, Kafka, zookeeper, Kafka, Storm, MQTT, HBase, RabbitMQ, Riak

Project: Incident Management System (IMS)

Client: UAE Traffic Police

Role: Senior Developer

Duration: Marh 2013 – December 2013

Technologies Used: HTML, Spring and Hibernate and SQL Server 2005

Project: Breezway CSA

Client: CSC, USA

Role: Senior Developer

Duration: November 2011 – Sept 2012

Technologies Used: HTML, Spring and Hibernate and SQL Server 2005, Salesforce, JBoss 7.1

Project: Integral Policy

Client: CSC, USA

Role: Senior Developer

Duration: March 2010 – October 2011

Technologies Used: HTML, Core Java, Spring, Hibernate, SQL Server 2005

Project: VERIZON VERIFY

Client: Verizon, Prolifics INDIA

Role: Senior Software Engineer

Duration: June 2009 – March 2010

Technologies Used: HTML, Core Java, Spring, Hibernate, SQL Server 2005

Project: EGPOLICY

Client: Generali, Japan

Role: Senior Developer

Duration: November 2007 – June 2009

Technologies Used: HTML, Core Java, Spring and Hibernate, Struts 1.2 and SQL Server 2005

Education & Certifications

Master of Computer Application (MCA), Rajiv Gandhi Technical University

Bachelor of Computer Science, Dau Dayal Institute of Vocational Education

Contact this candidate