Data Engineer Big

Location:

West Haven, CT, 06516

Posted:

May 15, 2025

Contact this candidate

Resume:

Vijay Gopi Krishna Danda

Email: ************@*****.***

Mobile: (860) - 946 - 0057

LinkedIn: www.linkedin.com/in/gopidanda999

PROFESSIONAL SUMMARY:

• Having overall 5+ years in Big Data Technologies with Spark & Hadoop development along with cloud computing.

• Hands on experience with Big Data core components and Ecosystem (Spark, Spark SQL, Hadoop, HDFS, Map Reduce, YARN, Zookeeper, Hive, Pig, HBase, Sqoop, Python).

• Hands on experience in AWS services like Glue, Redshift, Athena, IAM, Ec2, Step function, S3, EMR, RDS, Snowflake, Hive Queries and PIG Scripting.

• Implemented couple of experiments in Azure, GCP & Snowflake to build efficient & cost- effective solutions.

• Experience in data storage, querying, processing and analysis.

• Working experience in ingesting data into HDFS using Sqoop and Talend.

• Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS.

• Exposure of ETL methodology for Data Integration, Extraction, Transformation and Loading into data lake layer using ETL tools.

• Completed couple of POC’s in Azure & GCP especially in Big Data services.

• Experience in Integrating Hive with HBase and Pig with HBase.

• Good understanding and knowledge of NOSQL database HBase.

• Hands on experience as installing configuring and using Hadoop Ecosystem.

• Worked with Terraform and Kubernetes to manage CI/CD pipelines for Azure Databricks, ADF, and SQL Server resources.

• Automated Azure DevOps pipelines to streamline releases for ADF pipelines, Databricks, ADLS, and SQL Server.

TECHNICAL SKILLS:

Big Data Ecosystem HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, pyspark, Spark, Apache Kafka, Zookeeper, Solr, Ambari, Oozie NO SQL Databases HBase, Elastic Search, Cassandra, MongoDB, Amazon DynamoDB

Hadoop Distributions Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache

Languages Java, Scala, Python, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting

Source Code Control Github, Bitbucket, SVN

Cloud Computing Tools Amazon AWS, (S3, EMR, EC2, glue, redshift, Athena, Step functions, Lambda, VPC, Route 53, Cloud Watch, Cloud Front), Microsoft Azure

Databases Teradata, Oracle 10g/11g, Microsoft SQL Server, MySQL, DB2, Microsoft Fabric

DB languages MySQL, PL/SQL, PostgreSQL & Oracle

Build Tools Jenkins, Maven, ANT, Log4j

Business Intelligence Tools Power BI, Tableau

Development Tools Eclipse, IntelliJ

ETL Tools Talend, Pentaho, Informatica, SSIS (SQL Server Integration Services)

Development

Methodologies

Agile, Scrum, Waterfall

PROJECT #1:

Client :

Duration :

Role :

Services :

UBS Bank.

Jan 24 - Current

Senior Data Engineer

AWS Glue, Redshift, S3, Athena, Step function, Ec2, IAM, Spark, Scala, pyspark, Databricks.

Roles & Responsibility:

• Loading the data from on premise to s3 using Aws DMS.

• Converting the Greenplum stored procedures to pyspark and Redshift sql approach.

• Created the Glue job to write custom logic as per business needs.

• Created the EMR cluster to run the jobs in a manner of persistent and step execution as well.

• Loaded the final data into Redshift & s3 using python shell.

• Worked on step function scheduling for glue jobs through lambda function.

• Develop the Oozie actions like hive, shell, and java to submit and schedule applications to run in Hadoop cluster

• Experienced of building Data Warehouse in Azure platform using Azure data bricks anddata factory

• Worked with production support team to provide necessary support for issues with CDH cluster and the data ingestion.

• Implemented Unity Catalog in Databricks to enhance data governance and security.

• Created Delta Live Tables for real-time data processing and quality improvements.

• Proficient with Azure Data Lake (ADLS) Databricks & Python Notebook format, Databricks Delta lakes & Amazon web Service (AWS).

• Provide guidance to development team working on PySpark as ETL platform.

• Worked in Azure environment for development and deployment of Custom Hadoop Applications.

• Demonstrated QlikView data analyst to create custom reports, charts and bookmarks.

• Designed and implemented scalable Cloud Data and Analytical a solution for various public and private cloud platforms using Azure.

• Developed a tool that automatically reads tens of millions of rows of street addresses upon user upload and standardize those using SmartyStreets API. Used Scala and Spark to process ~1 million rows per minute.

• Worked on developing and predicting trend for business intelligence. PROJECT #2:

Client :

Duration :

Role :

Environment :

Southern California Edition.

Oct 22 - Dec 23

Data Engineer Developer

Hadoop, HDFS, Hive, Impala, SAP HANA, Talend, Spark, Python, ADF, ADLS2 & Databricks.

Roles & Responsibility:

• Involved in transferring data from SAP HANA to Hadoop file system using Sqoop.

• Loading data from File System into a Spark RDD.

• Applying transformations to the data into a pair RDD.

• Involved in working with Spark data frame as well as Pandas data frame.

• Developed DF's, Case Classes for the required input data and performed the data transformations using Spark Core

• Used Spark SQL to process the huge amount of structured data to perform Sql operations

• Involved in working with Hive for data retrieval process.

• Design robust, reusable, and scalable data driven solutions and data pipeline frameworks to automate the ingestion, processing and delivery of both structured and unstructured batch and real time data streaming data using Python Programming.

• Worked with building data warehouse structures, and creating facts, dimensions, aggregate tables, by dimensional modeling, Star and Snowflake schemas.

• Applied transformation on the data loaded into Spark Data Frames and done in memory data computation to generate the output response.

• Worked with Google BigQuery, Google Dataflow, and Cloud Composer for large-scale data processing and workflow automation.

• Good knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance.

• Used Spark Data Frames API over platforms to perform analytics on Hive data and used Spark Data Frame operations to perform required validations in the data.

• Built end-to-end ETL models to sort vast amounts of customer feedback, derive actionable insights and tangible business solution.

• Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.

• Wrote Spark applications for Data Validation, Cleansing, Transformation, and custom aggregation and used Spark engine, Spark SQL for data analysis and provided to the data scientists for further analysis.

PROJECT #3:

Client: Value Labs

Duration: Oct-2019 to Aug-2022

Role: Data Engineer Developer

Environment: Azure Cloud, Zeppelin, Python, Spark, Postgres & Power BI Roles & Responsibilities:

• Developed solutions for import/export of data from Teradata, Oracle to HDFS, S3 and S3 to snowflake

• Loading the raw data into Azure Cloud as the Blob storage.

• Creating the data frames from the Azure Blob storage data.

• Migrated the shell script code into pyspark and applying transformations as well as Actions on the data frame.

• Calculation of KPI formulae as per customer requirement and check the performance issues.

• Loading the processed data into postgres in efficient manner

• Generate the reports and visuals by connecting postgres through Power BI.

• Resolve Spark and Yarn resource management issues in Spark including Shuffle issues, Out of Memory issues, heap space errors and schema compatibility.

• Import and export of data using Sqoop from or to HDFS and Relational DB Oracle and Netezza.

• Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames, Spark RDD.

• Extensively involved in Installation and configuration of Cloudera Hadoop Distribution.

• Built ETL pipeline for scaling up data processing flow to meet the rapid data growth by exploring Spark and improving the performance of the existing algorithm in Hadoop using Spark-Context, Spark-SQL, Data Frame, Pair RDD’s and Spark YARN.

• Created an automated loan leads and opportunities match back model used to analyze loan performance and convert more business leads.

• Ingested forecasted budgets history into data warehouse.

• Worked on PySpark APIs for data transformation.

• Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.

• Connected third party software API for data validation in direct mail match-back project used to analyze direct mail performance by mapping loan applications to mailing instances. Certification

• Microsoft Certified: Azure Data Engineers Associate. EDUCATION:

Ø Completed B. Tech from QIS Engineering College.

Ø Completed master’s from Lindsey Wilson college.

Contact this candidate