Data Engineer Senior

Location:

Detroit, MI

Posted:

April 28, 2025

Contact this candidate

Resume:

JAYANTH KUMAR DEVALLA

+1-313-***-****

*******************@*****.***

http://www.linkedin.com/in/jayanth-kumar-devalla-5a0196211

Sr. Data Engineer

PROFESSIONAL SUMMARY:

Senior Data Engineer with over 9+ years of experience in Big Data, Cloud Computing, and Data Engineering.

Expertise in designing scalable data architectures, ETL pipelines, and real-time data processing solutions.

Fluent programming skills in Python, Scala, SQL, with experience in Hadoop, Apache Spark, Apache Kafka, and Databricks.

Good experience in application development primarily using Hadoop, Python and worked on data analysis.

Experience with IDEs like Eclipse, IntelliJ, PyCharm, and Visual Studio.

Experienced in developing and deploying enterprise applications using major Hadoop ecosystem components.

Experienced in handling large datasets using Spark in-memory capabilities, Partitions, Broadcast variables, Accumulators, Effective & Efficient Joins. Used Scala to develop Spark applications.

Extensive experience in cloud platforms: AWS (Lambda, Redshift, S3, EMR), Azure (ADLS, ADF, Cosmos DB, HDInsight, Key Vault) and GCP (BigQuery, Cloud Dataflow, Pub/Sub, Dataproc).

In-depth understanding of Spark Architecture and performed several batch and real-time data stream operations using Spark (Core, SQL, Streaming).

Proficient in Data Warehousing and ETL tools: Talend, Apache Airflow, StreamSets, SSIS, Azure Data Factory (ADF).

Experience in designing, developing, and implementing connectivity products that allow efficient exchange of data between our core database engine and the Hadoop ecosystem.

Set up standards and processes for Hadoop based application design and implementation.

Adept at configuring and installing Hadoop/Spark Ecosystem Components.

Skilled in developing MapReduce and Streaming jobs using Scala and Python.

Hands-on experience in ETL processes and tools such as Apache, Kafka, PowerBI, and Microsoft SSIS.

Proficient in using Sqoop to migrate data between RDBMS, NoSQL databases, and HDFS.

Expertise in DevOps & CI/CD: Docker, Kubernetes, Terraform, Jenkins, Azure DevOps.

Strong knowledge of NoSQL databases: Couchbase, MongoDB, HBase and relational databases like Azure SQL Database, MySQL, PostgreSQL.

Experience in Data Visualization & Monitoring using Power BI, Tableau, Grafana.

Ample knowledge of data architecture including data ingestion pipeline design, Hadoop/Spark architecture, data modeling, data mining and advanced data processing.

Strong knowledge in data preparation, data modelling and data visualization using Power BI.

Experience in different Hadoop distributions like Cloudera and Horton Works Distributions (HDP) and Elastic Map Reduce (EMR).

Hands on experience in Azure Development, worked on Azure web applications, App services, Azure -storage, Azure SQL Database, Virtual machines, Azure AD, Azure search, and notification hub.

Hands on knowledge in installing Hadoop cluster using different distributions of Apache Hadoop, Cloudera, and Hortonworks.

Experience in creating Tableau Dashboards using Stack Bars, Bar Graphs, and geographical maps.

Good understanding of Mapper, Reducer and Driver class structures for Map-Reduce.

Hands on work experience in writing applications on NoSQL databases like HBase, and Cassandra.

Extensive experience with ETL and Query big data tools like Pig Latin and Hive QL.

Hands on experience in big data ingestion tools like Flume and Sqoop.

Experience in managing and reviewing Hadoop Log files.

Good experience in performing data analytics and insights using Impala, Hive and Working knowledge on Kubernetes.

Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache.

In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts.

Experience in using Pig, Hive, Scoop, Oracle VM.

Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.

TECHNICAL SKILLS:

Big Data Ecosystem:

Hadoop, Spark, MapReduce, YARN, Hive, SparkSQL, Pig, Sqoop, HBase, Flume, Oozie, Zookeeper, Avro, Parquet, Maven, Snappy, Streamsets SDC

Cloud Technologies:

AWS (Lambda, Redshift, S3, EMR), Azure (ADLS, ADF, Cosmos DB, HDInsight, Key Vault)

Hadoop Architecture:

HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce

Hadoop Distributions:

Cloudera, MapR, Hortonworks

Programming Languages:

Python, Scala, SQL

NoSQL Databases:

Cassandra, Mongo DB, HBase

ETL & Data Pipelines:

Apache Airflow, StreamSets, Talend, SSIS, Azure Data Factory (ADF)

Data Storage & Warehousing:

Azure SQL Database, Redshift, Snowflake, Couchbase, MongoDB

Data Streaming:

Apache Kafka, AWS Kinesis, Apache Flink

DevOps & CI/CD:

Docker, Kubernetes, Terraform, Jenkins, Azure DevOps

Data Visualization & Monitoring:

Power BI, Tableau, Grafana.

PROFESSIONAL EXPERIENCE:

Client: Epic Systems Savannah, GA Sep 2023 to till date

Role: Senior Data Engineer

Roles & Responsibilities:

Engineered an Analytics Data Warehouse by designing fact and dimension tables and orchestrating ETL pipelines using Informatica PowerCenter to enhance data processing efficiency by 40%.

Implemented real-time data streaming using AWS Lambda, AWS EventBridge, and Apache Kafka.

Transformed raw data into optimized parquet files using Impala, improving data retrieval speeds.

Built CI/CD pipelines using Jenkins and Kubernetes, automating data workflows.

Architected a Star and Snowflake schema data model, improving query performance.

Provisioned and managed Databricks clusters for batch and streaming workloads, enhancing data processing.

Developed advanced data integration solutions using Apache Hadoop, Hortonworks, and Cloudera to optimize pipeline performance.

Migrated Redshift Data Warehouse to Snowflake, reducing storage costs and improving query execution times.

Automated AWS EMR job launches using S3 Events, SNS, KMS, and Lambda, cutting deployment time.

Designed data ingestion pipelines using Couchbase for real-time analytics.

Administered AWS resources, including EC2, EMR, S3, and Elasticsearch, ensuring 99.9% system uptime.

Designed and enforced S3 bucket policies, securing 100TB+ of stored data.

Built dynamic Power BI dashboards, enabling real-time analytics and accelerating decision-making.

Developed and optimized PySpark code for AWS Glue and EMR, significantly improving ETL job execution time.

Automated end-to-end workflows using Shell scripting, AWS Lambda, PagerDuty, and Python, reducing manual intervention.

Utilized Flume to ingest real-time log data into HDFS in Cloudera, increasing logging efficiency.

Deployed AWS services (Lambda, SQS, SNS, Dead Letter Queues) using Jenkins blue/green deployment, minimizing downtime.

Designed HBase tables to handle structured, semi-structured, and unstructured data, improving query speeds.

Implemented IoT streaming using Databricks Delta Lake, enabling ACID transactions and ensuring data consistency.

Developed machine learning models in Databricks, integrating predictive analytics into business intelligence.

Ensured data security using Azure Key Vault & AWS security services

Client: USAA San Antonio, TX Jun 2021 – Aug 2023

Role: Data Engineer

Roles & Responsibilities:

Designed and built ETL pipelines using Apache Airflow, Talend, and Azure Data Factory (ADF).

Expertise in writing Hadoop jobs for analyzing, transforming data using PySpark and Hive.

Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Amazon Redshift for large-scale data handling millions of records every day.

Hands-on experience with Redshift Spectrum and AWS Athena query services for reading the data from S3.

Created several Databricks Spark jobs with PySpark to perform several table-to-table operations.

Developed data pipeline using Flume, Sqoop, Pig, and MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.

Developed Spark scripts by using Scala shell commands as per the requirement.

Implemented real-time analytics pipeline using Confluent Kafka, Storm, Elasticsearch, Splunk, and Greenplum.

POC to explore AWS Glue capabilities on Data Cataloging and Data Integration.

Provided EMR clusters and installed Talend Application for running Hadoop jobs and configuring master and core nodes.

Developed big data solutions using Databricks, Apache Spark, and Kafka for large-scale data processing.

Design/Develop framework to leverage platform capabilities using MapReduce, Hive UDFs.

Worked on data transformation pipelines like Storm. Worked with operational analytics and log management using ELK and Splunk. Assisted teams with SQL and MPP databases such as Greenplum.

Developed ETL pipelines in and out of Data Warehouse using a combination of Python and Snowflake’s SnowSQL. Wrote SQL queries against Snowflake.

Designed and developed Informatica BDE Application and Hive Queries to ingest Landing Raw zone and transform the data with business logic to refined zone and to Greenplum data marts for reporting layer for consumption through Tableau.

Orchestrated real-time analytics pipelines using Apache Flink.

Managed cloud-based databases: Azure SQL, Cosmos DB, Redshift.

Integrated Grafana for real-time data monitoring and visualization.

Implemented Terraform for cloud resource automation and infrastructure as code.

Automated data ingestion and transformations using Python & Scala.

Client: Home Depot Atlanta, GA Dec 2019 – May 2021

Role: Hadoop Engineer

Roles & Responsibilities:

Built Azure ADLS-based data pipelines to support business intelligence and analytics.

Managed batch and real-time data workflows using Apache Airflow.

Utilized Apache Kafka for streaming data processing and real-time analytics.

Developed ETL processes with Talend & Azure Data Factory (ADF).

Designed NoSQL data models in Azure Cosmos DB & Couchbase.

Built machine learning models and data transformations in Databricks.

Client: Juspay Technologies Bengaluru, India Jun 2017 – Sep 2019

Role: Python Developer

Roles& Responsibilities:

Designed and developed big data frameworks using Apache Spark.

Automated data integration & transformations using Talend & Apache Airflow.

Developed Python-based data processing scripts for high-performance analytics.

Implemented Azure Key Vault for data security & encryption.

Client: Yana Software Private Limited Hyderabad, India Aug 2015 – May 2017

Role: Business Analyst

Roles & Responsibilities:

Developed ETL solutions using SSIS & Apache Airflow.

Designed SQL-based data models to support business intelligence.

Integrated Grafana for real-time monitoring & dashboarding.

Implemented Docker & Kubernetes for scalable data processing workflows.

EDUCATION

Bachelor of Technology (B.Tech) in Information Technology

JNTUH, Hyderabad, Telangana, India – 2015

Contact this candidate