Data Engineer Processing

Location:

Irving, TX

Posted:

February 19, 2025

Contact this candidate

Resume:

Prasanna Velagala

Sr Data Engineer

E-mail: **************@*****.***

Phone: +1-361-***-****

PROFESSIONAL SUMMARY:

Experienced Big Data Developer with 6 years of experience with a demonstrated history of working in the financial services industry. Skilled in Python, Pyspark, Unix, Jenkins, and Amazon Web Services (AWS).

Proficient in Python, BigData, Spark and AWS to handle complex use cases. Exposure to SDLC and agile methodologies.

Developed ETL pipelines using Python, Pyspark and AWS.

Hands on experience in working Amazon Web Services (AWS) using Elastic Map Reduce (EMR), EC2, Kubernetes and Lambda for data processing.

Hands-on experience in application development using Python and AWS.

Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.

Expertise in designing and automating ETL pipelines with AWS Glue, Apache Airflow and transformation from diverse sources such as SQL databases, APIs, and cloud storage.

Designed and automated ETL/ELT pipelines, integrating data from multiple sources into Azure Data Lake, Azure SQL, and Synapse Analytics.

Designed distributed data processing using Azure Databricks

Designed high-performance data pipelines using Azure Data Factory (ADF), Databricks, and Synapse Pipelines.

Leveraged Apache Kafka for real-time streaming data, building consumer-producer applications to process high-velocity data for use in analytics and decision-making.

Utilized Hadoop and Apache Spark for distributed data processing on large datasets, reducing ETL times and enhancing data availability for analytics teams

Proficient in Big Data technologies including Apache Spark, Hive, Hadoop, with hands-on experience in data processing and analytics for large-scale datasets.

Skilled in continuous integration/continuous deployment (CI/CD) using Jenkins for automated deployment pipelines.

Proficient in Kafka for real-time data streaming and message queuing.

Optimized Spark jobs and workflows by tuning Spark configurations, partitioning and memory allocation settings.

Proficient in Snowflake for data warehousing and MS SQL for relational database management.

Hands on experience with GitHub to push the code for maintaining versions.

Skilled in leveraging Databricks for efficient data processing and analytics, with hands-on experience optimizing Spark workflows and managing high-performance data pipelines.

TECHNICAL SKILLS

Programming Languages

Python, PySpark, JavaScript, HTML, SQL, Shell Scripting

Databases

Relational - PostgreSQL, MySQL, No-SQL – Mongo DB, Cosmos DB, DynamoDB

Business Intelligence

Kibana, Tableau, Power BI

Virtualization/Cloud

AWS (Lambda, Glue, EMR, S3, ECS, CFT, Cloud Watch, IAM, RDS, KMS, EC2, VPC, ELB), Azure Databricks

Devops

Jenkins, Maven, GitHub

Big Data

Snowflake, Apache Kafka, Apache Spark, Apache Hadoop, Apache Hive, Airflow

PROFESSIONAL EXPERIENCE

Client: Citi Bank

Location: Dallas, TX

Duration: Sep 2022 – Present

Role: Data Engineer

Responsibilities:

Experience in understanding the SDLC phases like the Requirements Specification, Analysis, Design, Implementation, Testing, Deployment and Maintenance.

Design pull, push SNS event-based trigger listener AWS lambda functions to run ETL workflows as scheduled.

Developed ETL processes using AWS Glue and Apache Spark to extract customer interaction data from multiple sources, transform it for analysis, and load it into AWS RDS, ensuring that the team had access to real-time insights

Implemented ETL job to perform data transformations through Spark data frame on ECS fargate container.

Implemented and optimized infrastructure as code (IaC) using AWS CloudFormation, enabling infrastructure provisioning and management in a repeatable and automated manner.

Configured APIs for the applications and manage routing through load balancers and route53

Developed User Defined Functions (UDF) in Pyspark to calculate required metrics from data.

Used batch processing to run the spark applications.

Implemented column tagging and identified PII for the data assets residing on AWS s3 bucket

Worked on EMR cluster, rest API to integrate and deploy the Pyspark jobs.

Used performance tuning techniques in implementing Pyspark transformations.

Performed json parsing for incoming kafka topics in real time.

Created Docker images and used them in helm charts to run the Kubernetes pods.

Built the Logical and Physical data model for snowflake as per the requirements.

Performed configuration, deployment and support of cloud services using Amazon Web Services (AWS) with respect to requirments.

Involved in Analysis, Design, Development, Testing and followed Agile methodology

Implemented data cleansing and validation processes using Python scripts.

Developed ETL workflows using Python scripts in Apache Airflow used CRON scheduling to trigger the jobs.

Developed PySpark scripts utilizing SQL and Data Frames for data analysis and storage in S3.

Wrote Python scripts to build DAGs in Apache Airflow for creating ETL pipelines and dependency checks.

Automated data ingestion using PySpark from various sources like APIs, AWS S3.

Developed Spark streaming applications for processing of data through Hive tables.

Developed a generalized Kafka producer to publish messages to Kafka topics. Multiple applications utilize this to produce data into a specific Kafka topic.

Developed Kafka Consumer to stream data to Hive with transformations in Python.

Created Spark applications using PySpark and Spark-SQL for extracting and processing the data.

Integrated Elasticsearch with Kibana for real-time dashboards to monitor maintenance data.

Created and managed Docker containers for running api usecases.

Developed complex SQL queries to aggregate data across AWS DynamoDB, enabling comprehensive cross-cloud analytics for unified reporting on customer behaviors and trends.

Implementing CI/CD using in house automation frameworks and Github as VCS.

Supported AWS infrastructure maintenance and enhancements.

Client: Visa

Location: Denver, CO

Duration: June 2020– August 2022

Role: Data Engineer

Responsibilities:

Developed ELT/ETL pipelines to move data to and from Snowflake data store using combination of Python and Snowflake Snow SQL.

Developing ETL transformations and validation using Spark-SQL/Spark Data Frames with Azure data bricks and Azure Data Factor

Worked with Azure Logic Apps administrators to monitor and troubleshoot issues related to process automation and data processing pipelines.

Developed and optimized code for Azure Functions to extract, transform, and load data from various sources, such as databases, APIs, and file systems.

Designed, built, and maintained data integration programs in Hadoop and RDBMS.

Developed CI/CD framework for data pipelines using Jenkins tool.

Collaborated with DevOps engineers to develop automated CI/CD and test-driven development pipeline using azure as per the client requirement.

Hands on programming experience in scripting languages like python and Scala.

Involved in running all the Hive scripts through Hive on Spark and some through Spark SQL.

Developed a data pipeline using Kafka, Spark, and Hive to ingest, transform and analyze data.

Developed Spark code and Spark SQL scripts using Scala for faster data processing. Developed and maintained.

Developed interactive dashboards using Tableau and Power BI, empowering business users to access real-time data insights from both data warehouses without requiring extensive technical knowledge.

Automated SQL-based data transformations as part of the ETL pipeline, reducing manual effort and ensuring consistent data preparation for analytics and machine learning models.

Implemented partitioning and indexing strategies in SQL databases, enhancing query performance on large datasets and improving the efficiency of reporting systems.

Collaborated with the engineering team to tune queries and pipelines, ensuring faster data processing and optimized performance for analytics workflows.

Implemented fine-grained access control using AWS IAM roles and policies, ensuring secure and compliant data access across the organization for both Amazon Redshift and Azure Synapse Analytics

Client: Incomm Incentives

Location: St.Louis, MO

Duration: Nov 2018 – May 2020

Role: Data Engineer

Responsibilities:

Implement One time Data Migration of Multistate level data from SQL server to Snowflake by using Python and SnowSQL.

Developed Snowflake views to load and unload data from and to an AWS S3 bucket, as well as transferring the code to production.

Hands-on experience with Snowflake utilities, Snowflake SQL, Snow Pipe, etc.

Ability to solve any ongoing issues with operating snowflake compute cluster(s).

Collaborated with stakeholders to identify key business metrics and KPIs and developed dashboards using Athena and visualization tools to track and monitor these metrics in real-time, resulting in better visibility and insights.

Worked in Snowflake advanced concepts like setting up Resource Monitors, Role Based Access Controls, Data Sharing, Cross Platform database Replication, Virtual Warehouse Sizing, Query Performance Tuning, Snow Pipe, Tasks, Streams, Zero- copy cloning etc.

Reduced the time-to-insight for the data analysis workflow by over 50% by implementing a highly optimized PySpark and AWS Glue pipeline, resulting in faster and more efficient data processing.

Designed and implemented a scalable data warehousing solution using Redshift that cloud handle over 100 terabytes of data and supported concurrent querying from multiple users.

Enabled real-time insights for driver and client application by configuring Kinesis to process and analyze data streams in near-real-time, resulting in faster decision-making and improved operational efficiency.

Successfully automated the end-to-end data processing workflow using Airflow, resulting in a 70% reduction in manual effort and faster time-to-market for new features and capabilities.

Employed AWS Glue for automated ETL (Extract, Transform, Load) processes, enabling efficient data preparation and transformation at scale.

Utilized Glue Crawlers to discover and catalog metadata from various data sources, ensuring dynamic and adaptive data processing.

Orchestrated database migrations seamlessly using AWS DMS to transfer data between SQL Server and Redshift.

Education:

Masters in computer Science Jan 2017-May 2018

Texas a& M university

Contact this candidate