Post Job Free
Sign in

Data Engineer Processing

Location:
Irving, TX
Posted:
September 10, 2025

Contact this candidate

Resume:

PRASHANTHI RAYALA

Data Engineer

Objective

Experienced Data Engineer with 5+ years of hands-on expertise in building ETL pipelines, optimizing data workflows, and managing cloud data infrastructure (AWS/GCP/Azure). Strong problem-solving and decision-making abilities, with excellent communication skills and a collaborative approach to delivering high-quality, impactful analytics solutions. Looking to contribute to a forward-thinking company where I can solve complex data challenges and enable impactful analytics. Profile Summary

Results-driven Data Engineer with 5+ years of experience in designing, developing, and optimizing scalable data pipelines and analytics solutions.

Expertise in building batch and real-time data processing systems leveraging AWS services (S3, Redshift, EMR, Lambda, and Dynamo DB) and Azure (ADLS, ADF, Databricks, and Synapse Analytics).

Hands-on experience in ETL/ELT processing, data migration and data processing using AWS services such as EC2, Athena, Glue, Lambda, S3, Relational Database Service (RDS). Proficient with Scala, Apache HBase, Hive, Pig, Sqoop, ZooKeeper, Spark, Spark SQL, Spark Streaming, Kinesis, Airflow, Yarn, and Hadoop (HDFS, MapReduce).

Skilled in streaming and real-time processing using Kafka, Flume, PySpark, and orchestration via Apache Airflow.

Experience with Hadoop Distributions such as Cloudera, MapR, Hortonworks, and Azure HDInsight.

Proficient in Python, Scala, SQL; experienced with relational (MySQL, PostgreSQL) and NoSQL (MongoDB, Cassandra, HBase) databases.

Expertise in data warehousing technologies like Amazon Redshift, Google BigQuery, Snowflake; and Hadoop ecosystem tools including HDFS, Hive, Pig, Sqoop, Impala, and MapReduce.

Developed Python scripts to extract data from HBase and implement PySpark solutions. Strong understanding of Software Development Lifecycle (SDLC), Agile and Waterfall methodologies.

Experience with containerization and DevOps tools such as Docker, Kubernetes, Jenkins, Maven, and Git, including CI/CD pipeline implementations.

Familiar with workflow orchestration tools like Oozie, Zookeeper, Apache NiFi, and Apache Airflow.

Experience with Python, PL/SQL, SQL, REST APIs, and the Azure- based big data integrations.

Skilled in data modelling (e.g.,Snowflake, Star Schema), BI pipeline development, and visualization using Power BI and Tableau.

Proficient in AWS tools like EMR, S3, and Cloudwatch for managing and monitoring Hadoop and Spark Jobs.

Performed collaborative code reviews for Python, PySpark, and SQL, ensuring adherence to best practices, optimizing performance, and maintaining high-quality, scalable data solutions.

Demonstrated teamwork, collaboration, and problem-solving skills in information technology projects, applying classification techniques to deliver accurate and efficient solutions.

469-***-****

Technical Skills

*******************@*****.***

Hadoop Eco System: Hadoop,

MapReduce, Spark, HDFS,

Sqoop, YARN, Oozie, Hive,

Impala, Apache Airflow, HBase

Programming Languages:

PL/SQL, SQL, Python, PySpark,

Scala, Java

Databases: MySQL, SQL Server,

Oracle, MS Access, Teradata

NoSQL Databases: Cassandra,

HBase, DynamoDB, MongoDB

Workflow Management tools:

Oozie, Autosys, Airflow

Visualization & ETL tools:

Tableau, Power BI, Informatica,

Talend

Cloud Technologies: AWS,

Azure, GCP

IDEs: Eclipse, Jupyter Notebook,

Spyder, PyCharm, IntelliJ

Version Control & Devops : Git,

SVN, Jenkins, CI/CD, Docker,

Kubernetes

Operating Systems: Windows,

Linux, Unix

Data Governance: Data Quality

Frameworks,Governance

Practices

Streaming Technologies:

Apache Kafka, Apache Flink.

Project Management Tools:

JIRA, ServiceNow

ETL/BI: Informatica, SSIS, SSRS,

SSAS, Tableau, Power BI,

QlikView, Arcadia, Erwin,

Matillion, Rivery.

Testing: Unit Testing,

Integration Testing, Automation

Testing

Texas, USA

Education

Northwest Missouri State University – Missouri, USA (Jan 2023 – Apr 2024)

Applied Computer Science / Masters

Professional Experience

Azure Data Engineer Aim Bridge Hospitality

Description: Aim Bridge Hospitality is a leading third-party hotel management company specializes in managing hotels and resorts for various ownership groups, including real estate investment trusts (REITs), I am working for client data & analytics team, creating high-quality, reusable, data products to support data-driven client insights and improve access to client, product and transactional data, using Python, PySpark, Scala, Kafka, Airflow, and ETL frameworks Key Responsibilities:

Designed and developed scalable, fault-tolerant data pipelines using Python, PySpark, Scala, Kafka, Airflow, and custom ETL frameworks.

Ingested and processed ~5TB structured, semi-structured, and unstructured data using Flume, Sqoop, Kafka, Zookeeper.

Optimized storage solutions in Azure Synapse Analytics, Snowflake, and HDFS, reducing query response times by 40%.

Migrated legacy ETL workflows to Azure Synapse and performed transformations using Spark and Azure Data Factory.

Implemented real-time data ingestion with Kafka + Spark Streaming including data quality checks and transformation logic.

Built batch and streaming pipelines in Azure Databricks integrating with SQL Server and SFTP targets.

Developed multi-threaded Java ingestion jobs for FTP and data warehouse ingestion.

Created PySpark validation scripts reducing manual QA time for Snowflake table loads by 60%.

Integrated Azure Synapse with Databricks notebooks, improving loading performance via dynamic partitioning.

Built and deployed containerized applications using Docker and managed CI/CD pipelines in Jenkins.

Managed real-time log analysis with Elasticsearch, Logstash, Kibana (ELK) for end-to-end transaction monitoring.

Configured Azure ADLS, ARM templates, Virtual Networks, and Azure Machine Learning for analytics integration.

Performed performance tuning and optimization for Kubernetes deployments to improve system KPIs.

Implemented data governance and database access control, migrating on-prem databases to Azure Data Lake.

Collaborated with cross-functional teams to test and validate information systems and full-stack applications, ensuring accuracy and scalability in production.

Hands-on experience with Postgres, information system debugging, and ensuring compliance through effective collaboration and rigorous code review practices. Environment: Databricks, Azure Synapse, Azure Cosmos DB, Azure Data Factory(ADF/ADF V2), SSRS, Power BI, Azure Data Lake Storage(ADLS), Azure Resource Manager(ARM), Azure HDInsight,Azure Blob Storage, Apache Spark, Spark SQL, Spark SQL, Python,Scala, Ansible, Kubernetes, Docker, Jenkins. AWS Data Engineer Amazon

Houston, Texas, USA Apr 2023 - Dec 2023

Description: Amazon.com, Inc. is a global emerging technologies and e-commerce leader.It is one of the world's most influential companies, spanning sectors such as retail, cloud computing, digital streaming, and artificial intelligence. I integrated data from various sources, both internal and external, to create a cohesive data environment using Hive, Pig, Spark, MapReduce, and SQL

Key Responsibilities:

Designed and implemented data models, partitions, and clustering strategies in Snowflake, Amazon Redshift, and PostgreSQL for optimized querying.

Developed data transformation workflows using Hive, Pig, Apache Spark, MapReduce, and SQL on Amazon EMR and Databricks (AWS).

Built ETL processes using Talend Open Studio, Flume, and Sqoop to ingest data into HDFS. Plano, Texas, USA Jan 2024 - Present

Managed AWS infrastructure with EC2, S3, Lambda, and CloudWatch for scalable and monitored data processing

Created AWS Glue (Spark) jobs with complex transformations, triggering downstream processing via AWS Lambda

Developed CI/CD pipelines using Jenkins, Maven, GitHub, Chef, and Terraform for automated deployments.

Implemented containerized deployments on Docker and Kubernetes for AWS-hosted applications.

Leveraged Apache Kafka and Spark Streaming for real-time data ingestion and transformation.

Utilized T-SQL and ANSI SQL for database object management and performance optimization.

Configured and maintained ELK Stack (Elasticsearch, Logstash, Kibana) for log analytics and monitoring.

Queried datasets from Amazon S3 using AWS Athena and built visualizations with AWS QuickSight.

Integrated data migration processes involving MongoDB, ensuring integrity, privacy, and consistency.

Used Nagios and Amazon CloudWatch for infrastructure health checks and alerting.

Applied data governance best practices and leveraged modern ELT tools like dbt and Fivetran for streamlined workflows Environment:Python, SQL, PySpark, Hive, Pig, Apache Spark, MapReduce, Snowflake, Amazon Redshift, PostgreSQL, AWS Glue, AWS S3, AWS EC2, AWS Lambda, AWS Athena, AWS QuickSight, Talend, Flume, Sqoop, Kafka, Spark Streaming, Docker, Kubernetes, Terraform, Jenkins, Maven, GitHub, Chef, ELK Stack (Elasticsearch, Logstash, Kibana), MongoDB, Nagios, dbt, Fivetran.

GCP Data Engineer Eli Lilly and Company

Description: Eli Lilly and Company is a leading American pharmaceutical healthcare company developing a new oral life sciences diabetes and weight-loss drug based on the GLP-1 hormone. This drug has shown promising trial data. I was involved in leveraging Google Cloud Platform services to manage, process, and analyze data for Siemens, using MySQL, PostgreSQL, MongoDB, Cassandra and HBase.

Key Responsibilities:

Designed and managed MySQL, PostgreSQL, MongoDB, Cassandra, and HBase databases for healthcare data solutions.

Built and managed cloud-native data lakes using Google Cloud Storage (GCS) and HDFS.

Developed GCP Dataflow (Apache Beam) pipelines and Cloud Functions for serverless data ingestion and orchestration.

Created BigQuery-based analytics processing ~4TB of clinical trial data, improving reporting accuracy by 25%.

Designed Google Looker Studio dashboards for billing and usage optimization.

Developed ETL workflows using IBM DataStage, PySpark, Scala, Sqoop, Hive, and Pig.

Built CI/CD pipelines with Cloud Build, automating environment and application deployments

Processed and persisted real-time streaming data using HBase, PySpark, and NoSQL databases.

Utilized GCP Dataproc, GCS, and BigQuery for large-scale healthcare data processing.

Performed Exploratory Data Analysis (EDA) using R and Python.

Orchestrated workflows with Oozie and optimized MapReduce jobs via compression techniques.

Deployed and managed infrastructure with GCP Deployment Manager, Cloud SDK, and GCP client libraries.

Integrated Snowflake, Cloud Bigtable, and Power BI for advanced reporting and analytics.

Applied dbt for modern ELT transformations and healthcare metrics computation. Environment: GCP, PySpark, Dataproc, BigQuery, Hadoop, xml, Hive, GCS, Python, Snowflake, Cloud Bigtable, Oracle Database, PowerBI, SDKs, GTM, Dataflow, GCP, dbt, SQL Database, visual studio, BigQuery, Databricks. Data Engineer UBS

Description: UBS Group AG is a multinational investment bank and financial services company. It stands as the largest Swiss banking institution and the world's leading private bank. I worked closely with data analysts and business iintelligence teams to provide clean, organized data for dashboards, reports, and advanced analytics tools, using Docker and Kubernetes for scalable processing environments.

Bengaluru, Karnataka, India Jun 2021 - Dec 2022

Bengaluru, Karnataka, India Mar 2020 - May 2021

Key Responsibilities:

Scheduled and monitored ETL workflows using Apache Airflow, Oozie, and CI/CD pipelines with Jenkins, Git, and Maven.

Deployed containerized data processing applications using Docker and Kubernetes for scalable and consistent environments.

Managed configuration data in MongoDB and performed operations using PyMongo.

Designed and implemented ETL solutions with IBM DataStage components (Transformer, Aggregator, Merge, Join, Lookup, Sort, Remove Duplicates, Funnel, Filter, Pivot).

Built fully automated CI systems integrating GitHub, Jenkins, MySQL, and custom Python/Bash tools.

Updated Django models and database schemas using Django Evolution and manual SQL modifications.

Improved data quality and reporting accuracy from 92% to 99% through validation and transformation processes.

Developed Airflow DAGs in Python for automated job flows executed on Amazon EMR and EC2 clusters.

Orchestrated and monitored data pipelines in Azure Data Factory (ADF), staging API and Kafka JSON data into Snowflake.

Flattened complex JSON structures for analytical services and reporting needs.

Utilized MicroStrategy, Power BI, and Tableau for analytics reporting and dashboard creation.

Automated deployments using Terraform and PowerShell, provisioning Azure environments and deploying microservices.

Created UNIX shell scripts for database connectivity, troubleshooting, and parallel query execution.

Collaborated with stakeholders to recommend big data solutions, improve data quality, and define reporting and analytics requirements.

Applied strong problem-solving skills to build distributed systems and GraphQL-based integrations, leveraging React and Excel for data-driven insights while fostering effective collaboration across teams. Environment:AWS(EMR, EC2, S3), Azure(ADF, Synapse, Power BI), HDFS, HBase, Hive, Javascript, Spark, Tableau, OMOP, SQL, HTML, Terraform, RDBMS, Python, Excel, Delta Lake, Jira, Confluence, Git, Kafka, Jenkins, Snowflake, Docker, Kubernetes, MongoDB, PyMongo



Contact this candidate