Data Engineer Processing

Location:

Phoenix, AZ

Posted:

July 09, 2025

Contact this candidate

Resume:

Akshatha Rajanala

Sr. Data Engineer

Email: ******************@*****.***

Mobile: +1-901-***-****

Professional Summary:

Over 10 years of extensive experience in the complete Software Development Life Cycle (SDLC) with expertise in multi-cloud platforms (AWS, Azure, GCP), ETL development, and data engineering for large-scale data solutions. Proficient in using AWS (EMR, EC2, RDS, S3, Lambda, Glue, Redshift), Azure (Data Lake, Storage, SQL, Databricks), and GCP (Dataflow, BigQuery) for scalable data architecture, processing, and storage solutions. Specializes in optimizing data workflows, implementing real-time data processing using Kafka, AWS Kinesis, and Apache Flink, and building serverless architectures for cost-effective and agile data solutions. Experienced in deploying and managing machine learning models within data pipelines, enabling predictive analytics and AI-driven insights. Proficient in using AWS services including EMR, EC2, RDS, S3, Lambda, Glue, and Redshift to architect, deploy, and manage scalable data solutions.

Technical Skills:

Cloud Computing

Amazon Web Services (EMR, EC2, RDS, S3, Lambda, Glue, Redshift), Azure (Azure Data Lake, Azure Storage, Azure SQL, Azure Databricks), Google Cloud Platform

ETL Processes

Microsoft Integration Services, Informatica Power Center, SnowSQL, OLAP, OLTP, Talend.

Data Modelling & Databases

SQL Server, NoSQL (DynamoDB, MongoDB), Oracle (PL/SQL), Star Schema, Snowflake Schema.

Programming Languages

Python, Scala, Java, Shell scripting.

Real-Time Data Processing

Spring Boot, Golang-based pipelines.

Visualization Tools

Tableau, Power BI.

Networking Protocols

DNS, TCP/IP, VPN.

DevOps & CI/CD

Terraform, Jenkins, Docker, Concourse, Bitbucket.

Version Control Systems

Git, SVN, Bamboo.

Testing Tools

Apache JMeter, QuerySurge, Talend Data Quality.

Methodologies

Agile, Scrum, Test-Driven Development (TDD).

Professional Experience:

Client: American Airlines, Phoenix, Arizona Mar 2022 – Till Now

Role: Sr. Data Engineer

Responsibilities:

Developed and managed scalable EC2 instances, configured with EBS volumes to ensure high availability and reliability of data storage for critical applications.

Implemented comprehensive monitoring and logging solutions using AWS CloudWatch and CloudTrail to enhance the observability and auditability of cloud infrastructure.

Designed data ingestion pipelines leveraging S3 for storage and SNS for real-time notifications, facilitating efficient data processing workflows.

Optimized data warehousing solutions using AWS Redshift and, ensuring fast query performance and scalability for large datasets.

Built distributed task queues with Celery and RabbitMQ to handle asynchronous tasks, improving the performance and scalability of data processing jobs.

Implemented high-speed caching solutions using Redis and DynamoDB for low-latency access to frequently accessed data, enhancing application performance.

Developed serverless data processing workflows using AWS Lambda and Glue, automating ETL processes and reducing operational overhead.

Executed large-scale data processing tasks using EMR clusters with PySpark, optimizing resource usage and processing times for big data analytics.

Managed relational databases such as SQL Server and MySQL, ensuring data integrity, performance tuning, and efficient query execution.

Administered PostgreSQL and Oracle databases, implementing robust backup, recovery, and security strategies to safeguard critical data.

Leveraged NoSQL databases like MongoDB and Cassandra for handling large volumes of unstructured data, providing high availability and scalability.

Developed data warehousing solutions using Hive for querying large datasets and utilized Golang for developing efficient, high-performance backend services.

Built scalable data processing pipelines using Scala and facilitated data interchange between systems using JSON format for lightweight data transfer.

Utilized Avro for efficient data serialization and Teradata for enterprise-level data warehousing, ensuring high performance and scalability.

Designed data models using Power BI and implemented Star Schema for optimized data querying and reporting in business intelligence applications.

Applied Snowflake Schema design principles and Ralph Kimball methodologies to build efficient and scalable data warehousing solutions.

Deployed containerized applications using Kubernetes and implemented CI/CD pipelines with Jenkins to automate build and deployment processes.

Utilized SVN and Bitbucket for version control and code repository management, maintaining a reliable source control system for development projects.

Designed and implemented data lake architecture on AWS, integrating various data sources to provide a unified data repository for analytics.

Developed and optimized ETL processes to extract, transform, and load data from multiple sources into data warehouses and data lakes.

Automated data pipelines using Apache Airflow, ensuring timely and reliable data processing workflows with minimal manual intervention.

Implemented data quality management frameworks to ensure accuracy, consistency, and reliability of data across different stages of processing.

Built real-time data processing solutions using Apache Kafka and AWS Kinesis, enabling timely insights and actions on streaming data.

Integrated data from various sources such as APIs, databases, and flat files, ensuring seamless data flow across different systems.

Created intuitive and interactive data visualizations using tools like Tableau and Power BI to present complex data in an easily understandable format.

Conducted performance tuning of databases and data processing workflows to optimize query execution times and resource utilization.

Integrated machine learning models into data processing pipelines, enabling predictive analytics and advanced data-driven decision-making.

Implemented cloud cost optimization strategies, leveraging AWS cost management tools to reduce operational expenses and improve resource efficiency.

Optimized ETL processing speed by 30% through AWS Glue and Redshift optimizations, reducing processing times and improving data availability.

Implemented cost-optimization strategies that cut cloud expenses by 20% without compromising performance, leveraging AWS cost management tools and usage monitoring.

Environment: AWS, EBS, EC2, CloudWatch, CloudTrail, S3, SNS, Redshift, Snowflake, Celery, RabbitMQ, Redis, DynamoDB, Lambda, Glue, EMR, SQL Server, MySQL, PostgreSQL, Oracle, MongoDB, Cassandra, PySpark, Hive, Golang, Scala, JSON, CSV, Parquet, Avro, Teradata, Oozie, Tableau, Power BI, Star Schema, Snowflake Schema, Ralph Kimball, Bill Inmon, REST APIs, Kubernetes, Jenkins, Jira, Git, SVN, Bitbucket.

Client: AutoZone, Memphis, Tennessee Jul 2019 – Mar 2022

Role: Sr. Data Engineer

Responsibilities:

Developed and maintained ETL pipelines using Azure Data Factory and T-SQL, optimizing data flows for seamless integration and ensuring data accuracy and integrity across multiple sources.

Implemented complex data transformations and aggregations using Spark SQL and U-SQL, improving processing efficiency and enabling advanced analytics for large datasets.

Designed and executed data analytics solutions leveraging Azure Data Lake Analytics and Azure Data Storage, resulting in scalable and cost-effective data processing capabilities.

Managed and optimized data storage solutions on Azure Data Lake and Azure Storage, ensuring high availability, durability, and performance for critical data assets.

Built and maintained data warehouses using Azure SQL and Azure DW, providing robust data management and reporting capabilities for enterprise-wide business intelligence.

Utilized Azure Databricks for big data processing and analytics, integrating with Azure SQL Data Warehouse to deliver real-time insights and scalable data solutions.

Implemented and managed MySQL and PostgreSQL databases, focusing on performance tuning, data migration, and backup strategies to support business continuity.

Deployed and maintained NoSQL databases like MongoDB and Cassandra, enabling flexible and scalable storage solutions for unstructured and semi-structured data.

Created interactive dashboards and visualizations using Tableau and Power BI, transforming complex data sets into actionable insights for stakeholders.

Developed data models and reports in Looker, integrating with Amazon Redshift to deliver scalable analytics solutions and enhance data-driven decision-making.

Designed and implemented data flow solutions with Apache NiFi, leveraging Scala for efficient data processing and transformation in big data environments.

Developed ETL processes using Pyspark and SSIS, ensuring seamless data integration and transformation for complex data workflows and business intelligence initiatives.

Automated data pipeline deployments and CI/CD processes with Jenkins and Artifactory, enhancing development efficiency and ensuring consistent data delivery.

Implemented code quality and security checks using SonarQube, and automated infrastructure management with Chef, improving overall system reliability and maintainability.

Leveraged U-SQL and Azure Data Lake Analytics to process and analyze big data, delivering high-performance solutions for data-driven applications.

Integrated Azure DW with Azure Databricks for advanced data analytics and machine learning, providing robust and scalable data solutions for diverse business requirements.

Administered Oracle and MySQL databases, focusing on performance optimization, data security, and reliable backup and recovery processes.

Implemented and optimized PostgreSQL and MongoDB databases, ensuring high availability and performance for diverse data workloads.

Developed interactive reports and dashboards with Power BI and Looker, enabling real-time data exploration and analytics for business users.

Automated data pipelines and workflows using Apache Airflow and Apache NiFi, ensuring efficient data processing and integration across various sources.

Developed data processing solutions using Scala and Pyspark, enhancing the performance and scalability of big data applications and workflows.

Environment: Azure Data Factory, T-SQL, Spark SQL, U-SQL, Azure Data Lake Analytics, Azure Data Storage, Azure Data Lake, Azure Storage, Azure SQL, Azure DW, Azure Databricks, Azure SQL Data Warehouse, SQL Server, Oracle, MySQL, PostgreSQL, MongoDB, Cassandra, Tableau, Power BI, Looker, Amazon Redshift, Apache Airflow, Apache NiFi, Scala, Pyspark, SSIS, Jenkins, Artifactory, SonarQube, Chef, Puppet, SQL Server Analysis Services (SSAS).

Client: Sprint Telecom, Overland Park, KS Apr 2017 – Jul 2019

Role: Sr. Data Engineer

Responsibilities:

Utilized Python and PySpark to design and implement efficient ETL pipelines, processing large datasets and ensuring data integrity and quality across diverse sources.

Leveraged Apache Airflow to schedule and manage complex workflows, ensuring seamless data processing and integration across various data platforms like GCP and AWS.

Implemented Google Cloud Storage solutions for scalable and secure data storage, integrating with Dataproc for distributed data processing and analytics.

Deployed and managed Snowflake Cloud and AWS Redshift for scalable and high-performance data warehousing, enabling advanced analytics and reporting.

Designed and implemented data integration solutions using Informatica Intelligent Cloud Services (IICS), facilitating seamless data migration and transformation.

Created interactive and insightful dashboards in Power BI, providing stakeholders with real-time data insights and visual analytics.

Implemented AWS Neptune and Gremlin for advanced graph database solutions, enabling complex relationship and network analysis.

Utilized Grafana for monitoring and visualizing the performance of data pipelines and systems, ensuring high availability and optimal performance.

Deployed and managed Cassandra for high availability and scalability in handling large volumes of unstructured data.

Developed and maintained CGI scripts for integrating legacy systems with modern data platforms, ensuring data consistency and availability.

Designed and optimized schemas for SQL Server, Oracle, MySQL, and PostgreSQL databases, ensuring efficient data storage and retrieval.

Utilized Terraform Cloud and Terraform Enterprise to automate infrastructure deployment and management, ensuring consistency and scalability.

Implemented microservices in Golang, leveraging Kafka for messaging and data streaming, ensuring robust and scalable data processing.

Developed data integration workflows using Informatica IICS, enabling seamless data flow between on-premises and cloud environments.

Architected data solutions on Google Cloud Platform (GCP), leveraging Cloud Storage, Dataproc, and Dataflow for end-to-end data processing.

Designed and implemented data warehousing solutions using Snowflake Cloud and AWS Redshift, optimizing for performance and cost.

Developed real-time analytics platforms using Kafka for data streaming and processing, enabling timely data-driven decision-making.

Created comprehensive dashboards and reports in Power BI, transforming raw data into actionable insights for business stakeholders.

Employed Grafana to monitor and visualize system metrics, ensuring data pipeline reliability and performance.

Implemented scalable NoSQL databases like Cassandra and MongoDB, supporting high-throughput data operations.

Managed the deployment and scaling of applications on Glassfish and Tomcat servers, ensuring high availability.

Developed CGI scripts to bridge legacy systems with modern data architectures, ensuring continuous data flow.

Administered SQL Server, Oracle, MySQL, and PostgreSQL databases, optimizing for performance and reliability.

Leveraged Terraform Cloud and Enterprise to automate the provisioning and management of cloud infrastructure, enhancing operational efficiency.

Environment: Python, PySpark, Airflow, Google Cloud Platform (GCP), Cloud Storage, Dataproc, Dataflow, Snowflake Cloud, AWS Redshift, Informatica Intelligent Cloud Services (IICS), Power BI, Neptune, Gremlin, Grafana, Cassandra, Pig, Hive, Glassfish, Tomcat, CGI, SQL Server, Oracle, MySQL, PostgreSQL, MongoDB, Terraform Cloud, Terraform Enterprise, Golang, Kafka

Client: Advent Health, Orlando, FL Sep 2015 – Apr 2017

Role: Data Engineer

Responsibilities:

Developed and maintained data pipelines using Python and Apache Kafka to handle high-throughput data ingestion for real-time analytics, ensuring minimal latency and high availability.

Utilized Django and Flask to create RESTful APIs for data processing applications, facilitating efficient data exchange and integration across microservices architecture.

Deployed, monitored, and managed scalable applications on AWS EC2 and EMR, optimizing resource allocation and cost through auto-scaling and spot instances.

Implemented data storage solutions on AWS using S3, RDS, DynamoDB, and Redshift, ensuring data durability, high availability, and optimized query performance.

Employed Apache Flink and NiFi to design and manage complex data flows, enabling real-time data processing and seamless integration across multiple data sources.

Utilized QuerySurge to automate the testing of data warehouses and ETL processes, ensuring data accuracy and integrity throughout the data lifecycle.

Integrated GitHub for version control and collaborative development, ensuring codebase integrity and streamlined workflow through continuous integration and branching strategies.

Containerized applications with Docker to ensure consistency across development, testing, and production environments, facilitating easier deployment and scaling.

Applied Agile (SCRUM) methodologies to manage data engineering projects, enhancing team collaboration and adaptability to changing requirements and priorities.

Utilized Scikit-learn for developing and deploying machine learning models, enabling predictive analytics and data-driven decision-making processes.

Implemented real-time data streaming solutions with AWS Kinesis, enabling low-latency data processing and analytics for time-sensitive applications.

Managed data warehousing solutions on AWS Redshift, optimizing ETL processes and query performance for large-scale data analytics.

Configured and managed AWS RDS instances, ensuring high availability, automated backups, and efficient performance tuning for relational databases.

Developed ETL workflows using NiFi, automating data ingestion, transformation, and loading processes across diverse data sources and destinations.

Employed Flink for stateful stream processing, handling complex event-driven applications with high throughput and fault tolerance.

Designed and implemented distributed data storage solutions with Cassandra, ensuring high availability and horizontal scalability for large datasets.

Developed data processing scripts in Python, leveraging libraries like Pandas and NumPy for efficient data manipulation and analysis.

Utilized Jenkins pipelines to automate testing and deployment of data engineering solutions, improving CI/CD workflows and reducing time to market.

Deployed and managed Docker containers on AWS ECS, ensuring scalable and resilient microservices architecture for data processing applications.

Implemented data replication and synchronization strategies across multiple databases (Oracle SQL, MongoDB, MySQL, MS SQL) to ensure data consistency and reliability.

Environment: Python, Django, Flask, Kafka, AWS (EC2, Route 53, S3, RDS, DynamoDB, EMR, Kinesis, Redshift), Flink, NiFiflows, QuerySurge, Oracle SQL, MongoDB, MySQL, MS SQL, Cassandra, GitHub, Jenkins, Docker, Agile (SCRUM), Scikit-learn.

Client: Invesco India Pvt. Ltd, Hyderabad, India Jun 2013 – Dec 2014

Role: Data Engineer

Responsibilities:

Developed and optimized complex SQL queries for data extraction, transformation, and loading (ETL) processes, ensuring data integrity and improving query performance across PostgreSQL databases.

Managed and maintained Oracle databases, performed regular backups and recovery operations, and implemented security measures to protect sensitive data.

Leveraged Pandas and NumPy for efficient data manipulation and analysis, enabling the processing of large datasets and facilitating exploratory data analysis (EDA) tasks.

Created insightful data visualizations using Matplotlib and Seaborn, effectively communicating complex data trends and patterns to stakeholders through interactive plots and charts.

Implemented real-time data streaming solutions using AWS Kinesis, allowing for the ingestion and processing of large volumes of data with minimal latency.

Deployed and managed AWS NiFi workflows for seamless data integration and automated data flow between various sources and destinations, enhancing data pipeline efficiency.

Provisioned and managed scalable AWS EC2 instances to support data processing and analysis workloads, ensuring high availability and performance of data engineering tasks.

Utilized AWS S3 for data storage and retrieval, implementing cost-effective and scalable solutions for storing vast amounts of structured and unstructured data.

Managed and optimized AWS RDS and DynamoDB databases for high-performance data storage and retrieval, ensuring low-latency access to critical data.

Designed and implemented data workflows using Apache Oozie, orchestrating complex ETL processes and ensuring timely execution of data pipelines.

Worked with Apache Cassandra to handle large-scale, distributed data storage needs, ensuring high availability and fault tolerance of critical data services.

Collaborated with team members using GitHub for version control, managing code repositories, and ensuring seamless integration and deployment of data engineering projects.

Automated data pipeline deployments using Jenkins, setting up continuous integration and continuous delivery (CI/CD) pipelines to streamline development and operational workflows.

Conducted data validation and testing using QuerySurge, ensuring data accuracy and consistency across ETL processes and database migrations.

Containerized data engineering applications using Docker, facilitating consistent deployment environments and improving scalability and portability of data solutions.

Managed and optimized AWS cloud infrastructure components including S3, EC2, RDS, and DynamoDB, ensuring reliable and cost-effective data storage, processing, and analysis solutions.

Environment: SQL, SAS, Oracle, PostgreSQL, Pandas, NumPy, Matplotlib, Seaborn, AWS Kinesis, NiFi, EC2, Route 53, S3, RDS, DynamoDB, Oozie, MongoDB, Cassandra, GitHub, Jenkins, QuerySurge, Docker.

Contact this candidate