Senior Data Engineer with Cloud & ETL Expertise

Location:

San Antonio, TX, 78205

Posted:

January 30, 2026

Contact this candidate

Resume:

Rajesh Kuppala

Sr. Data Engineer

***************@*****.***

469-***-****

PROFESSIONAL SUMMARY:

Overall, 10+ years of hands-on experience as a Senior Data Engineer, specializing in data processing, ETL development and data analytics across various cloud platforms including AWS, Azure and Google Cloud.

Skilled in programming languages such as Python, Scala, SQL, PL/SQL, PowerShell and JavaScript, leveraging these tools for efficient data manipulation and analysis.

Experienced in building end-to-end data workflows including Excel-based business forms, SQL stored procedures, automated pipelines, scheduling, and source-to-target data validation in Azure environments.

Proficient in leveraging AWS services including Amazon S3, Amazon Redshift, AWS Lambda and AWS Data Pipeline, along with Azure Data Factory and Google Cloud Dataflow for scalable data solutions.

Expertise in designing and optimizing data warehousing solutions using Snowflake, Amazon Redshift and BigQuery, resulting in improved data retrieval and reporting efficiency.

Extensive experience in using Apache Spark and PySpark for large-scale data processing, along with proficiency in Apache Hive, Apache Pig and AWS Glue for data transformation and integration.

Successfully designed and implemented robust ETL pipelines utilizing tools like SSIS, Talend and Informatica ensuring seamless data flow and high quality.

Strong background in database technologies such as MySQL, Oracle, SQL Server, PostgreSQL, MongoDB and Azure Cosmos DB ensuring adequate data storage and management.

Skilled in creating insightful dashboards and reports using Tableau, Power BI, Looker and AWS QuickSight, facilitating data-driven decision-making for stakeholders.

Adept at working in Agile environments, utilizing Scrum and Kanban frameworks to enhance team collaboration and project delivery efficiency, with proficiency in JIRA and Confluence for project tracking and documentation.

Strong verbal and written communication skills facilitate collaboration with cross-functional teams and simplify complex technical concepts for non-technical stakeholders.

Demonstrated ability to identify challenges and implement innovative solutions, showcasing a proactive approach to enhancing data processes and system performance.

TECHNICAL SKILLS:

Programming Languages: Python, Scala, SQL, PL/SQL, PowerShell, JavaScript

Data Processing and Analytics: Apache Spark, PySpark, Apache Hive, Apache Pig, AWS Glue, AWS Lambda, Apache Kinesis, AWS Data Pipeline, Apache Airflow, Google Cloud Dataflow, Google Dataprep, Azure Synapse Analytics (Notebooks, Pipelines) Azure Data Lake Storage (ADLS Gen2)

ETL Tools: SSIS, Talend, Informatica

Cloud Platforms: AWS: Amazon S3, Amazon Redshift, AWS CodeBuild, AWS CloudFormation, AWS CDK, AWS Athena, AWS IAM, AWS KMS, AWS Lake Formation, AWS QuickSight; Azure: Azure Data Factory (ADF), Azure Blob Storage, Azure Data Lake Storage (ADLS), Azure Active Directory (AAD), Azure Key Vault; GCP: Google Cloud Storage, Google Pub/Sub, Google DataProc

Data Warehousing: Snowflake, Amazon Redshift, BigQuery

Databases: MySQL, Oracle, SQL Server, PostgreSQL, MongoDB, Azure Cosmos DB

CI/CD and Version Control: Git, Jenkins, Azure DevOps, Terraform

Data Visualization: Tableau, Power BI, Looker, AWS QuickSight

Reporting / Validation: MSRA, Cube Validation

Other Technologies: Agile, Scrum, Kanban, JIRA, Confluence

WORK EXPERIENCE:

Microsoft, Seattle, WA Jan 2025 – Present

Sr. Data Engineer

Responsibilities:

Developed and implemented end-to-end data ingestion pipelines using PySpark in Azure Synapse Notebooks to onboard multiple source systems into enterprise databases, ensuring scalable and reliable data processing.

Designed and maintained SQL-based solutions by writing complex queries and stored procedures (MERGE logic, partition switch operations) to load data efficiently from Azure Data Lake Storage (ADLS) into staging and target tables.

Built and orchestrated GNA pipelines to automate data movement from ADLS to staging layers, supporting seamless integration between Synapse, databases, and downstream analytics systems.

Implemented job orchestration and scheduling to run source-to-target pipelines automatically on a daily basis (9 AM), enabling fully automated end-to-end workflows from Synapse notebooks to curated main tables.

Optimized PySpark transformations and SQL queries by eliminating unnecessary joins, views, and redundant computations, significantly reducing execution time and improving overall pipeline performance.

Performed data validation and reconciliation between source and target datasets using MSRA reports and cube validation tools, ensuring data accuracy, completeness, and compliance with business rules.

Collaborated with business and analytics teams to troubleshoot data discrepancies and resolve data quality issues across multiple environments.

Developed end-to-end Excel-based forms for business users, including building required filters, SQL queries, and stored procedures to support data entry, validation, and controlled data submission based on requirements.

Implemented complete data flow and automation behind the forms by integrating Synapse notebooks, GNA pipelines, stored procedures, scheduling, and validations, ensuring data moved correctly from Excel inputs to staging and final tables with accurate results.

Used GitHub integrated with VS Code for version control, managing feature development and promoting code across Dev, UAT, and Main branches following standard branching and PR practices.

Deployed and maintained data solutions across multiple environments while ensuring consistency, reliability, and adherence to enterprise deployment standards.

Worked extensively with Azure services including Azure Synapse Analytics, ADLS, Azure SQL, Azure Active Directory (AAD), and Azure DevOps, supporting secure and scalable cloud-native data platforms.

Participated in Agile/Scrum ceremonies, collaborating with cross-functional teams for sprint planning, code reviews, and production releases.

Environment: PySpark, SQL, Azure Synapse, Azure Synapse Notebooks, ADLS, MySQL / Azure SQL, GNA Pipelines, MSRA, Cube Validation Tools, GitHub, VS Code, Azure DevOps, Azure Active Directory, Agile.

USAA, San Antonio, TX Mar 2024 – Dec 2025

Sr. Data Engineer

Responsibilities:

Developed and implemented over 20 data processing scripts using Python, improving data quality and consistency across the project.

Designed and optimized data processing workflows on Amazon EMR with Apache Spark, achieving to increase in performance and scalability for large datasets.

Managed and processed millions of records using Hadoop technologies, including Hive and Pig, resulting in actionable insights that drove increase in operational efficiency.

Developed and maintained data lakes on Amazon S3, facilitating cost-effective storage and improving data accessibility for analytics.

Automated serverless data workflows using AWS Lambda enhancing data ingestion efficiency and reducing manual processing time.

Developed interactive dashboards and reports using AWS QuickSight enhancing stakeholder insights and driving in increase in data-driven decision-making.

Utilized Git for version control, enabling collaboration and tracking changes in data pipeline code and configurations.

Conducted performance tuning and optimization of data processes ensuring timely and efficient data delivery to stakeholders.

Collaborated with data scientists and analysts to understand data requirements and provide solutions that meet business needs.

Created and maintained CI/CD pipelines with Jenkins, automating deployment and testing processes, which improved software delivery times.

Participated in Agile and Scrum methodologies, contributing to sprint planning and daily stand-ups for effective project management.

Tracked project progress and managed tasks using JIRA, facilitating communication and collaboration within cross-functional teams.

Environment: Sql, Python, Apache Spark, Amazon EMR, AWS Glue, Amazon S3, Amazon Redshift, AWS Lambda, Amazon RDS, MySQL, MongoDB, Jenkins, AWS CodeBuild, AWS CloudFormation, AWS KMS, AWS QuickSight, Amazon Kinesis, Git, JIRA.

State of Utah, Salt Lake City, UT Mar 2023 – Feb 2024

Data Engineer

Responsibilities:

Developed and maintained over 15 data pipelines using Azure Data Factory (ADF), orchestrating data workflows across multiple sources and improving processing efficiency.

Utilized Azure Key Vault for securely storing and managing sensitive information such as connection strings and API keys.

Created interactive reports and dashboards in Power BI, providing stakeholders with actionable insights from complex datasets.

Developed and deployed real-time data streaming applications using Kafka, integrating with various data sources, which improved data ingestion speed.

Designed and optimized data models in Snowflake, enabling efficient data warehousing and analytics.

Managed Azure Active Directory (AAD) for user authentication and access control ensuring secure access to data resources.

Developed complex SQL queries for data extraction, transformation and analysis in relational and non-relational databases enhancing data retrieval efficiency.

Employed Apache Spark and PySpark for processing large datasets ensuring high performance and scalability in data transformations, leading to increase in processing speed.

Automated data workflows and processes using PowerShell, improving operational efficiency and reducing manual intervention.

Managed and optimized data processing in Hadoop and Hive ensuring efficient handling of large datasets.

Utilized JIRA for tracking project tasks, bugs and feature requests, facilitating effective communication within the team.

Integrated Azure Cosmos DB for handling unstructured data, providing high availability and low-latency data access that improved application performance.

Designed and implemented CI/CD pipelines using Azure DevOps, automating testing and deployment of data applications, which decreased deployment time.

Collaborated with cross-functional teams to design and implement data governance policies and best practices.

Participated in Agile and Scrum methodologies, contributing to sprint planning and daily stand-ups to ensure timely project delivery.

Environment: SQL, Python, Pyspark, Azure Data Factory, Azure Key Vault, Power BI, Azure Data Lake Storage (ADLS), Kafka, Snowflake, Azure Active Directory (AAD), Terraform, Apache Spark, PowerShell, Hadoop, Hive, Azure Cosmos DB, Azure DevOps, JIRA.

Thomson Reuters, India Sep 2019 – Dec 2022

Data Engineer

Responsibilities:

Developed distributed data processing workflows using Hadoop and Spark to analyze large datasets and optimize data operations.

Managed data storage and retrieval systems using Google Cloud Storage ensuring scalability and security for cloud-based data access.

Designed and deployed infrastructure as code (IaC) using Terraform, automating resource provisioning on Google Cloud Platform.

Wrote and optimized Pig scripts to process large-scale data within Cloudera Distribution of Hadoop (CDH) environments.

Developed and managed ETL processes using Informatica ensuring seamless data integration and transformation across various sources to support data warehousing and analytics initiatives.

Utilized Google Cloud Platform tools such as BigQuery, Dataflow, Pub/Sub and DataProc to create and manage efficient cloud data solutions.

Monitored system performance with Grafana ensuring optimal operation of data systems and timely resolution of performance issues.

Developed interactive dashboards and reports using Looker, enabling real-time data insights for business stakeholders.

Created and optimized Hive queries for extracting insights from datasets stored within the Hadoop ecosystem.

Employed Google Dataprep to clean, prepare and transform raw data for seamless downstream processing and analysis.

Participated in Agile and Scrum methodologies using Jira for task tracking and sprint management ensuring project goals were met.

Integrated data systems and applications by developing and maintaining REST APIs for seamless data exchange.

Wrote and optimized PL/SQL procedures within SQL Server, improving the performance of queries and data operations.

Used Git and Jenkins for version control and CI/CD pipeline automation ensuring smooth deployments of data workflows.

Environment: Hadoop, Spark, Google Cloud Storage, Terraform, Pig, Informatica, Google Cloud Platform (BigQuery, Dataflow, Pub/Sub, DataProc), Grafana, Looker, Hive, Google Dataprep, Agile, Scrum, Jira, REST APIs, PL/SQL, SQL Server, Git, Jenkins.

NewVision Software, India Apr 2016 – Aug 2019

Data Engineer

Responsibilities:

Processed and analyzed big data using Apache Spark and Scala, optimizing data transformations and aggregations for improved performance.

Automated data ingestion workflows with AWS Lambda functions, integrating data pipelines with S3 for efficient data storage and retrieval.

Utilized Python libraries such as NumPy, SciPy and Pandas for data manipulation, statistical analysis and data preparation for models.

Designed and implemented data pipelines for seamless ingestion from on-premise sources to Hadoop ecosystems using Hive and Sqoop.

Developed and maintained scalable ETL processes to extract, transform and load data into AWS Redshift and Oracle databases.

Created and optimized complex SQL and PL/SQL queries for data extraction, validation and reporting across AWS Redshift and Oracle systems.

Monitored and troubleshooted data pipelines, addressing issues in performance, data accuracy and system integration within data warehousing environments.

Documented ETL workflows, data models and technical processes to ensure compliance with data governance and facilitate team knowledge sharing.

Collaborated with cross-functional teams to ensure data pipelines' smooth operation, supporting real-time data warehousing and processing solutions.

Developed data visualizations and interactive dashboards in Tableau, enabling stakeholders to make data-driven decisions based on insights.

Environment: Scala, Python, SQL, PL/SQL, Apache Spark, Hadoop, Hive, Sqoop, AWS Lambda, Amazon S3, AWS Redshift, Oracle, NumPy, SciPy, Pandas, Tableau.

EDUCATION:

Masters of computer and information sciences Southern Arkansas university Arkansas.

Bachelor of Information Technology CGIT, JNTU – India.

Contact this candidate