Senior Data Engineer - Multi-Cloud Analytics Leader

Location:

San Antonio, TX

Salary:

$80000

Posted:

March 19, 2026

Contact this candidate

Resume:

CHAITHANYA KONDA

Data Engineer

+1-520-***-**** *****************@*****.*** LinkedIn

CAREER OBJECTIVE

Results-driven Data Engineer with 5+ years of experience designing, building, and optimizing scalable data pipelines and infrastructure on AWS, Azure, and GCP. Skilled in transforming raw data into actionable insights through ETL processes, data warehousing, and advanced analytics solutions.

PROFILE SUMMARY

Senior Data Engineer with 5+ years of experience designing, building, and optimizing scalable, high-performance data pipelines across multi-cloud environments (Azure, AWS, GCP).

Domain expertise in healthcare, e-commerce, and financial services, with a focus on delivering analytics-ready data platforms with strong governance and data quality controls.

Current specialization as an Azure Data Engineer, proficient in Azure Databricks, ADF, ADLS, Synapse, Delta Lake, dbt, and Terraform for end-to-end data solution delivery.

Proven expertise in big data processing using PySpark/Spark, and building batch & streaming pipelines with Kafka, Kinesis, and Dataflow.

Extensive experience with cloud-native data warehousing solutions including Azure Synapse, Amazon Redshift, and BigQuery.

Strong practitioner of modern DevOps practices: implementing CI/CD pipelines, Infrastructure as Code (IaC), and managing containerized workloads with Docker, Kubernetes, Azure DevOps, and GitHub Actions.

Delivered petabyte-scale analytics at Amazon, enabling near-real-time insights and optimizing large-scale data lakes on AWS.

Built foundational experience in financial services at State Street, developing enterprise ETL pipelines with SQL Server, SSIS, and Python to support regulatory reporting, trade settlement, and audit-ready data.

Highly collaborative professional with strong experience in Agile environments, effectively partnering with data scientists, analysts, and business stakeholders to drive data-driven decision-making, operational efficiency, and regulatory compliance.

PROFESSIONAL EXPERIENCE

Fresenius Medical Care Waltham, MA, USA

Azure Data Engineer July 2025 – Present

•Actively developing SQL and PySpark optimization strategies (broadcast joins, caching, adaptive query execution) in Azure Databricks to improve pipeline performance and reduce compute costs.

•Currently managing schema evolution and data versioning using Delta Lake features such as time travel and merge operations for reliable analytics and auditing.

•Presently leveraging dbt and SQL-based transformations to standardize analytics-ready datasets in Azure Synapse and BigQuery.

•Actively monitoring and troubleshooting pipelines using Azure Monitor, Log Analytics, and GCP Cloud Monitoring, ensuring SLA compliance and proactive issue resolution.

•Currently implementing CI/CD pipelines using Azure DevOps and GitHub Actions to automate deployment of ADF pipelines, Databricks notebooks, and infrastructure changes.

•Presently containerizing data workloads using Docker and deploying scalable processing jobs for batch analytics and experimentation.

•Actively applying Infrastructure as Code (IaC) using Terraform to provision and manage Azure and GCP data resources consistently across environments. Currently enforcing data quality checks using Great Expectations and custom Python validation frameworks, reducing downstream data defects.

•Presently integrating REST APIs using Python and Azure Functions to ingest third-party healthcare and research data into Azure and GCP platforms.

•Actively collaborating in Agile/Scrum environments using JIRA and Confluence, partnering with data scientists, analysts, and clinicians to deliver analytics solutions.

•Currently supporting BI and reporting workloads using Power BI, Tableau, and SQL, enabling data-driven operational and clinical decision-making.

Amazon Hyderabad, TG, India

AWS Data Engineer Jan 2021 - Jul 2023

•Designed and implemented scalable batch data pipelines using AWS Glue and PySpark to process daily e-commerce transactions across global marketplaces, enabling analytics at petabyte scale.

•Implemented dynamic partitioning strategies in Amazon S3, optimizing storage costs and significantly improving query performance for downstream analytics teams.

•Enabled near–real-time inventory tracking by integrating Kafka streams and Amazon Kinesis with AWS Lambda and DynamoDB, reducing fulfillment delays by 20% during peak sales events.

•Led migration of legacy on-premises sales data to Amazon Redshift using AWS DMS and Step Functions, applying dimensional data modeling to support cross-regional BI and reporting.

•Developed Python-based data validation frameworks integrated with Amazon CloudWatch and Grafana, reducing manual product catalog validation efforts by 80% and ensuring accurate REST API data mappings.

•Optimized Amazon Athena query performance by transforming raw JSON logs into partitioned Parquet datasets using AWS Glue DataBrew, accelerating ad-hoc analytics for logistics and delivery optimization.

•Built and maintained serverless data lake architectures using Amazon S3, AWS Lake Formation, and HDFS, centralizing clickstream and transactional data from Aurora PostgreSQL systems.

•Implemented HBase-backed real-time lookup tables to support customer journey mapping and low-latency analytics use cases.

•Collaborated with DevOps teams to containerize Spark workloads using Docker, AWS ECR, and Amazon EKS, enabling efficient batch processing of terabyte-scale pricing datasets.

•Supported machine learning workflows by integrating curated datasets with Amazon SageMaker for model training, retraining, and experimentation.

•Implemented data security and governance controls using AWS IAM and AWS KMS, ensuring encryption at rest and in transit and maintaining GDPR compliance.

•Designed and executed disaster recovery strategies by replicating Amazon Redshift clusters across regions, ensuring high availability and business continuity during regional outages.

State Street Corporation Hyderabad, TG, India

Data Engineer May 2018 - Dec 2020

•Designed and developed enterprise ETL pipelines using SQL Server and SSIS to consolidate transactional data from legacy ERP systems and CRM platforms, including REST API–based data extractions.

•Streamlined daily batch ingestion workflows, improving data accuracy and reliability for month-end and quarter-end financial reporting.

•Partnered with compliance and regulatory teams to translate SEC and SOX reporting requirements into structured SQL workflows using dimensional data modeling, delivering audit-ready datasets.

•Automated recurring NAV and fund accounting reports using Python and T-SQL, enabling straight-through processing and significantly reducing manual intervention.

•Optimized high-volume trading and transaction tables by implementing indexing strategies and query tuning, improving performance of client portfolio and risk dashboards during peak market hours.

•Built robust data validation and reconciliation checks using T-SQL, minimizing trade settlement discrepancies and reducing downstream operational delays.

•Integrated REST API–based alerting mechanisms to notify operations teams of reconciliation mismatches, enabling faster issue resolution.

•Standardized deployment and version control by migrating SSIS packages and SSRS reports to a centralized Azure DevOps repository, improving collaboration across global development teams.

•Implemented Infrastructure as Code (IaC) practices using Terraform to manage database and ETL-related infrastructure in alignment with SDLC standards.

•Resolved data latency issues in intraday trade settlement pipelines by tuning SSIS buffer configurations, achieving SLA compliance for time-sensitive workflows.

•Collaborated with business analysts in an Agile/Scrum environment to document data lineage and transformation logic for critical asset management datasets, improving audit traceability.

•Supported SQL Server upgrades (2012/2014 to 2016) by testing and refactoring legacy stored procedures, ensuring zero downtime during production migrations.

EDUCATION

The University of Arizona, Tucson, Arizona USA July 2023 – May 2025

Master's in Information Science with a specialization in Machine Learning

SKILLS

Programming & Query Languages: Python, PySpark, SQL, T-SQL, Java, Scala, Spark

Cloud Platforms: Microsoft Azure, Amazon Web Services (AWS), Google Cloud Platform (GCP)

Azure (Current): Azure Data Factory (ADF), Azure Databricks, Snowflake, Azure Synapse Analytics, Azure Data Lake Storage Gen2 (ADLS), Azure Functions, Azure Monitor, Log Analytics, Azure DevOps

AWS: AWS Glue, Amazon S3, Amazon Redshift, Snowflake, Amazon Athena, AWS Lambda, AWS DMS, AWS Step Functions, Amazon Kinesis, DynamoDB, AWS Lake Formation, Amazon EMR, AWS KMS, AWS IAM, Amazon Aurora PostgreSQL, Amazon SageMaker, ECS

GCP: Google Cloud Storage (GCS), BigQuery, Dataflow (Apache Beam), Cloud Composer (Apache Airflow), GCP Cloud Monitoring

ETL, Orchestration & Data Modeling: SSIS, dbt, Apache Airflow, Dimensional Modeling, Star Schema, Data Lakehouse Architecture, Informatica, Talend, Kafka, Data Architecture, Data Engineering, ERM, Hive, Data Warehousing

DevOps & Infrastructure: Docker, Kubernetes (EKS), Terraform (IaC), GitHub Actions, Azure DevOps (Repos & Pipelines), CI/CD, Troubleshooting, Infrastructure As Code, Scripting

Data Quality, Governance & Security: Great Expectations, Data Validation Frameworks, Data Versioning & Schema Evolution, AWS KMS, Azure RBAC, IAM, GDPR Compliance

Monitoring & Logging: Azure Monitor, Log Analytics, Amazon CloudWatch, Grafana, GCP Cloud Monitoring

BI & Reporting: Power BI, Tableau, SSRS

Methodologies: Agile, Scrum, SDLC, Root Cause Analysis, Analytical Skills

Contact this candidate