Data Engineer Azure

Location:

St. Louis, MO

Posted:

June 25, 2025

Contact this candidate

Resume:

KOKKONDA REVANTH REDDY

Sr Data Engineer 314-***-**** **********************@*****.*** Linkedin

PROFESSIONAL SUMMARY

Highly accomplished Sr. Data Engineer with 10 years of experience architecting and implementing scalable data solutions across diverse industries (healthcare, finance, retail) utilizing Azure, AWS, and GCP. Proven expertise in building and optimizing data pipelines, data warehouses, and data lakes. Adept at leveraging big data technologies, ETL tools, and advanced data processing techniques to optimize data flow, efficiency, and performance. Experience in architecting event-driven platforms using AWS MSK and Kinesis Data Streams. Proficient in building machine learning-powered financial forecasting systems using Google AI Platform with TensorFlow, accurately predicting market volatility to inform real-time trading strategies

Proficient in utilizing Infrastructure-as-Code (IaC) with AWS CDK, embedding unit tests to promote repeatable deployments across development, staging, and production environments. Expertise in Snowflake security architecture using row-level security, column-level masking, and custom roles. Experience includes designing and maintaining scalable data pipelines using Azure Data Factory (ADF) and Synapse Analytics, integrating diverse data sources into Azure Data Lake. Expertise lies in implementing Delta Lake architecture with Databricks, enabling ACID transactions, streaming, and batch processing. Proficient in developing automated data validation and transformation processes using PySpark and ADF, ensuring accurate data integration. Experience extends to developing data visualizations using Azure Analysis Services and Power BI. Experience developing data transformation frameworks using Snowflake Stored Procedures with JavaScript and SQL. Proficient in utilizing GCP services like BigQuery, Dataflow, and Cloud Composer

Proficient in Amazon S3 for secure, cost-effective data storage and management, with hands-on experience using Amazon ECR for containerized application deployment. Extensive knowledge of AWS Lambda for building serverless functions to streamline data processing and integration, and strong capabilities in Amazon EMR for large-scale data processing. Experienced with Amazon Athena for efficient querying and data analysis, and AWS Glue for creating robust ETL workflows, data transformations, and schema management. Proficient in using AWS Iceberg for managing large datasets with enhanced storage efficiency and version control. Skilled in building event-driven architectures with Amazon Event Bridge, ensuring seamless integration and real-time data flow between distributed systems.

PROFESSIONAL EXPERIENCE

SR DATA ENGINEER EXPRESS SCRIPTS ST LOUIS, MO Jun 2023 – present

Designed, developed, and maintained scalable data pipelines using Azure Data Factory (ADF) and Azure Synapse Analytics, integrating patient data, medical records, and clinical data from various healthcare systems into Azure Data Lake for advanced analytics and regulatory reporting.

Implemented Delta Lake architecture with Databricks to enable ACID transactions, streaming, and batch processing on healthcare data, creating Delta Tables for storing sensitive clinical and operational data with high reliability and consistency.

Integrated and maintained ERP systems with Azure Synapse Analytics and Azure Data Lake, centralizing clinical, billing, and inventory data for improved reporting, compliance, and business intelligence in healthcare settings.

Developed and implemented automated healthcare data validation and transformation processes using PySpark and Azure Data Factory, ensuring accurate and consistent integration of electronic health records (EHR) and patient data across multiple healthcare systems.

Developed and maintained healthcare data visualizations using Azure Analysis Services and Power BI, delivering real-time dashboards for healthcare administrators and clinicians to track patient outcomes, hospital operations, and treatment efficacy.

Implemented advanced security architecture in Snowflake using row-level security policies, column-level masking, and custom role hierarchies to enforce data governance while maintaining GDPR and SOC2 compliance for sensitive financial data.

Developed comprehensive data transformation framework using Snowflake Stored Procedures with JavaScript and SQL, automating complex ETL workflows with embedded error handling.

Utilized GCP services such as BigQuery for large-scale healthcare data analysis, Dataflow for real-time processing of patient data streams, and Cloud Composer for orchestrating healthcare data workflows, enabling faster decision-making and efficient operations.

Developed and optimized SQL queries and data models for querying healthcare data in Azure SQL Databases and Google BigQuery, ensuring fast and reliable access to patient records, claims data, and health outcomes for analysis and reporting.

Integrated healthcare data from clinical trials and patient monitoring systems into a centralized data warehouse, leveraging Azure Synapse Analytics and BigQuery, enabling real-time insights into clinical outcomes and improving the efficacy of treatment plans.

Implemented secure data sharing between healthcare providers and external organizations using Kafka, Azure Data Lake, and GCP Pub/Sub, ensuring seamless, compliant data exchange while maintaining the integrity and privacy of sensitive patient information.

Automated healthcare data pipelines using Apache Airflow and Cloud Composer, ensuring seamless orchestration and timely delivery of patient records, lab results, and operational data to downstream analytics platforms for actionable insights.

Managed healthcare data streaming and real-time processing with Kafka and Pub/Sub, facilitating the processing of real-time patient monitoring data, clinical event data, and alerting systems for healthcare operations and decision-making.

Utilized Google Cloud Storage for scalable and secure storage of patient records, medical images, and clinical data, optimizing access for healthcare providers and researchers while ensuring compliance with regulatory standards.

SR DATA ENGINEER UBS SAN JOSE, CA Sep 2021 – May 2023

Architected and implemented an event-driven financial data platform using AWS MSK and Kinesis Data Streams capable of processing 5M+ transactions per second with 99.99% durability guarantees.

Engineered serverless ETL pipelines using AWS Glue with custom PySpark transforms that reduced processing latency by 67% while handling 30TB+ of daily financial transaction data.

Designed and implemented BigQuery-based data warehouse with materialized views and partitioning strategies that processed financial data with sub-second query response times.

Engineered real-time data processing pipeline using Dataflow with custom Apache Beam transforms that reduced ETL processing time for market data analysis.

Implemented event-driven architecture using Cloud Pub/Sub with exactly-once delivery semantics, handling 4M+ financial events per minute.

Implemented fine-grained security controls using AWS IAM roles, KMS encryption, and VPC endpoint policies to maintain GDPR and PCI-DSS compliance for all financial data processing.

Designed and deployed multi-region data replication architecture using AWS S3 Cross-Region Replication with automated failover capabilities.

Designed and engineered Snowflake data pipelines with dynamic materialized views and zero-copy cloning, reducing storage costs while enabling parallel development environments for multiple teams.

Built real-time anomaly detection system using AWS Lambda integrated with DynamoDB streams to identify fraudulent patterns of transaction events.

Developed custom data quality validation framework using Google Cloud Functions with Cloud Tasks that identified and remediated data anomalies before warehouse ingestion.

Architected multi-regional backup strategy using Google Cloud Storage with object versioning and retention policies that maintained audit history for regulatory compliance.

Built ML-powered financial forecasting system using Google AI Platform with TensorFlow to predict market volatility with 92% accuracy, informing real-time trading strategies.

Orchestrated complex data transformation workflows using AWS Step Functions integrated with SQS dead-letter queues.

Optimized Amazon RDS, PostgreSQL instances with custom parameter groups and read replicas, reducing query latency by 72% for critical financial reporting workloads.

Implemented infrastructure-as-code using AWS CDK (TypeScript) with embedded unit tests, enabling repeatable deployments across development, staging, and production environments.

Designed auto-scaling data ingestion platform using AWS Application Auto Scaling policies that dynamically adjusted ECS container counts based on SQS queue depth metrics.

Developed custom CloudWatch dashboards with anomaly detection bands to proactively identify data pipeline performance degradation before impacting business operations.

Architected data lake solution using S3 with partitioning schemes and Athena query optimization, reducing average query execution time.

Engineered fault-tolerant API gateway using Amazon API Gateway with custom Lambda authorizers and AWS WAF rules to protect financial data endpoints from malicious traffic.

Created automated compliance reporting system using AWS Config with custom rules that generated daily audit reports for SOC2 and FINRA regulatory requirements.

AZURE DATA ENGINEER GIANT EAGLE GLENSHAW, PA Jan 2020 – Aug 2021

Designed and implemented end-to-end data solutions to align with retail business needs, translating complex requirements into efficient technical architectures and seamless data flows for retail operations and customer insights.

Built and automated data pipelines to ingest, clean, transform, and aggregate large volumes of retail data from various sources, ensuring real-time access to analytics and business intelligence for improved decision-making.

Developed a hybrid ETL pipeline using Informatica PowerCenter for on-premises data sources and Informatica Cloud Data Integration for cloud-based data, resolving connectivity issues and achieving seamless data integration across environments.

Designed and deployed a star-schema data warehouse on Azure Synapse Analytics, utilizing partitioned tables and clustered column store indexes to optimize query performance.

Integrated RESTful web services from third-party vendors using Azure Logic Apps and secured them with OAuth 2.0 authentication via Azure API Management, enabling real-time data exchange with e-commerce platforms.

Leveraged Azure Stream Analytics and Event Hubs to design real-time data pipelines, enabling fast processing of customer behavior and transaction data, driving dynamic retail strategies and personalized customer experiences.

Developed predictive analytics models using Azure Machine Learning and Python to forecast customer demand, optimize inventory management, and personalize retail marketing campaigns, driving improved sales and customer satisfaction.

Developed optimized data models and schema designs for relational and NoSQL databases (Oracle, MySQL, Redshift), ensuring fast and reliable access to retail data for analytics and operational decisions.

Applied Spark (PySpark) with optimized partitioning strategies to process 5TB daily retail transaction data, reducing processing time by 30% compared to previous Hive-based solutions.

Design and automation of customer segmentation strategies using Azure Data Factory and SQL, enabling dynamic customer segmentation for targeted marketing and personalized promotions based on purchasing behavior.

Implemented real-time data processing for retail promotions and sales campaigns using Azure Stream Analytics and Event Hubs, ensuring that offers and discounts were dynamically adjusted based on live customer and transactional data.

Utilized Azure Synapse Analytics for large-scale retail data transformation, enabling real-time analytics that support inventory optimization, demand forecasting, and personalized marketing.

Built and managed interactive Tableau dashboards and visualizations, allowing retail managers to access key performance indicators (KPIs) and make data-driven decisions quickly for improved store operations.

Applied big data technologies like Spark and Hive to handle large-scale retail datasets, enabling efficient processing of sales data, customer interactions, and supply chain analytics.

DATA ENGINEER ELI LILLY AND COMPANY INDIANAPOLIS, IN Mar 2018 – Dec 2019

Designed, developed, and deployed high-performance data pipelines in AWS, adhering to modern data Lakehouse architecture to ensure efficient data processing, storage, and retrieval.

Built reusable data components in AWS Lambda, Glue, and Step Functions, accelerating enterprise-wide data delivery and streamlining data engineering workflows.

Implemented data integrity checks, security protocols, and privacy controls in AWS, ensuring that data processing and storage meet industry standards and organizational requirements.

Developed self-healing automation features for data pipelines, integrating error detection, anomaly identification, and automated recovery systems to maintain uninterrupted service and mitigate data issues.

Collaborated with cross-functional teams to design, build, and optimize cloud-based data solutions in Amazon S3, Redshift, and Athena, ensuring alignment with business and analytics needs.

Utilized PySpark for data transformation, processing large-scale datasets efficiently in distributed computing environments, ensuring scalable and cost-effective processing.

Leveraged AWS Glue for ETL (Extract, Transform, Load) operations, building custom scripts to streamline data ingestion, transformation, and loading across different systems.

Implemented CI/CD pipelines using GitHub, automating code deployment processes for continuous integration and delivery, ensuring smooth and efficient updates to data pipeline solutions.

Conducted performance tuning and optimization of SQL queries and Python scripts, improving the efficiency and scalability of data processing workflows.

Maintained a comprehensive system testing for data pipelines, performing unit, integration, and performance testing to ensure reliability and mitigate defects before deployment.

AWS DATA ENGINEER CLAIRVOYANT INDIA Sep 2014 - Aug 2017

Designed, developed, and tested data pipelines to ingest large-scale datasets into AWS cloud platforms, ensuring seamless integration for analytics and reporting needs.

Enhanced and optimized existing AWS-based data pipelines to improve data flow, efficiency, and performance.

Troubleshot and resolved data pipeline issues using AWS monitoring tools to ensure high data reliability and performance.

Ensured data consistency and reliability within the AWS cloud infrastructure by implementing automated data quality checks and pipeline validation processes.

Worked with AWS Glue, Amazon Redshift, Fivetran, and DBT to support data processing, transformation, and analytics.

Collaborated with cross-functional teams to identify and gather data requirements, and developed scalable, cloud-based data solutions for business intelligence and reporting.

Applied data warehousing principles within AWS to structure, store, and organize large datasets for seamless querying and analysis.

Utilized SQL and Python for data manipulation, building custom data extraction, transformation, and loading (ETL) processes within the AWS ecosystem.

Implemented automated data processing workflows using AWS Lambda and AWS Step Functions to streamline and optimize data pipeline operations.

Actively sought new technologies and AWS services to enhance data engineering capabilities, maintaining a proactive approach to adopting innovative solutions.

TECHNICAL SKILLS

Hadoop Components / Big Data

HDFS, Hive, HBase, Kafka, Yarn, Pyspark, AWS MSK, Airflow, Kafka, Snowflake,

Programing Languages

Scala, SQL, Python, Hive QL, KSQL, Boto3, Java

IDE Tools

Eclipse, IntelliJ, PyCharm, VS Code.

Cloud platform

GCP (BigQuery, Cloud Composer, Cloud Storage, Dataflow, Pub/Sub), AWS, Azure.

ETL Tools

Talend, DBT Cloud, Pentaho, AWS GLUE.

Databases

Oracle, SQL Server, MySQL, Druid, MS Access, NoSQL Database (HBase, Cassandra, MongoDB), T-SQL

Data Analysis Libraries

Pandas, NumPy, SciPy, Scikit-learn, NLTK, Matplotlib

Data Warehousing

Teradata, BigQuery, Redshift, Snowflake, Azure Synapse Analytics.

Data Migration Expertise

Teradata-to-GCP migrations, claims systems integration.

BI Tools

Alteryx, Tableau Power BI, Sisense, Streamlit, Looker.

Containerization

Docker, Kubernetes

CI/CD Tools

Jenkins, Bamboo, GitLab

Operating Systems

UNIX, LINUX, Ubuntu, CentOS.

Contact this candidate