Data Engineer - AI-Driven Data Platforms Expert

Location:

Valley Ranch, TX, 75063

Salary:

70000

Posted:

November 18, 2025

Contact this candidate

Resume:

SAI KIRAN KURVA

DATA ENGINEER

806-***-**** **************@*****.*** www.linkedin.com/in/sai-kiran-kurva SUMMARY

Data Engineer with 5+ years of experience building scalable, AI-integrated data platforms and pipelines across AWS, Azure, and Snowﬂake enabling analytics, machine learning, and GenAI applications. Designed and deployed real-time and batch ETL pipelines using PySpark, Airﬂow, and Glue, processing over 50 TB of data monthly while reducing latency by up to 87%. Led cloud migrations and conﬁgured data quality frameworks, cutting manual validation by 10+ hours weekly. Built Lakehouse environments and self-service BI platforms that expedited analytics for 50+ business users, ensuring compliance with HIPAA, GDPR, and banking standards. TECHNICAL SKILLS

• Languages: Python, SQL, Scala, Shell, Java (Basic)

• Big Data & ETL: Spark, Kafka, Flink, Hadoop (MapReduce, Hive, HBase, Pig, Sqoop), NiFi, Glue, Informatica, Talend, SSIS

• Orchestration & CI/CD: Airﬂow, Step Functions, Control-M, Jenkins, GitHub Actions, Terraform, CloudFormation, Docker

• Cloud & Data Warehousing: AWS (S3, Redshift, Lambda, Glue, EMR, Athena, RDS, EC2), Azure (ADF, Blob, Synapse), Oracle Cloud, Snowﬂake, Big Query, Teradata, SQL Server

• Databases: PostgreSQL, MySQL, Oracle, MongoDB, Cassandra, DynamoDB

• Monitoring & Logging: CloudWatch, Grafana, Splunk, Prometheus, Log4j

• AI & ML Tools: MLﬂow, Feature Store (Databricks / SageMaker), LLM dataset preparation, Bias validation frameworks, AI-driven data pipelines

• Data Management & Formats: DataOps, Governance, Data Quality, Metadata, Parquet, Avro, JSON, XML, CSV

• BI & Version Control: Power BI, Tableau, Quick Sight, Git, Bitbucket, SVN EXPERIENCES

Citizens Bank USA

Data Engineer July 2024 – Present

• Architected AWS-based real-time and batch pipelines (Glue, Lambda, Redshift) using PySpark, reducing analytics latency from 2 hours to 15 minutes through SLA-driven optimization.

• Engineered a scalable data ingestion framework, acquiring data from Kafka, third-party APIs, and on-premise databases into AWS S3; enabled a 50% increase in data source integration.

• Partnered with data science and AI teams to deliver feature-store datasets and inference pipelines supporting fraud detection and customer segmentation models.

• Built governed data ﬂows and bias-validation checks for ML and GenAI workloads, ensuring model input quality and transparency.

• Orchestrated complex ETL workﬂows by developing and managing Airﬂow DAGs, which enhanced data pipeline reliability and increased the success rate to over 95% for all critical ﬁnancial reporting jobs.

• Optimized Spark applications by reﬁning partitioning, memory conﬁgurations, and job parallelism, achieving a 50% runtime improvement and reducing cloud compute costs by an estimated $18,000 annually.

• Pioneered the integration of Snowﬂake as an analytical layer over S3, creating a data Lakehouse environment that advanced self-service BI and empowered 50+ business users with faster Tableau reporting capabilities.

• Automated data quality validation and pipeline monitoring by implementing scripts, CloudWatch alerts, and Grafana dashboards, initiated identifying issues and increasing stakeholder trust in executive-level dashboard metrics.

• Implemented change data capture (CDC) and schema evolution in Redshift data pipelines, standardizing data processing and reducing manual data correction eﬀorts by 10 hours per week.

• Introduced advanced data encryption and IAM-based access control across 3 key systems, maintaining compliance with banking security standards and drastically reducing vulnerabilities. Moderna USA

Data Engineer December 2021 – July 2023

• Spearheaded the development of an AWS data Lakehouse (S3, Glue, Redshift) to centralize Molecular, clinical, and R&D data, cutting two months from the vaccine development timeline.

• Developed PySpark data pipelines to transform and process sensitive laboratory instrument logs and clinical data, and executed reusable data masking and tokenization modules for 15+ data sources.

• Orchestrated complex data workﬂows using AWS Glue and Batch, automating the ingestion of real-time research notes and instrument data, which compressed manual data handling by 20 hours per week for laboratory personnel.

• Pioneered the adoption of TDD and Pytest, boosting test coverage, and reducing critical data errors related to data transformations by 90%, improving data system eﬃciency.

• Curated and prepared high-quality, feature-rich datasets in direct partnership with the data science team, directly contributing to the training of ML models that putative Transcript vaccine eﬃcacy with Upgraded accuracy.

• Architected the AWS Glue Data Catalog and Deployed detailed data lineage tracking, modernizing dataset discovery for analytics teams and slashed data search time by 83%.

• Programmed data infrastructure provisioning by contributing to Terraform scripts for S3, IAM, and Redshift, establishing a repeatable and secure foundation that consolidated environment setup time from 3 days to under 4 hours.

• Integrated CI/CD for ETL deployments via GitHub Actions and Terraform, ensuring consistent infrastructure provisioning.

Farmers Insurance USA

Data Engineer June 2019 – November 2021

• Constructed resilient Azure Data Factory pipelines, migrating 15 TB of mission-critical data from on-premise SQL Servers, facilitating a 25% increase in data accessibility for stakeholders.

• Optimized reusable PySpark notebooks within Azure Databricks to execute data transformations, applying partitioning strategies that improved query performance for structured datasets by 40% and alleviated compute costs.

• Initiated oriented monitoring framework using Azure Log Analytics and custom alerts to track pipeline health, identifying and documenting 20+ potential failure points to assist the team in maintaining a 99.9% pipeline success rate.

• Validated data integrity and load completeness by writing and executing complex SQL queries, directly supporting BI developers in the creation of 5+ key dashboards for clinical research teams.

• Automated data quality checks and unit tests for production pipelines using PyTest, working closely with the QA team to ensure deployment stability and eliminate 3 recurring data defects from release cycles.

• Accelerated team onboarding and knowledge sharing by creating comprehensive documentation and training wikis for internal data engineering processes, reducing the ramp-up time for 2 new hires by 2 weeks each. EDUCATION

Texas Tech University – Masters of Science in Computer Sciences CERTIFICATION

• Google Data Analytics Certiﬁcate

• Microsoft Certiﬁed: Azure Data Engineer Associate

• AWS Certiﬁed Data Analytics - Specialty

Contact this candidate