SAI KIRAN KURVA
DATA ENGINEER
806-***-**** **************@*****.*** www.linkedin.com/in/sai-kiran-kurva SUMMARY
Data Engineer with 5+ years of experience building scalable, AI-integrated data platforms and pipelines across AWS, Azure, and Snowflake enabling analytics, machine learning, and GenAI applications. Designed and deployed real-time and batch ETL pipelines using PySpark, Airflow, and Glue, processing over 50 TB of data monthly while reducing latency by up to 87%. Led cloud migrations and configured data quality frameworks, cutting manual validation by 10+ hours weekly. Built Lakehouse environments and self-service BI platforms that expedited analytics for 50+ business users, ensuring compliance with HIPAA, GDPR, and banking standards. TECHNICAL SKILLS
• Languages: Python, SQL, Scala, Shell, Java (Basic)
• Big Data & ETL: Spark, Kafka, Flink, Hadoop (MapReduce, Hive, HBase, Pig, Sqoop), NiFi, Glue, Informatica, Talend, SSIS
• Orchestration & CI/CD: Airflow, Step Functions, Control-M, Jenkins, GitHub Actions, Terraform, CloudFormation, Docker
• Cloud & Data Warehousing: AWS (S3, Redshift, Lambda, Glue, EMR, Athena, RDS, EC2), Azure (ADF, Blob, Synapse), Oracle Cloud, Snowflake, Big Query, Teradata, SQL Server
• Databases: PostgreSQL, MySQL, Oracle, MongoDB, Cassandra, DynamoDB
• Monitoring & Logging: CloudWatch, Grafana, Splunk, Prometheus, Log4j
• AI & ML Tools: MLflow, Feature Store (Databricks / SageMaker), LLM dataset preparation, Bias validation frameworks, AI-driven data pipelines
• Data Management & Formats: DataOps, Governance, Data Quality, Metadata, Parquet, Avro, JSON, XML, CSV
• BI & Version Control: Power BI, Tableau, Quick Sight, Git, Bitbucket, SVN EXPERIENCES
Citizens Bank USA
Data Engineer July 2024 – Present
• Architected AWS-based real-time and batch pipelines (Glue, Lambda, Redshift) using PySpark, reducing analytics latency from 2 hours to 15 minutes through SLA-driven optimization.
• Engineered a scalable data ingestion framework, acquiring data from Kafka, third-party APIs, and on-premise databases into AWS S3; enabled a 50% increase in data source integration.
• Partnered with data science and AI teams to deliver feature-store datasets and inference pipelines supporting fraud detection and customer segmentation models.
• Built governed data flows and bias-validation checks for ML and GenAI workloads, ensuring model input quality and transparency.
• Orchestrated complex ETL workflows by developing and managing Airflow DAGs, which enhanced data pipeline reliability and increased the success rate to over 95% for all critical financial reporting jobs.
• Optimized Spark applications by refining partitioning, memory configurations, and job parallelism, achieving a 50% runtime improvement and reducing cloud compute costs by an estimated $18,000 annually.
• Pioneered the integration of Snowflake as an analytical layer over S3, creating a data Lakehouse environment that advanced self-service BI and empowered 50+ business users with faster Tableau reporting capabilities.
• Automated data quality validation and pipeline monitoring by implementing scripts, CloudWatch alerts, and Grafana dashboards, initiated identifying issues and increasing stakeholder trust in executive-level dashboard metrics.
• Implemented change data capture (CDC) and schema evolution in Redshift data pipelines, standardizing data processing and reducing manual data correction efforts by 10 hours per week.
• Introduced advanced data encryption and IAM-based access control across 3 key systems, maintaining compliance with banking security standards and drastically reducing vulnerabilities. Moderna USA
Data Engineer December 2021 – July 2023
• Spearheaded the development of an AWS data Lakehouse (S3, Glue, Redshift) to centralize Molecular, clinical, and R&D data, cutting two months from the vaccine development timeline.
• Developed PySpark data pipelines to transform and process sensitive laboratory instrument logs and clinical data, and executed reusable data masking and tokenization modules for 15+ data sources.
• Orchestrated complex data workflows using AWS Glue and Batch, automating the ingestion of real-time research notes and instrument data, which compressed manual data handling by 20 hours per week for laboratory personnel.
• Pioneered the adoption of TDD and Pytest, boosting test coverage, and reducing critical data errors related to data transformations by 90%, improving data system efficiency.
• Curated and prepared high-quality, feature-rich datasets in direct partnership with the data science team, directly contributing to the training of ML models that putative Transcript vaccine efficacy with Upgraded accuracy.
• Architected the AWS Glue Data Catalog and Deployed detailed data lineage tracking, modernizing dataset discovery for analytics teams and slashed data search time by 83%.
• Programmed data infrastructure provisioning by contributing to Terraform scripts for S3, IAM, and Redshift, establishing a repeatable and secure foundation that consolidated environment setup time from 3 days to under 4 hours.
• Integrated CI/CD for ETL deployments via GitHub Actions and Terraform, ensuring consistent infrastructure provisioning.
Farmers Insurance USA
Data Engineer June 2019 – November 2021
• Constructed resilient Azure Data Factory pipelines, migrating 15 TB of mission-critical data from on-premise SQL Servers, facilitating a 25% increase in data accessibility for stakeholders.
• Optimized reusable PySpark notebooks within Azure Databricks to execute data transformations, applying partitioning strategies that improved query performance for structured datasets by 40% and alleviated compute costs.
• Initiated oriented monitoring framework using Azure Log Analytics and custom alerts to track pipeline health, identifying and documenting 20+ potential failure points to assist the team in maintaining a 99.9% pipeline success rate.
• Validated data integrity and load completeness by writing and executing complex SQL queries, directly supporting BI developers in the creation of 5+ key dashboards for clinical research teams.
• Automated data quality checks and unit tests for production pipelines using PyTest, working closely with the QA team to ensure deployment stability and eliminate 3 recurring data defects from release cycles.
• Accelerated team onboarding and knowledge sharing by creating comprehensive documentation and training wikis for internal data engineering processes, reducing the ramp-up time for 2 new hires by 2 weeks each. EDUCATION
Texas Tech University – Masters of Science in Computer Sciences CERTIFICATION
• Google Data Analytics Certificate
• Microsoft Certified: Azure Data Engineer Associate
• AWS Certified Data Analytics - Specialty