Saikiran Gayaru Data Engineer
TX, USA +1-414-***-**** ****************@*****.***
SUMMARY
Senior Data Engineer with 6+ years of experience designing and implementing scalable cloud-native data platforms. Strong expertise in Google Cloud Platform (BigQuery, Dataflow, Cloud Composer), advanced SQL, Informatica, MS SQL, and enterprise data analytics solutions. Proven experience building analytics-ready datasets, optimizing BigQuery performance, and delivering governed reporting solutions using Looker, Power BI, and Qlik Sense. Experienced in production support, SLA management, and cross-functional collaboration across analytics and platform teams.
SKILLS
Snowflake & Data Modeling: Snowflake (Virtual Warehouses, Clustering, Micro-partition Optimization, Resource Monitors, Cost Governance), Dimensional Modeling (Star/Snowflake Schema), Domain-Oriented Modeling, Query Optimization, Secure Data Sharing.
Python & Engineering Practices: Python (Packaging, Logging, Metrics, Error Handling, Performance-Aware Processing), PySpark, Testable Code Design, Modular ETL Frameworks.
GCP Data Engineering: Big Query (Partitioning, Clustering, Query Optimization), Cloud Composer (Airflow), Dataflow (Batch & Streaming), Pub/Sub, Cloud Storage, Cloud Monitoring, IAM, VPC, GCP Infrastructure Frameworks.
Data Analytics: Looker Core (Explores, LookML modeling), Big Query Analytics, Qlik Sense, Power BI, Advanced SQL.
Secondary / Hybrid Skills: Informatica (ETL workflows), MS SQL Server, Oracle SQL, Advanced SQL (CTEs, Window Functions, Performance
Tuning).
SQL & ELT: Advanced SQL (CTEs, Window Functions, Query Optimization), Incremental Loads, CDC Concepts, Backfills, Error Handling
Strategies.
DBT & Transformation: dbt Core (Models, Macros, Tests, Seeds, Documentation), CI/CD Deployments, Version-Controlled Transformations,
Data Lineage.
Orchestration & Pipelines: OpenFlow (Flow Development, Execution, Monitoring, Troubleshooting), Airflow, Azure Data Factory.
Cloud & Operations: AWS (S3, Glue, Redshift, Lambda), Azure (ADF, Databricks), Production Monitoring, Incident Response, SLA Management.
DevOps & Governance: Git-based workflows, CI/CD (GitHub Actions), Terraform, RBAC, IAM, Data Governance.
Domains : Insurance, Financial Services, Real Estate.
PROFESSIONAL EXPERIENCE
Senior Data Engineer Anthem TX, USA Jan 2024 – Present
Designed, developed, and maintained scalable ETL/ELT pipelines using Azure Databricks, PySpark, and Spark SQL to process large-scale healthcare claims, membership, and provider datasets at Anthem.
Built and maintained analytics-ready datasets in BigQuery to support claims reporting, regulatory compliance, and operational KPIs across multiple lines of business.
Developed complex SQL transformations in BigQuery using window functions, CTEs, and performance optimization techniques to support quality metrics and reimbursement analytics.
Built and scheduled ETL workflows using Cloud Composer (Airflow DAGs) to orchestrate ingestion of claims, eligibility, and provider data from upstream systems.
Implemented batch and streaming data pipelines using Dataflow to process high-volume healthcare transactions and near real-time operational feeds.
Migrated and integrated data from MS SQL Server and Oracle-based legacy healthcare systems into BigQuery using Informatica and custom ingestion frameworks.
Designed scalable BigQuery-based data marts supporting HEDIS reporting, claims analytics, enrollment analysis, and executive dashboards.
Optimized BigQuery performance using partitioning, clustering, and query plan analysis, reducing query runtime by 35% and improving report delivery SLAs.
Built governed, analytics-ready datasets consumed by Looker and Power BI dashboards for leadership and compliance reporting.
Implemented IAM-based access controls and dataset-level security to ensure HIPAA-compliant access management.
Configured Cloud Monitoring alerts and SLA tracking to proactively detect pipeline failures and ensure timely regulatory reporting.
Partnered with business stakeholders including claims, provider operations, and compliance teams to translate KPIs into scalable data models.
Provided production support, incident resolution, and performance troubleshooting for enterprise healthcare reporting systems.
Participated in infrastructure configuration including VPC setup, service account governance, and Terraform-based deployment automation.
Environment: GCP (BigQuery, Cloud Composer, Dataflow, Pub/Sub, Cloud Monitoring, IAM), Azure Databricks, PySpark, Informatica, MS SQL Server,
Oracle, Looker, Power BI, Git, Terraform
Data Engineer Mphasis India Mar 2021 – Oct 2022
Designed and maintained scalable ETL pipelines using Python and SQL to process 1.5+ TB of daily transaction and customer data for a global banking client, reducing batch failures by 45%.
Built data ingestion and transformation workflows supporting credit card, loan, and customer portfolio analytics across 7 regions.
Developed reusable Python-based data validation frameworks (500+ scripts) to enforce data quality checks, reducing downstream production defects and manual QA efforts.
Designed dimensional data models and transformation logic for risk, churn, and fraud analytics use cases.
Built interactive Tableau dashboards for senior leadership to monitor churn, portfolio risk, and performance KPIs, contributing to a 13% reduction in customer attrition.
Implemented clustering models (KMeans, DBSCAN) in Python for credit card risk segmentation, improving targeted marketing response rates by 21%.
Automated MIS reporting pipelines consumed by 40+ branch managers, eliminating 15+ hours/week of manual Excel effort per user.
Partnered with fraud analytics and product teams to recalibrate detection KPIs using historical transaction data, increasing fraud detection accuracy by 9%.
Documented 300+ source-to-target mappings and transformation rules for loan processing systems, improving audit traceability and reducing issue resolution time.
Provided production support, root cause analysis, and performance tuning for enterprise reporting pipelines.
Environment: Python, SQL Server, Oracle, Tableau, Pandas, NumPy, Scikit-learn, Git, Excel Automation (VBA), Linux, Banking Data
Warehousing Systems.
Data Engineer Prop Technology India Jun 2019 – Feb 2021
Designed ETL workflows to process and analyze 60,000+ property listings and rental agreements, enabling dynamic pricing strategies across 12 cities.
Built SQL-based transformation pipelines to normalize historical sales and rental data across multiple platforms, improving long-term trend analysis accuracy.
Developed automated data reconciliation scripts in Python to validate property listings and agreements, improving SLA compliance and reducing manual verification.
Integrated third-party APIs (Google Maps, Tax Boards, Zoning Authorities) to enrich property metadata, increasing structured data completeness from 68% to 96%.
Created Power BI and Google Data Studio dashboards for sales and operations teams, increasing lead conversion by 22% through data-driven regional targeting.
Conducted inventory and pricing analytics to identify seasonal demand fluctuations, enabling optimized marketing campaign timing and improved engagement.
Automated executive-level reporting workflows, reducing ad-hoc reporting requests and enabling proactive business reviews.
Supported database performance tuning and query optimization to improve dashboard refresh speed and reporting accuracy.
Environment: Python, SQL, MySQL, Power BI, Google Data Studio, REST APIs, Pandas, Excel, Git, Property Management Systems.
EDUCATION
Master in Information Technology Management University of Wisconsin Milwaukee, USA May 2024
Bachelor in Commerce in Information Technology ST. joseph’s degree and PG college, India May 2020
PROJECTS
GenAI Data Catalog Automation: Designed and implemented an AI-driven metadata enrichment solution to automate data classification, tagging, and
governance across analytics datasets. Leveraged foundation models (Bedrock / OpenAI APIs) to improve data discoverability, usability, and trust for
analytics and reporting consumers, accelerating insight delivery and reducing manual governance effort.
Enterprise Data Platform Migration to Snowflake
Environment: Snowflake, dbt Core, OpenFlow, Python, SQL, AWS S3, Terraform, GitHub Actions.
Led the design and implementation of a production-grade Snowflake data platform to modernize legacy ETL pipelines and enable scalable analytics across
commercial, finance, and operational domains.