SURYA TEJA
Senior Data Engineer Data Platform Builder Snowflake Databricks AWS
***************@*****.*** +1-205-***-**** https://www.linkedin.com/in/surya-teja-a-
PROFESSIONAL SUMMARY
Senior Data Engineer with 10+ years of experience building and owning enterprise data platforms across Banking and Healthcare. Strong hands-on background in Snowflake, Databricks, Airflow, and DBT with real experience taking platforms from scratch to production. Worked on pipelines processing 50M+ daily records, Kafka-based streaming, and regulated reporting. Primary cloud is AWS, with solid working knowledge of Azure and GCP. Comfortable leading small engineering teams, mentoring junior engineers, and working directly with business stakeholders to turn requirements into reliable data solutions. Available immediately.
CORE STRENGTHS
Platform Ownership : End-to-end platform ownership at Flagstar Bank and Cencora. ingestion through reporting.
Technical Depth : Primary stack: Snowflake, Databricks, Airflow, DBT, AWS. Kafka streaming, Azure, and GCP in supporting roles.
Domain Experience : 10+ years in Banking, Financial Services, and Healthcare. regulated reporting, governance, and compliance.
TECHNICAL SKILLS
Languages
Python, SQL, PySpark, Scala, Java
Snowflake
Schema Design, Virtual Warehouses, Snowpipe, Data Masking, UDFs, Column / Row / Tag-Based Security, Time Travel, Snowpark
Databricks
Delta Lake, Delta Live Tables (DLT), PySpark, Unity Catalog, Databricks SQL, Workflows
Streaming
Apache Kafka, Kafka Streams, Spark Structured Streaming, AWS Kinesis
Orchestration
Apache Airflow (DAGs, Operators, XComs), DBT Core & Cloud, AWS Step Functions, Databricks Scheduler, Control-M
AWS (Primary)
S3, Glue, Lambda, EMR, Step Functions, EC2, CloudWatch, Redshift, CloudFormation, Terraform
Azure (Working)
ADLS Gen2, Azure Data Factory, AAD, RBAC, Synapse, VPN Gateway
GCP (Working)
BigQuery, Cloud Storage, Cloud SQL, Dataflow
Data Warehousing
Snowflake, Amazon Redshift, BigQuery, Vertica, Teradata
Databases / NoSQL
PostgreSQL, MongoDB, Oracle, DB2, Teradata, DynamoDB
Data Modeling
Star Schema, Data Vault 2.0, Bronze / Silver / Gold Lakehouse
DevOps / CI-CD
Jenkins, GitHub Actions, Terraform, Docker, Kubernetes, OpenShift
Data Quality
Great Expectations, Data Profiling, Anomaly Detection, Lineage Tracking
ETL / Integration
AWS Glue, ADF, Informatica PowerCenter, Ab-Initio
BI & Monitoring
Tableau, QuickSight, Power BI, Splunk, CloudWatch
PROFESSIONAL EXPERIENCE
Flagstar Bank Apr 2023 – Present
Senior Data Engineer : Financial Services
•Owned end-to-end data platform design at Flagstar responsible for architecture decisions, pipeline development, and production support across ingestion, transformation, and reporting layers.
•Built real-time and batch pipelines using Kafka, PySpark, AWS Glue, and Databricks Delta Live Tables processing 50M+ daily records. maintained 99.9% uptime across financial reporting workflows.
•Wrote and maintained Apache Airflow DAGs for 30+ workflows across Snowflake, AWS, and GCP reduced manual intervention by 70% and improved on-time regulatory report delivery.
•Built DBT models for Bronze to Gold Snowflake transformation layers added automated testing and documentation that cut downstream data issues by 45%.
•Set up Snowflake security controls including column masking, row access policies, and tag-based governance to protect PII data across banking datasets.
•Led Teradata-to-Snowflake migration for 100+ tables wrote mapping specs and reconciliation scripts to validate data parity post-migration.
•Debugged production pipeline failures using CloudWatch and Airflow logs reduced mean time to resolution by 40%.
•Supported delivery of 15+ regulatory and executive reporting dashboards through Databricks SQL and BigQuery.
Environment: Python, SQL, PySpark, Kafka, Airflow, DBT, Snowflake, Databricks, Delta Lake, AWS (S3, Glue, Lambda, EMR), GCP (BigQuery), PostgreSQL, MongoDB, Teradata, Kubernetes, GitHub, Jira
Cencora Apr 2021 – Mar 2023
Senior Data Engineer : Healthcare
•Took ownership of data platform from the ground up designed pipeline architecture, built ingestion layers, and established transformation standards across 15+ source systems processing 20M+ daily records.
•Built Airflow DAGs and DBT models for Snowflake transformation workflows improved pipeline throughput by 35% and cut ad hoc SQL requests by 50%.
•Refactored 40+ PL/SQL procedures into Snowflake SQL using CTEs and window functions reduced average query runtime by 55%.
•Worked closely with business analysts, data scientists, and reporting teams to understand requirements, troubleshoot data issues, and make sure pipeline outputs matched what stakeholders actually needed.
•Enforced HIPAA-compliant governance using Azure AAD, RBAC, and encryption with lineage tracking passed compliance audit with zero findings.
•Added Great Expectations validation checks into pipelines caught 98%+ of data issues before they reached reporting layers.
•Built reusable Python ETL utilities for Snowflake and PostgreSQL saved the team 20+ hours of manual work per week.
•Delivered QuickSight dashboards to 200+ business users; implemented Jenkins CI/CD cutting release cycles from 2 weeks to 3 days.
Environment: Python, SQL, PySpark, Airflow, DBT, Snowflake, Azure (ADLS Gen2, AAD, ADF), AWS (Lambda, CloudWatch), PostgreSQL, Great Expectations, Jenkins, GitHub, Kubernetes, QuickSight, Jira
TIBCO Software Jul 2019 – Mar 2021
Senior Data Engineer : Enterprise Technology
•Owned pipeline reliability for 10M+ daily records across Snowflake, Databricks, and GraphDB primary point of contact for data issues across 50+ downstream reporting consumers.
•Designed and built Apache Airflow DAGs for 20+ enterprise workflows with retry logic, SLA monitoring, and dependency management achieved 99.5% on-time pipeline completion.
•Developed DBT transformation models with data contracts, automated quality tests, and lineage documentation standardized transformation layer used across 3 business units.
•Tuned PySpark jobs on Databricks through broadcast joins, partition optimization, and cluster sizing cut batch processing time by 60% and reduced compute costs significantly.
•Built Kafka-based event ingestion pipelines for near-real-time data availability reduced data latency from hours to minutes for operational reporting workflows.
•Ran deep data profiling across 100M+ records identified and resolved 500+ quality issues; set up CloudWatch and Splunk alerts that reduced pipeline failure rate by 45%.
•Worked directly with application owners and business stakeholders to gather requirements, debug data discrepancies, and deliver on-demand Tableau reports.
•Led and mentored a team of 4 junior engineers ran weekly code reviews and walkthroughs on Snowflake performance tuning and Airflow DAG best practices.
Environment: Python, SQL, PySpark, Kafka, Airflow, DBT, Snowflake, Databricks, AWS (CloudWatch, Step Functions), GraphDB, Tableau, Splunk, CI/CD, GitHub, Jira
Terralogic Software Solutions Jan 2018 – Jun 2019
Data Engineer : Snowflake & Cloud
•Migrated 80+ enterprise tables from Teradata to Snowflake and Redshift defined mappings and transformation rules that cut migration defects by 70%.
•Built ETL pipelines using Python, Spark, and AWS (Lambda, EMR, Step Functions) loading 5M+ daily records; deployed infrastructure via CloudFormation and Terraform.
•Rebuilt Ab-Initio business logic as Snowflake SQL validated 100% parity through reconciliation scripts; set up Jenkins CI/CD for Kubernetes microservices.
Environment: Snowflake, Redshift, Teradata, Ab-Initio, Python, PySpark, Spark, AWS (Lambda, EMR, CloudFormation), Terraform, Jenkins, Docker, Kubernetes, Airflow, GitHub
Zytrix Labs Aug 2015 – Dec 2017
Data Analyst / Data Engineer
•Gathered functional and non-functional requirements through JAD sessions with business stakeholders, ETL developers, and vendors translated them into data mappings, BRDs, and SQL transformation logic.
•Wrote advanced SQL (DML/DDL) against Oracle, DB2, and Teradata to extract and analyze large-scale enterprise datasets; used results to build Business Objects reports for operational decision-making.
•Developed Hive SQL transformation scripts for big data processing and created SQL validation scripts to verify transformation accuracy against defined business rules.
•Analyzed Ab-Initio business logic and mapped equivalent transformations in Informatica PowerCenter validated 100% functional parity through post-migration reconciliation.
•Designed Control-M jobs to automate 20+ daily batch loads reduced manual intervention by 80% and ensured consistent SLA adherence across reporting cycles.
•Participated in full Agile/SAFe Scrum lifecycle sprint planning, backlog refinement, daily standups, and retrospectives; wrote detailed user stories and use cases using JIRA.
Environment: Oracle, DB2, Teradata, Hive SQL, Informatica PowerCenter, Ab-Initio, Control-M, UNIX, Java, JIRA, Business Objects
EDUCATION
Bachelor of Engineering in Information Technology
Osmania University, Hyderabad, Telangana, India