Data Engineer Real-Time

Location:

Virginia Beach, VA

Posted:

September 10, 2025

Contact this candidate

Resume:

Sai Teja M

Technology: Data Engineer

Total Experience: 5+ Years

Email ID: *********.*@*****.*** Phone Number: 804-***-**** PROFESSIONAL SUMMARY

• Data Engineer with over 5 years of experience building cloud-native data solutions across AWS, Azure, and GCP. Skilled in designing data lakes, warehouses, and real-time streaming pipelines using tools like Spark

(Scala/PySpark), Kafka, Snowflake, Databricks, and BigQuery.

• Hands-on in ETL development and orchestration using Airflow, NiFi, ADF, and Informatica, integrating diverse data from APIs, flat files, and RDBMS. Experienced in real-time processing with Kafka, Kinesis, Event Hubs, and Spark Streaming for actionable analytics.

• Built secure data platforms with IAM, KMS encryption, masking, and HIPAA/GDPR compliance. Managed CI/CD pipelines using GitHub, Bitbucket, GitLab, Jenkins, and Azure DevOps. Proficient in Docker and Kubernetes for containerized data workloads.

• Collaborated with cross-functional teams to deliver ML feature pipelines, BI-ready datasets, and automated data quality checks using PyTest, ScalaTest, Deequ, and Great Expectations. Supported real-time analytics and reporting via Power BI, Looker, Tableau, and Athena.

• Optimized large-scale data processing in AWS EMR, Databricks, and GCP Dataproc, while maintaining cost efficiency, performance tuning, and data security standards. Passionate about building reliable, scalable, and production-grade data systems that drive business insights. TECHNICAL SKILLS

• Programming Languages: Python, Java, R, SQL, Scala, Shell Scripting

• Big Data Tools: Apache Spark, Hadoop, Hive, Pig, HDFS, Sqoop, Kafka, Flume

• ETL & Workflow Orchestration: Apache Airflow, AWS Glue, Informatica, Apache NiFi, SSIS

• Cloud Platforms: AWS (S3, Glue, Redshift, Lambda, EMR, Athena), Azure, Google Cloud Platform (GCP)

• Databases: Oracle, PostgreSQL, SQL Server, MongoDB, Snowflake

• Streaming Technologies: Apache Kafka, Spark Streaming, AWS Kinesis

• Data Visualization: Power BI, Tableau, Looker

• DevOps & Infrastructure: Git, Bitbucket, Jenkins, Terraform, Docker

• Other Tools & Frameworks: Databricks, Delta Lake, YARN, Confluent Schema Registry, JIRA, Confluence PROFESSIONAL EXPERIENCE

Liberty Mutual Insurance

July 2024 - Present Role: Data Engineer

• Built AWS S3 data lake using Parquet/Avro formats with schema standardization for claim and policy data.

• Developed ETL pipelines in Spark (Scala) on Databricks, integrating data from JSON, CSV, APIs, and Oracle.

• Migrated ETL from Informatica to Spark, improving cost and scalability.

• Enabled real-time policy updates with Kafka to Snowflake streaming pipelines for near real-time analytics.

• Created Snowflake star schema models, secured with row-level access, column masking, and GDPR compliance.

• Automated workflows using Airflow DAGs, with retries, SNS alerts, and checkpointing.

• Built unit tests with ScalaTest, validated data transformations, and logged pipelines via Log4j and CloudWatch.

• Integrated external APIs (weather, vehicle) for enriched risk models.

• Deployed Dockerized pipelines on Kubernetes, managed CI/CD using GitHub, GitFlow, Jenkins, and collaborated in Agile (Jira).

Key Bank

June 2021-July 2023 Role: Data Engineer

• Built a data lake on Azure ADLS for loan and transaction data using Parquet/ORC, structured for efficient access.

• Developed ETL pipelines in PySpark (Databricks), integrating CSV, JSON, AVRO, SQL Server, and API data with schema standardization.

• Enabled real-time transaction ingestion via Event Hubs & Structured Streaming, pushing enriched data to consumers.

• Migrated Teradata to Snowflake, rewriting logic with Snowflake SQL/UDFs and building Star Schema models for OLAP workloads.

• Ensured data security with Azure RBAC, Snowflake masking, KMS encryption, and row-level security for compliance.

• Automated workflows via ADF, logged pipelines in Azure Log Analytics, and enforced data quality with PyTest.

• Published curated data using Hive SQL and Power BI, enabled self-service Snowflake querying for analysts.

• Managed CI/CD with Bitbucket and Azure DevOps, captured lineage and metadata for governance, and created ML feature sets in Snowflake.

• Worked in Agile sprints via Jira, collaborating with InfoSec, risk, and finance teams. UnitedHealth Group (UHG)

June 2019 – May 2021 Role: Data Engineer

• Built batch and real-time pipelines in Spark (Scala) on GCP Dataproc, processing claims and pharmacy data into a GCS data lake.

• Designed AVRO/JSON/CSV ingestion, exposed BigQuery data via Cloud Functions APIs, and integrated Kafka streaming for EDI feeds.

• Migrated ETL from SQL Server/Oracle to BigQuery, optimized pipelines, and developed Star Schema data marts to cut load times by 70%.

• Implemented data quality checks (Deequ), exception handling, and lineage capture for 100+ pipelines.

• Secured data with IAM roles, KMS encryption, masking, ensuring HIPAA/GDPR compliance.

• Deployed Dockerized Spark jobs via Cloud Composer (Airflow) and built ML feature stores in BigQuery.

• Published datasets to Looker/Data Studio, logged pipelines via Log4j/Stackdriver, and managed CI/CD with GitLab.

• Led Agile delivery via Jira, supporting the on-prem to GCP migration and collaboration with cross-functional teams.

EDUCATION

Old Dominion University

Master of Science in Computer Science

Contact this candidate