Data Engineer Quality

Location:

San Francisco, CA

Posted:

September 10, 2025

Contact this candidate

Resume:

Nomula Kruthagnareddy

Data Engineer

San Francisco, CA Open for relocation to Seattle, WA New York, NY Chicago, IL Dallas, TX ***************@*****.*** 763-***-**** https://www.linkedin.com/in/kruthagna-reddy-nomula-123a5b2a1/ SUMMARY

Data Engineer with 5+ years of experience designing production-scale ETL/ELT pipelines (batch + streaming) and building cloud- native data platforms on AWS, Azure, and GCP. Skilled in SQL performance tuning, schema design, and dimensional modeling

(star/snowflake) to support BI and analytics. Proficient in Apache Spark (PySpark/Scala), Databricks, Airflow, and DBT, with expertise in Snowflake, Redshift, and BigQuery. Strong DevOps background in Docker, Kubernetes, Terraform, and CI/CD automation, with proven success in data quality (Great Expectations, Deequ), governance (Atlas, Glue Catalog), compliance

(HIPAA, SOX, GDPR, FedRAMP), and supporting AI/ML pipelines (PyTorch, TensorFlow). PROFESSIONAL EXPERIENCE

Splunk (Data Management Pipeline Builders (Edge & Ingest) Nov 2024 – Present Data Engineer San Francisco, CA

• Built Databricks Delta Lake pipelines integrating Kafka and Spark for unified batch and streaming analytics, enabling real- time plus historical processing on AWS.

• Designed data quality validation frameworks with Great Expectations inside Airflow DAGs, reducing downstream data errors by 35% and improving data reliability.

• Migrated warehousing workloads to Snowflake and Amazon Redshift, applying schema tuning and partitioning strategies that reduced BI query latency by 40%.

• Engineered secure, scalable AWS S3-based data architectures, ensuring compliance with GDPR, HIPAA, and FedRAMP High while optimizing tiered storage costs.

• Developed high-throughput streaming pipelines using Kafka, Amazon Kinesis, and AWS-native services, automating workflows with Python, SQL, and Terraform for scale.

• Enhanced observability by integrating Splunk, Grafana, and Prometheus, reducing Mean Time to Resolution (MTTR) by 30% and improving SLA-driven monitoring.

• Deployed containerized ingestion workflows with Docker and Kubernetes on AWS EKS, ensuring fault-tolerant distributed pipelines across cloud and hybrid environments.

• Automated ETL/ELT pipelines using Python (Pandas, PySpark), SQL, and Terraform, accelerating transformations, reducing operational overhead, and improving scalability.

• Enabled federated queries combining Splunk indexes with AWS Security Lake via OpenTelemetry, providing unified observability across distributed AWS microservices.

• Partnered with security and compliance teams to integrate AWS pipelines with SIEM platforms and anomaly monitoring, improving proactive threat detection and audit readiness. IBM June 2019 - April 2023

Data Engineer India

• Spearheaded a 10+ PB Hadoop-based big data platform for Vodafone Idea, processing 15B+ records daily and enabling real- time customer analytics.

• Delivered enterprise data models (star/snowflake schemas) powering Power BI and Tableau dashboards for 50M+ telecom subscribers with faster and more reliable KPI reporting.

• Built real-time streaming pipelines on Airtel’s Edge Cloud using Kafka and Spark Streaming, supporting AI-driven inspections at Maruti Suzuki with <1s latency.

• Automated ETL/ELT orchestration using Airflow and DBT, reducing manual maintenance by 60% and improving deployment reliability.

• Deployed multi-cloud data services using AWS Glue/EMR, Azure Synapse, and GCP BigQuery, ensuring scalable, compliant enterprise processing at petabyte scale.

• Engineered large-scale ETL workflows with Hadoop, Hive, Spark, and Kafka, enabling ingestion, cleansing, and transformation of petabyte-scale telecom datasets.

• Partnered with ML engineers to prepare datasets for PyTorch and TensorFlow pipelines, enabling predictive analytics and real-time computer vision applications.

• Implemented DataOps automation with Ansible, Airflow, and CI/CD (Git, Jenkins) to streamline deployments and monitoring of distributed data workflows.

• Ensured data governance, lineage, and security using Apache Atlas, RBAC, and encryption standards, aligning with strict telecom regulatory requirements.

• Optimized big data storage formats (Parquet, ORC, Avro), reducing query costs and improving analytics performance for large-scale customer datasets.

PROFESSIONAL EXPERIENCE

Programming & Scripting: Python (Pandas, PySpark, SQLAlchemy), SQL (T-SQL, PL/pgSQL, optimization), Java, Scala, Bash, Unix/Linux

Data Engineering & Modeling: Apache Spark (Batch & Streaming), Databricks (Delta Lake), DBT, Kafka, Flink, Hive, Hadoop, Airflow, ETL/ELT Pipelines, DataOps, Star/Snowflake Schemas Cloud & Warehousing: AWS (S3, Glue, EMR, Redshift, Athena, Lambda, Kinesis), Azure (Data Factory, Synapse, Databricks, Event Hubs), GCP (BigQuery, Dataflow, Pub/Sub)

DevOps & Governance: Docker, Kubernetes (EKS, AKS, GKE), Terraform, Jenkins, GitHub Actions, Azure DevOps, Ansible, Apache Atlas, AWS Glue Catalog, GDPR, HIPAA, SOX, FedRAMP Analytics, Observability & ML: Tableau, Power BI, Looker, Splunk, Grafana, Prometheus, OpenTelemetry, Great Expectations, Deequ, PyTorch, TensorFlow

EDUCATION

Masters in computer/information technology

Concordia University St Paul, MN

Bachelor of Science

Jawaharlal Nehru Institute of Technology and Management, Hyderabad

Contact this candidate