Data Engineer Senior

Location:

United States

Posted:

September 10, 2025

Contact this candidate

Resume:

Pranitha Kamishetty

Nashua, NH 682-***-**** *******************@*****.*** linkedin.com/in/pneethu Experience

Senior Data Engineer, Walmart Jun 2023 – Present

– Designed and built robust ETL pipelines using Scala and Apache Spark on Google Cloud Platform for the Know Your Store Insights program, shrinking store-level reporting latency from hours to minutes.

– Orchestrated Apache Airflow DAGs via Cloud Composer to automate Spark workflows on Dataproc, driving 99.9% SLA compliance across daily and weekly loads.

– Leveraged Cloud Storage, BigQuery, and IAM policies to secure distributed processing and cut storage costs 25% through tiered lifecycle rules.

– Built Spark applications to transform raw GCP tables and write curated outputs into Hive (Parquet), boosting query performance 3 for downstream analytics.

– Architected date- and week-partitioned Hive tables plus Spark SQL aggregations, trimming load times for non-partitioned fact tables by 40%.

– Automated on-demand Dataproc cluster provisioning with Automic, saving $120 k/yr by removing idle nodes.

– Tuned Spark jobs (executors, memory, partition counts) via YARN insights to cut runtimes up to 40%.

– Stress-tested UI traffic with Automaton, ensuring sub-second supplier-facing response times under peak load.

– Designed Druid ingestion specs loading from Hive, delivering <200 ms aggregation latency for dashboards.

– Optimized Druid clusters (task slots, heap, segment tuning) to serve billions of records at multiple granularities without downtime.

– Acted as engineering–API liaison, providing high-speed metrics that lifted supplier engagement 18%.

– Implemented data-quality gates using DPAT and GDP Portal rules, achieving 99.5% schema conformance and automated prod-to-dev sync.

– Maintained CI/CD via Automic, Concord, GitHub; stewarded reference data in MongoDB/Cosmos DB; documented in Confluence; drove Agile delivery through Jira, code reviews, and KT sessions. Data Engineer, Accenture Jun 2018 – Dec 2021

– Designed and maintained cloud-ready ETL pipelines with Informatica PowerCenter & Informatica Cloud, ingesting 40+ sources (SQL, Salesforce, SAP, AWS S3, Azure Blob) into an enterprise EDW on Snowflake, enabling global reporting in minutes.

– Developed serverless data-ingestion frameworks using AWS Glue, AWS Lambda, and Step Functions, cutting time-to-insight from 24 h to 2 h.

– Orchestrated daily workloads with AWS CloudWatch Events and Apache Airflow on Amazon MWAA, achieving 99.9% on-time SLA across 120+ pipelines.

– Containerized legacy PowerCenter sessions with Docker and deployed on Amazon EKS, reducing provisioning lead-time from days to hours.

– Migrated 15 TB of historical data from on-prem SQL Server to Amazon Redshift via S3 and Glue, saving $60 k in licensing annually.

– Implemented Snowpipe auto-ingest via S3 event notifications to load streaming IoT data in real-time (<5 min lag)

– Introduced Terraform and GitLab CI/CD for IaC and pipeline promotion, ensuring 100% environment reproducibility and eliminating manual errors.

Education

University of Texas at Arlington, Arlington, TX – Master of Science in Data Science Sreenidhi Institute of Science and Technology, India – B.Tech. in Computer Science Technical Skills

Programming Languages: Python, Scala, Java, SQL, Bash, JavaScript, TypeScript, Go, R Frameworks/Libraries: Apache Spark, Apache Airflow, Hadoop, Pandas, NumPy, Scikit-learn, TensorFlow, Dask, Kafka, Flink

Technologies: Git, Docker, Kubernetes, Jenkins, Terraform, Automic, Tableau, Gradle, REST/GraphQL APIs Databases/Cloud: BigQuery, Hive, MongoDB, CosmosDB, MySQL, PostgreSQL, Snowflake, GCP, AWS Certification

Microsoft Certified Fabric Data Engineer Associate Jul 2025

Contact this candidate