Pranitha Kamishetty
Nashua, NH 682-***-**** *******************@*****.*** linkedin.com/in/pneethu Experience
Senior Data Engineer, Walmart Jun 2023 – Present
– Designed and built robust ETL pipelines using Scala and Apache Spark on Google Cloud Platform for the Know Your Store Insights program, shrinking store-level reporting latency from hours to minutes.
– Orchestrated Apache Airflow DAGs via Cloud Composer to automate Spark workflows on Dataproc, driving 99.9% SLA compliance across daily and weekly loads.
– Leveraged Cloud Storage, BigQuery, and IAM policies to secure distributed processing and cut storage costs 25% through tiered lifecycle rules.
– Built Spark applications to transform raw GCP tables and write curated outputs into Hive (Parquet), boosting query performance 3 for downstream analytics.
– Architected date- and week-partitioned Hive tables plus Spark SQL aggregations, trimming load times for non-partitioned fact tables by 40%.
– Automated on-demand Dataproc cluster provisioning with Automic, saving $120 k/yr by removing idle nodes.
– Tuned Spark jobs (executors, memory, partition counts) via YARN insights to cut runtimes up to 40%.
– Stress-tested UI traffic with Automaton, ensuring sub-second supplier-facing response times under peak load.
– Designed Druid ingestion specs loading from Hive, delivering <200 ms aggregation latency for dashboards.
– Optimized Druid clusters (task slots, heap, segment tuning) to serve billions of records at multiple granularities without downtime.
– Acted as engineering–API liaison, providing high-speed metrics that lifted supplier engagement 18%.
– Implemented data-quality gates using DPAT and GDP Portal rules, achieving 99.5% schema conformance and automated prod-to-dev sync.
– Maintained CI/CD via Automic, Concord, GitHub; stewarded reference data in MongoDB/Cosmos DB; documented in Confluence; drove Agile delivery through Jira, code reviews, and KT sessions. Data Engineer, Accenture Jun 2018 – Dec 2021
– Designed and maintained cloud-ready ETL pipelines with Informatica PowerCenter & Informatica Cloud, ingesting 40+ sources (SQL, Salesforce, SAP, AWS S3, Azure Blob) into an enterprise EDW on Snowflake, enabling global reporting in minutes.
– Developed serverless data-ingestion frameworks using AWS Glue, AWS Lambda, and Step Functions, cutting time-to-insight from 24 h to 2 h.
– Orchestrated daily workloads with AWS CloudWatch Events and Apache Airflow on Amazon MWAA, achieving 99.9% on-time SLA across 120+ pipelines.
– Containerized legacy PowerCenter sessions with Docker and deployed on Amazon EKS, reducing provisioning lead-time from days to hours.
– Migrated 15 TB of historical data from on-prem SQL Server to Amazon Redshift via S3 and Glue, saving $60 k in licensing annually.
– Implemented Snowpipe auto-ingest via S3 event notifications to load streaming IoT data in real-time (<5 min lag)
– Introduced Terraform and GitLab CI/CD for IaC and pipeline promotion, ensuring 100% environment reproducibility and eliminating manual errors.
Education
University of Texas at Arlington, Arlington, TX – Master of Science in Data Science Sreenidhi Institute of Science and Technology, India – B.Tech. in Computer Science Technical Skills
Programming Languages: Python, Scala, Java, SQL, Bash, JavaScript, TypeScript, Go, R Frameworks/Libraries: Apache Spark, Apache Airflow, Hadoop, Pandas, NumPy, Scikit-learn, TensorFlow, Dask, Kafka, Flink
Technologies: Git, Docker, Kubernetes, Jenkins, Terraform, Automic, Tableau, Gradle, REST/GraphQL APIs Databases/Cloud: BigQuery, Hive, MongoDB, CosmosDB, MySQL, PostgreSQL, Snowflake, GCP, AWS Certification
Microsoft Certified Fabric Data Engineer Associate Jul 2025