Sarvani Panguluri
913-***-**** ******************@*****.*** LinkedIn
Professional Summary
Cloud Data Engineer with extensive experience designing and optimizing scalable data pipelines on AWS and GCP. Proven expertise in Apache Spark for both batch and streaming ETL transformations and in integrating change data capture for robust data lake ingestion. Successfully improved performance and reduced query latencies through strategic optimizations and automation. Focused on leveraging CDC, ETL, and big data concepts to drive impactful, data-driven solutions. Technical Skills
• Cloud Platforms: GCP (BigQuery, Dataflow, Cloud Storage, Pub/Sub), AWS, Azure, AWS Skillset
• ETL & Data Modeling: Data Warehousing, Data Pipelines, Workflow Automation, Change Data Capture
• Languages: Python, SQL, Scala, Java
• Frameworks/Tools: PySpark, Apache Beam, Flask, Cloud Composer, Apache Airflow, Apache Spark, Apache Griffin, AWS Deequ, Apache Hudi
• Monitoring: GCP Monitoring, AWS CloudWatch
• BI & Reporting: Tableau, Power BI
• Logging & Observability: Splunk, Kibana, Grafana
• Version Control: Git, GitLab
• Operating Systems: Linux (Ubuntu, RedHat, CentOS), Windows Work Experience
Verizon Mar 2023 - Present
Cloud Data Engineer Irving, TX
• Migrated and integrated data systems from AWS to GCP, enhancing performance, scalability, and cost-efficiency while leveraging cloud-based ETL transformations.
• Redesigned data pipelines using Apache Beam on Google Dataflow to replace AWS Glue, improving flexibility and resilience in processing both batch and streaming data.
• Transferred 100+ TB of data from S3 to GCS with minimal downtime and zero data loss, ensuring compliance with data integrity standards.
• Optimized Spark and Hadoop workloads on GCP Dataproc to achieve a 40% improvement in resource utilization, aligning with Big Data performance tuning best practices.
• Implemented robust security measures including IAM-based policies and encryption protocols for both data at rest and in-transit.
• Tuned SQL queries and optimized GCP resource usage to reduce operational costs while maintaining high-level data accessibility.
• Orchestrated daily workflows using Python, SQL, gsutil, and Cloud Composer, supporting efficient data operations and maintenance.
• Documented technical processes and trained team members on the effective use of GCP infrastructure and tools, facilitating knowledge transfer.
Caliber Mar 2021 - Feb 2023
AWS Data Engineer Hyderabad, India
• Developed a cloud-based data warehouse in Amazon Redshift optimized for high-volume analytics, incorporating ETL transformations and query optimizations.
• Constructed big data processing pipelines on Amazon EMR using Apache Spark and Hadoop, demonstrating expertise in batch and streaming data pipelines.
• Engineered real-time ingestion using Amazon Kinesis for streaming analytics, ensuring timely data processing and insight generation.
• Orchestrated ETL workflows using PySpark to transform raw data into optimized formats for Redshift and Snowflake, aligning with industry best practices for CDC-inspired data hydration.
• Migrated datasets from S3 to Redshift, and later to Snowflake, achieving a 40% reduction in query time while ensuring data consistency.
• Optimized EMR configurations and tuned Redshift performance to reduce compute costs and enhance overall system efficiency.
• Collaborated with data analysts and business teams to deliver actionable data solutions, emphasizing scalable and high-performance data engineering approaches.
Education
University of the Cumberlands – Williamsburg, KY
M.S., Information Technology
JNTU Kakinada – India
B.Tech, Electrical and Electronics Engineering