Varshitha Puthalapattu
Email: ************@*****.***
Mobile: +1-216-***-****
Data Engineer
PROFESSIONAL SUMMARY:
5+ years of experience in analytical thinking and problem-solving, designing enterprise-grade data pipelines, demonstrating attention to detail and innovative thinking skills in scalable environments. Strong communication skills to present to technical and non-technical audiences.
Designed end-to-end ELT workflows using Python and strong PL/SQL skills, connecting dots across various applications to understand the E2E view and writing complex queries for data analysis. Agile team player.
Advanced hands-on skills using Databricks and Apache Spark for streaming and batch workloads, effectively communicating across the organization and guiding the team for success. Expertise with Microsoft Office suite.
Implemented optimized data models with historical tracking, demonstrating proficiency with query tools to aid data analysis and working well in a team environment with minimal supervision. Oracle Exadata experience.
Developed Azure Data Factory pipelines for ingesting data into curated layers, identifying priorities and managing multiple projects simultaneously, showcasing strong communication and presentation skills. Agile/scrum experience.
Built scalable ingestion and transformation pipelines using Apache Airflow, demonstrating the ability to effectively communicate across the organization and willingness to ask questions and reach out for assistance. Analytical mindset.
Created cost-efficient datasets in Snowflake, accelerating analytics workloads and ensuring data freshness with scheduled task orchestration, while demonstrating expertise in analytical thinking and attention to detail. Team player.
Leveraged Apache Iceberg and Delta Lake for schema evolution, ensuring consistent data pipelines across environments, and demonstrating proficiency with query tools to aid data analysis. Strong PL/SQL skills.
Used Kafka and EventHub for capturing streaming data, transforming payloads using Spark Streaming, and processing with minimal latency, while demonstrating innovative thinking skills and problem-solving abilities. Microsoft Office suite proficiency.
Collaborated with data scientists to build feature stores, optimizing joins and aggregations using scalable PySpark scripts, and demonstrating the ability to effectively communicate across the organization. Oracle Exadata experience.
Integrated Azure DevOps CI/CD pipelines with unit testing, demonstrating attention to detail and managing multiple projects simultaneously, while working well in a team environment with minimal supervision. Agile/scrum experience.
Engineered real-time alerting and audit trails by integrating event-driven systems, demonstrating strong communication and presentation skills to technical and non-technical audiences. Analytical thinking and problem-solving skills.
Developed reusable Python frameworks for ingestion, reducing code duplication, and increasing operational maintainability, while demonstrating proficiency with query tools to aid data analysis. Strong PL/SQL skills.
Delivered multi-cloud solutions using Snowflake and Databricks, focusing on vendor-neutral architectures, and demonstrating the ability to effectively communicate across the organization. Microsoft Office suite proficiency.
Implemented fine-grained access control across sensitive datasets, demonstrating attention to detail and innovative thinking skills, while working well in a team environment with minimal supervision. Agile/scrum experience.
Tuned complex SQL queries on Synapse, Snowflake, and Redshift, improving reporting performance and reducing compute cost, while demonstrating proficiency with query tools to aid data analysis. Oracle Exadata experience.
Managed DataOps lifecycle using GitOps methodologies, demonstrating the ability to effectively communicate across the organization and willingness to ask questions and reach out for assistance. Analytical mindset.
Collaborated with cross-functional teams to define SLAs and data contracts, demonstrating strong communication and presentation skills to technical and non-technical audiences. Team player and problem-solving skills.
Participated in agile sprints and backlog grooming, contributing to sprint planning and release management, while demonstrating attention to detail and innovative thinking skills. Microsoft Office suite proficiency.
Conducted in-depth data validation and pipeline QA using Pytest, delivering high-reliability datasets, and demonstrating proficiency with query tools to aid data analysis. Strong PL/SQL skills and analytical thinking.
TECHNICAL SKILLS:
Cloud Platforms - Google Cloud Platform (BigQuery, Dataflow, Pub/Sub, Cloud Composer), Azure (ADF, ADLS, Databricks, Synapse, SQL DB), AWS (Glue, S3, Redshift)
Languages - Python, SQL, Scala, Shell Scripting, PL/SQL
Big Data Frameworks - Apache Spark, PySpark, Apache Kafka, Apache Beam
Data Integration Tools - Azure Data Factory, GCP Cloud Composer, Informatica, Airflow
Databases - BigQuery, Azure SQL DB, Oracle, PostgreSQL, SQL Server, Oracle Exadata
Data Modeling - Star/Snowflake Schema, Fact-Dimension modeling, ER Diagrams
Version Control / CI/CD - Git, GitHub, Azure DevOps, Jenkins, YAML
Visualization - Power BI, Tableau (basic exposure)
Tools - JIRA, ServiceNow, Unix/Linux, Postman, Microsoft Office suite
Methodologies - Agile, Scrum
PROFESSIONAL EXPERIENCE:
CVS Health Dec 2024 – Present
Data Engineer
Responsibilities:
Applied analytical thinking to design and implement data ingestion pipelines in GCP using Dataflow and Pub/Sub for real-time retail analytics, processing high-volume transactional data streams. This required attention to detail in schema inference and validation.
Built clustered and partitioned BigQuery tables with optimized DDL scripts, achieving 75% faster query performance while managing multi-terabyte transactional and demographic datasets for reporting and dashboard consumption. This demonstrated problem-solving skills.
Developed automated DAGs in Cloud Composer to orchestrate complex workflows across BigQuery, GCS, and external APIs, ensuring SLA adherence and delivering critical metrics to business stakeholders, showcasing innovative thinking.
Implemented Cloud Functions to automate data ingestion based on GCS triggers, sending Slack and email notifications for event monitoring and end-to-end operational transparency in production environments, requiring strong communication skills.
Integrated Pub/Sub streaming messages with BigQuery sinks to enrich and store transformed retail data in real-time, reducing latency and supporting downstream Looker-based analytics dashboards, connecting dots across applications.
Collaborated with data analysts to create federated BigQuery views for standardized KPI calculations, ensuring accuracy and consistency across financial, operations, and customer behavior reporting domains, communicating effectively across the organization.
Configured IAM roles, service accounts, and resource-level permissions across GCP projects to enforce least-privilege access policies, achieving full compliance with audit and internal data governance standards, demonstrating attention to detail.
Optimized BigQuery SQL transformations using WITH clauses, staging tables, and caching to accelerate complex multi-table joins and aggregations used in finance and supply chain analysis, utilizing strong PL/SQL skills.
Used Terraform scripts to provision reusable cloud infrastructure including Cloud Storage buckets, BigQuery datasets, and Pub/Sub topics across dev, test, and production environments with consistent tagging, working well in a team environment.
Developed a metadata-driven logging framework in Python for ingestion pipelines, capturing file-level details, processing metrics, and row-level validation errors in centralized BigQuery audit tables, identifying priorities effectively.
Centene Corporation Aug 2022 – Nov 2024
Data Engineer
Responsibilities:
Created dynamic and reusable ADF pipelines using parameterized datasets to extract claims and eligibility data from Oracle and CSV files into Azure Data Lake Gen2 with robust transformation logic, demonstrating analytical thinking.
Developed PySpark notebooks in Azure Databricks to cleanse, join, and validate provider and patient records, achieving a 60% performance improvement in healthcare data aggregation workflows, showcasing problem-solving skills.
Built scalable analytical models in Synapse SQL Dedicated Pools to handle millions of records with nested joins and CTE logic, enabling advanced insights into claims and risk categories, requiring attention to detail.
Implemented SCD Type 2 tracking in PySpark to manage historical versions of demographic changes, enabling downstream systems to access accurate timelines of member and provider profiles, demonstrating innovative thinking.
Protected secrets and tokens using Azure Key Vault integration within ADF, Databricks, and Azure SQL pipelines, ensuring secure authentication to source systems and compliance with enterprise policies, connecting dots across applications.
Developed incremental dataflow logic using watermark columns and control tables in ADF, allowing efficient and fault-tolerant daily ingestion of newly arrived claims and authorization updates, communicating effectively across the organization.
Applied partitioning, indexing, and distribution strategies in Synapse to reduce query execution times by over 80%, especially on complex inner joins across multi-terabyte fact tables, utilizing strong PL/SQL skills.
Created pipeline alerting rules in Azure Monitor to detect failures and anomalies, triggering automatic retries, escalation alerts, and logging into Log Analytics workspaces for issue resolution, working well in a team environment.
Developed YAML-based CI/CD pipelines in Azure DevOps to deploy ADF components, SQL scripts, and notebook artifacts to dev/test/prod environments using ARM templates and environment variables, identifying priorities effectively.
Migrated 10+ TB of legacy SQL Server claims data to Azure Synapse, using validation queries and row-level comparison to ensure accuracy, consistency, and referential integrity across systems, with minimal supervision.
Franklin Templeton Investments March 2020 – Jul 2022
Data Analyst
Responsibilities:
Analyzed investment and portfolio data using Aladdin’s cloud-based OMS, applying validations and transformations to prepare accurate, regulatory-compliant datasets for downstream financial systems and executive dashboards, demonstrating analytical thinking.
Standardized Bloomberg, Reuters, and Morningstar datasets using complex SQL queries and Python logic, enabling daily ingestion of NAV, benchmark, and duration metrics into performance and risk dashboards, showcasing problem-solving skills.
Automated data pipelines to extract portfolio holdings, benchmarks, and attribution records, providing performance teams with real-time visibility into fund positions, daily returns, and sector-level contribution analytics, requiring attention to detail.
Developed reconciliation scripts to compare custodian and fund-level transactions, isolating discrepancies and ensuring reporting accuracy across fixed income, equity, and alternative investment classes, demonstrating innovative thinking.
Coordinated with risk teams to ingest daily market data for VaR calculations, stress tests, and sensitivity modeling across multi-asset portfolios including corporate bonds, sovereign debt, and derivatives, connecting dots across applications.
Built interactive Power BI dashboards to visualize portfolio duration, yield movements, credit ratings, and exposure breakdowns across fixed income strategies and managed investment mandates, communicating effectively across the organization.
Delivered accurate and auditable reports for internal compliance and regulatory requirements, supporting audits by producing historical extracts and flagging material discrepancies in investment recordkeeping systems, utilizing strong PL/SQL skills.
Authored technical documentation for ingestion workflows, data transformation logic, and lineage tracking used within Aladdin’s audit portal for traceability across front-office and reporting operations, working well in a team environment.
Participated in model validation sessions, measuring the impact of macroeconomic volatility on portfolio NAV, applying historical simulations and backtesting methods for validation against internal benchmarks, identifying priorities effectively.
Developed SQL Server stored procedures and SSIS-based jobs to refresh high-frequency datasets for use in intra-day dashboards and pre-market review by trading and strategy teams, with minimal supervision.
Certifications:
AWS Certified Solutions Architect - Professional
Educational Details:
Master of Science in Information Technology - Belhaven University
Bachelor of Technology in Computer Science - Saveetha University