Data Engineer Senior

Location:

Tampa, FL

Posted:

October 15, 2025

Contact this candidate

Resume:

Venkata Gupta Penugonda

Data Engineer

206-***-**** # *********************@*****.*** ï linkedin portfolio Professional Summary

Senior Data Engineer with 5+ years of experience designing and modernizing enterprise data platforms across Azure, AWS, and GCP. Proven success leading large-scale cloud migrations, developing real-time data pipelines, and building secure, high-performance data lakehouses supporting finance, insurance, and healthcare use cases. Skilled in Spark, Databricks, and SQL, with strong expertise in CI/CD automation, data governance, and regulatory compliance (HIPAA, SOX). Recognized for reducing fraud detection latency by 90%+ and accelerating ETL performance by 45%, enabling data-driven decision-making for Fortune 500 organizations. Education

University of South Florida Florida, USA

Masters in Computer Science

Experience

Bank of America Oct 2023 - Present

Senior Data Engineer Cleveland, Ohio

• Enterprise Cloud Modernization for Banking Data Lake – Led cloud modernization by migrating 50+ legacy batch workflows into a unified Azure data lakehouse, centralizing 20TB+ of financial/transactional data and improving accessibility for 500+ risk and compliance users.

• Engineered 30+ Azure Data Factory (ADF) pipelines to ingest data from 15 core banking systems into ADLS and Azure SQL, cutting ingestion time by 40% and ensuring reliable daily reporting.

• Built modular Delta Lake architecture in Databricks supporting ACID transactions and schema enforcement, enabling auditors and analysts to run historical financial queries 2x faster.

• Boosted ETL performance by 45% by refactoring SSIS workflows into parameterized ADF pipelines integrated with Key Vault, reducing overnight batch windows and improving SLA compliance.

• Developed PySpark jobs in Databricks to cleanse and aggregate 500M+ monthly transactions, enabling fraud analytics teams to reduce false positives by 20%.

• Enabled near real-time processing of streaming data using Azure Event Hubs, Stream Analytics, and Synapse Analytics, reducing fraud detection latency from hours to under 5 minutes.

• Automated CI/CD workflows in Azure DevOps with ARM templates, cutting deployment time from 2 days to under 4 hours across dev, UAT, and prod environments.

• Implemented data lineage and classification in Microsoft Purview, achieving 100% SOX compliance audits and reducing manual reporting by 30%.

• Optimized T-SQL and Spark SQL queries for regulatory reporting, reducing execution time by 30% and enabling timely submission of compliance reports.

• Partnered with compliance, cybersecurity, and data science teams in Agile sprints to deliver 10+ scalable data products supporting Bank’s Key cloud modernization initiative. Root Insurance Nov 2022 - Sep 2023

AWS Data Engineer Tampa, Florida

• Real-Time Telematics and Claims Data Platform – Engineered an AWS-based solution to process 2M+ vehicle telemetry and claims records daily, enabling fraud detection, underwriting, and personalized pricing for insurance policies.

• Designed and implemented scalable ETL pipelines using AWS Glue, ingesting and transforming JSON/CSV/Parquet data from S3, API Gateway, and third-party telematics sources, improving ingestion efficiency by 35%.

• Built near real-time ingestion and processing flows with Kinesis Data Streams, Lambda, and Glue Streaming, reducing claim fraud detection latency from hours to under 10 minutes.

• Optimized Athena and Redshift Spectrum queries for analytical workloads, improving performance by 35% and accelerating access to behavioral risk models for data science teams.

• Implemented robust orchestration with Apache Airflow (MWAA), managing 80+ DAGs across dev, QA, and prod for data quality, batch updates, and ML feature generation, reducing operational failures by 20%.

• Developed infrastructure-as-code using Terraform and CloudFormation, standardizing provisioning of IAM roles, S3 buckets, and KMS-encrypted resources across environments.

• Established governance and access control with Lake Formation and Glue Catalog, ensuring HIPAA and PCI compliance and reducing access violations by 25%.

• Monitored and debugged pipelines with CloudWatch, X-Ray, and CloudTrail, cutting SLA breaches by 25% and improving platform observability.

• Partnered with actuarial and data science teams to design high-availability architecture supporting ML-based risk scoring models for 2M+ customers.

• Contributed to Agile sprints by leading story refinement and sprint retrospectives, ensuring incremental delivery of critical data platform features.

Novo Nordisk Jun 2021 - Jul 2022

Data Engineer Hyderabad, India

• Clinical Trial Analytics and Pharma Data Integration – Developed a hybrid data engineering solution supporting global trial data processing, patient adherence tracking, and regulatory submissions.

• Built automated batch pipelines using Apache Airflow and Python, integrating EHR and clinical trial data from flat files, APIs, and relational databases into a centralized data warehouse.

• Leveraged Google Cloud Storage (GCS), Cloud Composer, and BigQuery for secure storage and querying of anonymized patient datasets to support pharmacovigilance analytics.

• Developed ETL jobs using Informatica and custom PySpark scripts for transforming trial protocol, adverse event, and medication data into standardized formats (CDISC, SDTM).

• Enabled self-service reporting and dashboarding for clinical stakeholders by integrating curated datasets into Looker and Power BI, improving data accessibility and decision-making.

• Worked closely with GCP security teams to enforce access policies using IAM roles, service accounts, and VPC-SC configurations for compliant healthcare data handling.

• Deployed ML-ready datasets to BigQuery ML for running patient dropout prediction models in collaboration with the data science team.

• Implemented Data Quality Frameworks for consistency checks across trial phases using dynamic rule engines and validation rules stored in metadata tables.

• Conducted daily Agile ceremonies and worked alongside clinical data managers, statisticians, and GCP architects to ensure timely delivery of analytics-ready datasets. Bayer Mar 2020 - May 2021

Junior Data Engineer Hyderabad, India

• Crop Analytics & Pharmaceutical Supply Chain Data Platform – Supported the development and maintenance of batch ETL processes to unify data from research labs, field trials, and production systems.

• Assisted in building ETL pipelines using Talend and Python, enabling the extraction and integration of crop genetics, chemical trial data, and logistics datasets from multiple silos into enterprise warehouses.

• Supported ingestion of clinical product data from SAP and LIMS systems into SQL Server and PostgreSQL databases using data quality and mapping rules for consistency.

• Developed and maintained scheduled jobs via Apache NiFi and Talend JobServer to automate file transfers and validation processes across research locations.

• Created basic data validation scripts in Python to perform QA checks on raw datasets for trial duration, batch formulation, and shipment tracking data.

• Worked closely with domain SMEs and senior engineers to support the migration of batch workflows into a centralized AWS S3-based archive for historical analysis.

• Built internal dashboards using Power BI to provide visibility into trial completion timelines, active SKUs, and regional shipment volumes.

• Documented pipeline logic, metadata flows, and SOPs in Confluence to support reproducibility and audit readiness for compliance purposes.

• Participated in daily standups and weekly sprint reviews under the guidance of tech leads and product owners to deliver backlog items and enhancements.

Technical Skills

Languages: Python, SQL, Java, Shell Scripting, HTML/CSS, JavaScript Cloud Platforms: Microsoft Azure (ADF, ADLS, Synapse, Databricks, Event Hubs, Purview), AWS (Glue, Redshift, Lambda, Kinesis, S3, CloudWatch, IAM), GCP (BigQuery, GCS, Cloud Composer, BigQuery ML) ETL/Orchestration Tools: Azure Data Factory, AWS Glue, Apache Airflow, Talend, Informatica, Apache NiFi Big Data & Processing: Apache Spark, PySpark, Databricks, Delta Lake, Hadoop DevOps & CI/CD: Azure DevOps, GitHub, Jenkins, Terraform, CloudFormation, Docker Data Warehousing: Azure Synapse Analytics, Amazon Redshift, Google BigQuery, SQL Server, PostgreSQL Data Visualization: Power BI, Looker, Tableau

Data Governance: Microsoft Purview, AWS Lake Formation, Glue Catalog, IAM, HIPAA, SOX Developer Tools: Visual Studio Code, Eclipse, Jupyter Notebook, Postman, Confluence Operating Systems: Linux, Windows

Contact this candidate