Pavan Goud
Data Engineer
*************@*****.*** +1-216-***-**** Remote
PROFESSIONAL SUMMARY
AWS Data Engineer with 6+ years of experience designing and optimizing cloud-native data pipelines, ETL workflows, and data lake/warehouse solutions. Hands-on expertise with AWS services (S3, Glue, Redshift, EMR, Kinesis, Lambda, Athena), with strong skills in SQL, Python, and Scala. Proven ability to deliver scalable, fault-tolerant, and high-performance data architectures supporting analytics, machine learning, and operations systems. Experienced in collaborating with data scientists, analysts, and business teams to ensure reliable, secure, and high-quality data delivery.
EDUCATION
Master’s in computer information systems – Rivier University
PROFESSIONAL EXPERIENCE
Capital One (Contract), 05/2023 – Present Remote
AWS Data Engineer - Cloud Data Pipelines
Designed and maintained ETL/ELT workflows using AWS Glue, Lambda, and Step Functions for financial• data pipelines.
Built data lake and warehouse solutions leveraging S3, Redshift, and Lake Formation.•
Implemented real-time streaming solutions with Kinesis and MSK to support fraud detection pipelines.•
Optimized SQL queries and Redshift schemas for faster query performance.•
Configured and managed EMR clusters for large-scale PySpark data transformations.•
Automated data quality checks and anomaly detection in Glue workflows.•
Collaborated with data scientists and analysts to prepare model-ready datasets.•
Documented AWS data pipelines and trained staff on pipeline maintenance and monitoring.•
Cigna Healthcare (Contract), 05/2022 – 04/2023 Remote
AWS Data Engineer – Healthcare Analytics
Built HIPAA-compliant data pipelines using AWS Glue, S3, and Redshift for claims analytics.•
Developed streaming Ingestion workflows with Kinesis + Lambda to process patient monitoring data in• near real-time.
Created Athena queries to provide ad-hoc analytics access on S3-based datasets.•
Deployed and tuned EMR-based Spark jobs for large-scale data processing.•
Applied data cleansing and standardization techniques to ensure quality and compliance.•
Partnered with healthcare analysts to integrate utility-like outage and claims risk data models.•
Optimized partitioning and indexing strategies in Redshift to cut query costs by 30%.•
Supported end-to-end AWS data pipeline documentation and governance.•
VIT Solutions, ETL Developer 05/2020 – 12/2021
Designed SQL Server and AWS Glue pipelines to integrate structured and semi-structured data.•
Migrated on-prem legacy data systems into AWS S3 and Redshift environments.•
Developed Python and Scala scripts to automate ETL transformations.•
Applied schema validation and DDL management for cloud data warehouses.•
Integrated pipelines with Power BI dashboards for business reporting.•
Supported data lineage and documentation for governance.•
Assisted in configuring Athena queries for quick reporting on S3 data.•
Monitored pipeline performance and optimized for scalability and fault tolerance.•
Savantis Solutions, Data Warehouse Developer 06/2019 – 04/2020 Hyderabad
Delivered data warehouse solutions using SQL Server and Informatics, later migrated to AWS.•
Built staging and transformation pipelines with S3 + Redshift integration.•
Supported data modeling (Star Schema, Snowflake Schema) for BI reporting.•
Assisted in configuring AWS Lake Formation for secure data governance.•
Created BI dashboards (Power BI, Tableau) with AWS-based datasets.•
Tuned queries and ETL workflows for optimized refresh cycles.•
Participated in Agile-Scrum sprints, supporting backlog refinement and testing.•
Contributed to cloud migration strategies for analytics workloads.•
CORE SKILLS
AWS Cloud Data Engineering — S3, Glue, Redshift, EMR, Athena, Kinesis, MSK, Lambda, Step Functions,
Lake Formation, Snowflake,Oracle.
Programming & Transformation — Python, SQL, Scala, PySpark.
Data Architectures — Data lakes, data warehouses, lake house design, schema optimization.
ETL/ELT — Batch & streaming pipelines, data cleansing, enrichment, standardization.
Optimization — Query tuning, partitioning, indexing, resource utilization.
Integration — APIs, IoT, third-party data ingestion, real-time streaming.
Tools — Git/GitHub, Airflow, DBT, Power BI, Tableau.
Domains — Finance, Healthcare.