Data Engineer - Python, Spark, Snowflake, AWS Expert

Location:

Maryland Heights, MO, 63043

Posted:

April 30, 2026

Contact this candidate

Resume:

Jayavardhan Besta Data Engineer

USA +1-314-***-**** *****************@*****.*** LinkedIn

Professional Summary

Data Engineer with 3.5 years of experience building and maintaining scalable data pipelines supporting analytics and operational reporting. Skilled in Python, SQL, PySpark, and Spark-based processing for large datasets across cloud data platforms. Experienced with ETL development, workflow orchestration, and data warehousing using Snowflake and AWS services. Proven ability to manage data ingestion, transformation, validation, and delivery pipelines handling multi-million record datasets while supporting analytics teams with reliable, structured data.

Technical Skills

• Programming & Data Processing: Python, SQL, PySpark, Bash/Shell Scripting, Advanced SQL Query Optimization, REST API Integration, JSON Data Processing

• Data Engineering & Pipeline Development: ETL / ELT Pipeline Development, Data Pipeline Architecture, Batch & Incremental Data Processing, Data Ingestion, Data Transformation, Data Integration, Workflow Automation, Pipeline Monitoring

• Big Data Processing: Apache Spark, Databricks, Spark SQL, Distributed Data Processing, Large-Scale Data Processing, Performance Optimization

• Streaming Data Processing: Apache Kafka, Spark Structured Streaming, Real-Time Data Pipelines, Event Streaming

• Cloud Data Engineering: Amazon Web Services (AWS), AWS S3, AWS Glue, AWS Lambda, AWS EMR, Amazon Redshift

• Data Warehousing & Modeling: Snowflake, Amazon Redshift, Data Warehouse Architecture, Dimensional Data Modeling, Star Schema, Data Mart Development

• Workflow Orchestration: Apache Airflow, Workflow Scheduling, Pipeline Dependency Management

• Data Quality & Reliability: Data Validation, Data Quality Monitoring, Data Lineage, Error Handling

• DevOps & Development Tools: Docker, Git, CI/CD Pipelines, Linux

• Analytics & BI Support: Tableau, Power BI, Analytical Data Modeling, Reporting Data Preparation Professional Experience

Data Engineer Salesforce Jul 2025 – Present USA

• Handle ingestion of product usage and customer activity data coming from internal APIs, application databases, and event logs, preparing roughly 6–7 million records each month for downstream analytics and operational reporting.

• Maintain PySpark and SQL processing scripts that convert raw platform activity data into structured datasets used by internal product, finance, and support teams reviewing subscription usage and customer engagement behavior.

• Adjusted partition logic and Spark job configuration for weekly datasets exceeding 2 TB, reducing average pipeline processing time by roughly 18% and helping reporting tables refresh earlier during morning reporting cycles.

• Introduced validation checks comparing upstream source counts with warehouse tables, which helped identify recurring mismatches affecting nearly 3% of records, improving dataset consistency across internal reporting dashboards.

• Coordinate with analytics engineers, product analysts, and infrastructure teams during sprint cycles, reviewing pipeline changes and assisting troubleshooting when upstream schema updates disrupt scheduled ingestion or transformation workflows.

• Built curated Snowflake tables supporting internal Tableau dashboards used by finance and product teams, helping improve visibility into subscription usage trends and service adoption across several product environments. Data Engineer Deloitte Jan 2021 – Dec 2023 India

• Managed ingestion and transformation pipelines consolidating operational and financial records from multiple enterprise systems, processing nearly 4–5 million rows monthly into centralized warehouses supporting analytics and internal reporting teams.

• Wrote SQL and Python scripts preparing standardized datasets used by analysts reviewing transaction activity, operational performance, and regulatory reporting across several client business environments.

• Restructured transformation queries and adjusted warehouse indexing strategies, helping reduce average reporting query runtime by around 15%, allowing analytics teams to generate reports more reliably during peak reporting periods.

• Added validation routines comparing source system exports with warehouse tables, which helped highlight data discrepancies affecting roughly 2–3% of incoming records and improved confidence in monthly reporting outputs.

• Worked closely with BI developers, analysts, and application teams to clarify dataset definitions and resolve upstream schema issues affecting scheduled pipelines and dashboard refresh timelines.

• Supported delivery of curated warehouse tables powering Power BI and Tableau dashboards used by business teams tracking operational performance, transaction volumes, and service utilization trends. Project

Customer Activity Data Pipeline & Analytics Platform

• Designed a cloud-based pipeline collecting customer activity and application event data from APIs and service logs, processing nearly 2 million records daily and storing structured datasets for analytics and operational reporting teams.

• Developed PySpark and SQL transformations converting raw JSON event data into curated analytical tables, improving dataset consistency and enabling analysts to review customer usage behavior, feature adoption patterns, and service performance trends.

• Implemented scheduled Airflow workflows handling ingestion, transformation, and warehouse loading processes, maintaining stable daily refresh cycles and reducing reporting delays affecting product and operations dashboards. Skills Used: Python, SQL, PySpark, Apache Kafka, Apache Airflow, AWS S3, Snowflake, Data Pipeline Development, Data Modeling, ETL Processing Education

Southeast Missouri State University (SEMO) Jan 2024– Dec 2025 Master’s in Computer Information Systems (CIS)

Contact this candidate