Data Engineer

Location:

Edwardsville, IL, 62025

Salary:

80000

Posted:

October 15, 2025

Contact this candidate

Resume:

Vyshnavi Gunda

Data Engineer

+1-618-***-**** ********.*@*********.*** LinkedIn Open to Relocation SUMMARY

Data Engineer with 4+ years of hands-on experience with Spark, Airflow, Python, SQL, Snowflake, AWS, Azure, Power BI, Data build tool (DBT), PostgreSQL to architect and implement high-performance data pipelines.

Expert in Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for transforming the data to uncover insights into customer usage patterns.

Experience in using Apache Airflow to orchestrate the data pipeline, including data extraction, transformation, and loading (ETL) processes.

Proficient in Data Modeling, Data Migration, Data Cleansing and ELT Processes, with an understanding of RDBMS (SQL Server, MySQL) and NoSQL technologies, to design and implement solutions for diverse data needs and skilled in modern Lakehouse technologies such as Delta Lake, Apache Iceberg, and Hudi EDUCATION

Master's in Computer Science

Southern Illinois University Edwardsville

TECHNICAL SKILLS

Programming Language & IDE’s: Python, R, SQL, PyCharm, Jupyter Notebook, VScode

Big Data Ecosystem: Hadoop, Hive, Apache Kafka, Apache Spark, Apache Flink, DataBricks, Apache Nessie

Cloud & Container Technologies: AWS (EC2, S3, QuickSight, Glue, Athena, AWS Pipeline, Redshift, DynamoDB), Azure (Azure Data Factory, Databricks, Azure Synapse, Azure Data Lake Storage Gen2),Docker

Visualizations: Tableau, Power BI, Excel

Packages & Data Processing: NumPy, Pandas, Matplotlib, Seaborn, PySpark, Data build tool (DBT), Apache Airflow

Version Control & Database: GitHub, Git, MySQL, PostgreSQL, MongoDB, Azure DevOPs, Snowflake, CI/CD WORK EXPERIENCE

Morgan Stanley, IL Jan 2024 – Present

Data Engineer

[

Developed and maintained various DBT models and macros, focusing on incremental model design and modularization, resulting in a 40% increase in data pipeline efficiency and a 25% reduction in pipeline execution time.

Established and deployed a high-performing ETL pipeline in Databricks to process terabytes of data daily, enabling efficient data integration and transformation for the data warehouse.

Implemented serverless data processing with AWS Lambda and AWS Kinesis, reducing operational overhead and improving cost efficiency by 50%.

Optimized Spark job execution time and processing efficiency by 20% through the effective configuration of Spark executor memory and tuning with PySpark.

Designed and implemented serverless data pipelines using AWS Lambda to process real-time data streams, achieving a 10% reduction in processing latency compared to traditional batch processing methods. Tata Consultancy Services, India Oct 2021 – Dec 2022 Data Engineer

Built data integration pipelines using Informatica PowerCenter, automating data movement and reducing manual effort.

Implemented checkpointing and state management strategies in Flink to ensure data integrity and recovery from system failures during real-time processing.

Developed efficient ETL/ELT processes using Azure Data Factory, optimizing data flow and performance to achieve faster data processing, and improved data quality by 30%.

Created Airflow DAGs (Directed Acyclic Graphs) to schedule and monitor data processing tasks for various applications, providing a centralized view for data pipeline orchestration.

Configured Kafka topics and partitions, achieving a 20% increase in message throughput to meet the data ingestion needs of downstream applications.

Accenture, India May 2020 – Sep 2021

Data Engineer

Generated and Visualized 10+ PowerBI Dashboards on top of Synapse Tables for Business Reporting and Business Analysis.

Leveraged PySpark for large-scale data processing and analytics tasks within the Apache Spark ecosystem and utilized DataFrames and SQL functionalities for efficient data manipulation.

Optimized Snowflake schema design, achieving a 15% reduction in storage costs while maintaining efficient data retrieval for complex queries.

Improved ETL workflows through rigorous testing and debugging of SQL scripts and Python code resulting in a 50% increase in data processing efficiency and ensuring seamless data integration with downstream systems.

Developed and optimized Apache Spark jobs for large-scale data processing tasks, including filtering, aggregation, and transformations using Spark SQL, DataFrames, and RDDs. Jan 2023 – Dec 2024

Contact this candidate