Vinay Sai Reddy Mandala
********************@*****.*** 945-***-**** https://www.linkedin.com/in/vinaymandala99/ Professional Summary
Data Engineer with over 5 years of experience in designing and maintaining large-scale ETL pipelines and cloud data architectures across financial, pharmaceutical and retail industries. Expert in AWS, Azure, and Big Data technologies (Spark, Kafka, Flink) with experience in improving processing speeds and reducing infrastructure costs. Skilled at bridging the gap between data and business needs by delivering insightful dashboards and optimized data solutions that support strategic decision-making. Technical Skills
• Programming Languages: Python, Advanced SQL, Pyspark, Shell Scripting
• Cloud Platforms & Tools: AWS (Glue, S3, Firehose, Lambda, Redshift, Athena, Step Functions, EventBridge,DataFactory), Databricks, Microsoft Azure (Data Lake, Data Factory, ADLS), SSIS, Alteryx, Apache Airflow, Snowflake
• Big Data & Streaming: Apache Spark, Hadoop, Kafka, Flink, Hive, Databricks
• ETL & Orchestration: Apache Airflow, NiFi, SSIS, Informatica, Docker, Kubernetes
• Visualization: Tableau, Power BI, Excel, Amazon QuickSight
• Databases: MySQL, PostgreSQL, MongoDB, Cassandra, NoSQL, Oracle, SQL Server, Amazon Redshift
• Strengths: Data Modeling (Star/Snowflake schema), ETL Development, Data Warehousing, CI/CD Pipelines, Statistical Analysis, Data Visualization, SQL Query Tuning,
• Soft Skills: Leadership, Stakeholder Communication, Technical Design Documentation Professional Experience
Ameris Bank, Senior Data Engineer AUG 2023 - Present
• Transitioned legacy Redshift and on-prem SQL workloads to a modern Lakehouse architecture using Databricks, AWS Glue, S3, reducing infrastructure costs.
• Designed and deployed Delta Lake architecture (Bronze, Silver, Gold layers) on AWS S3 using Databricks, improving data quality and reducing query latency by 42%.
• Developed the Silver Layer validation suite using Pyspark and Databricks, which acted as a critical gatekeeper for data quality and ensured all incoming Personal Banking records met strict GDPR and PII compliance standards.
• Executed end-to-end parity audits to prove that the Gold-layer outputs in the Lakehouse perfectly matched the legacy Redshift and on-prem SQL results.
• Implemented AWS-based data pipeline for real-time transaction processing, integrating AWS Glue, Redshift, and S3, with 40% improvement in data processing speed.
• Orchestrated Apache Airflow ETL workflows with automated validation, reducing manual intervention by 65% and improving operational efficiency.
• Optimized Commercial Banking warehouse in Redshift and implemented partitioning strategies, reducing query execution time by 38%.
• Wrote advanced SQL queries using CTEs, window functions, and subqueries to support complex metric calculations across Redshift and Snowflake.
• Integrated AWS Glue Data Catalog with Databricks for unified data discovery, schema management, and cross- platform access governance.
• Architected and implemented ETL pipelines in Databricks using Pyspark, processing over 1 TB of daily transactional data from diverse sources (JSON, Parquet, CSV, API), with advanced partitioning, caching, and adaptive query execution reducing job runtime by 40%.
• Refactored Manufacturing sector schemas by implementing Compound Sort Keys and Distribution Keys, enabling Zone Map Pruning that cut query execution time by 50% for billion-row IoT and asset-lease tables
• Reduced infrastructure costs by 23% through strategic implementation of serverless AWS technologies while maintaining system performance and reliability.
Environment: AWS (Glue, S3, Redshift, Lambda, Step Functions, EventBridge), Databricks, Apache Airflow, PySpark, Python, Advanced SQL, Delta Lake, Snowflake, AWS Glue Data Catalog. Novartis, Data Engineer Dec 2020 - AUG 2022
• Unified various clinical research data by developing end-to-end ETL pipelines in Apache Airflow, pulling from global trial sites into an Azure environment.
• Acted as a technical bridge for Data Scientists, ensuring that feature engineering for machine learning models was backed by validated data streams.
• Optimized complex SQL queries and stored procedures in PostgreSQL, improving query performance for clinical research databases.
• Led the migration of on-premises clinical databases to Azure Cloud infrastructure, with 99.9% uptime and improved data availability.
• Implemented automated data quality frameworks with robust validation and cleansing processes, reducing data inconsistencies by 75% .
• Designed interactive Power BI dashboards that translated complex trial metrics into real-time insights for senior management and research leads.
• Designed Snowflake / Azure Synapse data warehouse solutions using dimensional modeling techniques including star schema and SCD Type 2 for analytics and reporting use cases.
• Designed and deployed scalable Azure cloud-based data lake architectures, accommodating 5TB+ of structured and unstructured retail data while ensuring cost-effectiveness.
• Designed and implemented real-time monitoring systems for pipeline health and ETL job tracking, reducing mean time to resolution for data pipeline issues by 65%. Environment: Azure Data Factory, Azure Data Lake storage(ADLS), Azure Synapse, PostgreSQL, Databricks, Apache Airflow, Power BI, Python, SQL, ETL, Data Quality Frameworks. IKEA, Data Engineer Jan 2019 - Dec 2020
• Supported the high-volume ingestion of global supply chain data by developing SQL-based pipelines that processed over 2 million daily transactions from retail outlets into .
• Optimized SQL databases for high-volume analytics, supporting peak traffic of 2M+ daily transactions while maintaining query performance.
• Implemented data validation and quality check frameworks within ETL processes, reducing data inconsistencies by 46% and improving decision-making.
• Created interactive Tableau and Power BI dashboards for sales analytics, making complex data accessible to business users and increased data-informed decisions.
• Developed automated data validation scripts to identify and flag inconsistencies in inventory and sales records before they reached the reporting layer, improving overall data reliability.
• Created automated alerting systems for data pipeline failures, reducing mean time to detection by 66% and ensuring minimal disruption to critical business operations.
• Implemented a GDPR-compliant data governance framework, ensuring proper handling of customer data across all systems and maintaining regulatory compliance.
Environment: AWS Cloud(S3,Glue,Athena), Tableau, Power BI, SQL, SQL Server/Oracle, Informatica, ETL Validation, Python, Shell Scripting.
Education :
Master of Science in Data Science GPA 3.9/4.0
The University of Texas at Arlington – Arlington, Tx
• Honors in Data Science and Data Mining
Bachelor of Technology in Computer Science & Engineering GPA 8.3/10.0 Jawaharlal Nehru Technological University – Kakinada, India
• Honors in Data Analytics and IOT
Certifications :
• Certified Google Data Analytics Professional
• Certified AWS Data Engineer