VARAS VISHWANADHULA
Kansas (Open to Relocate), USA 913-***-**** ******.******@*****.*** LinkedIn SUMMARY
• Data Engineer with 5+ years of expertise in designing, developing, and maintaining large-scale data pipelines and cloud-based data platforms across finance, healthcare, and asset management domains
• Skilled in building end-to-end ETL/ELT workflows using Python, PySpark, and Apache Airflow to process structured and unstructured data from multiple sources into centralized data lakes and warehouses.
• Proven ability to implement real-time streaming solutions using Apache Kafka, AWS Lambda, and Spark Streaming to support continuous data ingestion, monitoring, and analytics for enterprise applications.
• Designed and optimized analytical data models, dimensional schemas, and Snowflake/Redshift-based warehouses to support reporting, BI dashboards, and predictive analytics.
• Expertise in data transformation, cleansing, validation, and anomaly detection frameworks to ensure data quality, reliability, and compliance with industry standards.
• Developed interactive dashboards and reporting solutions using Tableau, Power BI, and advanced SQL to provide actionable insights and support data-driven decision-making for stakeholders.
• Experienced in managing cloud-based infrastructure on AWS and Azure, including Data Lakes, Data Factory, Glue, Redshift, Synapse, and Databricks, ensuring scalability, performance, and cost optimization.
• Collaborated with cross-functional teams including data scientists, analysts, and business stakeholders to deliver analytics-ready datasets, enable predictive modeling, and drive operational efficiency in enterprise environments. SKILLS
Programming Languages: Python, SQL, Java
Packages: PySpark, NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, TensorFlow, Seaborn Data Warehousing & Databases: Snowflake, Amazon Redshift, Google BigQuery, Microsoft SQL Server, Oracle, MySQL, PostgreSQL, MongoDB, Cassandra, DynamoDB
Big Data Tools: Apache Spark (Batch & Streaming), Hadoop (HDFS, MapReduce), Hive, Pig, Kafka, Flink, Airflow, Databricks ETL & Data Integration: Talend, Apache NiFi, Informatica PowerCenter, SSIS, Fivetran, Matillion, Azure Data Factory Cloud Platforms Platforms: Azure (Data Lake, Data Factory, Synapse), AWS (S3, EMR, Lambda, Glue, Redshift, RDS), Data Modeling & Architecture: Star & Snowflake Schema, OLAP/OLTP, Dimensional Modeling, Data Lake & Lakehouse Architecture, Metadata Management
BI & Data Visualization: Tableau, Power BI, Looker, Qlik Sense, Data Studio, Advanced SQL Dashboards DevOps & CI/CD: Git/GitHub, Jenkins, Docker, Kubernetes, Terraform, Airflow DAG Orchestration Other Tools & Concepts: REST APIs, JSON, XML, JSON Schema, Message Queues, Real-time Streaming, Data Governance, Data Quality & Lineage, GDPR & HIPAA Compliance
Soft Skills & Leadership: Stakeholder Management, Cross-functional Collaboration, Agile & Scrum, Problem-Solving, Mentorship, Communication, Analytical Thinking, Strategic Planning, Project Management, Decision-Making WORK EXPERIENCE
American Express Kansas, USA
Data Engineer Jan 2024 – Present
• Designed and implemented a cloud-based financial data lake using Azure Data Lake and Snowflake, consolidating payment, billing, and transaction data from multiple legacy systems into a unified, analytics-ready platform used across Finance and Risk divisions.
• Developed parameterized data pipelines in Azure Data Factory to automate ingestion from Oracle, S3, and REST APIs, handling 12 TB+ of data daily and reducing manual data refresh time by 80%.
• Engineered PySpark data transformation workflows in Databricks to cleanse, standardize, and aggregate financial transactions, applying business rules for reconciliation, revenue classification, and exception handling, which improved data reliability for downstream analytics.
• Built and optimized financial fact-dimension data models in Snowflake, implementing clustering, partitioning, and result caching strategies that reduced analytical query latency by 40% and compute costs by 20%.
• Automated card settlement and merchant reconciliation processes using PySpark and SQL stored procedures, achieving 99.7% matching accuracy and reducing manual reconciliation time by 70%, ensuring audit-ready financial reporting.
• Integrated Kafka and Azure Event Hubs to stream high-volume transaction data for fraud monitoring, achieving near-real-time ingestion with under 3-minute latency, enabling faster anomaly detection.
• Established a data validation framework using dbt and Great Expectations, embedding 60+ data quality checks for completeness, accuracy, and reconciliation, which reduced downstream data issues by 35%.
• Created interactive Power BI dashboards connected to Snowflake for daily revenue and settlement variance reporting, enabling leadership to identify financial anomalies two days earlier and improving decision turnaround time by 40%.
• Deployed CI/CD automation in Jenkins with Git-based version control to streamline code promotion for ADF, dbt, and Databricks workflows, decreasing deployment errors by 30% and improving release consistency.
• Monitored pipeline performance using Azure Monitor and ServiceNow alerts, implementing workload optimization and auto-scaling policies in Databricks and Snowflake that lowered compute expenses by 25% while maintaining SLA compliance. Deloitte India
Data Engineer Aug 2020 – Dec 2022
• Designed and deployed Python and PySpark ETL pipelines to ingest and process over 5 TB of monthly asset, financial, and market data into Azure Data Lake, improving data accessibility and reliability by 40%.
• Established Data Governance frameworks with Azure Purview, managing metadata, data lineage, and access controls for 10,000+ asset records, ensuring compliance with organizational and regulatory standards and improving overall data quality.
• Developed automated data validation and anomaly detection workflows in PySpark and Apache Airflow, reducing production data errors by 30% and enabling accurate, near real-time monitoring of asset and market data.
• Optimized Snowflake data warehouse schemas using clustering and partitioning, reducing complex query runtimes by 50–60% and enabling faster reporting and analysis for historical and real-time asset performance metrics.
• Integrated ERP systems and market APIs using Python and Spark Streaming, consolidating real-time data into Azure Data Lake, reducing manual aggregation by 70% and improving predictive modeling.
• Collaborated with Data Scientists to design feature-rich datasets for asset risk and performance models, delivering clean, structured, analytics-ready data that improved predictive model accuracy by 20%.
• Built automated dashboards and reporting in Power BI, visualizing KPIs such as asset performance, risk exposure, and compliance metrics in real time, reducing manual reporting effort by 75% and enabling faster executive decision-making.
• Managed cloud infrastructure and optimized compute and storage using Azure Data Lake, Azure Data Factory, and Azure Synapse Analytics, supporting multi-terabyte datasets for real-time analytics while reducing storage costs by 25% annually.
• Performed performance tuning on Spark ETL jobs processing millions of asset records daily, achieving a 30–40% reduction in processing time, which improved SLA adherence for timely business insights.
• Mentored 5 junior data engineers on Spark optimization, Snowflake best practices, Azure Data Lake management, and ETL pipeline design, accelerating feature deployment by 50% and strengthening team capability for ongoing project delivery. Deloitte India
Data Engineer Associate Jun 2019 – Jul 2020
• Developed real-time data pipelines using Apache Kafka and AWS Lambda to ingest over 400,000 patient vitals and home care records daily from 20+ district hospitals into a centralized AWS S3 data lake, enabling timely monitoring and intervention.
• Automated ETL workflows with Python, orchestrated via Apache Airflow, to standardize and process heterogeneous data from hospital management systems and mobile applications into AWS Redshift, reducing manual data consolidation efforts by 70%.
• Transformed and validated 20 GB of daily health data using AWS Glue, removing duplicates and handling missing values to support accurate patient risk scoring and early intervention planning.
• Designed and implemented analytical data models in AWS Redshift to track patient recovery trends, hospitalization probabilities, and district-level health metrics, facilitating data-driven decision-making for district administrators.
• Created interactive dashboards in Tableau to visualize patient vitals, home care status, and hospital resources in near real-time, reducing reporting time from 8 hours to under 30 minutes and enhancing administrative efficiency.
• Built Python and SQL-based data validation frameworks to detect anomalies, ensure data quality, and maintain compliance with healthcare data standards, achieving over 95% reliability for operational and administrative use. EDUCATION
Master of Science in Computer Science - University of Central Missouri, Warrensburg, Missouri, USA Bachelor of Technology in Computer Science and Engineering - Jawaharlal Nehru Technological University Hyderabad, India CERTIFICATIONS
• Snowflake Snow Pro
• Snowflake Snow Pro Associate
• Databricks Data Engineer Associate