Vishnu Varma
PH: 863-***-**** **************@*****.*** www.linkedin.com/in/vishnu-varma-3537a820a Professional Summary
5+ years of experience in designing, developing, and managing complex data pipelines and infrastructure.
Skilled in ETL/ELT, data modeling, data wrangling, and enrichment using Hadoop, Spark, PySpark, SQL, Scala, Python, Airflow, Azure, AWS, Snowflake, Databricks, Tableau, and Power BI. Proven ability in real-time processing, CI/CD automation, and dashboard/reporting delivery for enterprise-scale solutions.
Worked on Spark apps in Azure Databricks using Spark SQL for ETL from diverse formats; optimized queries.
Developed PySpark pipelines in Spark Streaming for large-scale processing from Kafka, S3, and Kinesis.
Tuned Snowflake queries using profiling, execution plans, and rewrites for faster performance.
Integrated Hadoop, PySpark, HBase, MongoDB, and Hive for big data analytics.
Automated ETL using Python with SQL integration and data quality validation.
Experience in Star/Snowflake schema design, SCDs, fact/dim modeling using Erwin.
Migrated SQL workloads to Azure SQL DB, Synapse, and Data Lake via ADF; also migrated on-prem SQL to AWS Redshift.
Worked on CI/CD workflows using Jenkins on Kubernetes; leveraged Git, Docker, and Ansible.
Delivered interactive dashboards in Tableau, Power BI, and QuickSight using structured/semi-structured data.
Implemented data governance and security best practices including RBAC, encryption, and compliance with HIPAA/GDPR.
Education Details
Masters in Data Analytics at Indiana Wesleyan University, Marion, IN Skills Summary
Languages: Python, SQL, Scala, PySpark
Big Data: Hadoop, Spark, HBase, Kafka, Pig, Cassandra, MongoDB, Snowflake, Airflow
Cloud Platforms: AWS (S3, Redshift, Lambda, Glue, Kinesis, CloudFormation, DynamoDB,Cloudtrail), Azure (ADF, Synapse, Databricks, SQL DB, Data Lake, DevOps), GCP (Big Query, Dataflow)
ETL & Streaming: Glue, Airflow, DataStage, Kafka, Spark Streaming, Azure Stream Analytics, dbt, Fivetran, Informatica, Talend
DevOps & CI/CD: Docker, Kubernetes, Jenkins, Git, GitHub Actions, Terraform, Ansible, IaC practices
Visualization: Tableau, Power BI, QuickSight
Data Modeling & Governance: Dimensional Modeling, Star/Snowflake Schema, Erwin, Data Validation (Great Expectations), RBAC, Encryption, HIPAA, GDPR
Methodologies: Agile, Scrum, CI/CD
Professional Experience
BNY Mellon, US Oct 2023 - Present
Data Engineer
Designing and managing enterprise-scale data lake and real-time analytics systems for global financial data.
Developed Python-Spark apps to process data from RDBMS and streaming sources (Kafka, Kinesis)
Configured Snow pipe to load real-time data from S3 into Snowflake with <5 min latency
Built scalable Glue ETL pipelines to ingest, transform, and load high-volume structured/unstructured data into Redshift
Automated AWS Glue ETL jobs ingesting 50 M+ records/hour into Redshift, slashing manual intervention by 80%
Used Spark Streaming APIs for real-time actions; stored in DynamoDB and Snowflake
Integrated CodeStar + CodeCommit for version control; automated deployment with Jenkins and Ansible
Enabled micro-batching to ingest millions of files from S3 staging to Snowflake cloud
Built Python scripts to process CSV, JSON, and Parquet from S3 and store in DynamoDB/Snowflake
Used CloudWatch/CloudTrail for monitoring ETL jobs and user activity
Built Alteryx workflows and Tableau dashboards for regulatory and risk reporting
Tuned Redshift performance with sort/dist keys and WLM queues, boosting query efficiency by 40%
Ensured audit-ready pipelines by building custom Python-based data quality checks Environment: AWS (S3, Redshift, EMR, Lambda, Glue, DynamoDB, Kinesis, CloudFormation, SageMaker), Snowflake, Python, PySpark, Kafka, Jenkins, Tableau, Alteryx, SQL, HDFS, Hive, Pig, RDBMS, Teradata Vishnu Varma
PH: 863-***-**** **************@*****.*** www.linkedin.com/in/vishnu-varma-3537a820a Kogentix, India Sept 2021 – Dec 2022
Data Engineer
Worked on building scalable, cloud-native data pipelines and analytics solutions using Microsoft Azure services for enterprise clients.
Built data pipelines using Azure Data Factory, Spark SQL, and Data Lake Analytics with minimal impact on production systems
Orchestrated Azure Data Factory & Databricks workflows processing 1.5 TB of batch data and 200 M streaming events/day
Migrated on-prem SQL Server data to Azure Synapse & Azure SQL DB, applying transformations with PySpark
Used Kafka & Cassandra for distributed data processing and streaming integration
Created REST APIs in ADF for seamless integration between systems
Stored structured/semi-structured data efficiently in Parquet/Avro formats to improve query performance
Automated workflows with ADF scheduling & triggers; integrated Git & CI/CD for DevOps practices
Delivered actionable insights using Power BI integrated with ADF pipelines
Developed reusable data products for schema validation, complex transformations, and multi-port outputs (ADLS/SQL)
Migrated 5 TB of on-prem SQL Server data to Azure Synapse and Azure SQL DB, completing ahead of schedule by 30%
Imported data into Synapse from ADLS using PolyBase and Spark connector Environment: Azure Data Factory, Databricks, Synapse, Azure SQL DB, Azure DevOps, Spark SQL, Kafka, Cassandra, PySpark, Python, Git, Power BI, U-SQL, Kubernetes, Jenkins
Aurobindo Pharma, India July 2019 – Aug 2021
Data Engineer / Production Support
Developed and supported data infrastructure using AWS, Spark, SQL Server, and ETL tools for pharma analytics.
Migrated legacy media data to Wide Orbit using AWS Redshift and custom SQL mappings
Designed dimensional models (Kimball) with facts, dimensions, and referential constraints
Built SSIS ETL flows to move data from FTP and flat files to S3 and Redshift
Converted Informatica ETL to SSIS with dynamic control/script tasks
Created Tableau dashboards with parameters/actions and SSAS cubes with MDX calculations
Delivered Power BI reports using Power Pivot & Power View
Migrated 10 million+ legacy media records to AWS Redshift via SSIS, cutting nightly ETL runtimes by 60%
Slashed report generation time from 5 h to 30 m with Python scripts
Cleaned, transformed, and validated over 75 million transaction and EHR records using SQL and Python, cutting downstream data-error rates from 4% to 0.5%
Environment: AWS Redshift, S3, EMR, Kafka, SQL Server, SSIS, SSAS, Tableau, Power BI, T-SQL, Visual Studio