Power Bi Data Analytics

Location:

Jersey City, NJ

Posted:

September 10, 2025

Contact this candidate

Resume:

Vishnu Varma

PH: 863-***-**** **************@*****.*** www.linkedin.com/in/vishnu-varma-3537a820a Professional Summary

5+ years of experience in designing, developing, and managing complex data pipelines and infrastructure.

Skilled in ETL/ELT, data modeling, data wrangling, and enrichment using Hadoop, Spark, PySpark, SQL, Scala, Python, Airflow, Azure, AWS, Snowflake, Databricks, Tableau, and Power BI. Proven ability in real-time processing, CI/CD automation, and dashboard/reporting delivery for enterprise-scale solutions.

Worked on Spark apps in Azure Databricks using Spark SQL for ETL from diverse formats; optimized queries.

Developed PySpark pipelines in Spark Streaming for large-scale processing from Kafka, S3, and Kinesis.

Tuned Snowflake queries using profiling, execution plans, and rewrites for faster performance.

Integrated Hadoop, PySpark, HBase, MongoDB, and Hive for big data analytics.

Automated ETL using Python with SQL integration and data quality validation.

Experience in Star/Snowflake schema design, SCDs, fact/dim modeling using Erwin.

Migrated SQL workloads to Azure SQL DB, Synapse, and Data Lake via ADF; also migrated on-prem SQL to AWS Redshift.

Worked on CI/CD workflows using Jenkins on Kubernetes; leveraged Git, Docker, and Ansible.

Delivered interactive dashboards in Tableau, Power BI, and QuickSight using structured/semi-structured data.

Implemented data governance and security best practices including RBAC, encryption, and compliance with HIPAA/GDPR.

Education Details

Masters in Data Analytics at Indiana Wesleyan University, Marion, IN Skills Summary

Languages: Python, SQL, Scala, PySpark

Big Data: Hadoop, Spark, HBase, Kafka, Pig, Cassandra, MongoDB, Snowflake, Airflow

Cloud Platforms: AWS (S3, Redshift, Lambda, Glue, Kinesis, CloudFormation, DynamoDB,Cloudtrail), Azure (ADF, Synapse, Databricks, SQL DB, Data Lake, DevOps), GCP (Big Query, Dataflow)

ETL & Streaming: Glue, Airflow, DataStage, Kafka, Spark Streaming, Azure Stream Analytics, dbt, Fivetran, Informatica, Talend

DevOps & CI/CD: Docker, Kubernetes, Jenkins, Git, GitHub Actions, Terraform, Ansible, IaC practices

Visualization: Tableau, Power BI, QuickSight

Data Modeling & Governance: Dimensional Modeling, Star/Snowflake Schema, Erwin, Data Validation (Great Expectations), RBAC, Encryption, HIPAA, GDPR

Methodologies: Agile, Scrum, CI/CD

Professional Experience

BNY Mellon, US Oct 2023 - Present

Data Engineer

Designing and managing enterprise-scale data lake and real-time analytics systems for global financial data.

Developed Python-Spark apps to process data from RDBMS and streaming sources (Kafka, Kinesis)

Configured Snow pipe to load real-time data from S3 into Snowflake with <5 min latency

Built scalable Glue ETL pipelines to ingest, transform, and load high-volume structured/unstructured data into Redshift

Automated AWS Glue ETL jobs ingesting 50 M+ records/hour into Redshift, slashing manual intervention by 80%

Used Spark Streaming APIs for real-time actions; stored in DynamoDB and Snowflake

Integrated CodeStar + CodeCommit for version control; automated deployment with Jenkins and Ansible

Enabled micro-batching to ingest millions of files from S3 staging to Snowflake cloud

Built Python scripts to process CSV, JSON, and Parquet from S3 and store in DynamoDB/Snowflake

Used CloudWatch/CloudTrail for monitoring ETL jobs and user activity

Built Alteryx workflows and Tableau dashboards for regulatory and risk reporting

Tuned Redshift performance with sort/dist keys and WLM queues, boosting query efficiency by 40%

Ensured audit-ready pipelines by building custom Python-based data quality checks Environment: AWS (S3, Redshift, EMR, Lambda, Glue, DynamoDB, Kinesis, CloudFormation, SageMaker), Snowflake, Python, PySpark, Kafka, Jenkins, Tableau, Alteryx, SQL, HDFS, Hive, Pig, RDBMS, Teradata Vishnu Varma

PH: 863-***-**** **************@*****.*** www.linkedin.com/in/vishnu-varma-3537a820a Kogentix, India Sept 2021 – Dec 2022

Data Engineer

Worked on building scalable, cloud-native data pipelines and analytics solutions using Microsoft Azure services for enterprise clients.

Built data pipelines using Azure Data Factory, Spark SQL, and Data Lake Analytics with minimal impact on production systems

Orchestrated Azure Data Factory & Databricks workflows processing 1.5 TB of batch data and 200 M streaming events/day

Migrated on-prem SQL Server data to Azure Synapse & Azure SQL DB, applying transformations with PySpark

Used Kafka & Cassandra for distributed data processing and streaming integration

Created REST APIs in ADF for seamless integration between systems

Stored structured/semi-structured data efficiently in Parquet/Avro formats to improve query performance

Automated workflows with ADF scheduling & triggers; integrated Git & CI/CD for DevOps practices

Delivered actionable insights using Power BI integrated with ADF pipelines

Developed reusable data products for schema validation, complex transformations, and multi-port outputs (ADLS/SQL)

Migrated 5 TB of on-prem SQL Server data to Azure Synapse and Azure SQL DB, completing ahead of schedule by 30%

Imported data into Synapse from ADLS using PolyBase and Spark connector Environment: Azure Data Factory, Databricks, Synapse, Azure SQL DB, Azure DevOps, Spark SQL, Kafka, Cassandra, PySpark, Python, Git, Power BI, U-SQL, Kubernetes, Jenkins

Aurobindo Pharma, India July 2019 – Aug 2021

Data Engineer / Production Support

Developed and supported data infrastructure using AWS, Spark, SQL Server, and ETL tools for pharma analytics.

Migrated legacy media data to Wide Orbit using AWS Redshift and custom SQL mappings

Designed dimensional models (Kimball) with facts, dimensions, and referential constraints

Built SSIS ETL flows to move data from FTP and flat files to S3 and Redshift

Converted Informatica ETL to SSIS with dynamic control/script tasks

Created Tableau dashboards with parameters/actions and SSAS cubes with MDX calculations

Delivered Power BI reports using Power Pivot & Power View

Migrated 10 million+ legacy media records to AWS Redshift via SSIS, cutting nightly ETL runtimes by 60%

Slashed report generation time from 5 h to 30 m with Python scripts

Cleaned, transformed, and validated over 75 million transaction and EHR records using SQL and Python, cutting downstream data-error rates from 4% to 0.5%

Environment: AWS Redshift, S3, EMR, Kafka, SQL Server, SSIS, SSAS, Tableau, Power BI, T-SQL, Visual Studio

Contact this candidate