BHARATH KUMAR RAAVI
Role: Power BI/Azure Data Engineer
Phone: 314-***-****
Email: *****************@*****.***
LinkedIn: https://www.linkedin.com/in/bharathkumarraavi/
PROFESSIONAL SUMMARY:
Data Professional with over 9+ years of experience in the software industry, including 5+ years specializing in Azure Cloud Services and 4+ years focused on Business Intelligence and Power BI Development.
Expert in developing and deploying interactive Power BI Dashboards that transform complex data into actionable insights.
Proficient in integrating diverse data sources—such as Azure SQL, Azure Cosmos DB, and on-premise systems—to build comprehensive BI Solutions.
Skilled in designing and optimizing data models using Power BI, DAX, and Power Query to ensure high-performance, responsive reporting.
Developed robust ETL processes with Azure Data Factory and Databricks to streamline data ingestion and support real-time Power BI Analytics.
Successfully implemented end-to-end BI solutions on Azure, leveraging Azure Synapse Analytics and Azure Data Lake to feed dynamic Power BI Visualizations.
Utilized advanced DAX functions and custom measures to create tailored KPIs and metrics within Power BI Dashboards.
Extensively designed, developed, and maintained data models that underpin seamless integration and interactive reporting in Power BI.
Collaborated with cross-functional teams to translate business requirements into scalable, user-friendly Power BI Solutions.
Proficient in leveraging Power BI’s data transformation capabilities to clean, enrich, and aggregate data for optimal visualization.
Implemented robust data governance practices using Azure Purview and Unity Catalog, ensuring data quality and compliance for BI initiatives.
Demonstrated expertise in developing custom visuals and interactive reports that drive user engagement and strategic decision-making in Power BI.
Applied best practices in data warehousing and ETL to enhance data accuracy and performance for Power BI Reporting.
Automated data refresh cycles and report deployments using Power BI Service in conjunction with Azure DevOps pipelines.
Integrated Power BI with third-party APIs and connectors to extend data source capabilities and enrich business insights.
Proficient in utilizing Python and SQL to support data manipulation and transformation feeding into Power BI Dashboards.
Delivered actionable insights through detailed Power BI Reports that support executive-level decision-making.
Ensured robust data security and compliance within Power BI Environments by applying best practices in data protection.
Designed scalable BI architectures on Azure that support real-time visualization and advanced analytics using Power BI.
Leveraged Snowflake and Hadoop-based Data Architectures to complement and enhance Power BI Data Ecosystems.
Developed CI/CD pipelines for Power BI Projects using Git, Azure DevOps, and related version control tools.
Balanced technical precision with business needs to deliver impactful, visually compelling Power BI Solutions.
Committed to continuous learning and innovation, consistently adopting the latest Power BI features and best practices to drive business transformation.
Education:
•Bachelor’s in computer science and engineering, Lovely Professional University, JUNE 2014.
•Master’s in information studies at Saint Louis University, Saint Louis, Missouri, DEC 2019.
Certifications:
AZ-305 Microsoft Certified Azure Solutions Architect
Technical Skills:
Azure Services: Azure Blob, ADLS Gen2, Azure Data Factory (ADF), Azure Databricks, Azure Synapse Analytics, Key vault, Azure Event Hub, Azure HD Insights, Azure Cosmos DB, Azure SQL DB, Azure Function apps, Azure Logic apps, Azure Active Directory, Azure DevOps, Azure Monitor, Azure Service bus. Azure Kubernetes service (AKS), Azure purview
Big Data Technologies: HDFS, YARN, PIG Latin, MapReduce, Hive, Sqoop, Apache Spark, Zookeeper, Oozie, Apache Kafka, Cassandra, Apache Airflow, Apache Flume
ETL Tools: Data Build Tool (DBT), IBM DB2, IBM Infosphere DataStage 7.5X, IICS, Informatica power center
Languages: Python, R, pyspark, scala, SQL, PL/SQL, Hive SQL.
Databases: MS SQL server, Azure SQL DB, Teradata, oracle, MySQL, PostgreSQL, HBase, MongoDB
Data modeling: star schema Modeling, snowflake schema modeling, slowly changing Dimensions (SDC), change Data capture (CDC)
Operating Systems: Linux, Windows, UNIX
Version control: Git, GitHub
Big Data Platforms: Hortonworks, Cloudera
Development Methods: Agile/Scrum, Waterfall
IDE’s: PyCharm, Eclipse, Visual Studio
Data Visualization: Power BI, Tableau
CI/CD: Jenkins, Kubernetes
Work Experience:
Client: Fifth Third Bank (Remote) Feb 2024 - CURRENT
Role: Power BI/Azure Data Engineer
Responsibilities:
Collaborated with ML Engineers, Data Scientists, and business stakeholders to design interactive Power BI Dashboards that drive real-time, data-driven decision-making while ensuring robust data pipelines on Azure.
Designed and developed end-to-end data solutions on Azure by integrating Microsoft Fabric, Azure Data Factory, and Synapse Analytics to create a seamless BI ecosystem feeding dynamic Power BI Visualizations.
Led the implementation of integrated solutions combining Power BI with advanced data engineering processes, ensuring data flows are optimized for high-performance analytics.
Developed Python-based ETL pipelines in Azure Data Factory and Databricks, leveraging Spark and PySpark to prepare and transform large-scale data for real-time Power BI Reports.
Engineered scalable data processing architectures using Azure Data Lake and OneLake to support efficient ingestion, storage, and retrieval for enriched Power BI Analytics.
Designed data extraction, transformation, and distribution workflows via Azure Data Share to enable secure, timely data availability for Power BI reporting.
Optimized real-time data streaming solutions with Azure Event Hubs and Delta Live Tables, providing low-latency feeds directly into interactive Power BI Dashboards.
Implemented robust authentication and security protocols (using OpenID Connect and Federation Models) to safeguard data across Azure and BI platforms.
Developed business-critical APIs to streamline data access and integration, enhancing the accuracy and timeliness of Power BI visualizations and insights.
Constructed dynamic data models in Microsoft Fabric that directly feed into Power BI, ensuring data integrity and rapid response to business queries.
Applied data governance best practices using Azure Purview and Unity Catalog, maintaining compliance and high-quality data standards for BI initiatives.
Established CI/CD pipelines with Terraform, GitHub, and Azure DevOps to automate the deployment of both data engineering processes and Power BI solutions.
Migrated on-premise data solutions to a unified Azure ecosystem using OneLake and Synapse Analytics, streamlining data flows into scalable Power BI environments.
Enhanced data-sharing strategies with Iceberg Data Share, improving connectivity and data exchange across platforms to support comprehensive Power BI reporting.
Led Agile teams using Scrum and Kanban methodologies in JIRA, driving iterative improvements and ensuring successful delivery of integrated Power BI and Azure Data Engineering projects.
Environment: Azure Data Factory (ADF), Azure Data Lake Storage Gen2(ADLS Gen2), Azure SQL Database, Logic Apps, Azure Event Hub, Azure Blob storage, Azure Monitor, Function Apps, Snowflake, Kafka, python, Oracle, Hive, SQL, Azure cosmos DB, Pyspark, power BI, Jenkins, JIRA
Client: UnitedHealth Group (Remote) Oct 2022 to Jan 2024
Role: Power BI/Azure Data Engineer
Responsibilities:
Collaborated with cross-functional teams on a data migration initiative, transitioning on-premises data to the cloud using Azure Data Factory for robust ETL processes.
Developed and maintained complex data orchestration pipelines in Azure Data Factory, connecting diverse sources such as Oracle, SQL Server, Teradata, and APIs.
Automated data flows with Azure Data Factory to ensure efficient transformation and integration for analytics and reporting.
Administered high-availability clusters on Azure Databricks, integrating Azure Data Lake Storage Gen2 and Key Vault for secure, scalable data management.
Built ETL pipelines in PySpark on Azure Databricks to transform large datasets, optimizing storage with Delta Lake and integrating with Azure Cosmos DB.
Implemented Medallion Architecture on Azure with Delta Lake for schema enforcement, ACID transactions, and 7-day time travel for historical data analysis.
Seamlessly ingested data from S3 buckets into Azure Databricks using connectors and APIs for efficient cross-platform processing.
Enhanced streaming pipelines using Delta Live Tables to parse and flatten semi-structured data (JSON, XML) for real-time insights.
Created Delta Lake environments on Databricks and ingested data into Azure Synapse Analytics to power dynamic, live Power BI Dashboards.
Utilized Serverless SQL Pools and Polybase in Azure Synapse Analytics for comprehensive querying across diverse data sources, integrating with Azure Service Bus for event-driven messaging.
Orchestrated the migration of an on-premises data warehouse to Azure Synapse Analytics, optimizing performance and reducing costs significantly.
Developed consumer APIs for processing Kafka streams, enabling real-time data ingestion and transformation in distributed environments.
Automated event-driven workflows using Azure Logic Apps, streamlining data processing and enhancing system efficiency.
Leveraged Microsoft Purview for data governance, classification, and lineage tracking, ensuring data quality and compliance.
Implemented CI/CD pipelines with Azure DevOps and Python, complemented by Azure Function Apps for scheduling, to ensure consistent, efficient cloud deployments.
Environment: Azure Data Factory (ADF), Azure data lake storage Gen 2(ADLS Gen2), Azure HD Insight, Azure SQL database, Azure Logic Apps, Azure Blob storage, Azure cosmos DB, Azure function Apps, Azure SQL server, Azure Event Hub, HDFS, Kafka, MapReduce, snowflake, Python, Sqoop, PySpark, Jenkins, Power BI.
Client: Homesite Insurance, Boston, MA April 2021 to Sep 2022
Role: Data Engineer
Responsibilities:
●Leveraged Apache Spark and Spark SQL to accelerate data testing and processing from diverse sources, enhancing overall workflow efficiency.
●Configured and managed big data ecosystem tools including Hive, Pig, Sqoop, and Oozie on Hadoop clusters, enabling the development of complex MapReduce jobs.
●Optimized MapReduce jobs and improved HDFS storage efficiency through effective data compression techniques.
Orchestrated data imports from multiple sources and transformed data using Hive and MapReduce, while utilizing Sqoop to extract data from Oracle into HDFS.
●Designed and implemented robust data pipelines and complex data flows using Azure Data Factory and PySpark on Databricks to streamline transformation processes.
●Developed numerous Databricks Spark jobs with PySpark to facilitate table-to-table operations and support advanced analytics.
●Architected and maintained both external and internal tables and views in Snowflake, increasing data accessibility for analytical purposes.
●Conducted rigorous data quality analysis using SnowSQL to build analytical warehouses on Snowflake that supported decision-making.
●Implemented business logic through custom UDFs written in Java, enhancing data transformation capabilities.
Developed and deployed multiple data applications in Spark using Python and Scala for comprehensive data quality validation.
●Migrated legacy Hadoop workflows to modern Spark frameworks, leveraging in-memory distributed computing for real-time fraud detection.
●Transitioned computational code from HQL to PySpark, improving processing efficiency and reducing execution time.
Designed and implemented batch processing pipelines using Apache Spark, significantly reducing processing times.
●Created Pig Latin scripts to extract data from web server outputs and load it into HDFS for downstream analysis.
Automated data ingestion and pre-processing workflows using Oozie, reducing manual intervention and improving efficiency.
●Developed a reusable framework with Spark, Python, and Sqoop to manage dynamic metadata changes and streamline data loading into RDBMS systems.
Collaborated with cross-functional teams to install, update, and optimize Hadoop clusters, ensuring system stability and performance.
●Utilized Python to extract social media metrics and developed interactive dashboards with Power BI for actionable business insights.
Environment: Python, Hadoop, MapReduce, HDFS, Hive, Pig, Apache Sqoop, Scala, Spark, PySpark, Oozie, HDFS, Zookeeper, HBase, PL/SQL, SQL, Cloudera Manager, GitHub, MySQL, Windows, Power BI
Client: Guild Mortgage -Houston, TX Jan 2020 to Mar 2021
Role: Big Data Engineer
Responsibilities:
Utilized Sqoop to routinely transfer data from MySQL to the Hadoop Distributed File System (HDFS), ensuring seamless and reliable data integration.
Leveraged Apache Spark and Scala to perform aggregations on massive datasets, with the processed data stored in a Hive warehouse for comprehensive analysis.
Utilized extensive knowledge of big data ecosystems and Data Lakes—including Hadoop, Spark, Hortonworks, and Cloudera—to drive efficient data processing and management.
Successfully ingested and transformed diverse data sets—including structured, semi-structured, and unstructured formats—to enable effective analysis and insight generation.
Crafted advanced HiveQL queries to analyze data and meet business requirements, effectively simulating MapReduce functionalities for optimized performance.
Employed JIRA to streamline project workflows, track issues, and foster effective collaboration among cross-functional teams.
Harnessed PySpark and Spark SQL to accelerate data testing and processing, facilitating rapid analysis and actionable insights.
Utilized Spark Streaming to partition real-time data streams into manageable batches, enabling timely processing and analytics.
Leveraged Zookeeper for coordination, synchronization, and server serialization within clusters, ensuring efficient distributed processing.
Configured and managed Oozie workflows for automated job scheduling, ensuring seamless execution and oversight of data pipelines.
Utilized Git for version control to maintain code repositories, promoting efficient collaboration, version tracking, and code management.
Worked collaboratively with team members to identify and resolve JVM-related issues, ensuring smooth execution and optimal system performance.
Environment: Sqoop, MYSQL, HDFS, Apache Spark Scala, Hive Hadoop, Cloudera, HBASE, Kafka, MapReduce, Zookeeper, Oozie, Data Pipelines, RDBMS, AWS, EC2, Python, PySpark, Ambari, JIRA.
Client: Mutex Software Solutions Pvt Ltd, Hyderabad, India Oct 2015 to Aug 2018
Role: Hadoop Developer
Responsibilities:
Developed ETL jobs using Spark -Scala to migrate data from Oracle to new MySQL tables.
Rigorously used Spark -Scala (RRD’s, Data frames, Spark SQL) and Spark - Cassandra -Connector API's for various tasks (Data migration, Business report generation etc.)
Developed Spark Streaming application for real time sales analytics
Analysed the source data and handled efficiently by modifying the data types. Used excel sheet, flat files, CSV files to generated PowerBI ad-hoc reports
Analysed the SQL scripts and designed the solution to implement using PySpark
Extracted the data from other data sources into HDFS using Sqoop
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS.
Built HBase tables by leveraging HBase integration with Hive on the Analytics Zone, facilitating efficient storage and retrieval of data.
Kafka and Spark Streaming are applied to process streaming data in specific use cases, enabling real-time data analysis and insights generation.
Designed and implemented a data pipeline using Kafka, Spark, and Hive, ensuring seamless data ingestion, transformation, and analysis.
Migrated existing data from RDBMS (Oracle) to Hadoop using Sqoop, facilitating efficient data processing and leveraging Hadoop's capabilities.
Developed custom scripts and tools using Oracle's PL/SQL language to automate data validation, cleansing, and transformation processes, ensuring data accuracy and quality.
Implemented Continuous Integration and Continuous Deployment (CI/CD) pipelines to build and deploy projects in the Hadoop environment, ensuring streamlined development and deployment processes.
Extracted the data from MySQL into HDFS using Sqoop
Implemented automation for deployments by using YAML scripts for massive builds and releases
Apache Hive, Apache Pig, HBase, Apache Spark, Zookeeper, Flume, Kafka and Sqoop.
Implemented Data classification algorithms using MapReduce design patterns.
Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of MapReduce jobs.
Worked on GIT to maintain source code in Git and GitHub repositories
Environment: Hadoop, Hive, spark, PySpark, Sqoop, Spark SQL, Cassandra, YAML, ETL
Client: Wissen Infotech, Hyderabad, India June 2014 to Sep 2015
Role: Data Warehouse Developer
Responsibilities:
Working as SQL Server Analyst / Developer / DBA using SQL Server 2012, 2015, 2016.
Created jobs, SQL Mail Agent, Alerts and schedule DTS/SSIS Packages.
Manage and update the Erwin models - Logical/Physical Data Modeling for Consolidated Data Store (CDS), Actuarial Data Mart (ADM), and Reference DB according to the user requirements.
Source Controlling, environment specific script deployment tracking using TFS
Export the current Data Models into PDF out of Erwin and publish them on to SharePoint for various users.
Developing, Administering, and Managing corresponding databases: Consolidated Data Store, Reference Database (Source for the Code/Values of the Legacy Source Systems), and Actuarial Data Mart
Writing Triggers, Stored Procedures, Functions, Coding using Transact-SQL (TSQL), create and maintain Physical Structures.
Deployment of Scripts in different environments according to Configuration Management, Playbook requirements Create / Manage Files/File group - Table/Index association Query Tuning, Performance Tuning.
Defect tracking and closing by using Quality Center Maintain Users / Roles / Permissions.
Environment: SQL Server 2008/2012 Enterprise Edition, SSRS, SSIS, T-SQL, Windows Server 2003, Performance Point Server 2007, Oracle 10g, visual Studio 2010.