Data Engineer Power Bi

Location:

Terrace Park, FL, 33617

Salary:

95000

Posted:

October 16, 2025

Contact this candidate

Resume:

Ramana Reddy

Sr.Data Engineer

201-***-**** ****************@*****.***

PROFESSIONAL SUMMARY

Over 6+ years of hands-on experience in architecting, developing, and managing large-scale data engineering solutions across multi-cloud environments including Azure, AWS and GCP.

Expert in Google Cloud services such as BigQuery, Cloud Storage, Dataflow, Dataproc, Pub/Sub, Cloud Composer, and Cloud Functions, with extensive experience in event-driven architectures and real-time data pipelines.

Skilled in Azure services including Azure Data Factory, Azure Databricks, Synapse Analytics, Data Lake Storage, Azure SQL, and Power BI, with successful delivery of cloud data warehouse migrations, CI/CD pipelines using Terraform, and data security implementation using Azure Key Vault and role-based access.

Proficient in AWS technologies including Redshift, S3, Glue, EMR, Lambda, Kinesis, Athena, and Step Functions for building robust batch and streaming data pipelines supporting large-scale analytics.

Deep expertise in Apache Spark ecosystem using PySpark, Spark SQL, and Spark Streaming for both batch and real-time data processing across GCP, AWS, and Azure platforms.

Built scalable and modular data lakes using GCS, ADLS, S3, and Delta Lake with medallion architecture patterns (bronze, silver, gold) to streamline ingestion and analytics.

Experienced in building dimensional data models (star schemas) in BigQuery, Redshift, and Synapse Analytics following Kimball methodology for optimized BI querying.

Developed automated ETL/ELT pipelines using Python, SQL, and orchestration tools like Apache Airflow, Azure Data Factory, and Cloud Composer.

Hands-on experience in developing CI/CD pipelines with GitHub Actions, Jenkins, and Terraform for automated deployment and infrastructure provisioning across cloud environments.

Applied enterprise security and compliance frameworks including DLP, GDPR, CCPA, encryption, masking, and access control across GCP, AWS, and Azure.

Built real-time dashboards and visual analytics using Looker, Power BI, Tableau, Amazon QuickSight, and Dash Enterprise for actionable business insights.

Expert in SQL across BigQuery SQL, T-SQL, PL/SQL, PostgreSQL, MySQL, and SQL Server for complex querying, performance tuning, and data modeling.

Successfully led cloud migration initiatives from Hadoop ecosystems (Hive, Sqoop) to cloud-native platforms, improving performance and reducing infrastructure costs.

Extensive experience with orchestration, monitoring, and metadata tools including Cloud Composer, Airflow, Azure Monitor, Stackdriver, and AWS Glue Data Catalog.

Proficient in containerization and microservices using Docker and Kubernetes (AKS, EKS, GKE), supporting scalable and portable data applications.

Strong programming skills in Python, Scala, Java, and shell scripting for developing ETL workflows, validation frameworks, and cloud automation utilities.

Implemented data quality frameworks using tools like Great Expectations and dbt, ensuring trust in data pipelines and enabling robust testing in production environments.

Experience in integrating structured, semi-structured, and unstructured data formats including Parquet, Avro, ORC, and JSON across data lakes and warehouses.

Well-versed in MLOps tools such as MLflow and Kubeflow to support ML pipeline deployment and model monitoring within data platforms.

Strong understanding of data governance, lineage, metadata management, and cataloging using tools like Apache Atlas, AWS Glue Catalog, and Collibra.

Excellent collaboration and stakeholder communication skills, driving agile delivery and aligning technical solutions with business objectives across cross-functional teams.

TECHNICAL SKILLS

Cloud Platforms

GCP (BigQuery, Dataproc, Dataflow, Cloud Storage, Cloud Functions, Pub/Sub, Cloud Composer, IAM, gsutil), Azure (Data Factory, Databricks, Synapse Analytics, Data Lake Storage, Azure SQL, CI/CD), AWS (S3, Redshift, Lambda, Glue, EMR, Athena, Amazon QuickSight)

Big Data Technologies

Hadoop (HDFS, Hive, Pig, Sqoop, Flume, Oozie), Apache Spark (PySpark, Spark SQL, Scala), Apache Beam, Dask, Trino, Kafka, Spark Streaming, Snowflake, Athena

Data Warehousing

Snowflake,BigQuery, Redshift, Azure Synapse Analytics, Oracle, MySQL, PostgreSQL

Data Lake Solutions

Delta Lake, GCS, ADLS Gen2, S3, medallion architecture

ETL/ELT Tools

Custom Python pipelines, Azure Data Factory, AWS Glue, Apache Airflow, Cloud Composer, Informatica IICS, SSIS

Programming Languages

Python, Scala, Java, SQL, R, Shell scripting, JavaScript

Databases

MySQL, PostgreSQL, SQL Server, Oracle, MongoDB, HBase, Cassandra, CosmosDB, DynamoDB, Neo4j

DevOps & CI/CD

Terraform, Jenkins, GitHub Actions, GitLab CI/CD, Azure DevOps

Data Visualization

Looker, Power BI, Tableau, Dash, Amazon QuickSight

Workflow Orchestration

Apache Airflow, Cloud Composer, Azure Data Factory

Data Quality & Governance

Great Expectations, dbt, DLP, GDPR, CCPA, metadata management, row-level security

Real-time Streaming

Apache Kafka, Spark Streaming, Cloud Pub/Sub

Analytics & Machine Learning

MLflow, Kubeflow

Data Formats & Serialization

Parquet, Avro, ORC, JSON, XML, Protocol Buffers, CSV

API Development

FastAPI, Docker, Kubernetes

PROFESSIONAL WORK EXPERIENCE

Community Choice Financial, Addison, Texas Jun 2023 to present

Sr.Data Engineer

Created framework to off-load data in Azure ecosystem for SAS reporting, loaded data in Azure HDInsight using Scala, Spark, and PySpark.

Used Azure Data Factory for data extraction from Azure Blob Storage and loaded to Azure Synapse Analytics.

Designed and implemented pipelines, data flows, and complex data transformations and manipulations using PySpark on Azure Databricks.

Developing data pipelines to ingest structured and unstructured data from various sources into the Gen2 Data Lake.

Responsible for provisioning key Azure Cloud services and configuring them for scalability, flexibility, and cost optimization.

Automated data storage from streaming sources to Azure Data Lakes like Azure Blob Storage, Azure Synapse Analytics, and Azure SQL Database by configuring Azure Stream Analytics.

Served as the Azure Synapse Analytics Database Administrator, leading the data model design and database migration deployment in production environments (Dev, Qual, and Prod).

Performed ETL data translation using Azure Data Factory and functional requirements to support large datasets (Big Data) into Azure Cloud databases, Azure Synapse Analytics and Azure SQL Database.

Performed analytics using real-time integration capabilities of Azure Stream Analytics on streamed data.

Created monitors, alarms, notifications, and logs for Azure Functions, Azure Data Factory, and Azure Synapse Analytics using Azure Monitor.

Involved in loading data into Azure HBase using Azure HDInsight HBase, Azure HDInsight HBase Shell, HBase Client API, Pig, and Sqoop, and continuous monitoring and managing the Hadoop cluster through Azure HDInsight Management.

Developed different components of the Azure HDInsight ecosystem system process that involves MapReduce, and Hive.

Managed Azure infrastructure with orchestration tools such as Azure Resource Manager (ARM) templates, Terraform, and Azure DevOps Pipeline.

Created data ingestion modules using Azure Data Factory for loading data in various layers in Azure Blob Storage and reporting using Azure Synapse Analytics and Power BI.

Created PySpark jobs in Azure HDInsight to create Parquet files as per business requirements, used Python Pandas and Spark DataFrames.

Skilled in using Collections in Python for manipulating and looping through different user-defined objects.

Built database model, Views, and APIs using Python for interactive web-based solutions on Azure App Service.

Developed Spark code using Python for faster processing of data on Azure Databricks.

Good understanding of Spark Architecture with Azure Databricks, Structured Streaming. Setting up Azure with Azure Databricks, Azure Databricks Workspace for Business Analytics.

Responsible for creating on-demand tables on Azure Blob Storage using Azure Functions and Azure Data Factory using Python and PySpark.

Wrote and executed various Azure SQL Database queries using Python, implemented integration test cases.

Built Azure Storage Accounts and managed policies for Azure Blob Storage and used Azure Blob Storage for storage and backup.

Installed and configured Azure HDInsight developed multiple MapReduce jobs in Java for data cleaning and processing.

Developed data pipeline using Azure Event Hubs, Azure Data Factory, Azure HDInsight, and Azure Stream Analytics to ingest customer behavioral data and financial histories into Azure Data Lake Storage for analysis.

Developed GUI HTML, XHTML, AJAX, CSS 5, and JavaScript (jQuery) and exclusively used CSS for modifying layout and design of the web pages.

Generated Python Flask forms to maintain the record of online users.

Performed smoke testing in UAT and production environments for deployment verification.

Environment: Python, Azure, Flask, Git, Python Dataframe, Azure Databricks, Linux, Azure SQL Database, and Python libraries such as NumPy, etc.

Innovaccer, San Francisco, CA Dec 2021 to May 2023

Data Engineer

Designed and implemented end-to-end data pipelines using Azure Data Factory (ADF) to facilitate efficient data ingestion, transformation, and loading (ETL) from diverse data sources into Snowflake data warehouse.

Orchestrated robust data processing workflows utilizing Azure Databricks and Apache Spark for seamless large-scale data transformations and advanced analytics improving data processing speed by 14%.

Developed real-time data streaming capabilities into Snowflake by seamlessly integrating Azure Event Hubs and Azure Functions, enabling prompt and reliable data ingestion.

Hands-on experience in Azure Cloud Services, Azure Synapse Analytics, Azure SQL Server, Data Factory, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake Storage Gen 2.

Deployed Azure Data Lake Storage (ADLS) as a reliable and scalable data lake solution, implementing efficient data partitioning and retention strategies to store and manage both raw and processed data effectively.

Managed Azure Data Lake Storage for optimized data file storage and retrieval, implementing advanced techniques like compression and encryption to bolster data security and streamline storage costs.

Integrated Azure Logic Apps seamlessly into the data workflows, ensuring comprehensive orchestration and triggering of complex data operations based on specific events, enhancing overall data pipeline efficiency.

Enforced data governance and comprehensive data quality checks using Azure Data Factory and Snowflake, guaranteeing the highest standards of data accuracy and consistency.

Integrated MapR with various data processing frameworks, such as Apache Spark and Apache Drill, to enhance data processing capabilities.

Developed and executed data replication and disaster recovery strategies in MapR, minimizing data loss risks.

Automated data workflows and job scheduling in MapR using tools like Apache Oozie and MapR Control System.

Implemented robust data replication and synchronization strategies between Snowflake and other data platforms leveraging Azure Data Factory and Change Data Capture techniques, ensuring data integrity and consistency with a 98% reduction in data inconsistencies.

Designed and implemented efficient data archiving and retention strategies utilizing Azure Data Lake Storage Gen 2 and leveraging Snowflake's Time Travel feature, ensuring optimal data management and regulatory compliance.

Developed and deployed Azure Functions to handle critical data preprocessing, enrichment, and validation tasks within the data pipelines, elevating the overall data quality and reliability.

Worked on Azure Machine Learning and Snowflake to architect and execute advanced analytics and machine learning workflows, enabling predictive analytics and data-driven insights achieving 23% improvement in predictive accuracy.

Implemented secure authentication and access control measures by integrating Azure Active Directory, ensuring robust identity management and compliance with industry-leading security standards for user authentication and authorization within Azure Cloud environments.

Implemented Unity Catalog to enhance data security, segregating confidential data and managing access controls.

Proficient with using Delta Sharing, a secure data sharing platform that leverages Unity Catalog, to share data in Azure Databricks with external users and applications, using standard protocols and formats.

Developed custom monitoring and alerting solutions using Azure Monitor and Snowflake Query Performance Monitoring (QPM), providing proactive identification and resolution of performance bottlenecks.

Integrated Snowflake seamlessly with Power BI and Azure Analysis Services to deliver interactive dashboards and reports, empowering business users with self-service analytics capabilities.

Orchestrated robust data transformation workflows with PySpark, leveraging its powerful libraries and functions for advanced analytics, data cleansing, and feature engineering tasks.

Leveraged PySpark to efficiently handle large-scale data processing tasks, benefiting from Apache Spark's distributed computing capabilities for parallel processing across clusters.

Optimized data pipelines and Spark jobs in Azure Databricks through advanced techniques like configuration tuning, data caching, and data partitioning, using PySpark resulting in superior performance and efficiency.

Utilized Python for scripting and automation purposes, writing scripts to automate repetitive tasks, streamline workflows, and improve productivity.

Worked with data serialization formats like JSON, XML, or Protocol Buffers in Python, facilitating data interchange and compatibility between different systems and applications.

Architected and optimized high-performing Snowflake schemas, tables, and views to accommodate complex analytical queries and reporting requirements, ensuring exceptional scalability and query performance.

Collaborated closely with cross-functional teams including data scientists, data analysts, and business stakeholders, ensuring alignment with data requirements and delivering scalable and reliable data solutions.

Implemented robust CI/CD pipelines using Jenkins, streamlining the deployment process, and ensuring consistent and reliable delivery of data solutions.

Integrated Jenkins plugins to automate build, test, and deployment tasks, enhancing the efficiency of the development lifecycle and reducing manual intervention.

Environment: Azure Data Factory, Azure Databricks, Snowflake data warehouse, Azure Event Hubs, Azure Functions, Azure Data Lake Storage, Azure Blob Storage, Azure Logic Apps, Azure Machine Learning, Jenkins, Azure Monitor, Power BI, PySpark, Apache Purview, Apache Atlas.

Comerica Bank, Dallas, TX Mar 2020 – Dec 2021

Data Engineer

Designed and implemented lifecycle policies for Amazon S3 buckets, automating data archival and deletion processes, reducing storage costs by 30%.

Utilized data versioning in S3 to manage changes and ensure data integrity, improving data accuracy across workflows by 20%.

Configured cross-region replication in S3 to enhance data availability and disaster recovery capabilities.

Developed advanced Python scripts for data manipulation and automation, integrating with AWS SDKs (Boto3), which improved data processing efficiency by 35%.

Managed AWS IAM roles, policies and permissions to ensure secure access control and data protection.

Administered Amazon RDS and MySQL databases, including setting up automated backups, read replicas, and performance optimization strategies, reducing downtime by 40%.

Implemented CI/CD pipelines using Jenkins, automating deployment processes and reducing manual errors in data engineering solutions by 50%.

Leveraged AWS Glue for data cataloging, ETL, and integration, deploying Glue jobs that streamlined data workflows and cut processing times by 25%.

Designed and implemented complex data models including star schemas for effective data warehousing and analytics.

Utilized Amazon Redshift for data warehousing, managing clusters, tuning performance, and optimizing data loading/unloading operations, resulting in a 30% improvement in data retrieval times.

Employed Terraform for infrastructure as code (IaC) to efficiently provision and manage AWS resources, reducing provisioning times by 50%.

Utilized Amazon Athena for ad-hoc querying and data analysis over large datasets stored in S3, providing insights that improved decision-making by 20%.

Worked in Agile environments, utilizing tools like JIRA and Confluence and participating in Scrum ceremonies to ensure project alignment and timely delivery.

Collaborated with data scientists, software engineers and business stakeholders to drive data-driven decision-making and solution development.

Designed, developed, and maintained robust ELT pipelines, following best practices for data extraction, transformation, and loading, improving data processing efficiency by 40%.

Utilized AWS Step Functions to orchestrate complex workflows, coordinating data processing activities across AWS services, which improved process reliability by 35%.

Managed big data processing using Amazon EMR, including configuring Hadoop, Spark, and Hive clusters, leading to a 45% improvement in data analysis speed for large-scale datasets.

Implemented real-time data streaming solutions with Apache Kafka, setting up Kafka producers and consumers and managing Kafka clusters.

Developed scalable, event-driven data processing applications using AWS Lambda, orchestrating functions within complex data pipelines.

Used Scala to develop data processing applications within Spark, optimizing performance for large-scale operations, reducing execution time by 40%.

Applied data warehousing concepts such as partitioning, indexing, and data lifecycle management, ensuring efficient data storage and retrieval with a 30% reduction in query latency.

Exhibited advanced SQL skills for complex data manipulation, query optimization, and performance tuning, achieving a 35% increase in query execution speed across databases.

Set up and managed data lakes on AWS using Lake Formation ensuring secure, scalable and organized data storage.

Utilized Power BI to create interactive dashboards and reports, providing actionable insights and visualizations for stakeholders.

Applied data modeling techniques to design scalable and maintainable data architectures, supporting business intelligence and analytics needs with a 30% increase in data accessibility.

Integrated Informatica for ELT processes, optimizing data workflows and enhancing data integration capabilities across systems.

Environment: Amazon S3, Python, AWS SDK (Boto3), AWS IAM, Amazon RDS, MySQL, Jenkins, AWS Glue, star schemas, Amazon Redshift, Terraform, Amazon Athena, JIRA, Confluence, AWS Step Functions, Amazon EMR, Apache Kafka, Scala, data warehousing, Power BI, Informatica.

Cigna, Bloomfield, CT Feb 2019 – Mar 2020

Data Engineer

Led the design and implementation of data management solutions for clinical trials data on Azure, enabling efficient data collection, aggregation, and analysis to support research studies and accelerate medical breakthroughs.

Spearheaded the integration of diverse healthcare data sources such as Electronic Health Records (EHR), claims data, medical imaging, and IoT-generated data into Azure data platform, ensuring seamless interoperability and data consistency.

Implemented robust security measures and data governance practices aligned with HIPAA regulations to safeguard patient privacy and ensure compliance with healthcare industry standards, including encryption, access controls, and audit logging.

Conducted assessments on Hadoop and its ecosystem, validating through various proof of concept applications to enhance the project's Big Data Hadoop Evaluated initiative.

Designed and implemented a custom data model in Snowflake to bolster a novel customer reporting system, facilitating efficient data analysis and insights.

Assessed and estimated the software and hardware requirements for the Name node and Data nodes in the cluster, optimizing resource allocation for enhanced performance.

Played a pivotal role in productionizing the application post-testing by BI analysts, ensuring smooth transition to operational status.

Designed and developed data pipelines to transform and analyze clinical data for insights such as patient outcomes, disease trends, and treatment efficacy, leveraging Azure Machine Learning and advanced analytics capabilities.

Implemented real-time monitoring solutions using Azure Stream Analytics and Azure Monitor to detect anomalies, monitor patient vitals, and trigger alerts for timely intervention, enhancing patient care and safety.

Utilized Azure IoT Hub and Azure Machine Learning to build predictive maintenance models for medical equipment, enabling proactive maintenance scheduling and minimizing downtime, thereby optimizing healthcare service delivery.

Designed and deployed interactive dashboards using Power BI to visualize key healthcare metrics and KPIs for stakeholders, providing actionable insights into operational efficiency, patient satisfaction, and clinical outcomes.

Established data governance frameworks and data quality standards to ensure the reliability, accuracy, and completeness of healthcare data assets, enabling informed decision-making and regulatory compliance.

Environment : MapReduce, Hive, Sqoop 1.4.4, Oozie 4.2, Python, Scala, Azure, Azure Data Factory, Databricks, PySpark 2.3, Kafka, Ambari, Cassandra, Linux, Java, Data Migration, Cloud Formation.

Contact this candidate