Data Engineer Power Bi

Location:

Irving, TX

Salary:

65/hr

Posted:

June 25, 2025

Contact this candidate

Resume:

Saisree Reddy

+1-774-***-****

**.*********@*****.***

Senior Data Engineer

PROFESSIONAL SUMMARY:

Over 10+ years of professional IT experience in Application Development and Data Analytics using various languages and tools like SQL, Spark, SSIS, SSMS, SSRS, Power BI and Azure.

Experience in designing Data Marts by following Star Schema and Snowflake Schema Methodology.

Highly skilled in Business Intelligence tools like Tableau and Power BI

Expertise in collecting, processing, and aggregating large amount streaming data using Kafka, Spark Streaming.

Experience on ADLS, Data bricks Spark, Delta tables and Azure Data Factory for ingesting & transforming Batch & Streaming data by ADF pipelines.

Experience in Designing, Developing, Testing and Release of the enterprise-level ETL architecture solutions strategy to populate the data from various source systems utilizing Data Vault 2.0 concepts using SSRS, SSIS, Power BI, Tableau & Where Scape RED & 3D, Python, Oracle.

Experience in designing and developing applications in Spark using Python to compare the performance of Spark with Hive.

Solid experience in an agile environment (SCRUM) with the cross-function roles, work as Tableau Developer/Business Data Analyst/Data scientist/ Data engineer in Cloudera Hadoop Environment.

Experience in creating fact and dimensional model in MS SQL server and Snowflake Database utilizing Cloud base Matillion ETL tool.

Experience with Data Build Tool (Dbt)to perform Schema Tests, Referential Integrity Tests, and Custom Tests the data and ensured Data Quality.

Orchestrated data integration pipelines in ADF using various Activities like Get Metadata, Lookup, For Each, Wait, Execute Pipeline, Set Variable, Filter, until, etc.

Strong Experience in implementing Data warehouse solutions in Confidential Redshift; Worked on various projects to migrate data from on premise databases to Confidential Redshift, RDS and S3.

Experience in designing of on - line transactional processing (OLTP), operational data store (ODS) and decision support system (DSS) databases, utilizing Data vault (hub and spoke), dimensional and normalized data designs as appropriate for enterprise-wide solutions.

Experience on Cloud Databases and Data warehouses (SQL Azure and Confidential Redshift/RDS).

Expertise in design, development, and implementation of Enterprise Data Warehouse solutions.

Experienced in converting Hive/SQL queries into Spark transformations using Spark Data Frames and Python.

Extensive experience working with spark distributed Framework involving Resilient Distributed Datasets (RDD) and Data Frames using Python and Scala

Worked o Azure cloud resources like key vaults, Azure data factory, synapse, ADLS and Azure SQL Server.

Experience Data Warehousing concepts like SCD1 and SCD2 for loading data from source to target.

Implemented solutions in ETL for accessing data from different resources like oracle SQL and file host.

Optimized ETL jobs for high performance and scalability, reducing processing time by leveraging Ab Initio's parallel processing capabilities.

Created various reports and dashboards in Power BI with different types of sources like ADLS Gen2, SharePoint path, Postgres SQL, SQL server, salesforce objects, Power BI shared datasets and SSAS cubes.

Skilled in designing and implementing ETL Architecture for cost effective and efficient environment.

Experience in developing production-ready spark applications using Spark RDD API’s, Data frames, Spark-SQL, and Spark-Streaming API.

Worked on data processing and transformations and actions in spark by using Python (Pyspark) language.

Exposure to SQL implementations like complex queries, Views, CTE’s, Stored Procedures and window functions.

Experienced in Snowflake for accessing the files from storage account and created external tables on delta files and parquet files.

Technical Skills:

Programming Languages

Java, C, SQL, PL/SQL, T-SQL, R, Python, Hibernate, JDBC, JSON, HTML,CSS

Cloud Technologies

Azure, Aws OpenStack, Rack space GCP, Amazon, S3, EMR, Lambda, Athena Composer, Big Query.

Development & Orchestration Tools

Microsoft SQL Studio, Azure Databricks, Azure Data Factory,

Data Warehouses &BI

Star Schema, Snowflake Schema, Facts and Dimensions tables, SAS, SSIS and Splunk, ETL,Hadoop, AB Initio

Reporting Tools

SSR, SSAS, MDX, Tableau, MS Office, Power BI, AirFlow

Databases

Microsoft No Sql, SQL Server database 2008R2/2005,2010/2012/2014, MySQL 4.x/5.x, Oracle 11g, 12c, PostgreSql, Redshift,Anthena, Dynamo, Mongo DB, Oracle, SQL Server Azure, MS Access

Operating Systems

All versions of Windows, UNIX, Shell Scripting LINUX, Macintosh HD.

Big Data Ecosystem

HDFC, Nifi, MapReduce, Oozie, Hive/Impala, Pig, Sqoop, Zookeeper and Hbase, Spark, Scala Kafka, Apache, Flink AWS-EC2, S3, EMR.

Development Methodologies

Agile/Scrum, UML, Design Patterns, Waterfall

PROFESSIONAL EXPERIENCE:

United Healthcare, San Francisco, CA May 2023 - Present

Senior Data Engineer

Responsibilities:

Designed end to end scalable architecture to solve business problems using various Azure Components like HDInsight, Data Factory, Data Lake, Storage and Machine Learning Studio.

Worked on Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, and Azure DW) and processing the data in In Azure Databricks.

Understand and manage Hadoop Log Files. Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP. Private Cloud (VPC), Cloud Formation, Lambda, Cloud Front, Cloud Watch, IAM, EBS, Security Group, Auto Scaling, Dynamo DB, Route53, and Cloud Trail.

Wrote Spark SQL and spark scripts (PySpark) in Databricks environment to validate the monthly account level customer data and migrating the dashboards from other BI tools to Looker.

Designed and implemented data solutions on Oracle Exadata to optimize performance for large-scale data processing and analytics.

Designed intuitive dashboards and visualizations within OpenGate, facilitating user engagement and improving data literacy.

Developed automated backup and recovery strategies for MongoDB to ensure data durability and business continuity.

Integrated Jira Analytics with business intelligence tools (e.g., Tableau, Power BI) for advanced data visualization and deeper analysis.

Documented data migration processes and created user guides to facilitate knowledge transfer and future migrations.

Design, develop, deploy and maintain large scale Tableau dashboards for Product Insight, Devices and Networking, and Cox Premise Equipment.

Created Spark clusters and configuring high concurrency clusters using Azure Databricks (ADB) to speed up the preparation of high-quality data.

Created and maintained documentation for Snowpark implementations, including data models, transformations, and best practices.

Created comprehensive documentation for MongoDB data models, schema design, and operational procedures to facilitate team collaboration.

Supported the configuration and ongoing setup of data virtualization tools, optimizing data access and integration across platforms.

Developed and implemented data governance frameworks to ensure data quality, integrity, and compliance across the organization.

Collaborated with developers to integrate MySQL databases into applications, providing support for data access and manipulation.

Optimized database schemas and indexing strategies for OLTP applications to enhance transaction performance and minimize latency.

Conducted training sessions for staff on AML policies, procedures, and emerging trends to foster a culture of compliance.

Utilized tools like SQL Server Analysis Services (SSAS) or Apache Kylin for building and managing OLAP solutions.

Using Tableau Desktop to analyze and obtain insights into large data sets using groups, hierarchies, sorts, sets and filters.

Used Azure Synapse to bring these worlds together with a unified experience to ingest, explore, prepare, manage, and serve data for immediate BI and machine learning needs.

optimization and migration of AMEX data workloads to Google BigQuery, ensuring efficient data processing, seamless transitions, and minimal downtime across critical business operations.

Environment: Azure (Storage, DW, ADF, ADLS, Redshift, Data Vault, Databricks), BigQuery, Cloud, Scala, Data Lake, (PySpark, Spark streaming, SparkSQL), Hive, Snow Flake.

Walmart, Dallas, Tx Jan 2021 - Apr 2023

Sr. Data Engineer

Responsibilities:

Responsible for designing and developing enterprise-level ETL architecture solutions utilizing Data Vault 2.0 concepts following Agile and Scrum framework best practices

Design, create, revise and manage reports generated from operational and analytical systems using SSRS, Tableau, Power BI, and Crystal Reports.

Extracted, Transformed and Loaded data from source systems to Data Storage services using a combination of Data Factory, T-SQL, Spark SQL and Data Lake Analytics.

Assisted in migrating on-premise data infrastructure to Microsoft cloud platform.

Led the migration of on-premises data workloads to Google Big Query, ensuring a seamless transition and minimal downtime.

AWS Redshift performance tuning and optimization by using correct distkey (even, key or all) and sortkey (single sortkey, compound sortkey, interleaved sortkey).

Create and developed data load and scheduler process for ETL jobs using Matillion ETL package

To extract, process, and load data from many sources, Ab Initio was used to design and implement intricate ETL operations.

Implemented complex data transformation logic using Ab Initio's Transform components.

Developed and maintained PySpark-based data processing pipelines for ingesting, transforming, and loading large-scale datasets from various sources into the data lake. 40% shorter query execution times were achieved by designing and developing MapReduce processes for batch processing of customer transaction data.

Built and managed high-performance data pipelines using the Cloudera Hadoop distribution, increasing team access to data.

Created unique Oozie workflows to automate Cloudera-based ETL procedures, guaranteeing effective task completion.

Conducted benchmarking and performance testing of various Big Data tools to evaluate their efficiency and scalability and Got involved in migrating on prem Hadoop system to using GCP.

Integrated Java applications with Hadoop using Hadoop Java APIs and Hadoop Streaming, enabling seamless data exchange between systems.

Led the migration of on-premises Snowflake data warehousing solutions to Snowflake, ensuring minimal downtime and data integrity.

Designed and implemented scalable data models in MongoDB to support high-performance applications and flexible data requirements.

Enabled data-driven decision-making and enhanced operational efficiency by leveraging OpenGate's real-time analytics capabilities.

Develop Python and SQL used in the transformation process in Matillion.

Utilize Matillion and AWS Redshift for DW enactment of Epic Data Warehousing concept in Snowflake database.

Managed MongoDB clusters, including sharding and replication, to ensure high availability and fault tolerance.

Developed and maintained scalable data pipelines using Apache Spark and PySpark, ingesting and processing large-scale datasets into the data lake.

Assisted in the migration of on-premises data warehouse Storage, ensuring minimal downtime and smooth transition of data and applications and used HANA data as pivot tables in Microsoft Excel through via MDX provider. Migrated previously written cron jobs to airflow/ composer in GCP.

Leveraged Jira Analytics to generate actionable insights from project data, improving team performance and project outcomes.

Implemented and managed Cornerstone Learning Management System (LMS) integration with the existing enterprise data warehouse, enabling seamless data exchange and reporting across the organization.

Developed custom reports and dashboards within Cornerstone, utilizing its API to extract, transform, and load training data into enterprise analytics platforms like Power BI and Tableau.

Integration of Lumi’s Learning and Engagement platform with enterprise-level data systems, ensuring smooth data flow and consistency across platforms

Designed and deployed scalable GraphQL APIs using Hasura to provide real-time data access and manipulation across various data sources, including AWS Redshift, Google BigQuery, and Snowflake.

Environment: ETL operations, Data Warehousing, Redshift, Scala, DBT (data build tool), Data Vault, Data Modelling, Cassandra, BigQuery, GitHub, Databricks, Cloud, Data Lake, Java, Data Factory, ERwin, Advanced SQL methods, Python, Linux, Apache Spark, Scala, Spark-SQL, Oracle8.x PL/SQL, Snowflake.

Centene Corp Healthcare, San Francisco CA Oct 2018 – Dec 2020

Sr. Data Engineer

Responsibilities:

Designed and built Spark/PySpark based ETL pipelines for migration of credit card transactions, account, and customer data into enterprise Hadoop Data Lake. Developed strategies in handling large datasets using partitions, Spark SQL, broadcast joins and performance tuning.

Successfully designed and developed data ingestion pipelines in Azure Data flows to create extracts from Azure SQL server.

maintenance of data integration programs into Hadoop and RDBMS environments from both structured and semi- structured data source systems.

Worked with Google Cloud (GCP) Services like Compute Engine, Cloud Functions, Cloud DNS, Cloud Storage and Cloud Deployment Manager and SaaS, PaaS and IaaS concepts of Cloud Computing and Implementation using GCP.

Developed performance tuning on existing Hive queries and UDF’s to analyze the data. Used Pig to analyze datasets and perform transformation according to requirements.

Analyzed the data flow from different sources to target to provide the corresponding design Architecture in Azure environment.

Experience in setting up environment for Data Governance with Azure Purview. Created scans of different resources and other cloud providers.

Supervised data profiling and data validation to ensure the accuracy of the data between the source and the target systems. Performed job scheduling and monitoring using Auto sys and quality testing using ALM tool.

Worked on building of Tableau desktop reports and dashboards to report customer data.

Built and published customized interactive Tableau reports and dashboards along with data refresh scheduling using Tableau Desktop.

Snowflake - data warehouse to consume the data from C3 Platform.

Involved in S3 event notifications, an SNS topic, an SQS queue, and a Lambda function sending a message to the Slack channel.

Transformed Teradata scripts and stored procedures to SQL and Python running on Snowflake's cloud platform.

Automated tasks of extracting metadata and lineage from tools using Python scripts and saved 70+ hours’ manual efforts.

Analyzed the system requirement specifications and in client interaction during requirements specifications.

Providing daily reports to the Development Manager and participate in both the design phase and the development phase. Utilized Agile Methodology and SCRUM Process.

Environment: Data Factory (ADF v2), Azure SQL Database, Logic Apps, Data Lake, BLOB Storage, SQL server, Windows remote desktop, UNIX Shell Scripting, AZURE PowerShell, Data bricks, Python, ADLS Gen 2

Truist Financial, Charlotte, NC Oct 2016 - Sept 2018

Data Engineer

Responsibilities:

Collaborated with ETL developers to ensure that data is well cleaned, and the data warehouse is up to date for reporting purpose by Pig.

Selected and generated data into csv files and stored them into AWS S3 by using AWS EC2 and then structured and stored in AWS Redshift.

Created Entity Relationship Diagrams (ERD), Functional diagrams, and Data flow diagrams, enforced referential integrity constraints, and created logical and physical models using Erwin.

Deployed services on AWS and utilized step function to trigger the data pipelines.

Created plugins to extract data from multiple sources like Apache Kafka, Database and Messaging Queues.

Ran Log aggregations, website Activity tracking and commit log for distributed system using Apache Kafka.

Designed and implemented complex ETL processes using Informatica PowerCenter to extract, transform, and load data from heterogeneous sources.

Imported and exported data into HDFS and HIVE using Sqoop, worked on ingesting log data into Hadoop using Flume.

Designed and implemented OLAP cubes to facilitate multidimensional analysis and improve query performance for business intelligence applications.

Evaluated Fivetran and Matillion for streaming and batch data ingestion into Snowflake.

Strong experience in working wif python editors like PyCharm, Spyder and Jupyter notebook.

Designed ODS, and Data Vault with expertise in Loan and all types of Cards.

Developed ETL processes to transform and load data into OLAP systems, ensuring accurate and timely data availability for reporting.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, and Scala.

Configured, designed, implemented, and monitored Kafka cluster and connectors and payload in Flume.

Developed logical data model based on the requirements utilizing Erwin.

Wrote Kafka producers to stream the data from external rest APIs to Kafka topics

Performed Data Visualization and Designed Dashboards with Tableau and generated complex reports including chars, summaries, and graphs to interpret the findings to the team and stakeholders.

Environment: AWS (EMR, EC2, S3, Redshift, Glue), Spark (PySpark, SparkSQL, Spark Streaming, SparkMLIib), Kafka, Python 3.x (Scikit-learn, NumPy, Pandas), ERwin, Tableau 10.1, GitHub, Pig

Magnaquest Technologies Limited Hyderabad, India Jan 2014 - Aug 2016

Data Analyst

Responsibilities:

Designed and developed Logical Data Analyst & Physical Data Analyst using Erwin DM.

Worked with various process improvements, normalization, de-normalization, data extraction, data cleansing, and data manipulation.

Worked with requirements management, workflow analysis, source data analysis, data mapping, Metadata management, data quality, testing strategy and maintenance of the model.

Performed data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs and tools like TOAD, MS Access, Excel, XLS and SQL Server.

Assisted in production OLAP cubes, wrote queries to produce reports using SQL Server Analysis Services (SSAS) and Reporting service (SSRS) Editing, upgrading and maintaining ASP.NET website and IIS Server.

Designed the ER diagrams, logical model (relationship, cardinality, attributes, and candidate keys) and physical database (capacity planning, object creation and aggregation strategies) for Oracle and Teradata as per business requirements using Erwin.

Created SSIS Packages using SSIS Designer for exporting heterogeneous data from OLE DB Source, Excel Spreadsheets to SQL Server.

Extensively worked in SQL, PL/SQL, SQL Plus, and SQL Loader, Query performance tuning, DDL scripts, database objects like Tables, Views Indexes, Synonyms and Sequences.

Environment: ERWIN9.1, Netezza, Oracle8.x, SQL, PL/SQL, SQL Plus, SQL Loader, Informatica, CSV, Taradata13, T-SQL, SQL Server, SharePoint, Pivot tables, Power view, DB2, SSIS, DVO, LINUX, MDM, PL/SQL, ETL, Excel, Power BI, Tableau, Pivot Tables, SAS, SSAS, SPSS, SSRS

Contact this candidate