Data Engineer Azure

Location:

New York City, NY

Posted:

June 02, 2025

Contact this candidate

Resume:

Sai Prathyusha Kudaka

Senior Azure Data Engineer

Phone: 614-***-****

Email: *******************@*****.***

LinkedIn: www.linkedin.com/in/spkudaka

PROFESSIONAL SUMMARY:

11+ years of IT experience in Azure Cloud Services, Big Data technologies, Data Modeling, Analysis, ETL Development, Validation, Deployment, Monitoring, Visualization reports and Requirement gathering.

Accomplished Data Engineer with over 10 years of experience in designing and implementing scalable data ingestion pipelines using Big Data, Microsoft Azure Cloud, Python, PySpark, On-Prem to Azure Cloud Solutions.

Deployed and developed various data pipelines extensively and intricately harnessing the power of Azure Data Factory.

Strong understanding of CI/CD practices in DBT and Airflow, leveraging Git, Azure DevOps, and dbt Cloud to ensure automated testing, deployment, and version control in collaborative environments.

Over 2 years of experience implementing MLOps best practices to automate the deployment, monitoring, and version control of machine learning models using Azure ML, MLFlow, and CI/CD pipelines within Databricks and Azure DevOps environments.

Collaborated with cross-functional teams using Microsoft SharePoint for maintaining project documentation, version control, and centralized access to compliance materials and process workflows.

Utilized SharePoint as a collaborative workspace to streamline communication between stakeholders and ensure up-to-date documentation for data governance and fraud analytics projects.

Designed and implemented enterprise-wide data governance policies and frameworks using Azure Purview and Microsoft Fabric, ensuring consistent data lineage, cataloguing, and access control across the organization.

Developed Power BI dashboards to monitor and visualize data quality metrics, compliance KPIs, and lineage tracking, aiding stakeholders in making informed data-driven decisions.

Collaborated with cross-functional teams to establish data stewardship, assign data ownership roles, and enforce GDPR and HIPAA-aligned data privacy and compliance standards.

Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure Synapse Analytics, Azure SQL Database, Data Bricks and Azure SQL DataWarehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.

Leveraged Microsoft Fabric’s unified analytics platform to streamline end-to-end data workflows, integrating Data Factory, Synapse, and Power BI into a single solution for efficient business intelligence delivery.

Hands-on experience with OneLake and Fabric Notebooks, building scalable pipelines and real-time dashboards using Microsoft Fabric's lakehouse architecture.

Designed, developed, and deployed data pipelines on Azure platform, demonstrating proficiency in Azure services like Azure Data Factory, Azure Databricks, Unity Catalog, Azure Stream Analytics, and Azure Event Hubs.

Strong understanding of data modeling, datawarehousing, and ETL processes, with expertise in SQL and NoSQL databases, such as Azure SQL Database, Azure Cosmos DB, and MongoDB.

Proficient in designing and developing cloud-based data solutions, ensuring that they are scalable, reliable, and efficient.

Experienced in working with various Azure services, including Logic Apps, Storage, IoT Hub, Purview, Data Catalog, Databricks, Event Hubs, HDInsight, Virtual Machines, Notification Hubs, Service Bus, Active Directory, Key Vault, and Monitor, with strong knowledge in their implementation, integration, and management.

Demonstrated expertise in utilizing Azure DevOps, Jenkins for streamlined software development and deployment processes, enabling efficient collaboration, version control, and automated CI/CD pipelines for accelerated software delivery.

Experience in optimizing query performance and scalability in Azure SQL Database and Cosmos DB through partitioning strategies and indexing techniques.

Skilled in creating and managing ETL/ELT workflows with PySpark, Apache Spark, Apache Beam, or Apache Airflow to optimize data extraction, transformation, and loading operations.

Expertise with big data technologies like Hadoop, Spark, and Kafka, with knowledge of programming languages like Python, PySpark, Java, and Scala.

Ability to work in a team environment and collaborate with cross-functional teams, possessing excellent problem-solving and analytical skills.

Strong communication and interpersonal skills, enabling effective communication with clients, stakeholders, and team members.

Skilled at creating end-to-end data structures that are suited to a variety of business requirement using a variety of Azure services, such as Azure Cosmos DB, Azure Synapse Analytics, Azure Data Factory, Azure Databricks, and Azure SQL Database.

Thrived converting business needs into solid technical solutions using agile methodologies that produce actionable insights and promote well-informed decision-making and used JIRA for Project Management.

Ensured data quality, integrity, and compliance in Polybase with legal and business standards owing to my practical experience in data modeling, datawarehousing, and data governance.

Specializing in cost-effectiveness, scalability, and performance optimization, approached every project with a customer-centric attitude, a collaborative mind-set, and exceptional problem-solving abilities.

TECHNICAL SKILLS:

Azure Services:

Azure Data Factory, Azure Databricks, Azure Databricks Unity Catalog, Azure Synapse Analytics, Logic Apps, Azure Functions, Snowflake, Airflow, Azure DevOps, Azure Storage, Azure Event Hubs, Azure Synapse, Azure Notification Hubs, Azure Service Bus, Azure Data Lake Storage (ADLS) Gen2, Azure Key Vault, Azure IoT Hub, Azure SQL Database, Azure AI, Azure DevOps Repos, Azure HDInsight, Azure Data Catalog, Azure Blob Storage, Monitoring & Logging (ELK Stack, Splunk, Datadog), Azure Virtual Machines, Azure Active Directory, Azure Monitor, Azure Purview, Azure Cosmos DB, Azure Kubernetes Service, Azure Stream Analytics, AWS (S3, Redshift, Lambda), Google Cloud (BigQuery, Cloud Functions), Copy Data Activity, For Each Activity, Azure Entra ID.

Big Data Technologies:

HDFS, MapReduce, Hive, PySpark, Scala, YARN, Kafka, Spark, Oozie, Airflow, Spark Performance Tuning, Talend, Apache NiFi, Snowflake, Polybase, HQL, Pig, Sqoop, Apache Spark, Zookeeper, Flink, Spark Streaming DBT, Cloudera, AWS Redshift, DBT, Informatica PowerCenter, HBase, Clusters & Nodes, Data Wrangling, Data Cleaning.

Hadoop Distribution:

Cloudera, Horton Works.

Languages:

Java, Python, PySpark, Shell Script, R, SQL, Python, Pig, HiveQL, Scala, PL/SQL, T-SQL.

Web Technologies:

HTML, CSS, JavaScript, XML, JSP, Restful, SOAP.

Operating Systems:

Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.

Build Automation tools:

Azure DevOps, Jenkins.

Version Control & CI/CD Tools:

GIT, GitHub, Jenkins, Bitbucket, GitLab, Azure DevOps, Jenkins, Terraform, Kubernetes, Docker, GitHub Actions, Ant, Maven, Terraform, YAML Pipelines, CloudFormation, Bicep, RBAC, IAM Policies, Security & Compliance, Informatica-IICS, Glue, Athena, EC2, EMR, DBT.

IDE &Build Tools, Design:

Eclipse, Visual Studio, PyCharm, SSIS, SSAS, SSRS, SSMS.

File Formats:

JSON, XML, Avro, Parquet, ORC, CSV, Delta.

Databases/Data Modeling:

Visualization/Reporting

Tools:

Data Warehousing Tools:

MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse analytics. MS Excel, MS Access, Cosmos DB, Cassandra, MongoDB, PostgreSQL, Star Schema, Snowflake Schema, Dimensional Modelling, Snowflake, Partitioning, Clustering, Materialized Views, Dimensional Modeling, Slowly Changing Dimensions, OLAP, OLTP

Tableau, Power BI, Microsoft Fabric, Looker

Snowflake, Azure Synapse Analytics, BigQuery, Staging, Partition keys, Change Data Capture, PolyBase

EDUCATION:

Bachelor’s in Electrical and Electronics Engineering, Jawaharlal Nehru Technological University, Hyderabad, India, Jun 2009 – May 2013

CERTIFICATIONS:

Microsoft Certified: Azure Fundamentals

Microsoft Certified: Azure Data Engineer Associate

Databricks Certified: Data Engineer Associate

WORK EXPERIENCE:

Role: Senior Azure Data Engineer Oct 2022– Till Now

Client: Bank of America, New York

Responsibilities:

Implemented end-to-end data pipelines with Azure Data Factory, Azure Databricks, Unity Catalog, and Fabric, optimizing fraud detection workflows and real-time transaction monitoring, ensuring accurate anomaly detection.

Developed robust ETL workflows using Azure Data Factory, PySpark, and Apache Spark, automating data movement and transformation from diverse sources into Azure Data Lake, Azure Blob, Snowflake, and Azure Synapse.

Maintained technical and operational documentation on Microsoft SharePoint, enabling efficient knowledge sharing and audit preparedness for fraud detection systems.

Leveraged SharePoint to track data pipeline updates, change logs, and stakeholder reviews during the migration of ETL workflows to Azure Databricks.

Designed and implemented ETL workflows using IBM DataStage to extract, transform, and load large volumes of structured and semi-structured data from diverse sources into enterprise data warehouses.

Developed reusable DataStage jobs with parallel processing, optimizing performance and ensuring data quality through validation, error handling, and logging mechanisms.

Defined and enforced data quality rules using Azure Data Factory and Data Flow activities, integrating automated validations and auditing into daily fraud analytics pipelines.

Integrated Azure Purview and Key Vault for centralized metadata management and secure governance, supporting regulatory compliance and internal audit requirements.

Supported the data stewardship process by collaborating with compliance officers and analysts to improve data trust, usability, and accountability in real-time fraud detection systems.

Collaborated with business analysts and data architects to convert business rules into transformation logic, ensuring accurate data integration for reporting and analytics.

Built and managed Apache Airflow DAGs to schedule and monitor ETL workflows, integrating with Azure Data Factory, Databricks, and Snowflake for seamless orchestration.

Developed and deployed Microsoft Fabric Notebooks for Spark-based data engineering tasks, improving batch data transformation and visualization within a unified workspace.

Collaborated with application and data engineering teams to build and manage microservices for fraud detection, enabling seamless integration with APIs and secure data flow across platforms.

Automated document processing and data extraction using Azure AI Document Intelligence, streamlining data workflows. Leveraged AI-driven insights to improve data accuracy and operational efficiency.

Led the transition from Azure SQL DataWarehouse to Azure Fabric, optimizing datawarehousing solutions for enhanced scalability, performance, and cost efficiency.

Implemented Azure Fabric Data Engineering & Data Science workloads, ensuring seamless integration with Azure Synapse, Data Factory, and Power BI for real-time analytics and reporting.

Designed and deployed real-time ingestion pipelines leveraging Azure Event Hubs, Kafka, and Flume, ensuring low-latency fraud detection alerts and anomaly tracking for banking transactions.

Enhanced data pipeline efficiency by implementing Azure Data Factory dependencies and fault-tolerant mechanisms, reducing ETL job failures by 25% and improving overall workflow stability.

Developed scalable fraud risk models integrating Azure Machine Learning, Python, and Spark Streaming, enabling transaction velocity analysis, geolocation tracking, and behavioural anomaly detection.

Optimized fraud detection analytics using Azure Synapse, Snowflake Schema, and Star Schema models, improving query execution times and data retrieval speeds for compliance teams.

Implemented secure data governance and quality controls using Azure Purview, Azure Key Vault, and Azure Active Directory, ensuring GDPR, PCI DSS compliance, and secured PII encryption.

Developed Power BI and Tableau dashboards for real-time fraud alerts, transaction risk visualization, and anomaly trend tracking, enabling improved fraud investigation turnaround.

Integrated Azure Databricks Auto Loader to process streaming and batch data, enabling real-time detection of fraudulent activities and continuous risk scoring for banking transactions.

Engineered real-time fraud analytics models using PySpark and Scala, ensuring proactive identification of high-risk transactions before execution.

Developed and optimized CI/CD pipelines in Azure DevOps and Jenkins, automating fraud detection rule deployments, ETL updates, and ML model retraining workflows.

Implemented Spark Streaming and Yarn for high-speed fraud detection model execution, ensuring real-time risk scoring of incoming transactions.

Collaborated with fraud analysts and compliance officers, improving fraud detection rule accuracy by 40% while enhancing real-time reporting and risk assessment.

Leveraged Azure Logic Apps and Azure Kubernetes Service for automated fraud case investigations, integrating event-driven fraud detection workflows into regulatory audit trails.

Designed Azure Data Lake storage (ADLS) solutions, implementing efficient partitioning, encryption, and security enhancements to support banking fraud detection analytics.

Developed and optimized ETL pipelines using Informatica Intelligent Cloud Services (IICS) to extract, transform, and load structured and semi-structured data from on-premises and cloud sources into Azure SQL Database and Snowflake.

Ensured seamless fraud detection by optimizing Azure Synapse queries, Snowflake Time Travel features, and Star Schema-based data models, improving query execution speeds by 30%.

Developed advanced anomaly detection models using NLP-based sentiment analysis in Python, enhancing fraud profiling accuracy by 20%.

Optimized Databricks Spark jobs through query tuning, caching, and partitioning strategies, achieving a 30% improvement in fraud detection model efficiency.

Developed scalable ETL workflows using Talend for real-time and batch data processing, integrating data from diverse financial sources into Azure Data Lake and Snowflake. Optimized data quality, deduplication, and transformation processes to enhance fraud detection efficiency and regulatory compliance.

Implemented fraud risk scoring models using Hive and Pig, optimizing structured and unstructured data processing within fraud analytics pipelines.

Designed real-time fraud case investigation pipelines leveraging Azure Fabric and Azure Active Directory, ensuring role-based access to fraud-related transaction logs.

Migrated high-risk transactional datasets using SQOOP, ensuring efficient structured data ingestion from Oracle and SQL Server into Azure Data Lake for fraud analytics.

Automated infrastructure and fraud detection rule deployments using Terraform and Azure DevOps, ensuring seamless model retraining and fraud scoring optimizations.

Built real-time fraud case investigation dashboards in Power BI and Azure Analysis Services, enhancing regulatory compliance reporting.

Reduced fraudulent transaction detection time from minutes to real-time, decreasing fraud losses by 40% and increasing model precision to 92%.

Conducted in-depth fraud analysis by integrating machine learning algorithms in Azure Databricks, improving fraud detection accuracy and reducing false positives.

Implemented automated anomaly detection workflows in Azure Data Factory, reducing fraud investigation time by 35% and increasing transaction monitoring efficiency.

Developed real-time AI-driven predictive analytics for fraud risk scoring using Python, Spark MLlib, and Azure Machine Learning, enhancing fraud detection algorithms.

Leveraged Azure Synapse for historical data trend analysis, identifying emerging fraud patterns through batch and real-time data pipelines.

Optimized data ingestion processes with Azure Event Hubs, Apache Kafka, and Azure Stream Analytics, ensuring real-time fraud monitoring and alerting.

Created automated Power BI reports with role-based access control using Azure Active Directory, allowing fraud analysts to track transaction behaviours dynamically.

Improved fraud response times by 25% through automated alerting and fraud case categorization using Azure AI-driven text analytics.

Built large-scale distributed fraud detection applications using Azure Databricks, Azure Synapse, and Snowflake, reducing processing time for fraud risk scoring.

Environment: Azure Data Factory, Azure Databricks, Azure Synapse, Snowflake, Azure SQL Database, Azure Data Lake Storage (ADLS), Azure Blob Storage, Azure Event Hubs, Azure Stream Analytics, Apache Spark, PySpark, Apache Kafka, Power BI, Azure Key Vault, Azure Active Directory (RBAC), Azure DevOps, Jenkins, Terraform, GitHub Actions, Delta Lake, Microsoft Sentinel (Azure SIEM), Azure Logic Apps, Azure Kubernetes Service (AKS), T-SQL, Python, Scala, JSON, Parquet, Data Governance, SQL Server Management Studio (SSMS), Big Data Integration, Machine Learning, Data Security & Compliance (GDPR, PCI DSS).

Role: Azure Data Engineer June 2018 – Sep 2022

Client: PQE Group, Chicago

Responsibilities:

Implemented end-to-end data pipelines with Azure Data Factory and Azure Databricks for efficient ETL from disparate sources such as XML, Avro, Parquet, JSON, and on-prem Azure SQL Server, enabling high-performance data integration.

Extensively worked on Azure Data Lake Analytics and Azure Databricks to implement SCD-1, SCD2 approaches, ensuring robust ETL processing for large-scale analytics.

Developed Spark and Spark SQL transformations in Azure Databricks and PySpark for data cleansing, aggregation, and enforcing business rules.

Led data classification and privacy impact assessments for pharmaceutical and clinical trial datasets using Azure Purview, aligning data practices with FDA and GxP regulations.

Enabled end-to-end visibility into critical data pipelines with metadata-driven governance models, enhancing master data management and regulatory compliance in pharma operations.

Ingested Azure data into Azure cloud services like Azure Data Lake, Azure Blob Storage, Azure SQL, and Azure SQL DataWarehouse, ensuring secure and scalable cloud-based data processing.

Ensured HIPAA compliance by implementing data masking, role-based access control (RBAC), and audit logging for patient-sensitive data in Azure SQL and Data Lake environments supporting pharmaceutical analytics.

Integrated FHIR-compliant APIs for clinical data ingestion, enabling secure and standardized data exchange between Electronic Health Records (EHRs) and pharma trial systems, while maintaining interoperability and regulatory compliance.

Designed and managed complex Azure pipelines and Azure data flows using Azure Data Factory (ADF) and PySpark, orchestrating workflows via Apache Airflow for automated ETL execution.

Utilized Azure Key Vault for secure storage of credentials, access tokens, and encryption keys, ensuring compliance with enterprise security standards.

Developed scalable data pipelines using PySpark and Azure Data Factory, automating ETL workloads and integrating seamlessly with Azure Synapse Analytics.

Designed optimized data models and schemas for Azure SQL Server and Snowflake, implementing partitioning, clustering, and indexing for efficient query execution.

Implemented ETL pipelines in Talend to automate data ingestion from clinical trial systems and compliance databases into Azure SQL and Snowflake, ensuring accurate reporting for regulatory audits.

Developed advanced T-SQL queries for optimized data retrieval and transformation in Azure SQL Database.

Implemented Delta Lake architecture in Azure Databricks, ensuring ACID transactions, schema evolution, and time-travel capabilities.

Designed and implemented scalable ETL workflows using Scala and Spark, processing pharmaceutical compliance and clinical trial data in Azure. Optimized data modeling and aggregation to improve reporting accuracy and regulatory audits.

Integrated Snowflake’s Time Travel and Snowpipe for real-time ingestion, automated recovery, and rollback functionality.

Proficient in migrating on-prem databases to Azure SQL Database, ensuring seamless transition and high availability.

Processed 50TB+ of data daily into Azure Synapse Analytics and Snowflake, supporting machine learning-driven analytics and downstream reporting.

Developed real-time data streaming pipelines using Azure Event Hubs and Apache Kafka, enabling low-latency data availability for analytics use cases.

Built interactive Power BI and Tableau dashboards integrated with Azure Analysis Services, providing real-time monitoring of business KPIs and operational compliance.

Implemented Microsoft Sentinel (Azure SIEM) for proactive security monitoring, ensuring anomaly detection and enterprise-wide data governance.

Created and deployed Snowflake stages for structured and semi-structured data ingestion, ensuring schema enforcement and compliance.

Implemented Azure Role-Based Access Control (RBAC) using Azure Active Directory, ensuring secured access to sensitive business data.

Optimized Snowflake warehouse configurations through workload management, auto-clustering, and caching, improving query performance while reducing costs.

Developed Azure Synapse Analytics Pipelines for seamless integration of Azure Blob Storage and Azure SQL Server, supporting cloud-scale data lake processing.

Designed and implemented Azure Monitor and Azure Log Analytics dashboards, providing real-time tracking of ETL pipeline failures, query latency, and resource usage.

Automated CI/CD pipelines using Azure DevOps, Terraform, and GitHub Actions, enabling seamless deployment and version control for ETL workflows.

Applied advanced T-SQL techniques for performance tuning, materialized views, and stored procedures in Azure Synapse Analytics.

Mentored junior engineers in Azure Databricks, Snowflake, and Azure ADF best practices, improving team adoption of cloud-based ETL solutions.

Resolved performance bottlenecks in Azure Synapse Analytics, optimizing query execution times and reducing computational costs.

Worked in an agile environment, collaborating with business users and data scientists to translate business requirements into scalable Azure-based solutions.

Developed machine learning-driven fraud detection models using Azure Machine Learning and Databricks MLFlow, integrating real-time risk scoring via Azure Event Hubs.

Implemented advanced data governance frameworks using Azure Purview, ensuring compliance with industry regulations.

Optimized Azure SQL DataWarehouse structures using Star Schema and Snowflake Schema, enhancing data modeling efficiency for analytics use cases.

Utilized Apache Spark, Spark Streaming, and Yarn to enable high-speed processing for large-scale data transformations.

Leveraged Scala for developing scalable big data processing solutions within Azure Databricks.

Environment: Azure SQL Server, Azure Databricks, Azure Synapse Analytics, Azure Data Factory (ADF), Snowflake, Azure Data Lake Storage (ADLS) Gen2, Azure Blob Storage, Azure Key Vault, Azure Active Directory, Azure Monitor, Apache Spark, PySpark, Apache Kafka, Spark Streaming, Power BI, Tableau, Azure DevOps, Terraform, GitHub Actions, T-SQL, Python, Scala, JSON, Parquet, Delta Lake, Star Schema, Snowflake Schema, Azure Analysis Services, Azure Event Hubs, Azure HDInsight, SQL Server Management Studio (SSMS), Data Governance, CI/CD Pipelines, Machine Learning, Cloud Data Integration.

Role: Big Data Engineer Nov 2015 – April 2017

Client: HDFC Bank, Bengaluru, Karnataka, India

Responsibilities:

Engineered scalable data pipelines using Spark and Python, processing terabytes of data daily for real-time analytics. Implemented efficient ETL processes that reduced data processing time by 30%, improving overall system performance.

Developed and maintained Hadoop clusters for data warehousing, ensuring high availability and optimal performance.

Utilized Hive and Pig for data transformation and analysis, supporting critical business intelligence needs.

Designed and implemented NoSQL databases like Cassandra for handling high-volume, low-latency data inside the systems.

Optimized database performance through efficient data modeling and query optimization, enhancing application responsiveness.

Built and deployed data solutions on AWS, leveraging services like EMR, S3, and Redshift for cloud-based data processing. Automated infrastructure provisioning and deployment using Infrastructure as Code (IaC) principles.

Implemented data integration strategies to consolidate data from diverse sources into a unified data lake. Ensured data quality and consistency by implementing data validation and cleansing processes.

Applied machine learning algorithms using PySpark and Scikit-learn to extract insights from large datasets. Deployed machine learning models using MLOps practices, ensuring scalability and reliability.

Designed and implemented data models using dimensional modeling techniques, including star schema, for efficient data warehousing. Developed complex SQL queries to support data analysis and reporting requirements.

Implemented data governance policies and procedures to ensure data security and compliance. Conducted data quality audits and implemented data validation rules to maintain data integrity.

Developed and maintained ETL/ELT processes to load and transform data into Snowflake and Redshift data warehouses. Optimized query performance and data storage to improve data retrieval and analytical capabilities.

Utilized data visualization tools like Tableau and Power BI to create interactive dashboards and reports. Communicated complex data insights to stakeholders through clear and concise visualizations.

Participated in Agile/Scrum development methodologies, contributing to sprint planning, daily stand-ups, and retrospectives.

Utilized GCP services such as BigQuery and Dataflow for scalable data analytics and processing. Deployed and managed cloud-based data infrastructure, optimizing cost and performance.

Ensured data quality and integrity by implementing data validation and monitoring processes within data pipelines. Developed automated testing frameworks to validate data transformation logic and ensure data accuracy.

Environment: Apache Spark, PySpark, Apache Kafka, Spark Streaming, Snowflake, Power BI, Tableau, SQL Server, MapReduce, Yarn, Scala, Star Schema, Snowflake Schema, Terraform, DevOps, Ansible, Oozie, Hadoop (HDFS), Apache Hive, AWS Redshift, Google BigQuery, Airflow, DBT, PostgreSQL, MySQL, Cassandra, MongoDB, Kubernetes, Docker, Jenkins, T-SQL, PL/SQL, Spark SQL, IAM Policies, Data Encryption.

Role: Data Warehouse Developer July 2013 – Oct 2015

Client: Axis Max Life Insurance, Gurugram, Haryana, India

Responsibilities:

Implemented Power BI reports using DAX, Drill-through, and Drill-down, enabling interactive data exploration. Designed dashboards with filtering, sorting, and aggregation to enhance business intelligence.

Developed Data Marts using star schema to improve query efficiency and reduce redundancy. Optimized data storage for faster reporting and analytical processing.

Designed and deployed SSIS Packages, automating ETL workflows for seamless data integration. Configured error handling and logging to improve data pipeline reliability.

Utilized Azure SQL Database for transactional and analytical data processing in cloud environments. Integrated Azure Blob Storage for storing large structured and unstructured datasets.

Developed ETL pipelines using Informatica PowerCenter to extract, transform, and load data efficiently. Applied performance tuning techniques to optimize processing time.

Built Hive tables and optimized HQL queries for efficient big data processing. Used partitioning and bucketing to improve query performance in Hadoop environments.

Integrated real-time event streaming using Kafka and Flume to capture and analyse transactional data. Designed message queuing systems for better event-driven architectures.

Designed OLAP and OLTP models to enhance reporting and transactional processing. Ensured proper indexing and normalization to maintain database performance.

Developed T-SQL and PL/SQL scripts for data transformation, reporting, and business logic execution. Created stored procedures and triggers to maintain data integrity.

Utilized HDFS and MapReduce to process and store large-scale data in a distributed environment. Developed scalable batch processing solutions to handle growing data volumes.

Developed high-performance data transformations using Pig and Hive for structured and semi-structured data. Implemented HQL queries to retrieve insights from large datasets.

Ingested structured data with Sqoop for seamless data migration between relational databases and Hadoop. Ensured optimized data transfer for large-scale batch processing.

Implemented HBase for NoSQL data storage, ensuring high availability and fault tolerance. Designed data models for faster lookups and distributed query execution.

Deployed SSRS reports for generating automated and ad-hoc business intelligence reports. Designed dynamic reports with drill-down functionality for deeper insights.

Utilized SSMS for managing SQL databases, optimizing query performance, and debugging stored procedures. Implemented database maintenance strategies to enhance system reliability.

Implemented IBM DataStage ETL workflows for data extraction, cleansing, and loading into enterprise warehouses. Applied data quality rules to maintain accuracy and consistency.

Worked on Teradata for large-scale data

Contact this candidate