Post Job Free
Sign in

Data Engineer Azure

Location:
Texas City, TX
Posted:
October 15, 2025

Contact this candidate

Resume:

Vinith

United States 707-***-**** *******@*****.*** https://www.linkedin.com/in/roy-v-7b7098331/

Professional Summary:

•Results-oriented Data Engineer with 8 years of experience in designing, developing, and optimizing large-scale data solutions using Big Data and cloud technologies. Skilled in building robust ETL/ELT pipelines, real-time streaming architectures, and high-performance data processing systems leveraging Apache Spark, PySpark, Spark SQL, and Kafka. Proficient in Azure Data Services (Data Factory, Data Lake, Synapse, Databricks), SQL Server, Hive, and Snowflake, with expertise in data modeling, data warehousing, and performance tuning for OLTP and OLAP systems. Adept in integrating APIs, handling diverse file formats (Parquet, Avro, JSON, CSV), and employing advanced Spark optimization techniques for batch and streaming workloads. Demonstrated ability to deliver scalable, reliable, and business-focused data solutions across financial, research, and quality analytics domains.

Technical Skills:

•Databases: MS SQL, PostgreSQL, Oracle, MongoDB, Teradata, Cassandra, Snowflake

•Big Data: Apache Spark, PySpark, Spark SQL, Hive, HDFS, Kafka, Sqoop, MapReduce

•Cloud: Azure (ADF, ADLS, Databricks, Synapse, Cosmos DB, Logic Apps), AWS

•ETL/Tools: ADF, SSIS, Fivetran, Airflow, NiFi, Oozie, Informatica, Talend, Alteryx, Azure Data Factory, dbt

•Programming: SQL, T-SQL, Python, Scala, JavaScript

•Visualization: Power BI, Tableau, SSRS, Splunk

•Other: Linux/Windows, Agile/Scrum, CI/CD, Data Modeling, Performance Tuning, Git, Jenkins, Docker, Ansible, JIRA, Excel (Power Query, VBA, Macros), SAP Power Designer

•Governance/Compliance: HIPAA, PHI/PII, AML, ISO, Metadata & Lineage (Collibra, Alation, Informatica Axon)

Professional Experience

Key Bank Jul 2024 - Present

Data Engineer FL

•Designed and deployed ETL/ELT workflows using Azure Data Factory, Databricks, and Snowflake for financial data pipelines.

•Optimized Spark jobs on Databricks, reducing pipeline runtime and improving SLA adherence for financial reporting.

•Automated incremental data extraction using Fivetran for real-time transaction data integration, improving efficiency and system load handling.

•Designed and deployed 25+ ETL/ELT pipelines using ADF, Databricks, and Snowflake for financial data.

•Enhanced Spark jobs on Databricks, reducing pipeline runtime by 40% and improving SLA adherence.

•Systematized incremental data extraction with Fivetran, lowering system load by 30%.

•Designed and optimized ETL/ELT pipelines in Azure Data Factory and Databricks to process large volumes of financial and payments data.

•Developed and tuned complex SQL queries on SQL Server and Oracle to ensure high-performance data extraction and reconciliation.

•Implemented data validation and audit frameworks to maintain integrity across payments and treasury management datasets.

•Created data architecture documentation, including flow diagrams and metadata lineage for cross-system visibility.

•Developed and orchestrated ADF pipelines with parameters, variables, triggers (schedule, tumbling window, event-based), and proactive monitoring for performance optimization.

•Implemented CI/CD pipelines for ADF using Azure DevOps, Key Vault, and pipeline releases.

•Developed proactive monitoring and alerting solutions for ADF pipelines and Azure SQL Database, enhancing system reliability and reducing downtime

•Built interactive dashboards in Power BI, transforming legacy Excel reports into scalable visualizations, which improved data accessibility and decision-making efficiency.

•Designed and optimized data warehousing solutions in Azure Synapse Analytics, Azure SQL Data Warehouse, and Cosmos DB for scalable, low-latency access.

•Built and optimized ETL pipelines with Fivetran for real-time financial data integration, including incremental and parallel extraction for high-volume transactions.

•Designed and deployed globally distributed, multi-model databases using Cosmos DB for low-latency data access, resulting in faster data retrieval and improved user experience

•Developed parameterized, reusable Databricks notebooks and ADF pipeline templates for modular ETL design, accelerating develop- ment across multiple projects.

•Implemented automated data lineage tracking and impact analysis to improve transparency and auditability of data pipelines.

•Ensured sprint goals were met and code quality was maintained in Agile/Scrum environments by utilizing MS SQL and Databricks, leading to improved project delivery timelines.

Client: Centene Health Care Nov 2023 - Jun 2024

Data Engineer

•Architected end-to-end ETL pipelines using Azure Data Factory, Databricks, and SSIS to modernize healthcare data workflows.

•Architected and implemented end-to-end ETL solutions using Azure Data Factory (ADF), SSIS, and Databricks to streamline healthcare data movement across applications.

•Designed and executed data migration strategies to Azure SQL Data Warehouse/Synapse Analytics, ensuring data integrity, minimal downtime, and optimized performance.

•Built and deployed ADF pipelines with Blob Storage, Data Lake, stored procedures, lookups, data flows, and Azure Functions to automate and optimize workflows.

•Utilized Azure Databricks, PySpark for large-scale data transformation, migration of SQL ETL processes, and high-performance analytics.

•Led the migration of OLTP to OLAP systems, improving analytics capabilities through scalable ETL packages and optimized queries.

•Built HIPAA-compliant ETL frameworks with encryption, masking, and RBAC.

•Improved PySpark transformations, reducing processing time by 45%.

•Implemented Hive partitioning, improving query response times by 60%.Contributed to data governance and compliance initiatives, ensuring HIPAA/PII standards were met across healthcare and financial datasets.

•Designed and optimized PySpark transformations in Databricks for healthcare analytics.

•Implemented Hive bucketing and partitioning strategies to improve query performance.

•Utilized Python libraries for data cleaning, transformation, and testing of analytics workflows, ensuring data accuracy and reliability

•Migrated legacy ETL processes to modern big data frameworks and cloud-based solutions, improving scalability and maintainability, which reduced system downtime

•Prepared project documentation, use cases, and backlogs, ensuring alignment with business requirements, which facilitated smoother project execution

•Developed and maintained Power BI reports for business and IT stakeholders, enhancing data accessibility and decision-making processes

•Configured and tuned Databricks clusters for cost efficiency and high availability.

•Collaborated with cross-functional teams (business analysts, QA, DevOps) to deliver scalable cloud-based data pipelines within Agile/Scrum frameworks.

•Built end-to-end ETL pipelines using Azure Data Factory and Databricks, transforming complex source data into structured, analytics-ready datasets.

•Wrote and optimized SQL queries across SQL Server, PostgreSQL, and Oracle databases for performance and accuracy.

•Supported data modernization efforts, migrating on-prem datasets to cloud environments for improved performance and governance.

•Collaborated with business users and IT teams to translate functional requirements into technical ETL logic.

Client: Exprad IT Jan 2021 – Jun 2022

Data Analyst, India

•Ingested large-scale customer and behavioral data into Snowflake using Azure pipelines, improving data accessibility for analytics teams

•Designed warehouse models using SCD, CDC, surrogate keys, and applied data cleansing strategies, enhancing data integrity and reporting accuracy

•Built PySpark/Scala pipelines for enrichment, aggregation, and multi-format data ingestion (Parquet, Avro, JSON, CSV), streamlining data processing and reducing latency

•Developed Kafka producers for ingesting external REST API data streams, enabling real-time data processing and integration

•Implemented schema evolution and version control for Delta Lake tables to support evolving business requirements, ensuring data consistency and adaptability

•Implemented automated data lineage tracking and impact analysis to improve transparency and auditability of data pipelines.

•Developed parameterized, reusable Databricks notebooks and ADF pipeline templates for modular ETL design, accelerating development across multiple projects.

•Integrated Kafka and Spark Structured Streaming for real-time ingestion, enhancing responsiveness of healthcare applications.

•Automated data transformation tasks with Python and Databricks notebooks, improving efficiency and reducing manual effort.

•Designed and administered high-concurrency Spark clusters in Databricks, optimizing resource usage, performance, and cost management.

•Monitored and optimized SSIS package performance using Azure-SSIS IR.

•Built analytics dashboards in Power BI/Tableau, integrating Athena for reporting.

Client: Tech Mahindra Aug 2017 - Dec 2020

Data Engineer

•Developed large-scale ETL pipelines using Spark SQL and MapReduce for OLTP/OLAP data, improving data processing efficiency and supporting real-time analytics

•Wrote complex SQL queries, indexes, and scripts to generate reports, enhancing marketing and sales analysis capabilities and decision-making processes

•Worked with multiple data sources including flat files, on-premise databases, and cloud platforms, ensuring seamless data integration and accessibility for business intelligence

•Developed 50+ large-scale ETL pipelines using Spark SQL and MapReduce.

•Wrote complex SQL scripts, improving reporting efficiency by 35%.

•Migrated legacy reports to cloud BI (Power BI, Tableau), improving adoption by 40%.

•Streamlined OLTP to OLAP transformations, reducing ETL cycle time by 25%.

•Designed and implemented Star/Snowflake schemas for OLAP data models.

Education:

Indiana Tech University, Indianapolis, USA 2023

Masters, Information Systems



Contact this candidate