Senior Data Engineer - Azure & AWS Data Platform Expert

Location:

Plano, TX

Salary:

75000

Posted:

February 26, 2026

Contact this candidate

Resume:

Data Engineer

Rohit Ibrahimpatnam

Email: ******************@*****.***

Contact: +1-214-***-****

Professional Summary:

Data Engineer with 5 years of experience designing and developing scalable data pipelines across on-prem, AWS, and Azure environments.

Strong expertise in ETL/ELT development, data ingestion, transformation, and optimization using Python, SQL, Apache Spark, and PySpark.

Hands-on experience with Azure Data Factory, Azure Databricks, Azure Synapse Analytics, ADLS Gen2, and AWS Glue, Amazon S3, Amazon Redshift.

Proven ability to build and maintain incremental and CDC-based data pipelines, ensuring data accuracy, performance, and reliability.

Solid background in data modeling (star and snowflake schemas) to support analytics and business intelligence use cases.

Experienced in implementing data quality checks, validation frameworks, and reconciliation processes across enterprise datasets.

Strong understanding of cloud security and governance, including RBAC, IAM policies, and secrets management.

Skilled in performance tuning of Spark jobs and SQL queries to handle large-scale data processing efficiently.

Hands-on experience with CI/CD pipelines, version control, and production support, ensuring stable and maintainable data platforms.

Effective collaborator with cross-functional teams, translating business requirements into reliable data engineering solutions.

Technical Skill

Category

Skills / Tools

Programming Languages

Python (PySpark, Pandas), SQL

Data Engineering

ETL / ELT Development, Data Ingestion, Incremental & CDC Loads, Data Quality & Validation, Data Modeling (Star & Snowflake)

Big Data Processing

Apache Spark, PySpark

Cloud Platforms

Microsoft Azure, Amazon Web Services (AWS)

Azure Services

Azure Data Factory, Azure Databricks, Azure Data Lake Storage Gen2, Azure Synapse Analytics, Azure SQL Database, Azure Key Vault, Azure Active Directory, Azure Monitor, Log Analytics

AWS Services

AWS Glue, Amazon S3, Amazon Redshift, AWS IAM, Amazon CloudWatch

Databases & Storage

Relational Databases (Oracle, MySQL), Azure SQL Database, Amazon Redshift

Data Formats

Parquet, CSV, JSON

Security & Governance

RBAC, IAM Policies, Secrets Management

CI/CD & Version Control

Azure DevOps, Git, CI/CD Pipelines

Scheduling & Orchestration

Azure Data Factory Triggers, AWS Glue Triggers, Cron

Monitoring & Logging

Azure Monitor, Log Analytics, Amazon CloudWatch

Reporting & Analytics

Power BI (Data Consumption Support)

Operating Systems

Linux

Professional Experience:

Client: CMS, Dallas, TX Sept 2023 – Present

Role: Azure Data Engineer

Responsibilities:

Developed and maintained end-to-end data ingestion pipelines using Azure Data Factory, utilizing Self-Hosted Integration Runtime to extract data from on-prem SQL Server and flat files into Azure Data Lake Storage Gen2.

Implemented ETL/ELT processes using Azure Databricks (PySpark) to clean, transform, and standardize large healthcare datasets, applying Spark optimizations such as partitioning, broadcast joins, and caching.

Created and managed analytical tables and views in Azure Synapse Analytics (Dedicated and Serverless SQL Pools) to support reporting and downstream analytics workloads.

Implemented incremental data loads and CDC logic using watermark columns, control tables, and metadata stored in Azure SQL Database to improve pipeline efficiency and reduce processing time.

Performed data modeling in Synapse by building star and snowflake schemas aligned with business reporting requirements and healthcare metrics.

Implemented data quality checks within ADF and Databricks, including schema validation, duplicate detection, null checks, and reconciliation logic to ensure data accuracy and consistency.

Integrated Azure Key Vault with Azure Data Factory and Databricks to securely manage secrets, credentials, and service principals.

Applied role-based access control (RBAC) using Azure Active Directory and managed ADLS Gen2 ACLs to enforce secure access to sensitive healthcare data.

Configured pipeline orchestration using ADF triggers, parameters, and reusable datasets, enabling automated scheduling and dependency management.

Optimized Databricks Spark jobs and Synapse SQL queries by tuning file formats (Parquet), distribution strategies, indexing, and query execution plans.

Monitored pipeline executions and failures using Azure Monitor and Log Analytics, performing root-cause analysis and implementing fixes to improve stability.

Supported business intelligence and reporting teams by delivering curated datasets from Synapse for Power BI consumption.

Managed source control, build, and deployment of data pipelines using Azure DevOps, following CI/CD practices across development, test, and production environments.

Created and maintained technical documentation, data flow diagrams, and operational runbooks for production support.

Tech Stack: Azure Data Factory, Azure Databricks (PySpark/Python), ADLS Gen2, Azure Synapse Analytics, Azure SQL Database, Azure Key Vault, Azure DevOps, Power BI.

T-Mobile - Cognizant, India Jan 2022 – July 2023

Role: Data Engineer

Responsibilities

Developed and maintained scalable data ingestion pipelines using AWS Glue, Python (PySpark), and JDBC connectors to extract data from Oracle and MySQL systems into Amazon S3.

Built ETL workflows using AWS Glue Spark jobs (PySpark) to cleanse, standardize, and enrich high-volume telecom datasets, applying Spark optimizations such as partitioning, caching, and optimized joins.

Designed and managed data lake storage structures in Amazon S3, organizing raw, processed, and curated layers using partitioned Parquet formats for efficient querying.

Implemented analytical data models in Amazon Redshift, creating fact and dimension tables using star and snowflake schemas to support reporting and performance analytics.

Developed incremental data processing logic using Glue job bookmarks and S3 partitioning to reduce full data reloads and improve processing efficiency.

Integrated Apache Spark (PySpark) for complex transformations, aggregations, and window functions on large structured and semi-structured datasets.

Enforced data quality by implementing schema validation, null checks, duplicate detection, and reconciliation logic within Glue and Spark jobs.

Orchestrated end-to-end data workflows using AWS Glue Triggers and Amazon CloudWatch Events, enabling automated scheduling and dependency management.

Implemented security controls using AWS IAM roles and policies, ensuring secure access to S3, Glue, and Redshift in accordance with enterprise standards.

Tuned Redshift performance by configuring distribution styles, sort keys, vacuum operations, and query optimization to support high-concurrency workloads.

Monitored ETL pipelines and cluster performance using Amazon CloudWatch logs and metrics, performing root-cause analysis for job failures and latency issues.

Collaborated with reporting teams to deliver curated datasets from Amazon Redshift for downstream BI and analytics consumption.

Managed source code, versioning, and deployment of data pipelines using Git and CI/CD pipelines, supporting development, test, and production environments.

Tech Stack: AWS Glue (PySpark/Python), Apache Spark, Amazon S3, Amazon Redshift, AWS IAM, Amazon CloudWatch, Git, CI/CD pipelines.

Client: RCOM, Hyderabad, India Jan 2021 – Jan 2022

Role: Data Engineer

Responsibilities:

Developed batch data ingestion processes using Python (Pandas, PyODBC) and SQL to extract data from operational systems and load it into a centralized relational database.

Built and maintained ETL pipelines using Python scripts and SQL transformations, performing data cleansing, normalization, and business rule application on telecom datasets.

Designed and managed relational data models, creating fact and dimension tables to support reporting and analytical queries.

Implemented incremental and delta load logic using timestamp columns and surrogate keys to handle daily data refreshes efficiently.

Wrote and optimized complex SQL queries, including joins, CTEs, subqueries, and window functions, to support analytics and reporting needs.

Enforced data quality controls by implementing validation checks, record counts, duplicate detection, and reconciliation logic within ETL processes.

Performed database performance tuning by creating indexes, analyzing execution plans, and optimizing query logic.

Scheduled and monitored ETL jobs using cron and batch scheduling tools, ensuring reliable daily data processing.

Collaborated with reporting teams to deliver curated datasets aligned with business definitions and KPIs.

Documented data flows, transformation logic, and operational procedures to support ongoing maintenance and knowledge transfer.

Tech Stack: Python (Pandas), SQL, Relational Databases (Oracle/MySQL), ETL Scripts, Linux, Cron Scheduling.

Certification

Microsoft Certified: Fabric Data Engineer Associate

Education:

Master's – University of North Texas, Denton, Texas, USA

Bachelor’s - CVR College of Engineering —India

Contact this candidate