Senior Data Engineer

Location:

Irving, TX

Salary:

105000

Posted:

July 04, 2025

Contact this candidate

Resume:

Saran Durgam

+1-314-***-**** ***********@*****.***

Professional Summary:

Over 8 years of experience designing, developing, and optimizing large-scale data engineering solutions across cloud and on-premises ecosystems.

Expertise in building scalable batch and streaming data pipelines using PySpark, Azure Databricks, and Apache Spark for high-performance data processing.

Proficient in designing and orchestrating end-to-end data workflows using Azure Data Factory (ADF) for cloud- native ETL automation.

Strong hands-on experience with AWS services including Lambda, Redshift, and S3 for building serverless and distributed data solutions.

Deep expertise in Informatica PowerCenter, developing complex ETL mappings, reusable mapplets, session configurations, and workflow automation using Control-M and shell scripting.

Solid background in Teradata, including BTEQ scripting, FastLoad/MultiLoad utilities, stored procedures, performance tuning, and advanced SQL optimization.

Skilled in Python development with libraries such as pandas, NumPy, and Scikit-Learn for data transformation, analytics, and feature engineering.

Experienced in using SQL Server, Spark SQL, and Oracle for building optimized queries, stored procedures, and analytical logic.

Worked with large-scale data warehousing tools like Snowflake, CosmosDB, and PostgreSQL, applying best practices in modeling and indexing.

Implemented data modeling techniques such as Star Schema, Snowflake Schema, and Data Vault 2.0 to support business intelligence and reporting use cases.

Built and optimized Hive tables with partitioning and indexing for improved data query performance within Hadoop ecosystems.

Automated and managed ETL processes using Apache Airflow, shell scripting, and Control-M for robust workflow orchestration.

Designed and implemented data lake architecture with Azure Data Lake Storage (ADLS), handling both structured and unstructured datasets.

Delivered interactive reports and dashboards using Power BI, including complex visualizations like waterfall, matrix, treemap, and funnel charts.

Collaborated closely with cross-functional Agile teams, data scientists, and business analysts to align technical solutions with data requirements.

Practiced Agile methodology using JIRA for sprint planning, backlog grooming, and continuous integration practices.

Experience in ServiceNow for issue resolution, change tracking, and seamless communication with operations teams.

Consistently focused on improving data quality, operational reliability, and performance across all stages of the data lifecycle.

Technical Skills:

Programming

Languages &

Scripting

Python (pandas, NumPy, SciPy, Scikit-Learn, SQLAlchemy), SQL (SQL Server, Spark SQL), Shell Scripting

Big Data

Technologies &

Frameworks

Apache Spark, PySpark, Spark Streaming, Apache Hadoop, MapReduce, Hive, Talend, Informatica Power Center

Cloud Platforms &

Services

Microsoft Azure: Azure Databricks, Data Factory (ADF), Functions, Logic Apps, Monitor, Security Center, Data Catalog, Synapse Analytics, Data Lake Storage (ADLS), Blob Storage, Key Vault, Site Recovery, Backup, Machine Learning Services; AWS: Lambda, Redshift, S3, Dataflow, BigQuery

Data Storage &

Databases Data Warehouse, CosmosDB, MySQL, Teradata, PostgreSQL, HDFS, Distributed File Storage Data Modeling &

Metadata

Management

Dimensional Modeling, Data Vault 2.0, Star Schema, Erwin Data Modeler, Metadata Management, Data Lineage, Data Cataloging

DevOps, CI/CD &

Version Control Azure DevOps, Jenkins, Git & GitHub, CloudFormation Data Pipeline &

Workflow

Orchestration

Data Pipeline Development & Optimization, ETL/ELT Automation, Data Orchestration (Cloud

& On-Premises), Event-Driven Automation, Batch Processing Jobs Security & Access

Control

Role-Based Access Control (RBAC), OAuth Authentication, Data Governance & Compliance, Data Encryption (At-Rest & In-Transit), Data Security Policies Monitoring &

Performance

Query Performance Tuning & Optimization, Platform Health Monitoring, Performance Tuning (SQL Server, Spark SQL)

Project &

Collaboration Tools JIRA (Agile & Scrum), ServiceNow (Incident & Change Management) Data Visualization &

Reporting Power BI, Matplotlib, Tableau

Experience:

Cisco, Bengaluru, Karnataka January 2022 – August 2023 Sr. Data Engineer

Engineered complex data transformations and analytics workflows using Python, leveraging pandas and NumPy for efficient data wrangling and manipulation.

Implemented distributed data processing solutions with Apache Spark, PySpark, and Azure Databricks to efficiently manage large-scale datasets.

Optimized performance of SQL Server queries and Spark SQL jobs through advanced indexing strategies, stored procedure tuning, and query refactoring.

Designed, developed, and managed scalable ETL pipelines using Azure Data Factory (ADF) and Azure Functions to automate diverse business processes.

Built event-driven data workflows using Azure Logic Apps, enabling seamless integration across various cloud services for dynamic automation.

Executed batch processing operations using MapReduce and managed structured and semi-structured data through Hive on the Hadoop platform.

Maintained CI/CD pipelines using Azure DevOps, automating deployment processes and infrastructure provisioning.

Monitored platform stability and resource utilization via Azure Monitor, Log Analytics, and Azure Security Center, enabling proactive issue resolution.

Practiced Agile methodology with JIRA for sprint planning, issue tracking, and team collaboration throughout the development lifecycle.

Administered and optimized Snowflake Data Warehouse environments by partitioning, clustering, and optimizing data storage for high-performance analytics.

Built and managed big data processing workloads on Azure HDInsight, orchestrating hybrid data pipelines across on-prem and cloud environments.

Created modular, reusable pipeline components to reduce redundancy, support maintainability, and accelerate development cycles.

Integrated machine learning models into production data pipelines using Azure Machine Learning, supporting advanced predictive analytics.

Enhanced data discoverability and governance with Azure Data Catalog, enabling effective metadata management across enterprise datasets.

Managed source control and team collaboration through Git and GitHub, maintaining version history and enforcing code quality standards.

Designed resilient and scalable systems with fault tolerance and horizontal scalability in mind to ensure consistent data processing.

Architected storage and access layers using Azure Data Lake, Azure Blob Storage, and HDFS for efficient big data storage and retrieval.

Administered CosmosDB to support globally distributed, low-latency, multi-model applications requiring real- time responsiveness.

Applied Dimensional Modelling, Star Schema, and Data Vault 2.0 frameworks for building scalable and flexible enterprise data warehouses.

Used Erwin Data Modeler for designing and documenting data models, ensuring data integrity and consistency across systems.

Spearheaded query optimization and performance tuning initiatives, reducing execution times and improving overall data platform efficiency.

Implemented RBAC policies (Role-Based control access) to sensitive data assets, maintaining security and compliance across cloud environments.

Enabled cloud resilience and disaster recovery by configuring Azure Site Recovery and Azure Backup for business continuity.

Developed and enforced enterprise data governance frameworks, aligning with industry standards and regulatory compliance requirements.

Collaborated closely with data scientists, analysts, and business users to transform complex data requirements into scalable, production-ready solutions.

Environment: Python, Pandas, NumPy, Spark, PySpark, Azure Databricks, SQL Server, Spark SQL, ADF, Hive, Hadoop, Azure DevOps, Terraform, Docker, Security Center, JIRA, Snowflake, HDInsight, Azure ML, Data Catalog, Git, GitHub, HDFS, CosmosDB, Data Vault 2.0, Star Schema, Erwin, RBAC, Site Recovery, Azure Backup. Axis Bank, Bengaluru, Karnataka October 2020 – January 2022 Sr. Data Engineer

Designed and implemented scalable data ingestion, transformation, and integration pipelines using Informatica and AWS Data Pipeline to process batch and real-time data from multiple sources.

Created serverless data processing functions using AWS Lambda to support event-driven architectures and real- time data workflows.

Implemented streaming data ingestion and processing using AWS Kinesis, ensuring low-latency and fault- tolerant pipelines.

Developed data transformation scripts and analytics workflows in Python, utilizing libraries such as pandas, NumPy for efficient data manipulation and analysis.

Designed and maintained Teradata databases for semi-structured and unstructured data, integrating with data pipelines and applications.

Conducted metadata management and ensured data lineage to maintain data governance and compliance standards.

Managed and optimized data storage solutions on Amazon Redshift and Amazon S3 to support high-performance data warehousing and analytics.

Developed data processing workflows leveraging Apache Spark and Scala to handle large-scale distributed computing within Hadoop ecosystems.

Automated ETL and data pipeline deployments using CI/CD pipelines, Jenkins, and CloudFormation for consistent and reliable delivery.

Wrote and optimized complex SQL queries for data extraction, transformation, and reporting across relational and columnar databases.

Developed and maintained Hive scripts to facilitate querying and managing large datasets stored in Hadoop clusters.

Managed source code repositories and version control using Git, supporting collaborative development and code reviews.

Designed and implemented robust data pipelines ensuring scalability, fault tolerance, and high availability within cloud-native environments.

Collaborated with Agile and Scrum teams, leveraging JIRA for sprint planning, task tracking, and continuous delivery.

Optimized cloud resource usage and costs by analyzing workloads and adjusting storage, compute, and data transfer configurations.

Ensured compliance with organizational policies and industry standards through comprehensive data security and access management protocols.

Enforced robust data security measures, including AWS IAM policies, role-based access control (RBAC), AWS KMS encryption, and secret management via AWS Secrets Manager. Environment: SQL, PySpark, Databricks, Azure Data Factory, AWS Data Pipeline, Lambda, Redshift, S3, Data Lake house, Data Warehouse, Informatica Power Center, Oracle, Teradata, SQL Server, Linux, PL/SQL, SAS, Python, pandas, NumPy, Power BI, MongoDB, Spark, Hadoop, Hive, CI/CD, Jenkins, Git, JIRA, Control M. Landmark Group, Bengaluru, Karnataka August 2019 - August 2020 Data Engineer

Designed and developed scalable data pipelines using Azure Data Factory (ADF) to orchestrate data workflows across cloud and on-premises sources.

Created serverless data processing functions with Azure Functions and integrated with external systems through RESTful APIs.

Automated build, test, and deployment processes using Jenkins and CI/CD pipelines, ensuring rapid and reliable delivery.

Maintained version control and collaborative development practices using Git across multiple data engineering projects.

Administered and optimized relational data warehouses using MySQL and Teradata, ensuring data integrity and high query performance.

Orchestrated complex data workflows using Apache Airflow, scheduling and monitoring ETL jobs to meet business needs.

Engineered robust data storage solutions leveraging Azure Data Lake Storage (ADLS), Azure Blob Storage, and HDFS for efficient big data management.

Developed data processing and transformation jobs using Python, incorporating libraries such as Pandas, NumPy, and Scikit-Learn for advanced data manipulation and feature engineering.

Built and optimized distributed data processing workflows with Apache Spark, Azure Databricks, and Flink to handle real-time and batch processing.

Implemented machine learning pipelines integrated with data workflows, supporting predictive analytics and automated decision-making.

Utilized Azure Synapse Analytics and Hive to enable large-scale data warehousing and querying in hybrid cloud environments.

Monitored data platform health and performance with Azure Monitoring, Log Analytics, and debugging tools like IntelliJ IDEA.

Applied Agile methodologies to manage projects, collaborating closely with cross-functional teams to design robust, scalable data architecture.

Secured data access and credentials management through Azure Key Vault and implemented authentication protocols such as OAuth.

Developed analytical models and reporting dashboards with Power BI, translating data insights for business stakeholders.

Environment: ADF, Azure Functions, Jenkins, Git, MySQL, Teradata, ADLS, Azure Blob Storage, HDFS, Python, Pandas, NumPy, Scikit-Learn, Apache Spark, Azure Databricks, Hive, Agile, Power BI, SQL, Azure Data Factory, AWS Data Pipeline, Lambda, Redshift, S3, Data Lake house, Data Warehouse, Informatica Power Center, Oracle, SQL Server, Linux, PL/SQL, SAS, Python, pandas, NumPy, Hadoop, CI/CD, JIRA. Citi, Bengaluru, Karnataka January 2016 – July 2019 Jr. Data Engineer

Designed and developed complex ETL mappings using Informatica PowerCenter to extract, transform, and load data from flat files, Oracle, and SQL Server into Teradata EDW.

Implemented reusable mapplets, parameterized sessions, and dynamic workflows in Informatica to support modular ETL development and improve maintainability.

Implemented robust error handling, rejection logging, and audit tracking in Informatica using expression transformations, lookup, router, filter, update strategy, and aggregator transformations.

Built robust ETL processes for handling SCD Type 1 and Type 2 logic for dimension tables in Teradata, ensuring historical data tracking.

Developed high-performance BTEQ scripts, FastLoad, and MultiLoad jobs in Teradata for bulk loading and incremental updates across large datasets.

Performed unit testing, batch balancing, and data validation to ensure pipeline accuracy and alignment with business rules.

Optimized Teradata SQL queries using Explain Plans, statistics collection, and indexing strategies to reduce query execution time and resource consumption.

Migrated legacy ETL workflows from SAS and other tools into Informatica, improving job performance and error handling capabilities.

Collaborated with data modelers to implement Star Schema and Snowflake Schema models for building scalable, analysis-ready data marts in Teradata.

Performed unit testing, batch balancing, and data validation to ensure pipeline accuracy and alignment with business rules.

Created STTM (Source-to-Target Mapping) documents in collaboration with business analysts to clearly define transformation logic and data flow.

Integrated Informatica with Control-M and Unix shell scripts to schedule, monitor, and automate ETL job execution.

Managed incremental and full data loads using Change Data Capture (CDC) and pre/post-session logic for improved performance and reliability.

Used PL/SQL to write complex queries, stored procedures, and functions for validating and processing data within ETL workflows and downstream systems.

Built data ingestion and transformation pipelines using PySpark and Databricks, integrating with legacy Teradata systems for hybrid processing.

Leveraged Azure Data Factory (ADF) for orchestrating cross-platform data workflows between Teradata, Azure Data Lake, and Snowflake.

Configured and tuned data storage in Snowflake, including clustering, time travel, and warehouse sizing for reporting and dashboard use cases.

Engineered Delta Lake-based pipelines in Azure Databricks, enhancing consistency between staging and EDW layers during Informatica migration.

Automated CI/CD deployments for Informatica mappings and ADF pipelines using Azure DevOps and Jenkins, ensuring rapid and stable releases.

Implemented robust error handling, rejection logging, and audit tracking in Informatica using expression transformations and workflow link conditions.

Administered Azure Data Lake Storage to organize raw, staging, and curated data zones, improving pipeline traceability and performance.

Supported business intelligence teams with data mart development and analysis by designing Fact and Dimension tables aligned to Star Schema.

Scheduled, monitored, and managed ETL workflows using Control-M and Informatica PowerCenter, ensuring timely and reliable data processing across environments.

Resolved production issues related to failed jobs, performance degradation, and data mismatches by root cause analysis and providing permanent fixes.

Collaborated with QA, DBAs, and analysts to tune performance, optimize data lineage, and ensure data governance compliance in Teradata and Snowflake environments. Environment: SQL, Oracle, Informatica Power Center, Teradata, PL/SQL, Pyspark, Hadoop, Databricks, Python, ADF, Azure Data Lake, CI/CD, ServiceNow, Git, Agile, Power BI, Data Warehousing.

Contact this candidate