Data Engineer Azure

Location:

Hyderabad, Telangana, India

Posted:

October 15, 2025

Contact this candidate

Resume:

Venkata Chaitanya

Sr.Data Engineer

813-***-**** ******************@*****.***

PROFESSIONAL SUMMARY

Over 6+ years of experience in the IT industry, specializing in leveraging Azure tools and services, including Azure ADLS GEN2, Azure Blob Storage, Azure Synapse Analytics, Azure Data Factory, Azure Functions, Azure Stream Analytics, Azure Logic Apps and Azure Cosmos DB.

Over 6+ years as a Data Engineer specializing in Azure, consistently delivering high-impact solutions with optimized performance and efficiency.

Architecting and managing scalable data lakes using Azure ADLS Gen2 and Azure Data Factory; significantly enhanced ETL efficiency by optimizing Azure Data Factory pipelines, reducing data processing time by 40%.

Developing advanced real-time analytics and data processing pipelines with Azure Databricks (leveraging Python, PySpark, Spark SQL), resulting in a 30% reduction in processing time and improving business forecasting accuracy by 25% through ML model integration.

Designing and implementing scalable Delta Lake architecture with Azure Databricks and Spark for efficient big data processing; achieved 15% cloud infrastructure cost savings by optimizing Databricks resource utilization.

Proficient in developing real-time data ingestion and processing solutions using Azure Stream Analytics and Azure Event Hubs.

Led migration of on-premises data to Azure Synapse Analytics, optimizing data storage and streamlining analytics, which resulted in a 20% reduction in query response times.

Expert in implementing and managing robust Snowflake data warehousing solutions; enhanced data accessibility and contributed to a 25% increase in operational efficiency by integrating external data sources with Snowflake on Azure.

Leveraged Snowpipes for real-time data ingestion into Snowflake, reducing data latency by 50%, and utilized SnowSQL for scalable querying and management.

Led Snowpark (Python) integration in Snowflake for complex data transformations, significantly enhancing ETL efficiency within Snowflake's virtual data warehouse

Mastery in SQL (T-SQL, SnowSQL, Spark SQL) for complex querying and performance optimization; achieved up to 40% reduction in query execution times in Snowflake and 30% in Azure SQL Database through advanced indexing, partitioning, and script optimization.

Experience in Infrastructure Development and Operations involving AWS Cloud platforms, EC2, EBS, S3, VPC, RDS, SES, ELB, Auto scaling, Cloud Front, Cloud Formation, Elastic Cache, Cloud Watch, SNS.

Transformed data modeling practices using DBT to optimize ETL data pipelines, increase analytics accuracy, and streamline insights for informed decision-making.

Automating complex data workflows using Python-driven Azure Functions and Azure Logic Apps for efficient task orchestration and scheduling.

Extensive experience in enhancing ETL processes using Python and SQL, achieving up to 30% improvement in processing efficiency for large-scale data pipelines and 20% with Informatica/Python/SQL.

Adept in securing Azure environments (Network Security Groups, Azure Active Directory ) and utilizing Azure Monitor for operational insights.

Proficient in Azure DevOps for CI/CD and version control (Git, BitBucket), applying Agile methodologies for efficient project delivery.

TECHNICAL SKILLS

Cloud technologies: Azure HDInsight, Azure Data Factory, ADLS GEN2, Azure Blob Storage, Azure Synapse Analytics, Azure DataBricks, Azure Cosmos DB, Azure DevOps, Purview, Azure Functional Apps, Azure Logic Apps, Azure Synapse Analytics, Entra ID, Azure Resource Manager, Azure Virtual Machines, Azure Load Balancer.

Big Data: Spark, Hadoop, HDFS, MapReduce, YARN, Hive, Oozie, Pig, Sqoop, Presto, Zeppelin, Flink, ZooKeeper.

Programming languages: Python, Scala, Java, SAS, PySpark, SQL, PL/SQL, T-SQL.

Database: HBase, MongoDB, MYSQL, SQL SERVER, Oracle, PostgreSQL, Snowflake, Teradata.

Data Visualization tools: Tableau, Power BI

Machine Learning Libraries: Sci-kit learn, Pandas, NumPy, PyTorch, TensorFlow, Azure ML.

Version control: Git, GitHub, BitBucket.

Scripting languages: Shell scripting, Power Shell, Bash, UNIX/Linux

Streaming platforms: Kafka, Confluent Kafka, Azure Event Hubs.

Data Modeling: Star Schema, Snowflake Schema, dimensional modeling

PROFESSIONAL EXPERIENCE

Client: FedEx, Memphis, TN. March 2023 - Present

Sr. Data Engineer

Responsibilities:

Led the development of a customer insights dashboard, orchestrating complex ETL processes from diverse Azure sources (including Azure SQL Database and ADLS Gen2) and handling multiple data formats, laying the foundation for data-driven insights.

Architected and delivered robust Azure Data Factory (ADF) V2 ETL pipelines, featuring Change Data Capture (CDC), reusable frameworks, and comprehensive data cataloging for efficient data lifecycle management.

Achieved a 40% reduction in data processing time by implementing significant ADF ETL pipeline optimizations, including parallel processing, data compression, incremental loading, automated error handling, and secure Azure Key Vault integration.

Mastered Azure Databricks with Python (PySpark), Scala, and Spark SQL for advanced ETL data transformations and implementing Slowly Changing Dimensions (SCD) logic, improving Spark SQL efficiency by 75% through optimized query conversion.

Realized a 15% reduction in cloud infrastructure costs by applying advanced Spark optimization techniques in Databricks, including RDD/DataFrame APIs, dynamic partitioning/bucketing, and Python-driven broadcast joins/shuffle optimization.

Developed and deployed machine learning models using Python (PySpark's ML libraries) on Azure Databricks, seamlessly integrating them into ETL workflows to enhance analytical capabilities and improve business forecasting accuracy by 25%.

Engineered and managed over 100 Python (PySpark) jobs within Azure Synapse Analytics' Spark engine, establishing cost-efficient, large-scale ETL data processing and leveraging Synapse catalogs for metadata management.

Spearheaded the migration of on-premises data to Snowflake on Azure, achieving a 40% reduction in SQL query times; expertly utilized SnowSQL for scalable querying and leveraged Snowflake’s native features like automatic scaling and semi-structured data support.

Drove a 25% increase in operational efficiency by implementing robust data governance frameworks on Snowflake and orchestrating the ETL integration of external data sources, significantly improving data quality for real-time analytics and decision-making.

Designed and implemented complex ETL workflow orchestration using Apache Airflow, Python-driven Azure Functions (integrated with API Management for financial data submission), and Azure Logic Apps (crafting DAGs with Python and SQL).

Enhanced data governance, security, and ETL data validation by implementing Unity Catalog in Azure Databricks, defining Azure RBAC policies, establishing data retention/lifecycle processes, and developing comprehensive data validation workflows.

Improved real-time ETL streaming efficiency by 40% using Azure Event Hubs for financial data, by optimizing producers/consumers and topic structures; also applied Spark Streaming for creating data frames/datasets.

Established robust CI/CD pipelines with Azure DevOps and Git for automated testing and deployment of Python-based Azure Functions and other critical ETL components, ensuring efficient and reliable financial data processing.

Led data Lakehouse implementation using Delta Lake architecture, resulting in 15% cost savings.

Championed ETL best practices and team skill development by mentoring junior data engineers and establishing reusable frameworks for Data Factory and data lake management, promoting code reusability and maintainability.

Enabled impactful data-driven decision-making and achieved cost savings by developing data sandboxes for analytics teams, proactively minimizing ETL service downtime, and integrating transformed data with Power BI for stakeholder insights.

Environment: Azure Blob Storage, Azure SQL Database, Azure Data Lake Storage, Azure HDInsight, Azure Databricks, Unity Catalog, Azure Logic Apps, Azure Synapse Analytics, Azure Event Hubs, Azure Functions, Azure DevOps, Snowflake on Azure, Python, Scala, Spark (PySpark, SparkSQL), Kafka, Power BI, Linux, Java, Airflow, PostgreSQL, Oracle PL/SQL, Flink.

Enhance Health, Clearwater, Florida Sep 2021- Feb 2023

Sr. Data Engineer

Responsibilities:

Collaborated with data scientists to develop a Spending Classification model using Azure Data Factory (ADF), including data flows and data catalogs, with Azure Stream Analytics for real-time insights.

Engineered comprehensive ETL processes using ADF, incorporating Change Data Capture (CDC), control flow activities, SQL for transformations, integration with graph databases, and transforming data in Azure Databricks into star schemas with a focus on data quality and SCD implementation.

Orchestrated data migration (using Azure Database Migration Service and ADF) from on-premises systems and integrated diverse data sources into Azure Synapse Analytics, optimizing performance for analytics.

Leveraged Azure Databricks and Python (PySpark) for complex data transformations, cleansing, and implementing automated data quality checks (reducing data errors by 15%); significantly improved PySpark processing performance through proactive data skew mitigation.

Successfully integrated machine learning models into Azure Databricks ETL pipelines, enabling predictive analytics and improving business forecasting accuracy by 25%; also streamlined overall ETL processes using Databricks, reducing data processing time by 25%.

Employed Spark SQL queries for seamless data source integration, improving ETL process efficiency by 30%; established collaborative data science workflows on Databricks using Git and Agile methodologies, increasing team productivity by 20%.

Designed and implemented robust real-time streaming data pipelines using Azure Event Hubs, Azure Stream Analytics, and integrating with Azure Databricks and Kafka for timely ingestion, processing, and analysis of high-velocity data (e.g., automated ingestion of web server logs).

Developed sophisticated data models, implementing star and snowflake schemas in Azure Synapse Analytics and Snowflake on Azure to optimize query performance; also implemented advanced data management solutions using Apache Iceberg table formats and optimized Hive tables (Partitioning, Dynamic Partitions, Buckets).

Utilized Python-driven Azure Functions (with Azure API Management for integrating data from platforms like Concur) and Azure Logic Apps for complex workflow automation, automated SQL queries, dynamic ADLS Gen2 creation, and flexible data delivery to scientists.

Implemented feature engineering on large datasets using Azure HDInsight Spark clusters, optimizing data for model development and achieving a 20% improvement in processing efficiency.

Managed global financial data leveraging Azure Cosmos DB and Azure SQL Database for efficient, scalable storage and Azure Event Hubs for streaming.

Utilized Apache Airflow to automate and streamline critical data workflows, significantly reducing data engineering overhead and enabling focus on more productive tasks.

Ensured operational excellence by documenting data pipeline designs, data architecture models, and best practices; configured Azure Monitor for comprehensive monitoring of data pipelines and infrastructure to proactively address performance issues.

Facilitated data exploration and ad-hoc querying by delivering data to data scientists via Azure Data Lake Storage and Azure Synapse Analytics, supporting advanced analytics tools.

Environment: Azure Blob Storage, Azure Data Factory, Azure Synapse Analytics, Azure SQL Database, Azure Data Lake Storage, Azure Databricks, Azure Logic Apps, Azure HDInsight, Azure Event Hubs, Azure Functions, Azure RBAC, Python, Scala, Spark (PySpark, SparkSQL), Kafka, Azure Synapse Analytics, Azure Cosmos DB, Linux, Java, Apache Airflow, PostgreSQL, Snowflake on Azure.

Fidelity Investments, Boston, MA Dec 2017 – Jun 2019

Data Engineer

Responsibilities:

Worked on AWS Data Pipeline to configure data loads from S3 into Redshift and have used AWS components (Amazon Web Services) - Downloading and uploading data files (with ETL) to AWS system using S3 components, and used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in an Amazon S3 bucket.

Built high-quality Data warehouses and data lakes at the enterprise level, worked with cross-functional teams to automate data ingestion and schedule jobs at daily, weekly & monthly frequencies on AWS cloud.

Worked on Java Gradle to create the jar file to commit to the framework.

Developed Lambda functions and assigned IAM roles to run Python scripts along with various triggers (SQS and SNS).

Developed and deployed AWS Lambda services for ETL migration services by generating a serverless data pipeline that can be written to Glue and queried from Athena.

Designed AWS landing zones that are safe and secure.

Designed and executed on-prem to AWS cloud migration projects for state agencies.

Assisted with tasks such as data pipeline engineering, data analytics, data scraping and mining, data visualization, data dashboarding, machine learning (ML), and AWS Cloud Computing.

Designed and developed scalable, efficient data pipeline processes to handle data ingestion, cleansing, transformation, and integration using Sqoop, Hive, Python, and Impala.

Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks, and analysis.

Developed a data pipeline using Kafka and Storm to store data into HDFS.

Implemented and managed ETL solutions and automated operational processes.

Optimized and tuned the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics.

Imported data from relational data sources to HDFS and imported bulk data into HBase using MapReduce programs.

Developed Pig UDFs to know customer behavior and Pig Latin scripts for processing the data in Hadoop.

Developed simple to complex MapReduce streaming jobs using Python.

Scheduled automated tasks with Oozie for loading data into HDFS through Sqoop and pre-processing the data with Pig and Hive.

Environment: Python, Hadoop, MapReduce, HDFS, HBase, GCP, Java, AWS, Redshift, S3, IAM, Lambda, SQS, SNS, Kubernetes, Docker, UNIX, Hive, Sqoop, Oozie, Bigdata ECO systems, PIG, Cloudera, Impala, Teradata, MongoDB, Cassandra, Unix scripts, XML files, JSON, Rest API, Maven, GitHub, Tableau, Agile, Scrum.

Contact this candidate