Data Engineer with Azure Cloud and Big Data Expertise

Location:

Suwanee, GA

Posted:

November 25, 2025

Contact this candidate

Resume:

Alekhya Chinthala

Data Engineer

+1-470-***-**** *******.******@*****.***

LinkedIn: LinkedIn

PROFILE SUMMARY:

Accomplished Data Engineer with 6 years of IT experience, adept in designing and deploying cloud ETL solutions on Microsoft Azure platform through Azure Data Factory, Azure Databricks and Azure Data Lake Storage.

4+ years of dedicated experience in Azure Cloud Solutions and Big Data technologies, designing and establishing optimal cloud solutions for efficient data migration and processing.

Developed and maintained Informatica workflows to transform, cleanse, and load data into Snowflake, following the Medallion architecture (bronze, silver, gold layers) for enhanced data quality and accessibility.

Hands-on experience with Big Data technologies such as Hadoop, Hive, Apache Spark, and Databricks.

Proficient in using Azure Data Factory, Logic Apps, Event Grid, Service Bus, Mapping Dataflow, Maven, and Git repository management.

Experienced in creating pipelines in Azure Data Factory using various activities like Copy, Filter, Foreach, GetMetadata, and Lookup.

Managed and secured sensitive information such as passwords, encryption keys, and certificates for cloud applications using Azure Key Vaults.

Expertise in leveraging Azure Databricks and Unity Catalog for distributed data processing, transformation, cleansing, and data governance.

Optimized Spark jobs and workflows by tuning configurations, partitioning, broadcasting, caching, and adjusting memory allocations.

Designed and optimized CDC pipelines to efficiently capture and replicate data changes from various source systems, including MySQL, PostgreSQL, Oracle, and Snowflake.

Migrated data from Client data scope to Azure SQL server and ADLS Gen2, Designed NoSQL data structure to accommodate the new cache, NoSQL/SQL/Python/Databricks/PySpark Visual paradigm.

Expertise in Spark SQL and DataFrames, working with data formats including JSON, XML and Parquet.

Strong experience in developing Python applications with libraries such as Pandas, NumPy, and PyODBC.

Developed Spark applications using PySpark and Spark SQL to analyze and transform data from multiple file formats, uncovering valuable insights.

Skilled in AI-assisted data quality frameworks, embedding GPT-powered checks within Databricks pipelines to automatically validate business rules and identify anomalies.

Proficient in performance analysis, monitoring, and SQL query tuning, including collecting statistics, hints, and SQL tracing in both SQL Server and Teradata.

Experienced in creating and managing databases, users, tables, triggers, macros, views, stored procedures, functions, joins, and hash indexes using T-SQL for data transformation and manipulation, including complex joins, aggregations, and subqueries.

Proficient in leveraging Snowflake features including SnowSQL, Snow Pipe, Tasks, Streams, Optimizer, Metadata Manager, Data sharing, and Stored Procedures.

Extensive experience in Data Warehousing, encompassing OLTP, OLAP, Dimensions, Facts, and Data modeling with a solid understanding of Azure configuration in relation to Snowflake.

Experienced in data profiling, data mapping, data migration, ELT, and archiving and Adopts automation-driven approaches to minimize manual errors and streamline processes.

Implement data orchestration processes, leveraging tool Airflow to automate and optimize data workflows, ensuring seamless and reliable data pipeline operations.

Automated ETL processes by writing Python scripts with libraries like PyODBC and SQLAlchemy and streamlined data extraction, transformation, and loading tasks.

Utilized Azure DevOps for managing work items, tracking progress, and integrating with Git for version control and collaborative development.

TECHNICAL SKILLS:

Azure Services

Azure Data Factory (ADF), AWS, Azure Data Lake Storage (ADLS Gen2), Azure Blob Storage, Azure Functions, Azure Logic Apps, Azure Event Hubs, Azure Synapse Analytics, Azure SQL Database, Azure Key Vault, Azure Purview, Azure Monitor, Azure Analysis Services, Azure CLI

Big Data & Analytics

Databricks, PySpark, Spark SQL, Delta Lake, Delta Live Tables (DLT), Spark Structured Streaming, Hadoop (HDFS, MapReduce), Kafka, Hive

Programming & Scripting

Python (Pandas, NumPy, PyODBC, SQLAlchemy), SQL, T-SQL, Snowflake SQL (SnowSQL), Scala, Shell/Unix, Terraform

Cloud Data Warehousing

Snowflake (Streams, Tasks, Procedures, SnowPipe, Materialized Views, Optimizer, Data Sharing, Streamlit), Azure Synapse Analytics, Redshift

Gen AI & LLMs

OpenAI (GPT, Embeddings, Function Calling), LangChain, Retrieval-Augmented Generation (RAG), Agentic AI, LLMOps, Snowflake Streamlit AI Dashboards

Data Warehousing & ETL

OLTP/OLAP, Dimensions & Facts, Star & Snowflake Schema, SCD handling, Data Modeling, Informatica (legacy experience)

CERTIFICATIONS:

Microsoft Azure Data Engineer Associate: DP-203

Databricks Certified Data Engineer Associate

Academy Accreditation - Generative AI Fundamentals

PROFESSIONAL EXPERIENCE:

Role: Data Engineer Oct 2024 – Present

T-Mobile, Atlanta, GA

Responsibilities:

Designed metadata-driven dynamic pipelines in ADF to auto-generate ingestion workflows for 100+ tables across Salesforce, SAP, Oracle, and SQL Server.

Worked with ADF Mapping Data Flows for transformations, handling schema changes and late-arriving data. Built and optimized data transformation pipelines and ETL/ELT workflows for both batch and near real-time processing.

Supported Lakehouse data processing in Delta Lake and Snowflake for raw-to-curated transformations.

Developed Snowflake stored procedures to automate incremental loads for fact and dimension tables, handling inserts, updates, and deletions efficiently.

Designed APIs and orchestrations to enable seamless integration between backend data pipelines and front-end reporting tools like Power BI and Snowflake Streamlit.

Created audit and logging mechanisms within Snowflake stored procedures to track duplicates, errors, and data consistency during pipeline executions.

Performed data quality validations, optimized Spark jobs for performance, and orchestrated notebooks via ADF for automated end-to-end data processing.

Integrated OpenAI embeddings with Delta Lake to create a vector database for semantic search across telecom billing and customer datasets.

Integrated the Reflection Agent into a Snowflake Streamlit UI for seamless interaction with business users, improving adoption and usability of AI workflows.

Built interactive Snowflake Streamlit applications integrated with Snowflake to enable advanced business insights and self-service analytics.

Performed data profiling, cleansing, and cross-validation to resolve duplicates, missing values, and inconsistencies, providing reliable hierarchical views for reporting and analysis.

Analyzed product data and mapped attributes to identify and validate a 5-level product hierarchy, ensuring accurate categorization and consistency across multiple source tables.

Migrated inhouse script which is TSM Garasign functionality by implementing ETL logic in Databricks, transforming and processing source data for Snowflake ingestion.

Monitored and supported data pipelines in ADF, Databricks, and Snowflake, troubleshooting failures, data inconsistencies, and system errors to ensure timely and accurate data availability.

Responded to production incidents, performed root cause analysis, applied fixes, and coordinated with stakeholders to maintain smooth operations and prevent recurring issues.

Environments: ADF, Azure Blob Storage, Azure Data Lake Storage Gen2, Snowflake, Snowflake Stored Procedures, Snowflake Tasks, Databricks, Delta Lake, PySpark, Python, Streamlit, SQL, Data Quality Checks, ETL Pipelines, CDC Processing, Spark Job Optimization, Pipeline Monitoring, Production Support, Power BI

Role: Data Engineer Jan 2024 – Oct 2024

Freeman, Irving, TX

Responsibilities:

Developed Azure Data Factory pipelines, utilizing Linked Services, Datasets, and Integration Runtimes to ingest data from various sources, including on-prem SQL Server, APIs, and CRM flat files, into Azure storage.

Build and maintain end-to-end data pipelines to extract, transform, and load (ETL) data into Snowflake from diverse sources such as databases, files, APIs, and streaming platforms.

Implemented robust data replication and synchronization strategies between Snowflake and other data platforms leveraging Azure Data Factory and Change Data Capture techniques, ensuring data integrity and consistency with a 98% reduction in data inconsistencies.

Developed Python-based pipelines and Databricks notebooks to transform data from diverse sources (SQL, APIs, flat files) into Snowflake.

Consumed Kafka streams and applied transformations using Spark and Scala for analytics-ready datasets.

Automated workflow orchestration using Airflow and ADF, ensuring robust monitoring and timely delivery of datasets.

Utilized SSMS to access, configure, and administer all components of SQL Server, Azure SQL Database, and Snowflake.

Created ELT/ETL pipelines to transfer data to and from the Snowflake data store using a combination of Python and T-SQL, while leveraging Snowflake's data sharing capabilities.

Implemented SCD Types in Snowflake for daily incremental data updates, and reduced warehouse costs by leveraging Snowflake's Pause and Resume features.

Integrated Airflow with Azure Blob Storage, and relational databases for efficient data ingestion and loading, ensuring smooth execution of data pipelines.

Leveraged Snowflake's support for semi-structured data formats like JSON and Parquet for flexible data processing and built and optimized several stored procedures in snowflake.

Developed Snowpark scripts within stored procedures for advanced data processing and analytics, utilizing Snowflake’s powerful computation capabilities to ensure efficient operations.

Environments: Azure Databricks, Scala, Data Factory, Informatica, Azure Synapse, Data Flow, Logic Apps, Function Apps, Azure Stream Analytics, Data Pipeline, Spark Streaming, Kafka, Snowflake, Cosmos DB, SQL Server, Azure Data Lake, Azure Blob, Azure SQL Database, Oracle, Python, PySpark, Spark SQL, Power BI.

Role: Data Engineer Feb 2020 – Aug 2022

IBM, Hyderabad, India

Responsibilities:

Designed and developed end-to-end scalable data ingestion pipelines using Azure Data Factory, ingesting data from various sources such as SQL databases, JSON files, CSV files, and REST APIs.

Integrated Azure Data Factory with Azure Logic Apps to orchestrate complex data workflows and trigger actions based on specific events.

Employed Azure Blob Storage for optimized data file storage and retrieval, implementing advanced techniques like compression and encryption to bolster data security and streamline storage costs.

Implemented robust secrets and key lifecycle management practices within Azure Key Vault, ensuring secure storage, rotation, and versioning of sensitive information.

Optimized data pipelines and Spark jobs in Azure Databricks through advanced techniques like Spark configuration tuning, data caching, and data partitioning, resulting in superior performance and efficiency.

Developed and deployed PySpark jobs on Databricks clusters, leveraging its interactive notebooks for exploratory data analysis, model development, and production-ready workflows.

Developed Spark SQL scripts using PySpark for faster data processing to perform data transformations, aggregations, optimizing performance through parallel processing.

Integrated Synapse SQL Dedicated Pool with other Azure services such as Azure Data Lake Storage, Azure Databricks, and Power BI.

Developed custom monitoring and alerting solutions using Azure Monitor and Snowflake Query Performance Monitoring (QPM), providing proactive identification and resolution of performance bottlenecks.

Implemented efficient data archiving and retention strategies utilizing Azure Blob Storage and leveraging Snowflake's Time Travel feature, ensuring optimal data management and regulatory compliance.

Successfully optimized and fine-tuned Spark jobs in Azure Synapse Spark Pools to maximize performance and resource utilization.

Collaborated with stakeholders to translate business requirements into scalable, secure, and analytics-ready solutions.

Developed Python/Flask APIs and orchestrations to enable analytics integrations with front-end tools like Power BI.

Created measures, calculated columns, relationships and performed time series analysis using DAX in Power BI.

Proficient in using Power Query for data cleaning, transformation, and shaping within Power BI.

Published reports and visualizations to Power BI Service and thereby created dashboards.

Implemented continuous integration and continuous deployment (CI/CD) pipelines using Azure DevOps.

Environment: Azure Databricks, Azure Data Factory, Informatica, Azure Blob storage, Azure Synapse Analytics, Azure Data Lake, Azure Event hub, Azure DevOps, AWS, Logic Apps, Function Apps, MS SQL, Python, Snowflake, PySpark, Kafka, Power BI.

Role: Data Warehouse Developer Oct 2018 – Feb 2020

Spectrum, Hyderabad, India

Responsibilities:

Designed, developed, and maintained data integration programs in a Hadoop and RDBMS environment. Worked with both traditional and non-traditional source systems, as well as RDBMS and NoSQL data stores, to enable data access and analysis.

Employed Spark Streaming APIs for on-the-fly transformations and actions, building a common learner data model sourced from Kafka.

Executed the import of data from diverse sources to the HBase cluster using Kafka Connect. Additionally, contributed to the creation of data models for HBase based on existing data models.

Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.

Good Understanding of Data ingestion, Airflow Operators for Data Orchestration and other related python libraries.

Implemented a Python script to invoke the Cassandra Rest API, conducted necessary transformations, and loaded the data seamlessly into Hive.

Collaborated with MDM systems team on technical aspects and report generation. Developed Spark code using Scala and Spark-SQL for accelerated processing and testing, along with executing complex HiveQL queries on Hive tables.

Optimized existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, and Pair RDDs.

Engaged in the end-to-end Big Data flow of the application, encompassing data ingestion from upstream to HDFS, and subsequent processing and analysis of the data within HDFS.

Utilized reporting tools such as Tableau to connect with Hive, facilitating the generation of daily data reports.

Environment: SQL Server, Snowflake, SSIS, Sqoop, MYSQL, HDFS, Apache Spark Scala, Hive Hadoop, Cloudera, HBASE, Kafka, MapReduce, Zookeeper, Oozie, Data Pipelines, RDBMS, Python, PySpark, shell script, ETL.

Role: ETL Developer Jun 2017 –Oct 2018

CyGen HealthTech, Hyderabad, India

Responsibilities:

Developed, and maintained end-to-end ETL processes, ensuring seamless data extraction,

transformation, and loading from source to target systems.

Orchestrated data workflows to support business intelligence, analytics, and reporting requirements.

Demonstrated expertise in ETL tools such as Informatica, Talend, or Apache NiFi, utilizing their functionalities for data integration and transformation.

Implemented indexing, partitioning, and caching strategies to enhance ETL process performance.

Developed and optimized ETL jobs and workflows to align with business objectives and data quality standards.

Implemented and maintained data integration workflows using ETL tools like Informatica, SSIS, or Talend, facilitating seamless data movement across the data warehouse.

Created and maintained data models and mappings, defining the transformation logic to ensure accurate

and consistent data representation.

Conducted performance-tuning activities to optimize ETL job execution times, resource utilization, and

overall system efficiency.

Established and implemented data quality checks within ETL workflows to identify and address anomalies or discrepancies in the data.

Collaborated with data stewards and business users to define data quality rules and metrics.

Integrated ETL processes with diverse source systems, including databases, APIs, flat files, and cloud-based platforms.

Environment: Informatica Power Center, SSIS, SQL Developer, MS SQL Server, Flat Files, XML files Oracle 10g, DB2, SQL, PL/SQL, Unix/Linux.

EDUCATION QUALIFICATION:

Bachelor of Technology in Computer Science from JNTU Hyderabad, India in May 2017.

Master’s in computer science from Rowan University in December 2023.

Contact this candidate