TEJA G
Senior Azure Data Engineer
*********@*****.***/248-***-****
LinkedIn: https://www.linkedin.com/in/tejasreegumpula98/
SUMMARY:
Over 7+ years of hands-on experience designing, building, and maintaining scalable data pipelines and cloud-native solutions using Microsoft Azure and AWS, with deep expertise in Python and Java development.
Led enterprise-wide migrations from legacy systems (Teradata, Netezza) to Azure Synapse Analytics, improving performance by 45% and reducing total cost of ownership.
Led data-driven strategy initiatives for B2C products by applying statistical modelling, experimental design, and quantitative analysis, while collaborating cross-functionally with product, engineering, and marketing teams; leveraged strong communication and coding skills to translate insights into actionable business decisions.
Designed and implemented Fivetran-based database replication pipelines to ingest data from heterogeneous sources (PostgreSQL, MySQL, SQL Server) into Snowflake, leveraging incremental sync and change data capture (CDC) for high-volume, real-time replication.
Developed and maintained dbt transformations and SQL-based models within Fivetran pipelines, optimizing data quality, performance, and cost-efficiency while integrating proactive monitoring and alerting for replication jobs.
Led sprint planning and Scrum ceremonies to drive collaboration across ML and reliability teams, applying strong interpersonal skills and industry trend insights to accelerate discovery and deliver innovative, scalable solutions.
TECHNICAL SKILLS:
Cloud Platform
Azure, Azure Data Factory, Data Flow Mapping, Azure Synapse Analytics, Azure Databricks, Azure SQL Database, Azure Storage, Triggers, DeltaLake, Azure IoT Hub, Azure Event Hubs, Azure Stream Analytics, SSIS, AWS, Docker, Kubernetes, Vault, GCP,AWS – S3,EC2 Instances,Code Pipeline, Code Commit, Code Build, Code Deploy, Cloud Formation, Cloud Watch, Cloud Trail, IAM, VPC, Route 53, Azure Cosmos DB, MSBI (SSIS, SSRS, SSAS), Cosmos, Power BI, BigData, Kusto
Data Engineering
PySpark, Data Integration, Data Warehouse Concepts, Online Analytical Processing (OLAP), Data Warehousing, Business Intelligence, Data Modeling, Dimensional Modeling, Data Vault Modeling, ETL/ELT Processing, Data Lineage and Data Cataloging, Terraform
Languages
Core Java, Python, SQL, Scala, HTTP, CSS, PowerShell,R, C / C++, TypeScript, Node.js
Framework/APIs / Tools
Azure Data Bricks, Hadoop, Flask, Django, NumPy, Pandas, PySpark, Matplotlib, Shell Scripting, PowerShell, Power BI, Tableau, SSRS, Cosmos DB, MongoDB, Hive, BigQuery, Google BigQuery, Dataflow, Pub/Sub, Python, ETL, SQL
Databases
MYSQL, SQL Server 7/2000
Web Tools/IDE
SQL Server Management Studio (SSMS), Power Bi Desktop, Tableau, postgresql
Version Control System
Git, GitHub, Gitlab, SVN, Atlassian, Bit bucket.
Build Tools
Apache Spark / Apache Airflow, Puppet, Kubernetes, Terraform, Jenkins
PROFESSIONAL EXPEREINCE:
Client: CVS, Austin, Texas [Aug 2023 – Till Date ]
Role: Senior Azure Data Engineer
Responsibilities:
Designed, configured, and deployed Microsoft Azure for a multitude of applications utilizing the Azure Stack Including Compute, Web Mobile, Blobs, ADF, Resource Groups, HDInsight Clusters, Azure.
Designed and deployed secure, scalable infrastructure using Azure services like VNETs, NSGs, Key Vault, App Gateway, and Azure DevOps.
Refactored legacy ADF pipelines into a modular, metadata-driven framework, improving maintainability and reducing onboarding time by 40%.
Migrated data platforms from Teradata and Netezza to Azure Synapse, achieving a 45% performance improvement.
Built Delta Lake architecture in Azure Databricks supporting ACID transactions, schema evolution, and time travel for IoT pipelines.
Designed enterprise-grade BI solutions using Power BI, ADF, and Synapse. Built Power BI reports with advanced DAX logic and drill-through analysis. Managed Cosmos DB and Azure SQL datasets for real-time insights. Developed SSIS packages and migrated legacy reports to Power BI. Delivered ad hoc and scheduled reports for executive stakeholders.
Designed and implemented end-to-end ML workflows using Amazon SageMaker, including feature engineering in S3, model training using XGBoost and Linear Learner, and deploying models using SageMaker Endpoints for real-time predictions.
Automated model retraining and monitoring using SageMaker Pipelines, integrating with CloudWatch and S3 versioning for model lineage and drift detection.
Environment: R,,AWS Cloud (S3, ec2, IAM, Cloud formation template, cloud watch), SQL, Python, Data Visualization, Azure, Azure Data Factory, Data Flow Mapping, Azure Synapse Analytics,Spark, PySpark, Data Integration, Data Warehouse Concepts, Online Analytical Processing (OLAP), Data Warehousing, Business Intelligence,Data Modelling.
Client: Infosys [Sep 2021 – July2023]
Role: Azure Data Engineer
Responsibilities:
•Developed and deployed Azure Data Factory (ADF) pipelines for loading historical and incremental data from on-prem MSSQL to Azure Synapse and ADLS.
•Delivered real-time IoT telemetry ingestion and anomaly detection using Event Hubs, Databricks (PySpark), and Synapse Analytics.
•Created Power BI dashboards to visualize outliers, operational KPIs, and SLA adherence.
•Enhanced ETL pipeline resilience using custom logging, failover handling, and dynamic retry logic in ADF and Airflow DAGs. Migrated on-prem SQL Server and Teradata workloads to Azure Synapse Analytics and Cosmos DB, modernizing data infrastructure and reducing legacy system dependency.
•Built metadata driven ADF pipelines integrated with SSIS packages, automating batch and incremental data loads.
•Leveraged Kusto Query Language (KQL) to monitor Azure diagnostics and telemetry data, feeding insights into Power BI reports for stakeholders.
•Automated tabular report refreshes and processing using Azure DevOps and Power BI Gateway configurations.
Environment: Azure Data Lake, Azure Data Factory, Data Flow Mapping, Azure Synapse Analytics, Azure Databricks, Azure SQL Database, Azure Storage, Delta Lake, SSIS, Apache Spark, Apache Airflow, Spark, PySpark, Data Integration, Data Warehousing, Business Intelligence,Data Modelling, Dimensional Modelling, ETL ELT Processing, Data Lineage and Terraform, Power Bi Dashboard.
Client: Hexaware [Jan 2020 – Aug 2021]
Role: Data Engineer
Responsibilities:
•Engineered scalable staging and processing pipelines using Azure Data Factory (ADF) and Azure Data Lake Storage (ADLS).
•Created and optimized ADF activities like Copy, Stored Procedure, and Custom Activity to streamline data workflows.
•Automated infrastructure deployment using Azure Pipelines and PowerShell, enhancing DevOps efficiency.
•Designed and maintained high-volume data workflows in Parquet and Delta formats ensuring accuracy and quality.
•Built Azure Data Lake Analytics jobs to process data stored in ADLS for large-scale analytics.
•Created and deployed tabular models in Azure Analysis Services (AAS) for advanced data modeling.
•Automated tabular model processing using Azure Runbooks, and scheduled via Azure DevOps Pipelines.
Environment:
Python, R, AZURE, SQL, Data Visualization,Azure, Azure Data Factory, Data Flow Mapping, Azure Synapse Analytics,Spark, PySpark, Data Integration, data Modeling, Data Warehousing, GIT,Informatica Power Center, Informatica Big data Edition 9.6.1 hotfix 1,2, Oracle 11G 10G, SQL Server, TSQL, Hadoop 2.5.2, 2.6.0, PL SQL, UNIX, Autosys, Agility Workbench, Power BI.
Client: CGI [Jan 2017 – Aug 2019]
Role: Data Analyst
Responsibilities:
•Developed Spark SQL transformations on large-scale datasets, storing results in HDFS and Azure Blob Storage.
•Built interactive querying logic using aggregate functions in Spark SQL to support analytics use cases.
•Performed data validation, profiling, and transformation using Python (Pandas, NumPy) for structured and semi-structured data.
•Ingested and processed JSON and CSV files for loading into PostgreSQL and Azure Cosmos DB. Managed and transformed HDFS data formats like Avro, SequenceFile, with compression techniques (Snappy, GZIP).
•Executed multiple MapReduce tasks in Hive for data cleaning, pre-processing, and quality checks. Leveraged Spark DataFrame API to clean, filter, and transform data before loading into Hive.
•Built Azure Functions for schema enforcement and structural transformation of Blob storage data.
CERTIFICATIONS:
Stanford relational databases and SQL Certification
Certified in Microsoft Azure Fundamentals (AZ-900)
Google Cloud: Professional Data Engineer (GCP PDE)
Python programming by Hacker Rank.
EDUCATION:
Master’s in Information Technology, Central Michigan University, Mount Pleasant, MI.