Data Engineer Azure

Location:

Indianapolis, IN

Posted:

August 14, 2025

Contact this candidate

Resume:

VUNNAM SPANDANA

Data Engineer

Professional Experience

Azure Data Engineer

Ascension St. John Health System Tulsa, Oklahoma, USA Dec 2023 - Present

Ascension is a large private Catholic healthcare system. I am designing, building, and maintaining scalable data pipelines to ingest, transform, and store large volumes of data from various sources such as transactional systems, third-party APIs, and external data feeds.

Responsibilities:

Stored and processed data by using low level Java APIs to ingest data directly to HBase.

Used Pig as ETL tool to do Transformations with joins and pre-aggregations before storing the data onto HDFS and assisted Manager by providing automation strategies, Selenium/Cucumber Automation and Jira reports.

Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream and Installed Hadoop, Map Reduce, and HDFS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.

Integrated T-SQL codebase into Azure DevOps for version control, continuous integration, and automated deployments.

Staged the API and Kafka Data (in JSON file format) into Snowflake DB by Flattening the same for different functional services.

Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

Imported data from various sources into Spark RDD for processing.

Created Data tables utilizing PyQt to display customer and policy information and add, delete, update customer records.

Developed Python Spark modules for Data ingestion & analytics loading from Parquet, Avro, JSON data and from database tables.

Designed and implemented Infrastructure as code using Terraform, enabling automated provisioning and scaling of cloud resources on Azure.

Actively Participated in all phases of the Software Development Life Cycle (SDLC) from implementation to deployment.

Demonstrated skill in parameterizing dynamic SQL to prevent SQL injection vulnerabilities and ensure data security.

Involved in monitoring and scheduling the pipelines using Triggers in Azure Data Factory.

Built and configured Jenkins slaves for parallel job execution. Installed and configured Jenkins for continuous integration and performed continuous deployments.

Successfully completed a POC for Azure implementation, with the larger goal of migrating on-premises servers and data to the cloud.

Designed and implemented Teradata SQL queries, leveraging joins, aggregations, and indexing for high-performance data retrieval.

Building/Maintaining Docker container clusters managed by Kubernetes Linux, Bash, Git, Docker.

Skilled in monitoring servers using Nagios, Cloud watch and using ELK Stack- Elastic search and Kibana.

Implemented Synapse Integration with Azure Databricks notebooks which reduce about half of development work.

Conducted Performance tuning and optimization of Snowflake data warehouse, resulting in improved query execution times and reduced operational costs.

Developed CI/CD system with Jenkins on Kubernetes environment, utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, Test and Deploy.

Environment: Apache, API, Azure, CI/CD, Data Factory, Docker, ETL, Factory, Git, HBase, HDFS, Hive, Java, Jenkins, Jira, JS, Kafka, Kubernetes, lake, Linux, MapR, Pig, PIG, Python, Scala, Selenium, Snowflake, Spark, SQL

AWS Data Engineer

Devon Energy Oklahoma City, Oklahoma, USA Apr 2022 - Nov 2023

Devon Energy Corp. engages in the exploration, development, and production of oil and natural gas properties. I developed and maintained data models and data warehouses to support reporting, analytics, and machine learning initiatives. It involves understanding business requirements, designing efficient data schemas, and optimizing query performance.

Responsibilities:

Worked on Angular JS to augment browser applications with MVC capability.

Created Data stage and ETL jobs for populating the data into Data Warehouse constantly from different source systems like ODS, flat files, Parquet.

Created AWS Lambda functions and assigned IAM roles to schedule Python scripts using CloudWatch Triggers to support the infrastructure needs (SQS, Event Bridge, SNS)

Combining various datasets in HIVE to generate Business reports.

Worked on Kafka streaming on subscriber side, processing the messages and inserting them into the db and Apache Spark for real-time data processing.

Converted and parsed data formats using PySpark Data Frames, reducing time spent on data conversion and parsing by 40%.

Utilized Elasticsearch and Kibana for indexing and visualizing the real-time analytics results, enabling stakeholders to gain actionable insights quickly.

Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering requirements, design, development, deployment, and analysis of the application.

Expertise in Teradata utilities, such as FastLoad, MultiLoad, FastExport, and TPT, for bulk data processing.

Have used T-SQL for MS SQL Server and ANSI SQL extensively on disparate databases.

Used Jira for ticketing and tracking issues and Jenkins for continuous integration and continuous deployment.

Ensured data integrity and consistency during migration, resolving compatibility issues with T-SQL scripting.

Dockerized applications by creating Docker images from Docker file, collaborated with development support team to setup a continuous deployment environment using Docker.

Developed Kibana Dashboards based on the Log stash data and Integrated different source and target systems into Elasticsearch for near real time log analysis of monitoring End to End transactions.

Implemented a proof of concept deploying this product in AWS S3 bucket and Snowflake.

Executed full CI/CD pipeline by coordinating SCM (Git) with computerized testing instrument Gradle and Deployed utilizing Jenkins (Declarative Pipeline) and Dockerizing holders underway and furthermore occupied with not many Devops devices like AWS Cloud formation, AWS Code pipeline, Terraform and Kubernetes.

Successfully integrated Control-M with various AWS services, orchestrating job schedules, data processing, and workflow automation across the AWS ecosystem.

Environment: Apache, AWS, CI/CD, Docker, Elasticsearch, ETL, Git, Jenkins, Jira, JS, Kafka, Kubernetes, lake, Lambda, PySpark, Python, S3, Snowflake, Spark, SQL

GCP Data Engineer

Five Star Business Finance Limited Chennai, India Sep 2020 - Nov 2021

Five Star Business Finance Limited is a non-banking financial company (NBFC) that provides financial services to micro, small, and medium-sized enterprises in India. Implemented the data quality checks, monitoring data integrity, and ensuring compliance with data governance policies and regulations. It includes identifying and resolving data quality issues, implementing data lineage tracking, and establishing data security controls.

Responsibilities:

Experience in Monitoring System Metrics and logs for any problems adding, removing, or updating Hadoop Cluster.

Assess the infrastructure needs for each application and deploy it on Azure platform.

Creating Data Studio report to review billing and usage of services to optimize the queries and contribute in cost saving measures.

Developed ETL/ELT workflows with Dataflow, ensuring streaming and batch processing for large-scale datasets.

Using Dataproc, Big Query to develop and maintain GCP cloud base solutions.

Created Amazon VPC to create public-facing subnet for web servers with internet access, and backend databases & application servers in a private-facing subnet with no Internet access.

Created and provisioned different Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.

Developed and maintained BTEQ scripts, ensuring efficient data extraction, transformation, and loading (ETL) proesses.

Good knowledge in using Cloud Shell for various tasks and deploying services.

Responsible for estimating cluster size, monitoring, and troubleshooting the Spark Databricks cluster.

Design and Developed Web Services, RESTful APIs for Mobile Apps using Python Django- REST and Flask Frameworks.

Integrated Cloud Composer with Pub/Sub, Cloud Functions, and Dataproc for advanced data orchestration.

Experienced in Google Cloud components, Google container builders and GCP client libraries and Cloud SDK'S.

Utilized C# and ADO.NET to establish connections to databases such as SQL Server, Oracle, and MySQL, enabling efficient data retrieval and manipulation.

Developed pipelines in Azure Data Factory (ADF) using Linked Services, Datasets, and Pipelines for the extraction, transformation, and loading of data from diverse sources such as Azure SQL, Blob storage, and Azure SQL Data Warehouse.

Created BigQuery authorized views for row level security or exposing the data to other teams.

Used Sqoop import/export to ingest raw data into Google Cloud Storage by spinning up Cloud Dataproc cluster.

Knowledge on Google Cloud Dataflow and Apache Beam.

Written Python DAGS in airflow which orchestrate end to end data pipelines for multiple applications.

Environment: Apache, Apache Beam, API, Azure, Azure SQL Data Warehouse, BigQuery, Blob, Cluster, Data Factory, Factory, GCP, MySQL, Oracle, Python, SDK, Services, Spark, SQL, Sqoop, VPC

Data Engineer

MRF Chennai, India Mar 2019 - Aug 2020

MRF, or MRF Tyres, is an Indian multinational tyre manufacturing company and the largest manufacturer of tyres in India. Managed and optimized data infrastructure components such as databases, data lakes, and distributed computing platforms (e.g., Hadoop, Spark) to ensure high availability, reliability, and performance.

Responsibilities:

Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.

Performed ETL to move the data from source system to destination systems and worked on the Data warehouse. Involved in database migration methodologies and integration conversion solutions to convert legacy ETL processes into Azure Synapse compatible architecture.

Strong understanding of Partitioning, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.

Involved in Configuring and Upgrading the On Premises Data gateway between various Data sources like SQL Server to Azure Analysis Services and Power BI service.

Integrated Kafka with Spark Streaming for real time data processing.

Developed an end-to-end solution that involved ingesting sales data from multiple sources, transforming and aggregating it using Azure Databricks, and visualizing insights through Tableau dashboards

Developed multiple notebooks using PySpark and Spark SQL in Databricks for data extraction, analyzing and transforming the data according to the business requirements.

Designing the business requirement collection approach based on the project scope and SDLC methodology.

Well versed with various aspects of ETL processes used in loading and updating Oracle data warehouse.

Used Azure Data Factory to ingest data from log files and business custom applications, processed data on Data bricks per day-to-day requirements, and loaded them to Azure data lakes.

Implemented Apache Airflow for workflow automation and scheduling tasks and created DAGs tasks.

Worked on migration of data from On - prem SQL Server to Cloud databases (Azure Synapse Analytics (DW) & Azure SOL DB).

Develop metrics based on SAS scripts on legacy system, migrating metrics to Snowflake (Azure).

Worked on CI/CD tools like Jenkins, Docker in Devops Team for setting up application process from end-to-end using Deployment for lower environments and Delivery for higher environments by using approvals in between.

Environment: Azure, Azure Analysis Services, Azure Synapse Analytics, CI/CD, Data Factory, Docker, ETL, Factory, GCP, Hive, Java, Jenkins, Kafka, Kubernetes, lake, Oracle, Power BI, PySpark, SAS, Services, Snowflake, Spark, Spark SQL, Spark Streaming, SQL, Sqoop, Tableau

As an experienced Data Engineer with 5+ years of expertise in designing, building, and optimizing data pipelines and architectures across AWS, Azure, and GCP, I seek to apply my skills in cloud-based data engineering to drive data-driven decision-making.

*.*********@*****.***

Professional Summary

316-***-****

5+ years of expertise designing, developing, and executing data pipelines and data lake requirements in numerous companies using the Big Data Technology stack, Python, PL/SQL, SQL, REST APIs, and the Azure cloud platform.

Experience in Evaluation/design/development/deployment of additional technologies and automation for managed services on S3, Lambda, Athena, EMR, Kinesis, SQS, SNS, CloudWatch, Data Pipeline, Redshift, Dynamo DB, AWS Glue, Aurora DB, RDS, EC2.

Experience in working with RIVERY ELT platform which performs data integration, data orchestration, data cleansing, and other vital data functionalities.

Worked with csv, Avro, Parquet data to load into Data frames and do the analysis.

Implemented Big Data solutions using Hadoop technology stack, including PySpark, Hive, Sqoop, Avro and Thrift.

Expert in creating various Kafka producers and consumers for seamless data streaming with AWS services.

Hands-on experience with Spark, Databricks, and Delta Lake.

Created Python code to collect data from HBase and develop the PySpark implementation of the solution.

Experience in Performance Monitoring, Security, Trouble shooting, Backup, Disaster recovery, Maintenance and Support of Linux systems.

Participated in the designing and developing of software using Agile methodologies.

Experience in implementing Azure data solutions, provisioning storage account, Azure Data Factory, SQL Server, SQL Databases, SQL Data warehouse, Azure Data Bricks and Azure Cosmos DB.

Implemented production scheduling jobs using Control-M, and Airflow.

Experienced in building Snow Pipes, migrating Teradata objects into Snowflake environment.

Deployed Dockers Engines in Virtualized Platforms for containerization of multiple apps.

Hands-on experience interacting with REST APIs developed using the micro-services architecture for retrieving data from different sources.

Experienced in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension)

Expertise in building CI/CD on AWS environment using AWS Code Commit, Code Build, Code Deploy and Code Pipeline and experience in using AWS CloudFormation, API Gateway, and AWS Lambda in automation and securing the infrastructure on AWS.

Experience in Cisco Cloud Center to more securely deploy and manage applications in multiple data center, private cloud, and public cloud environments.

Capable of using AWS utilities such as EMR, S3 and cloud watch to run and monitor Hadoop and spark jobs on Amazon Web Services (AWS).

Education

Technical Skills

Wichita State University

Masters/ Computer Science / USA

(2022 - 2023)

Big Data Ecosystem: HDFS, Yarn, MapReduce, Spark, Kafka, Kafka Connect, Hive, Airflow, Stream Sets, Sqoop, HBase, Flume, Pig, Ambari, Oozie, Zookeeper, Nifi, Sentry

Hadoop Distributions: Apache Hadoop 2.x/1.x, Cloudera CDP, Hortonworks HDP

Cloud Environment: Amazon Web Services, Microsoft Azure, GCP

Databases: MySQL, Oracle, Teradata, MS SQL SERVER, PostgreSQL, DB2, Mongo DB

NoSQL Database: DynamoDB, HBase

Microsoft Azure: Databricks, Data Lake, Blob Storage, Azure Data Factory, SQL Database, SQL Data Warehouse, Cosmos DB, Azure Active Directory

Operating systems: Linux, Unix, Windows 10, Windows 8, Windows 7, Windows Server 2008/ 2003, Mac OS

Reporting Tools: Informatica, Talend, SSIS, SSRS, SSAS, ER Studio, Tableau, Power BI, Arcadia, Data stage, Pentaho

Programming Languages: Python (Pandas, SciPy, NumPy, Scikit-Learn, Stats Models, Matplotlib, Plotly, Seaborn, Keras, TensorFlow), PySpark, T-SQL/SQL, PL/SQL, HiveQL, Scala

Version Control: Git, SVN, Bitbucket

Development Tools: Eclipse, NetBeans, IntelliJ, Hue, Microsoft Office

Contact this candidate