Data Engineer Azure

Location:

Houston, TX

Salary:

100000

Posted:

September 10, 2025

Contact this candidate

Resume:

Sravani Reddy Donga

838-***-**** *****************@*****.*** www.linkedin.com/in/sravani-reddy-donga-b8051220a

OBJECTIVE

Experienced Data Engineer with 5+ years of experience designing and implementing scalable ETL pipelines and robust data architectures. Demonstrates proficiency in SQL, Python, and cloud platforms (AWS, GCP, and Azure) to optimize performance and cost efficiency. Proven track record in delivering innovative, high-quality data solutions through effective collaboration and agile methodologies.

TECHNICAL SKILLS

•Cloud Technologies: AWS, GCP, Azure

•AWS Ecosystem: S3Bucket, Athena, Glue, EMR, Redshift, Data Lake, AWS Lambda, Kinesis

•Azure Ecosystem: Azure Data Lake, ADF, Databricks, Azure SQL

•Databases: Oracle, MySQL, SQL Server, PostgreSQL, HBase, Snowflake, Cassandra, MongoDB

•Programming Languages: Java, Python, Hibernate, JDBC, JSON, HTML, CSS

•Script Languages: Python, Shell Script

•Version Controls And Tools: GIT, Maven, SBT, CBT

•Hadoop Components / Big Data: HDFS, Hue, MapReduce, PIG, Hive, HCatalog, HBase, Sqoop, Impala, Zookeeper, Flume, Kafka, Yarn, Cloudera Manager, Kerberos, Pyspark Airflow, Kafka, Snowflake, Spark Components

•Visualization & ETL Tools: Tableau, PowerBI, Informatica, Talend, Data Warehousing

•Operating Systems: Windows, Linux

•Methodologies: Agile (Scrum), Waterfall, UML, Design Patterns, SDLC

•Webservers: Apache Tomcat, WebLogic

•Collaboration Tools: JIRA, Confluence, Slack, Microsoft Teams

•Networking: Data Integration, API Management, Cloud Networking, Data Migration

•Containers & Orchestration: Docker, Kubernetes, Terraform

WORK EXPERIENCE

Delta Airlines May 2024 - Present

Azure Data Engineer Atlanta, Georgia, USA

•Developed ETL processes using PySpark with both DataFrame and Spark SQL APIs to transform and load data efficiently.

•Configured Kerberos authentication for secure cluster communications and tested HDFS, Hive, Pig, and MapReduce access for new users.

•Designed PowerBI dashboards with complex aggregate calculations to visualize key business metrics.

•Engineered a Spark streaming application to process raw packet data from Kafka, format it to JSON, and reinject into Kafka for scalable data pipelines.

•Implemented Spark applications for data validation, cleansing, transformation, and custom aggregations using Python.

•Built data tables with PyQt to display and manage customer and policy information, supporting add, delete, and update operations.

•Utilized Python's Beautiful Soup library for web scraping to integrate external data sources.

•Established infrastructure as code with Terraform for automated provisioning and scaling of Azure cloud resources.

•Collaborated in all phases of the SDLC from implementation through deployment to deliver robust solutions.

•Developed database connections using Perl and Python for efficient data retrieval across SQL Server, Oracle, and MySQL platforms.

•Crafted big data solutions with Azure Synapse Studio and utilized Azure Data Factory to ingest, process, and store data in Azure Data Lakes.

•Analyzed and optimized data pipelines using modern scheduling tools like Airflow to facilitate migration to an enterprise data lake on Azure.

•Configured Jenkins and continuous integration pipelines to support parallel job execution and automated deployments in the cloud environment.

•Orchestrated automated ETL pipelines between data warehouses using Python and Snowflake, ensuring high-quality data processing.

Charles Schwab Corporation Jun 2023 - Apr 2024

AWS Data Engineer Westlake, Texas, USA

•Designed scalable data pipelines leveraging AWS Glue, Lambda, and Amazon S3 to automate ETL processes across brokerage, trading, and advisory platforms.

•Established real-time data ingestion systems using Amazon Kinesis and AWS Lambda to process live trading events and customer interactions.

•Constructed optimized data models and warehouses with Amazon Redshift and Snowflake to support portfolio analytics and risk management.

•Managed transactional and reference data using Amazon RDS, Aurora, DynamoDB, and Neptune to ensure secure and efficient access.

•Automated data workflows with AWS Step Functions, Data Pipeline, and Apache Airflow, streamlining compliance reporting and account updates.

•Integrated third-party market and financial data via AWS API Gateway, Lambda, and AppFlow to enrich client portfolios and advisory dashboards.

•Conducted rigorous data quality checks using AWS Glue, PySpark, and Lambda, ensuring the accuracy of financial transaction logs and regulatory datasets.

•Developed end-to-end ETL solutions with Apache Spark and AWS Glue to structure and analyze trade execution data and customer portfolio behavior.

•Engineered Hadoop ecosystem solutions with HDFS, MapReduce, Hive, and Pig to process high-volume trading logs for long-term analytics.

•Designed ETL flows using Apache NiFi, Talend, and AWS Glue to consolidate financial, advisory, and CRM data from distributed infrastructures.

•Migrated data between Hadoop and relational systems using Sqoop, maintained real-time pipelines with Flume and Zookeeper, and executed fast queries with Impala.

•Deployed and managed large-scale unstructured data solutions with MongoDB and Cassandra for customer trade notes, feedback, and communication logs.

•Implemented containerized analytics solutions with Docker and orchestrated deployments using Kubernetes to enhance system scalability and reliability.

•Provisioned infrastructure as code using Terraform and CloudFormation, streamlining the deployment of Schwab's AWS-based data ecosystems.

•Created interactive dashboards with PowerBI and Tableau to deliver insights on advisor performance, portfolio allocation, and operational KPIs to leadership.

•Leveraged Control-M to automate and manage end-to-end data pipelines on Azure, facilitating seamless data integration, transforma- tion, and movement.

•Creating Data Studio report to review billing and usage of services to optimize the queries and contribute in cost saving measures.

•Experienced in GCP Dataproc, GCS, Cloud functions, BigQuery.

•Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

•Good knowledge in using Cloud Shell for various tasks and deploying services.

•Converted and parsed data formats using PySpark Data Frames, reducing time spent on data conversion and parsing by 40%. Developed Monitoring and notification tools using Python.

•Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery. Design and configure database, Back-end applications and programs. Managed large datasets using Pandas data frames and SQL.

•Involved in developing data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and Spark SQL. Also Worked with Cosmos DB (SQL API and Mongo API).

•Worked with google data catalog and other google cloud APIs for monitoring, query and billing related analysis for BigQuery usage.

•Working with GCP cloud using in GCP Cloud storage, Data-proc, Data Flow, Big-Query, EMR, S3, Glacier and EC2 with EMR Cluster.

•Used Google Cloud Dataflow using Python sdk for deploying streaming jobs in GCP as well as batch jobs for custom cleaning of text and json files and write them to BigQuery.

•Written Python DAGS in airflow which orchestrate end to end data pipelines for multiple applications.

ICICI Prudential May 2019 - Aug 2021

Data Engineer Bangalore, India

•Designed and implemented scalable ETL pipelines using Apache Spark and Apache Kafka to enable seamless integration and real-time processing of policyholder, claims, and premium transaction data.

•Developed predictive models using Python, Scikit-learn, and TensorFlow for customer lifetime value, risk scoring, and lapse prediction to enhance underwriting accuracy and retention strategies.

•Performed advanced analytics and data modelling using SQL, Python, and Pyspark on large-scale insurance datasets to support actuarial analysis and regulatory reporting.

•Managed unstructured insurance data using MongoDB and Cassandra, supporting customer service interactions, document storage, and policy issuance records with scalable NoSQL solutions.

•Implemented Jenkins CI/CD pipelines for automating deployment of data models and analytics workflows, ensuring rapid iterations and reduced time-to-market for insurance solutions.

•Scheduled and managed complex data workflows with Apache Airflow, orchestrating data pipelines across CRM, claims processing, and policy administration systems, used Docker and Kubernetes to containerize and orchestrate data engineering workflows, ensuring high availability and scalability across cloud environments.

•Built ETL pipelines using Apache NiFi and Talend to extract, transform, and load data from multiple legacy systems and third-party financial platforms, ensuring data consistency and reliability.

•Applied Apache Atlas and Collibra to establish data governance, metadata management, and compliance with data privacy policies including IRDAI and GDPR in handling sensitive policyholder data.

• Developed customer segmentation and cross-sell/up-sell models using machine learning to personalize insurance offerings and improve campaign conversion rates, Integrated analytics pipelines with marketing platforms and customer portals using REST APIs, enhancing real-time data-driven decision-making for lead generation and policy servicing.

EDUCATION

Southern New Hampshire University

Aug 2022 - Apr 2024

Masters, Business Analytics

•GPA: 3.9

Manchester, New Hampshire

Certifications

•Microsoft Certified: Azure Data Fundamentals

•AWS Certified: Data Engineer Associate

ABB(Cognizant)

GCP Data Engineer

•Involved in requirement gathering, business analysis, and technical design for Hadoop and Big Data projects.

Sep 2021 - Jul 2022

Bangalore, India

Contact this candidate