Machine Learning Data Warehouse

Location:

Haslet, TX

Posted:

March 03, 2025

Contact this candidate

Resume:

Vasavi Sathuluri

+1-682-***-**** Haslet, TX - *6052

*.*********@*****.*** LinkedIn

EDUCATION

Bachelors in Computer Science, Jawaharlal Nehru Technological University Aug 2014 - May 2018 Relevant Coursework: Data Mining, Big Data technology, Cloud Technologies, Machine Learning, Business Intelligence EXPERIENCE

Southwest Airlines Data Engineer Nov 2021 – Present

● Designed and deployed ETL solutions utilizing Microsoft Synapse, Data Factory, and Kafka to extract, transform, and load data from various sources into the data warehouse, successfully migrating on-premises data to the Azure cloud.

● Engineered data ingestion solutions to capture, process, and transform structured and unstructured data from multiple sources into Azure SQL Data Warehouse and Azure Data Lake Storage, resulting in a 40% reduction in data processing time.

● Leveraged Azure Monitor and Azure Log Analytics to proactively monitor and troubleshoot data pipeline performance issues, ensuring optimal functionality and rapid resolution of anomalies.

● Developed and deployed Power BI dashboards and reports to deliver business insights and visualize key performance indicators (KPIs) for stakeholders, resulting in a weekly savings of 12 hours previously spent on manual reporting.

● Implemented an end-to-end CI/CD pipeline using GitHub Actions, integrating Pylint for code quality checks, building Python wheel files, and storing artifacts in Azure Databricks DBFS, facilitating seamless integration with the ETL pipeline.

● Played a key role in the design, implementation, and data integration of a real-time performance tracking dashboard with Datadog, specifically tailored for monitoring Kafka to Delta pipelines using Scala.

● Developed an automated performance tracking dashboard by integrating code with Datadog UI using the Scala API, which included an alert mechanism resulting in a 94% reduction in job failure risk.

● Transformed data models into dimensional models for optimized data warehousing solutions. Demonstrated expertise in SQL optimizations, enhancing query performance on large datasets and achieving up to a 30% improvement in query response times.

Tata Consultancy Services Python Developer June 2018 – Aug 2021

● Developed Python applications integrating machine learning models and data pipelines.

● Built and optimized Python/SQL notebooks, PySpark jobs, and Azure Data Factory pipelines.

● Designed and maintained structured/unstructured data pipelines in Azure Data Lake.

● Implemented and optimized Azure SQL Data Warehouse solutions and ETL workflows using SSIS.

● Processed large datasets using Azure Databricks, PySpark, and Blob Storage.

● Designed Power BI reports for data visualization and business insights.

● Automated data ingestion, transformation, and failover handling in Databricks.

● Deployed and managed Python libraries, Databricks workflows, and job scheduling.

● Developed dimensional models (Star/Snowflake) and optimized SQL performance.

● Managed Azure Portal, Git version control, and application architecture design. Skills

Programming Languages: Python, Scala, SQL / Advanced SQL (PL/SQL), PySpark Databases: PostgreSQL, MySQL, Microsoft SQL Server, Mongodb, Cassandra, ElasticSearch Cloud: Microsoft Fabric, Azure Datafactory, Synapse Analytics, DevOps, Snowflake, AWS (EC2, S3, Glue) Tools: Spark, Kafka, Airflow, Datadog, Azure Databricks, Power BI, Tableau, Github, Streamlit Projects

AI-Driven PDF Document Assistant with Large Language Models

● Created a user-friendly tool that allows users to input PDF documents for efficient data extraction and analysis, powered by advanced language models including OpenAI’s GPT-3.5, Lang Chain, and Streamlit.

● Leveraging LangChain's text splitter and embeddings, the app enables users to upload PDFs, extract text, and pose queries, with responses generated using an OpenAI-powered question-answering chain. End-to-End Property Market Analysis Project: SQL, Snowflake

● Initiated a data analytics project, leveraging web scraping to extract property listing data and integrate it into Snowflake. Designed a robust database schema using SQL.

● Enhanced the dataset by converting location coordinates to names using Python and GeoPy and translating Polish descriptions to English with the Google Sheets API.

Real-Time AI Generated User Data Streaming with Kafka and Airflow

● Built a resilient data streaming engine with Kafka for real-time AI generated user data using airflow to generate and stream data frequently onto Kafka clusters, ensuring smooth data flow.

● Implemented a PySpark listener to retrieve data from Kafka, process it, and store it in Amazon S3. Utilized Amazon Athena for analyzing the data using SQL to gain insights into user behavior on AI-generated data.

Contact this candidate