Data Engineer Python Developer

Location:

Castro Valley, CA

Posted:

July 09, 2025

Contact this candidate

Resume:

Madhuri Vellaturu GEN AI Engineer Data Engineer Python Developer

Email ID: *********.**@*****.*** Ph #: 510-***-****

SUMMARY

Results-oriented professional with 6 years of experience in Data Analysis, GenAI, driving scalable AI-powered solutions across multiple industries.

In-depth expertise in Generative AI and Large Language Models (LLMs), with hands-on integration of GPT-3 and GPT-4 for business automation, intelligent summarization, and natural language analytics.

Skilled in developing AI-driven demand forecasting and dynamic pricing systems using GPT-4, Random Forest, and XGBoost on AWS SageMaker, achieving up to 25% improvement in forecasting accuracy and 10% revenue growth.

Competent in building LLM-enhanced reinforcement learning agents for portfolio optimization, incorporating GPT-generated sentiment and macroeconomic scenarios to improve financial decision-making.

Extensive experience developing ETL pipelines with PySpark, AWS Glue, and SQL, integrating structured and unstructured data from various systems, including retail, finance, and CRM.

Expertise in building real-time analytics systems with AWS Kinesis and LLMs, enabling anomaly detection, demand spikes, and operational alerts to improve business response times.

Strong proficiency in Pandas, NumPy, and Matplotlib for data wrangling, transformation, and creating advanced visualizations to support decision-making.

Experienced in containerizing applications and deploying them with Docker, Kubernetes, and AWS Fargate, ensuring efficient CI/CD integration for production-grade ML/GenAI solutions.

Deep understanding of data quality, dimensional modeling, and SQL-based business rule validation to maintain analytical integrity and support compliance.

Skilled in web scraping with BeautifulSoup for integrating external content into data pipelines, aiding training, enrichment, and compliance (e.g., MiFID II, GDPR).

Proficient in building GUI applications using PyQt and wxPython, providing intuitive interfaces for managing policy and transaction data with full CRUD capabilities.

Strong adherence to Python best practices, including PEP8 compliance, unit testing (PyUnit), reusable script development, and structured logging for debugging and system monitoring.

Experienced in Agile development environments, utilizing JIRA, Git, and CI/CD pipelines, coordinating teams to deliver high-quality solutions within deadlines.

Excellent interpersonal, communication, and organizational skills, ability to handle multiple tasks and work well with others.

TECHNICAL SKILLS

Machine Learning & AI

Random Forest, XGBoost, Logistic Regression, K-Means, Reinforcement Learning, NLP, GPT-3, GPT-4, TensorFlow, PyTorch, Scikit-learn, Pandas, NumPy

Cloud & DevOps

AWS (SageMaker, Lambda, Glue, Kinesis, EC2, Fargate, CloudWatch), Google Cloud, Docker, Kubernetes, Terraform, Jenkins

Data Visualization

Tableau, Amazon QuickSight, Power BI

Version Control & Collaboration

Git, GitHub, GitLab, JIRA, Confluence

Data Engineering

PySpark, AWS Glue, Apache Kafka, Apache Airflow

Web Scraping & Automation

BeautifulSoup, Selenium, AWS Lambda

Databases

MySQL, PostgreSQL, SQL Server, Amazon Redshift

Version Control Tools

GIT, GITHUB, SVN

Project Management Tool

Jira

Operating System

Windows, Linux

EXPERIENCE

Columbia Bank, Tacoma, WA Apr 2024 – Till Date

GEN AI/ ML Engineer

Developed an AI-driven demand forecasting and dynamic pricing system for a global retailer, integrating GenAI and LLMs to enhance forecasting precision and business decision-making.

Integrated GPT-4 to generate synthetic demand scenarios, simulate market shifts, and enrich training data for traditional ML models, increasing model resilience to edge cases.

Built a prompt-driven analytics layer using GPT-4 and LLM APIs, enabling business users to query sales forecasts and pricing strategies through natural language interfaces.

Created custom LLM-powered agents to summarize large-scale retail transaction data, providing real-time insights and generating executive-level reporting on sales trends.

Developed forecasting models using Random Forest and XGBoost, trained on AWS SageMaker, with automated hyperparameter tuning and scalable deployment pipelines.

Engineered robust ETL workflows using AWS Glue to process and clean multi-source retail data (POS, e-commerce, inventory), prepping it for ML and LLM analysis.

Built a dynamic pricing engine that adjusted prices in real-time based on predictive models, LLM-generated market narratives, competitor pricing, and demand elasticity.

Deployed ML and LLM components using Docker, Kubernetes, and AWS Fargate, ensuring high availability, portability, and efficient compute resource allocation.

Streamed live retail data via AWS Kinesis, triggering on-the-fly GPT-4 analysis for identifying anomalies, demand spikes, and inventory risks.

Queried large-scale data sets from Amazon Redshift, and integrated GPT-4 summarization layers to translate raw numbers into decision-ready insights for category managers.

Built interactive dashboards in Amazon QuickSight, enhanced with LLM-based narrative generation, auto-summarizing trends, forecasts, and pricing performance.

Leveraged GPT-4 agents automate code generation for feature engineering and pipeline debugging, significantly accelerating development cycles.

Implemented CI/CD workflows via AWS CodePipeline, allowing seamless deployment of updated ML models and LLM-enhanced modules.

Monitored infrastructure and model health using Amazon CloudWatch, augmented by GPT-based alerting that provided human-readable explanations of anomalies.

Achieved a 25% increase in forecasting accuracy and a 10% uplift in revenue from pricing optimizations, driven by the synergy of predictive ML and LLM-assisted decision intelligence.

Environment: AWS SageMaker, GPT-4, LLM APIs, Random Forest, XGBoost, Docker, Kubernetes, AWS Fargate, AWS Glue, Amazon Redshift, Amazon QuickSight, AWS Kinesis, Amazon CloudWatch, AWS CodePipeline

Pacific Specialty Insurance, Anaheim, CA Sept 2022 – Mar 2024

GEN AI Engineer

Integrated GPT-3 to generate market sentiment narratives, financial news summaries, and scenario-based prompts to guide reinforcement learning agents in portfolio optimization.

Designed a LLM-powered assistant to interact with investment professionals, allowing natural language queries on portfolio performance, risk metrics, and optimization recommendations.

Developed prompt-based interfaces using OpenAI's GPT-3 API to simulate “what-if” scenarios based on macroeconomic indicators and market signals, feeding them into the RL model environment.

Used PyTorch and Keras to train reinforcement learning agents, enhanced with GPT-3-generated market context to improve decision-making under volatile conditions.

Leveraged Google Cloud AI Platform for large-scale model training, using LLM-generated synthetic financial datasets to augment training for rare or extreme market conditions.

Built a conversational interface powered by GPT-3, allowing portfolio managers to interact with the system using natural language to retrieve asset allocation strategies and performance breakdowns.

Deployed containerized models using Docker and orchestrated workflows on Kubernetes, integrating LLM-driven insights into the RL training and inference pipelines.

Built a market data ingestion pipeline that combined real-time feeds with GPT-3 sentiment analysis on relevant financial news and reports to enrich asset allocation strategies.

Developed a risk dashboard enhanced with LLM-based summaries, transforming complex financial indicators and risk metrics into explainable narratives for analysts and stakeholders.

Implemented a continuous retraining pipeline where GPT-3 generated weekly financial summaries and regulatory changes, used to flag potential model drift and retraining needs.

Ensured compliance by using LLMs to assist in automatically tagging and classifying transactions for regulatory requirements (e.g., MiFID II, GDPR), improving audit readiness.

Conducted A/B testing on GPT-3-enhanced versus baseline RL models to evaluate the impact of narrative and sentiment inputs on portfolio return optimization.

Environment: GPT-3, OpenAI API, PyTorch, Keras, Reinforcement Learning, Google Cloud AI, Docker, Kubernetes, TensorFlow, AWS SageMaker, Flask, Financial APIs, LLM APIs, Natural Language Processing, A/B Testing, CI/CD, GDPR Compliance, MiFID II

Focus1 Insurance, Portland, OR Jan 2021 – Aug 2022

Data Engineer

Designed end-to-end data pipelines using PySpark and SQL Server, enabling efficient ETL workflows and near real-time reporting.

Developed predictive models (Random Forest, Lasso, Logistic Regression) to forecast sales and identify high-value customers.

Used Tableau to develop interactive dashboards for executives to track marketing performance and customer behavior.

Migrated Tableau dashboards and underlying data pipelines from on-prem to AWS Cloud Workspace for scalability.

Created SQL scripts for rule validation, data quality checks, and business metrics computation.

Integrated data from multiple sources including CRM, marketing platforms, and call center logs using SQL Loader and custom Python scripts.

Created dimensional models (fact/dimension tables) to support BI reporting and analytics use cases.

Performed data wrangling, missing value imputation, and outlier detection using Python and Scikit-learn.

Developed reusable Python functions for EDA, visualization, and automated model evaluation.

Conducted cluster analysis using K-means to group customers based on purchasing behavior and demographics.

Applied collaborative filtering and content-based filtering techniques for product recommendation models.

Automated weekly performance reporting using Tableau and Python scheduling libraries.

Built shell scripts to automate SQL jobs and validation checks against master datasets.

Used ElasticSearch and Kibana to analyze and visualize text-based data and user search logs.

Developed performance monitoring dashboards for key SLAs using Tableau and email alerting in AWS Lambda.

Environment: Tableau, PySpark, Python, SQL Server, AWS, ElasticSearch, Kibana, Pandas, Scikit-learn, AWS Lambda

Plumas Bank - Reno, NV Mar 2019 – Dec 2020

Data Analyst

Managed multiple projects of small to large scale through sprints.

Worked closely with stakeholders and subject matter experts to elicit and gather business requirements.

Documented business and functional requirements, user stories, acceptance criteria, test cases, and wireframes/mockups.

Worked directly with the Dev team to ensure product backlog was understood to the level needed and to keep sprint on track.

Tracked Velocity, Capacity, Burn Down Charts, and other metrics during iterations.

Created Data flow diagrams.

Prioritized the backlog to ensure on time delivery.

Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.

Facilitated Agile team ceremonies including Daily Standup, Backlog Grooming, Sprint Review, Sprint Planning etc...

Developed a machine learning system that predicted purchase probability at a particular offer based on customer’s real time location data and past purchase behavior.

Assisted Data Scientist team to Gathered the Requirements, Develop Process Model and detailed Business Policies and modified the business requirement document.

Performed legacy application data cleansing, data anomaly resolution and developed cleansing rule sets for ongoing cleansing and data synchronization.

Involved in project cycle plan for the data warehouse, source data analysis, data extraction process, transformation and loading strategy designing.

Using Python, I have automated a process to extract data and various document types from a website, save the documents to specified file path, and upload documents into an excel template.

Performed data analysis and data profiling using SQL on various sources systems including Oracle and Teradata.

Extensively used Star Schema methodologies in building and designing the logical data model into Dimensional Models.

Maintain and enhance data model with changes and furnish with definitions, notes, reference values and check lists.

Ran SQL queries to test and validate data in database (SQL Server, DB2).

Developed SQL queries /scripts to validate the data such as checking duplicates, null values, truncated values and ensuring correct data aggregations within an ETL process testing cycle.

Involved with data model and reporting requirements for Tableau Reports with the Data warehouse / ETL and Reporting teams.

Worked with Users to develop Test cases for user acceptance testing.

Drove User Acceptance Testing.

Designed and published visually rich and intuitively interactive Tableau/Excel workbooks and dashboards for executive decision making.

Contact this candidate