Machine Learning Data Engineer

Location:

Santa Clara, CA

Salary:

90000

Posted:

October 15, 2025

Contact this candidate

Resume:

GOWTHAMI CHIKKA

Database Engineer Data Engineer

PROFESSIONAL SUMMARY 405-***-**** ******.**********@*****.*** LinkedIn Data Scientist with 4+ years of experience in predictive modeling, optimization, and machine learning. Skilled in ARIMA, LightGBM, TensorFlow, PyTorch, SQL, Databricks, Snowflake, and Airflow. Experienced in building and deploying predictive and prescriptive models, optimization (Linear Programming, Knapsack), feature engineering, model tuning, and real-time dashboards for decision-making.. Skilled in Azure SQL, Cosmos DB, PostgreSQL, MongoDB, and Redshift with expertise in query tuning, indexing strategies, and performance optimization. Proficient in building ETL/ELT pipelines, data transformation workflows, and automation using Apache Airflow, Azure Data Factory, and AWS Glue. Hands-on experience implementing caching (Redis) and search technologies (Elasticsearch) to reduce latency and improve scalability. Adept at deploying secure, production-grade database solutions using Docker, Kubernetes, and REST APIs integrated with CI/CD pipelines (Azure DevOps, GitHub Actions). Strong foundation in algorithms, data structures, and distributed computing with practical expertise in Python and SQL. Known for collaborating with cross-functional teams to deliver high-performing, compliant, and scalable database systems while maintaining a data-driven approach to performance monitoring and troubleshooting. Experienced in CI/CD (GitHub Actions, Azure DevOps) and MLOps (MLflow, Airflow) for deploying scalable AI workflows. Strong foundation in statistics, linear algebra, and optimization applied to ML research and client solutions. TECHNICAL SKILLS

Programming & Scripting: Python, PySpark, SQL, Pandas, NumPy, Java, R, C++ (working knowledge) Databases & Querying: PostgreSQL, NoSQL (MongoDB, Cosmos DB), Snowflake, Redshift Machine Learning & AI: Scikit-learn, TensorFlow, NLP (spaCy, NLTK), Sentiment Analysis, Recommendation Systems, Personalization, Search Ranking, Cold-Start Modeling, Anomaly Detection, Time-Series Modeling, Hyperparameter Tuning

(GridSearchCV, Random Search, Cross-validation), Fraud Detection Models, LLMs, Generative AI, ARIMA, LightGBM, PyTorch. Optimization: Linear Programming, Knapsack.

Cloud Platforms & Services: AWS (SageMaker, EC2, S3, Glue, Redshift), Azure (Data Factory, Synapse Analytics), Databricks

,GCP, Azure SQL, Cosmos DB, CI/CD & DevOps: Azure DevOps Data Engineering & Pipelines: Apache Airflow, ETL/ELT Workflows, Data Ingestion, Data Preprocessing, Data Transformation, MLOps Pipelines, MLflow, GitHub Actions Deployment & Microservices: Docker, Kubernetes, REST APIs, Microservices Integration Visualization & Reporting: Tableau, Power BI, A/B Testing, KPI Reporting, Executive Dashboards, looker studio GenAI/Research: Generative AI, Agentic Actions research, RAG systems. EXPERIENCE

Machine Learning Engineer Northern Trust, USA Feb 2025 – Present

• Optimized Azure SQL and Redshift queries with advanced indexing, reducing execution time by 40%.

• Developed predictive models with ARIMA, LightGBM, and TensorFlow to improve fraud detection accuracy.

• Built automated ETL workflows with Glue, Airflow, and Redshift processing 1TB+ daily data.

• Designed Redis caching layers for real-time fraud scoring, reducing latency by 25%.

• Created monitoring dashboards with Azure Monitor and Datadog for database health & query efficiency.

• Integrated database-backed microservices using Docker, Kubernetes, and REST APIs with 99.9% uptime.

• Delivered compliance and performance dashboards in Tableau/Power BI aligned with FINRA & SOX.

• Designed recommendation-style ranking models for fraud likelihood scoring, leveraging optimization methods to prioritize high-risk transactions (parallel to personalization/search ranking systems).

• Collaborated with risk and compliance teams to run A/B tests, analyze user/system behavior, and address cold- start challenges when new transaction types emerged. AI Data Engineer Streebo Inc, India Sep 2019 - Dec 2022

• Designed ETL workflows (SQL, Pandas, NumPy) to process millions of logs daily, reducing prep time by 60%.

• Built NoSQL pipelines using Cosmos DB and MongoDB for enterprise-scale applications.

• Integrated Elasticsearch and Redis into microservices to improve response times.

• Automated database maintenance & retraining triggers with Airflow + MLflow.

• Developed REST APIs with optimized SQL queries powering 10+ enterprise apps.

• Produced dashboards (Tableau, Power BI) for 50+ stakeholders, supporting database-backed decision-making.

• Integrated ML models into RESTful APIs and microservices (Docker, Kubernetes), enabling adoption across 10+ enterprise apps with seamless performance.

• Built early-stage personalization models to recommend resolution templates based on historical customer service interactions, addressing sparse-data cold-start cases.

• Applied ranking algorithms and online learning techniques to continuously improve recommendation accuracy as new support queries appeared.

• Tuned models with GridSearchCV, Random Search, and cross-validation, improving predictive accuracy by 12% over baseline.

• Set up Airflow + MLflow. pipelines to monitor model drift and trigger automated retraining, cutting downtime from weeks to under 24 hours.

• Conducted A/B testing with business teams, producing Tableau/Power BI dashboards adopted by 50+ stakeholders to drive customer service decisions.

EDUCATION

Masters in Data Science and Analytics University of Oklahoma, Norman, Oklahoma Aug 2023 – Dec 2024 Conducted research on ML model selection frameworks, synthesizing insights from 300+ academic papers, preparing manuscript for publication in data science/ML venues.

Bachelors in Mechanical Engineering G Pulla Reddy Engineering College, Andhra Pradesh Sep 2016 – Sep 2020 PROJECTS

Healthcare AI :Fibroid Detection Pipeline

Built a scalable ML pipeline on Azure to predict fibroid occurrence from ultrasound data; improved recall by 15% while ensuring HIPAA compliance.

• Designed and trained a deep learning model (PyTorch) using MRI imaging + patient EHR data to predict fibroid growth.

• Built preprocessing pipelines in Python (Pandas, NumPy) for structured/unstructured data cleaning and feature engineering.

• Applied ranking and optimization techniques to prioritize treatment recommendations, aligning with personalization workflows

• Deployed end-to-end pipeline on Azure ML, storing medical images in Azure Blob Storage with CI/CD automation via Azure DevOps for continuous retraining and monitoring.

• Delivered explainable AI insights (Grad-CAM visualizations), supporting clinicians in treatment planning and improving decision transparency.

Financial Risk Analytics : Fraud Detection

Designed and deployed real-time fraud detection pipelines on AWS, cutting false positives by 22% and ensuring SOX/FINRA compliance.

• Applied ARIMA and LightGBM for risk scoring, combined with optimization methods to improve accuracy and prioritize alerts.

• Developed scalable ETL workflows (Python + SQL) for data ingestion, cleaning, and transformation from multiple sources.

• Implemented on AWS SageMaker, with real-time scoring endpoints, retraining automation, and monitoring via AWS CloudWatch.

• Designed scoring logic analogous to recommendation ranking systems, prioritizing high-risk transactions in real time transferable to content personalization and search ranking.

• Improved risk detection accuracy by 15% while ensuring compliance with Basel III and financial governance standards.

Contact this candidate