SUBRAY G. SHETTY
Cell: 732-***-**** Email: ************@*****.*** Location: New Jersey LinkedIn: https://www.linkedin.com/in/subrayshetty/
EXECUTIVE SUMMARY
Strategic Clodu Data Architect and Software Engineering Leader with over 18 years of experience modernizing cloud-native financial technology and AI platforms. Expert in architecting Snowflake and Databricks ecosystems, implementing Data Mesh at scale, and driving GenAI/LLM strategies. Proven track record of delivering governed, high-performance data products for global financial institutions including the Federal Reserve, LSEG, TD Ameritrade, and UBS.
TECHNICAL SKILLS & COMPETENCIES
Data Engineering & Orchestration: dbt Cloud (Advanced Modeling), Apache Airflow, RabbitMQ, AWS MSK (Kafka), Spark/PySpark, Python, SQL, and Shell Scripting.
Data Warehousing & Architecture: Snowflake (Data Cloud Architect), Databricks (Lakehouse/Delta Lake), Oracle, Teradata, Apache Iceberg, Data Vault 2.0, and Dimensional Modeling.
AI, ML & Advanced Analytics: AI Data Marts, Generative AI (RAG, LLMs), Snowflake Cortex, AWS Bedrock, SageMaker, Azure AI Studio, and Tableau.
Cloud & Infrastructure: AWS (EKS, Lambda, S3, EMR), Microsoft Azure, Terraform (IaC), Kubernetes, and Data Mesh Architecture.
Governance & Master Data: Informatica MDM, Unity Catalog, Immuta, Collibra, DataOps, and MLOps.
Programming: Expert-level Python proficiency; strong familiarity with C++, Go, or Java for integration and performance-critical workloads.
Frameworks: TensorFlow, PyTorch, JAX, ONNX, Hugging Face Transformers, and scikit-learn.
Data Science: NumPy, pandas, SciPy, scikit-learn, LangChain, R, and SQL.
MLOps Tools: MLflow, Airflow, Kubeflow, DVC, BentoML, and Weights & Biases.
Data Systems: Spark, Databricks, Kafka, Delta Lake, Snowflake, or BigQuery.
Cloud Platforms: AWS (SageMaker, Bedrock), Azure (Machine Learning, Synapse), or GCP (Vertex AI, Dataflow).
Infrastructure: Docker, Kubernetes, Terraform, Helm, and GPU/TPU orchestration.
Security & Compliance: IAM, key management, audit logging, AI model explainability, and Responsible AI design (including the design of Agentic guardrails).
Communication: Ability to write exceptionally and create compliant/compelling narratives; skilled in conceptualizing and developing rich graphic visuals to communicate complex technical concepts and approaches.
Data Systems: Spark, Databricks, Kafka, Delta Lake, Snowflake, Starburst (Trino), and BigQuery.
Infrastructure: Docker, Kubernetes, Terraform, Helm, ArgoCD, and Starburst Cluster Management.
PROFESSIONAL EXPERIENCE
NET2SOURCE(Client:FEDERAL RESERVE BANK OF NEW YORK ) Nov 2024 – Present Data Platform Engineer (AI Architecture + Data Mesh)
Infrastructure as Code (IaC): Leveraged Terraform to design, provision, and manage secure, scalable cloud infrastructure for the enterprise Data Lake on AWS, ensuring consistent environment parity and automated deployments.
Performance Observability: Engineered a comprehensive monitoring solution using Grafana to visualize Starburst (Trino) performance metrics; developed AWS Lambda functions to extract execution telemetry from CloudWatch and ingest it into Grafana dashboards for real-time query optimization and bottleneck identification.
Data Mesh Leadership: Architected cloud-native Data Mesh infrastructure on AWS, empowering domain teams to build and own data products under a centralized governance model.
Lakehouse Architecture: Operated Databricks-based Lakehouse environments supporting large-scale batch and streaming analytics for regulatory and supervisory domains.
Real-time Ingestion: Built event-driven platforms using AWS MSK (Kafka) for low-latency ingestion of financial events.
Governance: Implemented automated metadata management and policy-based access control using Collibra and Immuta.
Cloud-Native Orchestration & GitOps: Architected and managed scalable containerized workloads on AWS EKS (Elastic Kubernetes Service) using Terraform for infrastructure provisioning and Helm charts for standardized application packaging.
Automated Deployment Pipelines: Implemented a robust GitOps workflow using ArgoCD to automate the continuous delivery of data services and ML models, ensuring declarative state management and rapid, auditable deployments across AWS environments.
Infrastructure as Code (IaC): Developed and maintained modular Terraform scripts to manage complex VPC networking, IAM roles for service accounts (IRSA), and high-availability EKS clusters, reducing environment setup time by 40%
Agentic Workflows & Safeguards: Architecting agentic workflows that chain AI outputs into downstream regulatory actions; implemented human-in-the-loop safeguards and Agentic guardrails to ensure 100% compliance with Federal guidelines.
Demonstrated courageous team collaboration by leading cross-functional discussions across technology, risk, compliance, and business stakeholders; constructively challenged assumptions and design decisions to drive secure, scalable, and regulator-aligned solutions.
Delivered at enterprise scale by architecting and modernizing large-scale data platforms supporting mission-critical regulatory, monetary policy, and financial supervision workloads, prioritizing simplicity, reliability, and measurable end-user outcomes.
Acted as a trusted technical owner for complex initiatives spanning cloud, on-prem, and hybrid environments, taking end-to-end accountability from architecture and design through implementation, risk review, and production readiness.
Freelancer Semantic Search Engine: Jul 2024 – Oct 2024
Designed and implemented a semantic search engine leveraging RAG and Quadrant database, with a React-based web UI, enabling intelligent search across social platforms (Facebook, LinkedIn) and e-commerce sites.
Designed and developed a full-stack semantic search engine with React (frontend), Python (backend), and Quadrant DB.
LONDON STOCK EXCHANGE GROUP (LSEG) Sept 2021 – July 2024 Sr. Manager Research, Snowflake Data Platform Engineer
Sr. Manager Research, Snowflake Data Platform Engineer
Advanced ML & Macro-Correlation: Architected a predictive ML framework to determine the correlation between TBA (To Be Announced) Securities and macroeconomic drivers, including Non-Farm Payroll (NFP), Consumer Price Index (CPI), and consumer indices.
Sentiment & Signal Extraction: Leveraged Snowflake Cortex and Python to perform NLP on news speculation data, identifying leading indicators for fixed income asset price fluctuations.
Data Vault 2.0 & Dimensional Modeling: Engineered a highly scalable and auditable data architecture using Data Vault 2.0 for raw storage and Star Schema/Dimensional modeling for analytics; utilized Lucidchart for complex architectural mapping and stakeholder visualization.
TBA Securities POC: Successfully executed a Proof of Concept (POC) in Snowflake to map and visualize the relationship between real-time market data and mortgage-backed securities (TBAs), enhancing transparency for quant research teams.
Federated Snowflake & Data Sharing: Designed a federated Snowflake environment across global business units, implementing Snowflake Data Sharing (Private Shares and Data Exchange) to enable real-time, zero-copy data access.
Multi-Cloud Distribution: Orchestrated secure market data distribution to external hedge fund clients via Google Cloud Platform (GCP) and Snowflake’s cross-cloud capabilities.
dbt & AI Strategy: Built and maintained dbt data marts to serve as the high-quality data foundation for enterprise-wide Generative AI and Data Science R&D.
AWS Bedrock & ML Testing: Orchestrated the evaluation and testing of various Foundation Models (FMs) using AWS Bedrock, establishing benchmarks for accuracy, latency, and cost-efficiency to determine the optimal models for financial research applications.
GenAI Chatbot Development: Architected and deployed an enterprise-grade Generative AI Chatbot leveraging RAG (Retrieval-Augmented Generation) to provide real-time, context-aware responses to complex market data queries.
Advanced ML & Macro-Correlation: Developed an ML-driven framework to analyze correlations between TBA Securities and macroeconomic drivers such as Non-Farm Payroll (NFP), CPI, and market news sentiment.
Snowflake Ecosystem & Data Sharing: Architected a federated Snowflake environment across global business units, implementing Snowflake Data Sharing to provide secure, real-time access to unified market datasets.
Data Vault 2.0 & Dimensional Modeling: Engineered scalable data architectures using Data Vault 2.0 for auditable raw storage and star schemas for high-performance analytics, utilizing Lucidchart for technical visualization.
Cross-Cloud Infrastructure Migration: Orchestrated the strategic migration of enterprise cloud resources from AWS to Microsoft Azure, ensuring zero-downtime transition of mission-critical data workloads and research applications.
Multi-Cloud Architecture: Re-engineered data pipelines and storage layers to leverage Azure Synapse and Azure Data Lake Storage (ADLS), optimizing for performance-critical financial research workloads while maintaining interoperability with AWS legacy systems.
Infrastructure as Code (IaC) Refactoring: Utilized Terraform to refactor and redeploy cloud infrastructure, ensuring consistent security postures, IAM role mapping, and networking configurations across the hybrid-cloud environment.
Security & Compliance Alignment: Managed the migration of sensitive financial datasets ensuring strict adherence to international data residency and sovereignty requirements during the transition between cloud providers.
RAG Pipeline Design: Designed and maintained sophisticated RAG pipelines grounded in massive financial and fixed-income datasets; achieved significant hallucination reduction by implementing advanced grounding techniques and citation-based verification.
Clinical-Grade Accuracy (Financial Analogy): Owned the accuracy and failure-mode handling of LLM outputs used for high-stakes financial speculation, ensuring models adhered to strict quantitative constraints.
Macro-Correlation ML Engine: Developed ML-driven backends to determine relations between TBA securities and macro-economic data, utilizing Python-based microservices for real-time signal processing.
Practiced courageous collaboration by working across product, engineering, data, risk, and client-facing teams, constructively challenging architectural decisions to deliver scalable, compliant, and client-centric data solutions.
Delivered at global scale by architecting and modernizing cloud-native data platforms supporting fixed income, market data, and analytics products used by institutional clients worldwide, with a strong focus on simplicity, performance, and reliability.
Acted as an owner and hands-on doer, taking end-to-end responsibility for architecture, data modeling, ingestion, transformation, and delivery across Snowflake, Databricks, and cloud platforms.
Enabled real client outcomes by translating complex market data and regulatory requirements into performant, consumable datasets and APIs, improving analyst productivity, data quality, and time-to-insight.
Fostered a culture of continuous learning and curiosity by mentoring global engineering teams, promoting modern data engineering practices (ELT-first pipelines, Data Mesh, Data Vault 2.0), and encouraging experimentation with AI and advanced analytics.
TD AMERITRADE(Contracting ) Nov 2017 – Aug 2021 Data Architect (AI/ML Architect / Data Engineer)
Dimensional Modeling with ERwin: Designed and maintained complex dimensional data models (Star and Snowflake schemas) using the ERwin tool to support high-performance analytics and business intelligence requirements.
Compliance Modernization: Led the transformation of compliance platforms by migrating legacy rule-based engines to a modern, machine learning-driven architecture.
Informatica MDM & Oracle Migration: Engineered a scalable compliance data warehouse using Informatica MDM and Oracle, leading the migration of multi-terabyte legacy environments to a Data Lake using Medallion Architecture on AWS.
Asynchronous Messaging & Streaming: Implemented RabbitMQ to decouple microservices and integrated with Spark Structured Streaming to process trading events for near real-time risk detection.
Predictive Analytics: Developed and deployed advanced ML models, including Random Forest, SVM, and Deep Learning (CNNs/RNNs), to identify money laundering patterns and assess financial risk.
Salesforce Data Architecture: Designed Salesforce data models and built pipelines between Salesforce, Snowflake, and on-premises systems using REST APIs and MuleSoft to enable real-time synchronization of client data.
MDM & Legacy Migration: Engineered a scalable compliance data warehouse using Informatica MDM and Oracle, leading the migration of multi-terabyte legacy environments to a Data Lake using Medallion Architecture (Bronze, Silver, Gold layers) on AWS.
Messaging & Streaming: Implemented RabbitMQ for asynchronous microservices communication and Spark Structured Streaming for real-time trade event processing.
ML for Compliance: Developed predictive models (Random Forest, CNN/RNN) to modernize Anti-Money Laundering (AML) and risk detection systems.
Demonstrated courageous collaboration by partnering with trading, risk, compliance, product, and engineering teams, constructively challenging designs to deliver secure, high-performance platforms for retail and institutional investors.
Delivered at enterprise scale by architecting and modernizing data and analytics platforms supporting real-time and batch trading, market data, and client reporting workloads with high availability and low latency requirements.
Acted as an owner and hands-on doer, leading architecture, design, and implementation across cloud and on-prem environments, while taking accountability for reliability, security, and regulatory compliance.
Enabled client-centric outcomes by transforming complex market, order, and portfolio data into trusted, consumable datasets and APIs that improved trader experience, advisor productivity, and decision-making speed.
UBS(Contracting) Oct 2015 – Oct 2017 Solution Architect (CCAR)
Developed complex data extraction and transformation processes for CCAR (Comprehensive Capital Analysis and Review) using Informatica.
Established Data Quality (IDQ) frameworks and dimensional modeling to meet stringent regulatory reporting requirements.
Delivered enterprise-scale data and analytics solutions for wealth and investment banking stakeholders by partnering with front-office, risk, and compliance teams to design secure, high-performance platforms aligned with regulatory and business requirements.
Acted as a hands-on owner for complex data initiatives, leading architecture, data modeling, and integration efforts while ensuring strong governance, auditability, and operational resilience in a highly regulated financial environment.
BARCLAYS CAPITAL / BARCLAYS WEALTH(Contracting) Mar 2009 – Feb 2013 Informatica Architect / Senior Data Integrator
Designed end-to-end integration for asset management data and transfer statements.
Optimized ETL workflows to handle high-volume asset movement and holdings data with 100% reconciliation accuracy.
CREDIT SUISSE Aug 2002 – Mar 2009 Senior Programmer Analyst
Led data integration and analysis for the Prime Brokerage data set, ensuring high availability for hedge fund clients.
EDUCATION
MS in Computer Engineering – University of Bridgeport, CT
BS in Electronics and Communication – University of Mysore, India