Senior Data Architect / Analytics Engineer with Cloud + Lakehouse

Location:

Jericho, NY, 11753

Salary:

$250K/Annual Base + (Benefits + Bonus); $115/Hour

Posted:

February 12, 2026

Contact this candidate

Resume:

Ravi Kittappa ********@*****.*** 516-***-****

SUMMARY

●Over 30+ years of IT experience as Data Architect/Analyst/Engineer/Admin

●Hands-on experience in analysis, design & implementing Data Lake/Warehouse & Lakehouse solution

●Solid experience with Cloud-AWS, Azure, Snowflake, GCP; Hadoop, Netezza, Oracle & other RDBMS

●Mastered data migration & mining with large data sets of structured, unstructured, predictive modeling

●Solid data Integration (Informatica, Talend & building customized ETL/ELT Process) experience

●Hands-on experience with data cleansing/quality, performance tuning & troubleshooting techniques

●Sound knowledge on Scripting, OLAP/BI tools, App. level DBA & deployment CI/CD process

●Knowledge & experience working with secure & sensitive data & systems; data Visualization

●Excellent analytical, communication, reliable, mentoring & team building; onsite & offshore model

TECHNICAL SKILLS

Cloud: AWS, S3, EC2, ECR, Athena, Glue, EMR, Redshift, Spectrum, RDS, DynamoDB, Aurora, Delta Lake, SCT, DMS, ElastiCache, KMS, Lambda, SageMaker, CloudWatch, Glacier, IAM, Step Function, SNS, SQS, Snowflake, Stage, Share, Stream, SnowSQL, Spark/Python/Kafka Connector, GCP, Big Table, Big Query, Spanner, Dataproc, Dataflow/ Stream, Pub/Sub, Looker, gcloud, Azure, Synapse Analytics, Azure Data Share, Databricks, ADF, Lake Storage/Analytics, Azure SQL DB, Cosmos, Purview, CLI/PowerShell, Docker, Airflow, Terraform, Dremio

Big Data: Hadoop (Hortonworks, Cloudera 6.3), Impala, Hive, Spark, PySpark, Hue, HBase, Python, Scala, Sqoop,

Presto, Pig, Cassandra, Splunk, Ambari, Flink, Kafka, Ranger, DistCp, Phoenix, SAP Hana S4

Analytic Appliance: (NPS 7.2.x) Netezza, NZSQL, Teradata, Vertica, SAS, Collibra, IBM SPSS

Databases: Oracle 12, PL/SQL, PostgreSQL, PL/pgSQL, MySQL, MariaDB, SQL Server, NoSQL, MongoDB, Redis

DB Utilities: Fluid Query, API, Export/Import, SQL Loader, Stats pack, DBMS_XPLAN, Tkprof, Starburst, Immuta

Data Modeling: Erwin Studio/Enterprise, draw.io, Lucidchart, Visio, TOAD, Aginity, DB Artisan, SQL*Navigator

ETL: TalenD (Cloud/BigData) 7.x/DQ, PowerCenter 9, Customized ETL Solution, SSIS, Segment/Lytics/Alteryx

BI: Power BI, Tableau, Business Objects (BOXI), WebFocus, MicroStrategy, SSRS, QuickSight

OS, Language/GUI: Unix/Linux, Windows, macOS, .NET/ASP/VB/JavaScript, HTML, XML, Shell/Bash/Perl Script

Misc: DevOps tools, GitHub, Pycharm, IntelliJ IDEA, Control-M, AutoSys, Scrum, Confluence, Tivoli, Jira, Okera

EXPERIENCE

MetLife Insurance, Cary, NC - Remote Data Architect Feb 24 – Dec 25

IPDS Project: Integrated Pet Data Store (IPDS) project is to develop a Data Store for Analytics by sourcing data from various system of records (SOR) such as Pet Adobe Analytics, Pet App Store (for downloads and ratings), and the Pet Database itself to support all current analytics initiatives related to Pet data, with forward thinking design that accommodates future digital initiatives with minor enhancements for IDDS, DNO&GSDM (Group sales mart) project

●Helped to enhance existing processes for data visualization and recommended standards/best practice.

●Collaborated between developers, project management, and business owners for Pet DB – Analytics

●Used Azure Purview to maintain Data Assets/Catalog/Dictionary, data classification and lineage.

●Developed and enhanced data pipeline process using Databricks, Python and Spark

●Completed gap analysis between Table Categories (Level 1, Level 2, Level 3) and helped deploy changes.

●Recommended changes/improvements in architectural frameworks for physical model in all 3 Env.:

Raw Data (RDZ) --> Curated Data (CDZ) --> Data Delivery/BI (DDZ)

●Improved SQL Query optimization method in RDS and SQL Server, PostgreSQL, and data validation

●Used PostgreSQL, PL/pgSQL for Data Analysis and Analytics

●Followed agile development (Scrum) and supported product owners for migration process

●Supported multi-vendor program by working with Solution and Enterprise Architects

Environment: Azure, HDInsight, AKS, Synapse, SQL, ADLS/Share, Databricks, Lakehouse, Delta Lake, Purview, Spark, Python, Snowflake, RDS, SQL Server, Tableau, Bitbucket, Lucid chart.

Apollo Global Management, Inc., NYC, NY - Hybrid Snowflake/DBT Architect Mar 23 – Dec 23

Apollo is a leading provider of alternative asset management & retirement solutions by building stronger businesses through innovative capital solns that can generate excess risk-adjusted returns & retirement income; invest along with clients & take a disciplined, responsible approach to drive positive outcomes. This Management invests across Credit, Equity & Real Assets ecosystems in a wide range of asset classes & services. CPS- NextGen Project: To build & support Digital & Invst Svcs Tech Group activities by migrating legacy systems to NextGen to Integrate Master Data Management (MDM) project to develop & maintain Datawarehouse for Analytics & upstream apps. & consumption.

●Build processes supporting data transformation, data structures, metadata, dependency & workload mgmt

●Wrote SQL statements & Schema design/dimensional data modeling in Snowflake/DBT, Snowpark/Python

●Helped to enhance existing processes for data visualization & recommended standards/best practice

●Collaborated between developers, project management, & business/product owners for data Analytics

●Used Azure Purview to maintain Data Assets/Catalog/Dictionary, data classification & lineage

●Developed & enhanced scalable data pipeline process using Databricks, Python & Spark to integrate.

●Setting up & maintain tools such as Lambda, Airflow, RDS and ADF in Azure & Terraform; to bring from S4

●Implemented data pipeline using S3, Lake formation, Snow SQL, Snowflake for internal & external data files

●Completed gap analysis between Categories (Level 1, Level 2, Level 3) & helped Data Integration changes.

●Recommended change / improvements in architectural frameworks for physical models in all 3 Categories:

Raw Data Zone (RDZ) => Curated Data Zone (CDZ) => Data Delivery Zone (DDZ)

●Improved SQL Query optimization method in Starburst, Snowflake & data validation & Data Analytics

●Followed agile development (Scrum) & supported product owners for data migration & Integration process

●Supported multi-vendor program by working with Solution/Enterprise Architects, IT/Network Infrastructure

●Architected and implemented Data access controls for Enterprise data lake platform to have row level security

Environment: Azure, ADF, ADO, HDInsight, AKS, Snowflake, Snowpark, SnowSQL, Databricks, Lakehouse, Delta Lake, Purview, DBT, AWS, RDS, Python, Spark, Tivoli, Lambda, Airflow, Terraform, Starburst, S4, Lakeformation

Oscar Health Insurance, NYC, NY - Remote. Data Architect Sep 22 – Feb 23

Oscar is NYC based Health Insurance company - health insurance & health benefit group plans covered.

Project Airplane (PA): to create & maintain data feed for eligibility & claims related data for various vendors & all other business partners & migrate existing data/feed from legacy SNS to Ontology environments to make use of enhanced framework & to improve data quality.

●Architecture for high volume data/feed migration, integration backend platform using agile methodology.

●Defined current & future state, data flow diagrams using draw.io, Lucidchart

●Converted requirements/Epics into detail design for data feed teams using Canonical model

●Used GCP platform to migrate data/feed from legacy SNS to Big Table, Big Query (Ontology; NextGen)

●Implemented data feed for publication using DBT, Dataproc, Big Query (BQ), EDI/FHIR, YAML & Python

●Collaborated database architecture reviews & modeled relational & dimensional data marts

●SME for data Integration & quality for feeds (both legacy & data migration) & performance tuning

●Designed & implemented tracking in PySpark, error handling mechanism & email notifications

●Helped to enhance existing processes for data visualization & recommended standards/best practices

●Followed SDLC process for implementation of existing configuration/new requirements for Payers.

●Made use of existing OMC (Outbound Mission Control) framework for Launchpad & Data Vault model

●Collaborated with onshore/offshore teams to complete data migration for providers, Medicare & Medicaid

Environment: GCP, BigTable, BigQuery, Dataproc,YAML,Python 3.9, DBT, Bitbucket, draw.io, Lucidchart

Wawa, Inc. Wawa, PA - Remote. Snowflake & Azure Data Architect Jun 21 – Aug 22

Wawa, Inc. is a chain of convenience retail stores/gas stations located in 10+ states with 900 all-day, everyday stores with brand name packaged goods, healthy Ready-to-Go options, & a large variety of made-to-order recipes.

Snowflake Migration: to migrate high volume data from Hadoop to Cloud/AWS/Databricks/Snowflake platform & integrate 3rd party tools to support EDW environment & consumption layer for data team, BI & DS/ML

●Architect for high volume data migration-Hadoop to Cloud platform (Cloudera to Snowflake & Snowpark)

●Collected requirements by interacting with technical & business coordinators for Data Lake/Warehouse

●Used Dremio in AWS as Query engine for faster Joins and complex queries over S3 bucket

●Involved in Snowflake performance tuning, capacity planning, cloud spending, & utilization external vendor

●Converted requirements/user stories into detail design & Data Integration for delivery using Canonical model

●Created conceptual/logical/physical architecture diagrams for Data Ingestion Azure, ADF, Databricks, Data Integration-ADF/SAP Hana, Lakeformation, S4/Kafka/MuleSoft/JSON/XML/Text Files-EDI/FHIR & to Integrate MDM, Segment/Lytics/Alteryx – RDS, Athena, Snowflake; Snowpark, Python to build projects

●Designed/created Snowflake objects: Database/Schema/Share/Table/View; Dynamic Masking/Access policies.

●Modeled Snowflake data modelling, pipelines/ELT using Snowflake SQL, implementing stored Procedures.

●Developed & enhanced scalable data pipelines to integrate various data using tools such as Glue, Airflow

●Collaborated with EA to review Data Vault model & recommend tools/application in ARB, ABB, SRS, SBB

●Created current/future state diagrams for data pipeline process, Azure/Databricks/Python for Tasks/Notebook

●Loades data from files as soon as available in a stage and process using SnowPipe, SQS, Starburst

●Trained & Deployed Machine Learning models using AWS Sagemaker,Used Terraform to manage infra’

●Implemented Analysis Services, HDInsight for analytical reporting solutions & ML/Data Science team

●Modeled & created relational & dimensional data marts in Snowflake for BI/data visualization team

●Mentoring team, coordinated with onshore/offshore, IT/Network Infrastructure teams & cut-off/go-live strategy

Environment: AWS, S3, Glue, SCT, DMS, EC2, RDS, Snowflake, SnowPipe, Snowpark, Athena, CloudWatch, Lakeformation, Terraform, Azure, Databricks, ADF, Synapse, Python 3.5, Vertica, Segment/Lytics/Alteryx, Cloudera, Hive, Power BI, Kafka, Dremio, MuleSoft, Collibra, Starburst, Airflow, SAP S4

HRSA- PRF Cloud Project, Remote. Snowflake Architect Dec 20 – May 21

Health Resources & Services Admin (agency of Dept of Health & Human Services)

Provider Relief Funds (PRF) supports American families, workers, & heroic healthcare providers in battle against the COVID-19 outbreak. HHS is distributing relief funds to hospitals & providers on the front lines of the coronavirus. HRSA is in charge distributing Provider Relief Fund to Medicare facilities & providers impacted by COVID-19.

Assisted HRSA in building & enhancing EHB COVID DataMart Services

Involved part of the data quality & integration project around HRSA data from providers Sogeti

Provided technical expertise in the analysis, planning & implementation of solutions around Snowflake

Involved design, architecture using Erwin; reviewed &recommended standards/procedures/best practices

Designed & created data marts in Snowflake to support existing data feed from third-party plan providers

Experienced in developing stored Procedures/writing Queries to analyze & transform data

Migrated high volume data from On-perm RDBMS, SQL Server to Cloud - SnowFlake platform

Utilized Snowflake utilities like SnowSQL, Stages (Internal/External), Big Data model techniques

Used Dremio in AWS as Query engine for faster Joins and complex queries over AWS S3 bucket

●Designed & implemented tracking, error handling mechanism & email notifications

●Helped to enhance existing processes for data quality & visualization-BI & Executive dashboards

●Mentored & groomed other team members on Cloud, SnowFlake technologies & best practices

●Implemented data pipeline for data consumption layer for data using FHIR/EDI, BI & Analytics team

●Practiced Agile methodologies & frameworks such as Scrum; implemented HIPAA standards

Environment: AWS, DMS, Dremio, S3, SnowFlake, SQLServer, SSIS, MicroStrategy, Tableau, Erwin, LINUX

Cigna Health Insurance, NYC, NY. Big Data/Cloud Consultant Mar 20 – Dec 20

Cigna is CT based Health & Life Insurance company - health insurance & health benefit group plans covered.

Performance Data Engineering (PDE): to maintain claims related data for various env. & all business partners; PHI data mask using third party & in-house de-identification algorithms for downstream apps & other groups.

●Architecture for high volume data migration, integration backend platform using agile methodology

●Created Impala tables & Hive views in EMR to load & process large volume of data from various sources

●Involved Metadata management, Master Data Management & maintaining Master/Lookup data

●Groomed user stories by reviewing with business stakeholders & updated; designed/ Modeled using Erwin

●Implemented data pipeline using MuleSoft, Lakeformation, S3, Glue in AWS Redshift, Snowflake, SnowSQL

●Used AWS services like Lakeformation, S3, Glue, Athena, DynamoDB, Aurora to build&maintain DataMart.

Loaded data from files in a stage using SnowPipe; Used EDI/FHIR, SnowStream to implement SCD Type 2

●Eliminated third party masking tool by using hashing technics in Python/Spark for HIPAA req./apps/users

●Designed & created data pipeline process using Python & Databricks in Azure for high volume data

●Used Azure Purview to maintain Data Governance/Assets/Catalog/Dictionary, data classification & lineage.

●SME for data quality & performance tuning; Leveraged Sqoop & DistCp for data migration & Presto

●Designed & implemented tracking using PySpark, Flink, error handling mechanism & email notifications

●Collaborated database architecture reviews & recommended standards/procedures/best PostgreSQL, PL/SQL

●Helped to enhance existing processes for data quality, DevOps, APIs, QuickSight for data visualization

Environment: Cloudera 6, Impala, Hive, Beeline, PySpark, Sqoop, Hue, AWS, S3, Glue, Athena, ElastiCache, Redshift, PostgreSQL, Aurora, DynamoDB, KMS, SnowFlake, SnowPipe, SnowStream, Teradata, Airflow, Lakeformation, Oracle, Toad, Dremio, Cassandra, QuickSight, Flink, Azure, Databricks, ADF, Purview, DevOps

Coach (Tapestry Inc), NYC, NY. Big Data/Cloud Architect Mar 19 – Feb 20

Tapestry is NYC based house of modern luxury lifestyle brands including Coach, Kate Spade & Stuart Weitzman.

A360(Article/Products): to maintain article master, pricing, cost, etc from various vendors to all internal dept.

C360(Customer/Transactions): to maintain cust master & related, transactions, payments, etc from worldwide stores.

●Architecture for high-speed, high-volume data processing for internal & external clients & apps

●Consolidated data pipeline process & implemented tracking, error handling mechanism, email notification

●Implemented data pipeline solution for transaction data using S3, Lakeformation & Snowflake/SnowSQL

●Used MuleSoft, AWS services like S3, Athena, Glue, DynamoDB, Aurora, QuickSight for internal teams to consume data & Sagemaker model for Product owners/Business Analytics team;

●Trained & Deployed Machine Learning models in Production using AWS Sagemaker

●Created Hive/Impala tables/views to load & process large dataset from various sources, Flink, Kafka, JSON

●Leveraged Sqoop to migrate between Netezza & Hive, HBase by Automation, PostgreSQL for Data Analysis

●Involved Metadata management, Master Data Management & maintaining Master/Lookup data

●Strengthened data publishing process using RDS for other groups like BI (Tableau) & SAP Hana S4, APIs

●Involved performance tuning, data quality, Customer Capture process for associate’s incentive

●Involved database architecture reviews, created dimensional models & recommended standards/procedures

Environment: AWS, S3, Glue, Athena, RDS, EMR, DMS, Lakeformation, DynamoDB, Aurora, IAM, Snowflake, RDS, Hadoop, Hive, HBase, Spark, Python 3, PySpark, Netezza, PostgreSQL, Azure, ADF, ADLS, Databricks, Perl/Bash/Shell Script, Hana S4, CloudFormation, Tableau, SageMaker, MuleSoft, Erwin,Visio,

AlticeUSA (Cablevision), Long Island, NY. Data Architect (Big Data/Cloud) Sep 12 – Feb 19

NOC: ODM & ODS (Operational Data Mark & Store): collection of data from CMTS, Modem, digital devices & Remedy tickets, & processing to build Data Store which in term used to build various Data Marts, which is used by NOC, BI tools for Reporting to Executives & dashboards, various internal groups & external vendors/agencies (for both Optimum & Suddenlink brands)

●As a Data Engineer in all aspects of architecture, design & development, capacity planning

●Performed building Data Lake, Data Marts with large Structured/Unstructured data sets

●Data Mining, Data Validation, Predictive Modeling, Data Visualization under agile methodology using Sqoop, Kafka, MySQL, MongoDB in combination with Python & Shell scripting; MATLAB & other libraries

●Designed & Developed both stateless (to get status of a modem) & stateful transformations using Flink and JavaScript for near real time & batch processing; & worked with IT/Network Infrastructure teams

●Implemented data pipeline for Node health data using PostgreSQL, Glue & S3 in AWS Redshift Spectrum

●Used AWS S3 & DynamoDB to publish aggregated data to other groups/venders to using Apache Airflow

●Created Hive/HBase tables to load large set of data from RDBMS, Kafka, NoSQL, JSON, Parquet, ORC files

●Developed multi-million rows data pipeline process using PySpark, HiveQL, Scala, Spark, Presto & Redis

●Architecture for high speed, high volume, near-real time data; Metadata & Master Data management.

●DBA role for Netezza - Groom & Stats in high availability env. zero-downtime, mission-critical platform

●SPSS package used for the analysis of statistical data for customer satisfaction.

●Created ETL jobs/JobLets using Talend & TAC; Change Data Capture (CDC) process & liaison for BI

●Developed Stored Procedures, Views for ETL/ELT process & publishing data for other groups

Environment: Netezza, NZPLSQL, PostgreSQL, NZAdmin, Hadoop, Hive, HBase, Ambari, Ranger, Sqoop, Presto, Spark, Python, PySpark, MATLAB, Flink, Kafka, Airflow, Erwin, AWS, S3, EMR, Glue, Redshift, DynamoDB, IAM, Bash/Shell, Splunk, WebFocus, Tableau, TalenD, Oracle, PL/SQL, MySQL, JavaScript, Unix

S & P (Standard & Poor's) Capital IQ, NYC, NY. Data warehouse/ETL Architect Sep 11 – Aug 12

Data Integration to GCP (Global Credit Portal)- Web-based solution brings both fundamental & market analysis

Intex Data Integration – to integrate Intex’s (structured financial securities data provider) into IPD(Integrated Product Data) for RMBS(Residential Mortgage Backed Securities) one of 7 securities (Pools, Deals, Loan & groups)

Xtrakter Data Integration - to integrate daily trade data for mark-to-market & price calculations, Bid & Offer quotes & traded price for 6,500 fixed income instruments & Security reference data for 600,000 instruments.

●Collected requirements, designed/created logical/physical relational model (Star & Snowflake schema)

●Participated DB architecture reviews & created DB objects using ERWin, PL/SQL package, Stored Procs

●Created design & mapping document, Informatica mappings using Filter, Aggregator, Normalizer, etc

●Created Mapplets/Sessions/Workflows & automated shell script to execute WorkFlows; AutoSys

●Performed tuning & optimization, data validation, documented & eliminated gaps

●Coordinated with onshore & offshore development & QA teams

Environment: Informatica 9.0.1, Oracle 11g2, PL/SQL, TOAD for Oracle 10.6, Erwin, Visio Data Modeler,

OLAP, SharePoint, MicroStrategy, VersionOne, CVS, TortoiseCVS WinSCP/PuTTy & Windows 7/Unix

Bloomberg, NYC, NY. ETL/Data Integration Consultant Sep 10 - Aug 11

Bloomberg Government (BGov) is comprehensive, subscription-based, online tool collects best-in-class data, provides high-end analysis & analytic tools, & delivers deep, reliable, timely & unbiased data reporting to help congressional staffers, lobbyists, corporate headquarters, sales executives & foreign government agencies.

●Implemented ETL to integrate Govt spend data for opportunities/awards/contracts/Grants

●Mastered data cleansing/profiling/stewardship & loaded conformed dimensional data for facts & measures

●Improved performance using Parallel processing, database partitioning; maintained Admin Console

Environment: Oracle 11g, Oracle AWM, PL/SQL, TOAD, MySQL 5.5, MySQL WorkBench, Talend Open Studio 5.1/Integration Suite 5.1, ERWin Data Modeler, SVN(Subversion), ToolKit/PuTTy, Windows, Unix/Linux

Dept. of Treasury/Fannie Mae, Herndon, VA. ETL/Tech Lead Mar 09 - Aug 10

Home Affordable Modification Program (HAMP) is part of Home Affordability & Stability Plan (HASP), which is Treasury's comprehensive strategy to get the economy back on track by helping families modifying their mortgages to avoid foreclosure. Fannie as a Program Administrator handles data/payments/Reporting. To support the Treasury's analysis/decision making, designed & developed Data Warehouse & Data Mart for Reporting, ETL process.

Environment: Informatica PowerCenter 8.6, Oracle 11g, PL/SQL, ERStudio/ERWin 7.3, AutoSys, DOORS 8,1, BMC Remedy 7.1, Clear Quest/Clear Case, OLAP, BOXI, PuTTy, Unix/Windows

Up to Feb 09, involved various projects/clients using ETL/GUI, RDBMS, SQL, Scripts, Version control, monitoring tools:

New York City Transit (MTA), NYC, NY Tracking System (PSRTS) & I-Vault (IDM solution for portal)

MultiPlan, Inc, NYC, NY Supplier of network-based cost mgmt solutions for healthcare providers

Estee Lauder, Melville, NY Global Manufacturing System (GMS): PP, forecasting, procurement, cost & inventory. Demonstration (DeMo): Retail DSS; DW to sales force & mgmt to track sales/personal.

JPMC: Chase Auto Finance, Garden City, NY iCAF - Lease/Loan accounts, Collections &Title Mgmt

Aventis pharmaceuticals, Bridgewater, NJ ISARIS (Institutional Sales&Reporting System): IMS Data

Kraft/Nabisco Inc, Parsippany, NJ Food Service Portal & Integration for Sales Dept.

Camp systems intl., LI, NY AviSource: Computerized Aircraft Maintenance System

NY State insurance fund, NYC, NY Claims Management System; Customer Management System

A. C. Nielsen, Chicago, IL NITE: Converted ratings data from multi-dimensional database to various clients

Other Projects: Production Planning Control & Sales Order Processing, Budget Monitoring(BMS), Inventory & Receivable, Complaints Management & Directory enquiry & Material Requirement Planning (MRP)

EDUCATION: Master of Science in Computer Science

Contact this candidate