Ravi Kittappa ********@*****.*** 516-***-****
SUMMARY
●Over 30+ years of IT experience as Data Architect/Analyst/Engineer/Admin
●Hands-on experience in analysis, design & implementing Data Lake/Warehouse & Lakehouse solution
●Solid experience with Cloud-AWS, Azure, Snowflake, GCP; Hadoop, Netezza, Oracle & other RDBMS
●Mastered data migration & mining with large data sets of structured, unstructured, predictive modeling
●Solid data Integration (Informatica, Talend & building customized ETL/ELT Process) experience
●Hands-on experience with data cleansing/quality, performance tuning & troubleshooting techniques
●Sound knowledge on Scripting, OLAP/BI tools, App. level DBA & deployment CI/CD process
●Knowledge & experience working with secure & sensitive data & systems; data Visualization
●Excellent analytical, communication, reliable, mentoring & team building; onsite & offshore model
TECHNICAL SKILLS
Cloud: AWS, S3, EC2, ECR, Athena, Glue, EMR, Redshift, Spectrum, RDS, DynamoDB, Aurora, Delta Lake, SCT, DMS, ElastiCache, KMS, Lambda, SageMaker, CloudWatch, Glacier, IAM, Step Function, SNS, SQS, Snowflake, Stage, Share, Stream, SnowSQL, Spark/Python/Kafka Connector, GCP, Big Table, Big Query, Spanner, Dataproc, Dataflow/ Stream, Pub/Sub, Looker, gcloud, Azure, Synapse Analytics, Azure Data Share, Databricks, ADF, Lake Storage/Analytics, Azure SQL DB, Cosmos, Purview, CLI/PowerShell, Docker, Airflow, Terraform, Dremio
Big Data: Hadoop (Hortonworks, Cloudera 6.3), Impala, Hive, Spark, PySpark, Hue, HBase, Python, Scala, Sqoop,
Presto, Pig, Cassandra, Splunk, Ambari, Flink, Kafka, Ranger, DistCp, Phoenix, SAP Hana S4
Analytic Appliance: (NPS 7.2.x) Netezza, NZSQL, Teradata, Vertica, SAS, Collibra, IBM SPSS
Databases: Oracle 12, PL/SQL, PostgreSQL, PL/pgSQL, MySQL, MariaDB, SQL Server, NoSQL, MongoDB, Redis
DB Utilities: Fluid Query, API, Export/Import, SQL Loader, Stats pack, DBMS_XPLAN, Tkprof, Starburst, Immuta
Data Modeling: Erwin Studio/Enterprise, draw.io, Lucidchart, Visio, TOAD, Aginity, DB Artisan, SQL*Navigator
ETL: TalenD (Cloud/BigData) 7.x/DQ, PowerCenter 9, Customized ETL Solution, SSIS, Segment/Lytics/Alteryx
BI: Power BI, Tableau, Business Objects (BOXI), WebFocus, MicroStrategy, SSRS, QuickSight
OS, Language/GUI: Unix/Linux, Windows, macOS, .NET/ASP/VB/JavaScript, HTML, XML, Shell/Bash/Perl Script
Misc: DevOps tools, GitHub, Pycharm, IntelliJ IDEA, Control-M, AutoSys, Scrum, Confluence, Tivoli, Jira, Okera
EXPERIENCE
MetLife Insurance, Cary, NC - Remote Data Architect Feb 24 – Dec 25
IPDS Project: Integrated Pet Data Store (IPDS) project is to develop a Data Store for Analytics by sourcing data from various system of records (SOR) such as Pet Adobe Analytics, Pet App Store (for downloads and ratings), and the Pet Database itself to support all current analytics initiatives related to Pet data, with forward thinking design that accommodates future digital initiatives with minor enhancements for IDDS, DNO&GSDM (Group sales mart) project
●Helped to enhance existing processes for data visualization and recommended standards/best practice.
●Collaborated between developers, project management, and business owners for Pet DB – Analytics
●Used Azure Purview to maintain Data Assets/Catalog/Dictionary, data classification and lineage.
●Developed and enhanced data pipeline process using Databricks, Python and Spark
●Completed gap analysis between Table Categories (Level 1, Level 2, Level 3) and helped deploy changes.
●Recommended changes/improvements in architectural frameworks for physical model in all 3 Env.:
Raw Data (RDZ) --> Curated Data (CDZ) --> Data Delivery/BI (DDZ)
●Improved SQL Query optimization method in RDS and SQL Server, PostgreSQL, and data validation
●Used PostgreSQL, PL/pgSQL for Data Analysis and Analytics
●Followed agile development (Scrum) and supported product owners for migration process
●Supported multi-vendor program by working with Solution and Enterprise Architects
Environment: Azure, HDInsight, AKS, Synapse, SQL, ADLS/Share, Databricks, Lakehouse, Delta Lake, Purview, Spark, Python, Snowflake, RDS, SQL Server, Tableau, Bitbucket, Lucid chart.
Apollo Global Management, Inc., NYC, NY - Hybrid Snowflake/DBT Architect Mar 23 – Dec 23
Apollo is a leading provider of alternative asset management & retirement solutions by building stronger businesses through innovative capital solns that can generate excess risk-adjusted returns & retirement income; invest along with clients & take a disciplined, responsible approach to drive positive outcomes. This Management invests across Credit, Equity & Real Assets ecosystems in a wide range of asset classes & services. CPS- NextGen Project: To build & support Digital & Invst Svcs Tech Group activities by migrating legacy systems to NextGen to Integrate Master Data Management (MDM) project to develop & maintain Datawarehouse for Analytics & upstream apps. & consumption.
●Build processes supporting data transformation, data structures, metadata, dependency & workload mgmt
●Wrote SQL statements & Schema design/dimensional data modeling in Snowflake/DBT, Snowpark/Python
●Helped to enhance existing processes for data visualization & recommended standards/best practice
●Collaborated between developers, project management, & business/product owners for data Analytics
●Used Azure Purview to maintain Data Assets/Catalog/Dictionary, data classification & lineage
●Developed & enhanced scalable data pipeline process using Databricks, Python & Spark to integrate.
●Setting up & maintain tools such as Lambda, Airflow, RDS and ADF in Azure & Terraform; to bring from S4
●Implemented data pipeline using S3, Lake formation, Snow SQL, Snowflake for internal & external data files
●Completed gap analysis between Categories (Level 1, Level 2, Level 3) & helped Data Integration changes.
●Recommended change / improvements in architectural frameworks for physical models in all 3 Categories:
Raw Data Zone (RDZ) => Curated Data Zone (CDZ) => Data Delivery Zone (DDZ)
●Improved SQL Query optimization method in Starburst, Snowflake & data validation & Data Analytics
●Followed agile development (Scrum) & supported product owners for data migration & Integration process
●Supported multi-vendor program by working with Solution/Enterprise Architects, IT/Network Infrastructure
●Architected and implemented Data access controls for Enterprise data lake platform to have row level security
Environment: Azure, ADF, ADO, HDInsight, AKS, Snowflake, Snowpark, SnowSQL, Databricks, Lakehouse, Delta Lake, Purview, DBT, AWS, RDS, Python, Spark, Tivoli, Lambda, Airflow, Terraform, Starburst, S4, Lakeformation
Oscar Health Insurance, NYC, NY - Remote. Data Architect Sep 22 – Feb 23
Oscar is NYC based Health Insurance company - health insurance & health benefit group plans covered.
Project Airplane (PA): to create & maintain data feed for eligibility & claims related data for various vendors & all other business partners & migrate existing data/feed from legacy SNS to Ontology environments to make use of enhanced framework & to improve data quality.
●Architecture for high volume data/feed migration, integration backend platform using agile methodology.
●Defined current & future state, data flow diagrams using draw.io, Lucidchart
●Converted requirements/Epics into detail design for data feed teams using Canonical model
●Used GCP platform to migrate data/feed from legacy SNS to Big Table, Big Query (Ontology; NextGen)
●Implemented data feed for publication using DBT, Dataproc, Big Query (BQ), EDI/FHIR, YAML & Python
●Collaborated database architecture reviews & modeled relational & dimensional data marts
●SME for data Integration & quality for feeds (both legacy & data migration) & performance tuning
●Designed & implemented tracking in PySpark, error handling mechanism & email notifications
●Helped to enhance existing processes for data visualization & recommended standards/best practices
●Followed SDLC process for implementation of existing configuration/new requirements for Payers.
●Made use of existing OMC (Outbound Mission Control) framework for Launchpad & Data Vault model
●Collaborated with onshore/offshore teams to complete data migration for providers, Medicare & Medicaid
Environment: GCP, BigTable, BigQuery, Dataproc,YAML,Python 3.9, DBT, Bitbucket, draw.io, Lucidchart
Wawa, Inc. Wawa, PA - Remote. Snowflake & Azure Data Architect Jun 21 – Aug 22
Wawa, Inc. is a chain of convenience retail stores/gas stations located in 10+ states with 900 all-day, everyday stores with brand name packaged goods, healthy Ready-to-Go options, & a large variety of made-to-order recipes.
Snowflake Migration: to migrate high volume data from Hadoop to Cloud/AWS/Databricks/Snowflake platform & integrate 3rd party tools to support EDW environment & consumption layer for data team, BI & DS/ML
●Architect for high volume data migration-Hadoop to Cloud platform (Cloudera to Snowflake & Snowpark)
●Collected requirements by interacting with technical & business coordinators for Data Lake/Warehouse
●Used Dremio in AWS as Query engine for faster Joins and complex queries over S3 bucket
●Involved in Snowflake performance tuning, capacity planning, cloud spending, & utilization external vendor
●Converted requirements/user stories into detail design & Data Integration for delivery using Canonical model
●Created conceptual/logical/physical architecture diagrams for Data Ingestion Azure, ADF, Databricks, Data Integration-ADF/SAP Hana, Lakeformation, S4/Kafka/MuleSoft/JSON/XML/Text Files-EDI/FHIR & to Integrate MDM, Segment/Lytics/Alteryx – RDS, Athena, Snowflake; Snowpark, Python to build projects
●Designed/created Snowflake objects: Database/Schema/Share/Table/View; Dynamic Masking/Access policies.
●Modeled Snowflake data modelling, pipelines/ELT using Snowflake SQL, implementing stored Procedures.
●Developed & enhanced scalable data pipelines to integrate various data using tools such as Glue, Airflow
●Collaborated with EA to review Data Vault model & recommend tools/application in ARB, ABB, SRS, SBB
●Created current/future state diagrams for data pipeline process, Azure/Databricks/Python for Tasks/Notebook
●Loades data from files as soon as available in a stage and process using SnowPipe, SQS, Starburst
●Trained & Deployed Machine Learning models using AWS Sagemaker,Used Terraform to manage infra’
●Implemented Analysis Services, HDInsight for analytical reporting solutions & ML/Data Science team
●Modeled & created relational & dimensional data marts in Snowflake for BI/data visualization team
●Mentoring team, coordinated with onshore/offshore, IT/Network Infrastructure teams & cut-off/go-live strategy
Environment: AWS, S3, Glue, SCT, DMS, EC2, RDS, Snowflake, SnowPipe, Snowpark, Athena, CloudWatch, Lakeformation, Terraform, Azure, Databricks, ADF, Synapse, Python 3.5, Vertica, Segment/Lytics/Alteryx, Cloudera, Hive, Power BI, Kafka, Dremio, MuleSoft, Collibra, Starburst, Airflow, SAP S4
HRSA- PRF Cloud Project, Remote. Snowflake Architect Dec 20 – May 21
Health Resources & Services Admin (agency of Dept of Health & Human Services)
Provider Relief Funds (PRF) supports American families, workers, & heroic healthcare providers in battle against the COVID-19 outbreak. HHS is distributing relief funds to hospitals & providers on the front lines of the coronavirus. HRSA is in charge distributing Provider Relief Fund to Medicare facilities & providers impacted by COVID-19.
Assisted HRSA in building & enhancing EHB COVID DataMart Services
Involved part of the data quality & integration project around HRSA data from providers Sogeti
Provided technical expertise in the analysis, planning & implementation of solutions around Snowflake
Involved design, architecture using Erwin; reviewed &recommended standards/procedures/best practices
Designed & created data marts in Snowflake to support existing data feed from third-party plan providers
Experienced in developing stored Procedures/writing Queries to analyze & transform data
Migrated high volume data from On-perm RDBMS, SQL Server to Cloud - SnowFlake platform
Utilized Snowflake utilities like SnowSQL, Stages (Internal/External), Big Data model techniques
Used Dremio in AWS as Query engine for faster Joins and complex queries over AWS S3 bucket
●Designed & implemented tracking, error handling mechanism & email notifications
●Helped to enhance existing processes for data quality & visualization-BI & Executive dashboards
●Mentored & groomed other team members on Cloud, SnowFlake technologies & best practices
●Implemented data pipeline for data consumption layer for data using FHIR/EDI, BI & Analytics team
●Practiced Agile methodologies & frameworks such as Scrum; implemented HIPAA standards
Environment: AWS, DMS, Dremio, S3, SnowFlake, SQLServer, SSIS, MicroStrategy, Tableau, Erwin, LINUX
Cigna Health Insurance, NYC, NY. Big Data/Cloud Consultant Mar 20 – Dec 20
Cigna is CT based Health & Life Insurance company - health insurance & health benefit group plans covered.
Performance Data Engineering (PDE): to maintain claims related data for various env. & all business partners; PHI data mask using third party & in-house de-identification algorithms for downstream apps & other groups.
●Architecture for high volume data migration, integration backend platform using agile methodology
●Created Impala tables & Hive views in EMR to load & process large volume of data from various sources
●Involved Metadata management, Master Data Management & maintaining Master/Lookup data
●Groomed user stories by reviewing with business stakeholders & updated; designed/ Modeled using Erwin
●Implemented data pipeline using MuleSoft, Lakeformation, S3, Glue in AWS Redshift, Snowflake, SnowSQL
●Used AWS services like Lakeformation, S3, Glue, Athena, DynamoDB, Aurora to build&maintain DataMart.
Loaded data from files in a stage using SnowPipe; Used EDI/FHIR, SnowStream to implement SCD Type 2
●Eliminated third party masking tool by using hashing technics in Python/Spark for HIPAA req./apps/users
●Designed & created data pipeline process using Python & Databricks in Azure for high volume data
●Used Azure Purview to maintain Data Governance/Assets/Catalog/Dictionary, data classification & lineage.
●SME for data quality & performance tuning; Leveraged Sqoop & DistCp for data migration & Presto
●Designed & implemented tracking using PySpark, Flink, error handling mechanism & email notifications
●Collaborated database architecture reviews & recommended standards/procedures/best PostgreSQL, PL/SQL
●Helped to enhance existing processes for data quality, DevOps, APIs, QuickSight for data visualization
Environment: Cloudera 6, Impala, Hive, Beeline, PySpark, Sqoop, Hue, AWS, S3, Glue, Athena, ElastiCache, Redshift, PostgreSQL, Aurora, DynamoDB, KMS, SnowFlake, SnowPipe, SnowStream, Teradata, Airflow, Lakeformation, Oracle, Toad, Dremio, Cassandra, QuickSight, Flink, Azure, Databricks, ADF, Purview, DevOps
Coach (Tapestry Inc), NYC, NY. Big Data/Cloud Architect Mar 19 – Feb 20
Tapestry is NYC based house of modern luxury lifestyle brands including Coach, Kate Spade & Stuart Weitzman.
A360(Article/Products): to maintain article master, pricing, cost, etc from various vendors to all internal dept.
C360(Customer/Transactions): to maintain cust master & related, transactions, payments, etc from worldwide stores.
●Architecture for high-speed, high-volume data processing for internal & external clients & apps
●Consolidated data pipeline process & implemented tracking, error handling mechanism, email notification
●Implemented data pipeline solution for transaction data using S3, Lakeformation & Snowflake/SnowSQL
●Used MuleSoft, AWS services like S3, Athena, Glue, DynamoDB, Aurora, QuickSight for internal teams to consume data & Sagemaker model for Product owners/Business Analytics team;
●Trained & Deployed Machine Learning models in Production using AWS Sagemaker
●Created Hive/Impala tables/views to load & process large dataset from various sources, Flink, Kafka, JSON
●Leveraged Sqoop to migrate between Netezza & Hive, HBase by Automation, PostgreSQL for Data Analysis
●Involved Metadata management, Master Data Management & maintaining Master/Lookup data
●Strengthened data publishing process using RDS for other groups like BI (Tableau) & SAP Hana S4, APIs
●Involved performance tuning, data quality, Customer Capture process for associate’s incentive
●Involved database architecture reviews, created dimensional models & recommended standards/procedures
Environment: AWS, S3, Glue, Athena, RDS, EMR, DMS, Lakeformation, DynamoDB, Aurora, IAM, Snowflake, RDS, Hadoop, Hive, HBase, Spark, Python 3, PySpark, Netezza, PostgreSQL, Azure, ADF, ADLS, Databricks, Perl/Bash/Shell Script, Hana S4, CloudFormation, Tableau, SageMaker, MuleSoft, Erwin,Visio,
AlticeUSA (Cablevision), Long Island, NY. Data Architect (Big Data/Cloud) Sep 12 – Feb 19
NOC: ODM & ODS (Operational Data Mark & Store): collection of data from CMTS, Modem, digital devices & Remedy tickets, & processing to build Data Store which in term used to build various Data Marts, which is used by NOC, BI tools for Reporting to Executives & dashboards, various internal groups & external vendors/agencies (for both Optimum & Suddenlink brands)
●As a Data Engineer in all aspects of architecture, design & development, capacity planning
●Performed building Data Lake, Data Marts with large Structured/Unstructured data sets
●Data Mining, Data Validation, Predictive Modeling, Data Visualization under agile methodology using Sqoop, Kafka, MySQL, MongoDB in combination with Python & Shell scripting; MATLAB & other libraries
●Designed & Developed both stateless (to get status of a modem) & stateful transformations using Flink and JavaScript for near real time & batch processing; & worked with IT/Network Infrastructure teams
●Implemented data pipeline for Node health data using PostgreSQL, Glue & S3 in AWS Redshift Spectrum
●Used AWS S3 & DynamoDB to publish aggregated data to other groups/venders to using Apache Airflow
●Created Hive/HBase tables to load large set of data from RDBMS, Kafka, NoSQL, JSON, Parquet, ORC files
●Developed multi-million rows data pipeline process using PySpark, HiveQL, Scala, Spark, Presto & Redis
●Architecture for high speed, high volume, near-real time data; Metadata & Master Data management.
●DBA role for Netezza - Groom & Stats in high availability env. zero-downtime, mission-critical platform
●SPSS package used for the analysis of statistical data for customer satisfaction.
●Created ETL jobs/JobLets using Talend & TAC; Change Data Capture (CDC) process & liaison for BI
●Developed Stored Procedures, Views for ETL/ELT process & publishing data for other groups
Environment: Netezza, NZPLSQL, PostgreSQL, NZAdmin, Hadoop, Hive, HBase, Ambari, Ranger, Sqoop, Presto, Spark, Python, PySpark, MATLAB, Flink, Kafka, Airflow, Erwin, AWS, S3, EMR, Glue, Redshift, DynamoDB, IAM, Bash/Shell, Splunk, WebFocus, Tableau, TalenD, Oracle, PL/SQL, MySQL, JavaScript, Unix
S & P (Standard & Poor's) Capital IQ, NYC, NY. Data warehouse/ETL Architect Sep 11 – Aug 12
Data Integration to GCP (Global Credit Portal)- Web-based solution brings both fundamental & market analysis
Intex Data Integration – to integrate Intex’s (structured financial securities data provider) into IPD(Integrated Product Data) for RMBS(Residential Mortgage Backed Securities) one of 7 securities (Pools, Deals, Loan & groups)
Xtrakter Data Integration - to integrate daily trade data for mark-to-market & price calculations, Bid & Offer quotes & traded price for 6,500 fixed income instruments & Security reference data for 600,000 instruments.
●Collected requirements, designed/created logical/physical relational model (Star & Snowflake schema)
●Participated DB architecture reviews & created DB objects using ERWin, PL/SQL package, Stored Procs
●Created design & mapping document, Informatica mappings using Filter, Aggregator, Normalizer, etc
●Created Mapplets/Sessions/Workflows & automated shell script to execute WorkFlows; AutoSys
●Performed tuning & optimization, data validation, documented & eliminated gaps
●Coordinated with onshore & offshore development & QA teams
Environment: Informatica 9.0.1, Oracle 11g2, PL/SQL, TOAD for Oracle 10.6, Erwin, Visio Data Modeler,
OLAP, SharePoint, MicroStrategy, VersionOne, CVS, TortoiseCVS WinSCP/PuTTy & Windows 7/Unix
Bloomberg, NYC, NY. ETL/Data Integration Consultant Sep 10 - Aug 11
Bloomberg Government (BGov) is comprehensive, subscription-based, online tool collects best-in-class data, provides high-end analysis & analytic tools, & delivers deep, reliable, timely & unbiased data reporting to help congressional staffers, lobbyists, corporate headquarters, sales executives & foreign government agencies.
●Implemented ETL to integrate Govt spend data for opportunities/awards/contracts/Grants
●Mastered data cleansing/profiling/stewardship & loaded conformed dimensional data for facts & measures
●Improved performance using Parallel processing, database partitioning; maintained Admin Console
Environment: Oracle 11g, Oracle AWM, PL/SQL, TOAD, MySQL 5.5, MySQL WorkBench, Talend Open Studio 5.1/Integration Suite 5.1, ERWin Data Modeler, SVN(Subversion), ToolKit/PuTTy, Windows, Unix/Linux
Dept. of Treasury/Fannie Mae, Herndon, VA. ETL/Tech Lead Mar 09 - Aug 10
Home Affordable Modification Program (HAMP) is part of Home Affordability & Stability Plan (HASP), which is Treasury's comprehensive strategy to get the economy back on track by helping families modifying their mortgages to avoid foreclosure. Fannie as a Program Administrator handles data/payments/Reporting. To support the Treasury's analysis/decision making, designed & developed Data Warehouse & Data Mart for Reporting, ETL process.
Environment: Informatica PowerCenter 8.6, Oracle 11g, PL/SQL, ERStudio/ERWin 7.3, AutoSys, DOORS 8,1, BMC Remedy 7.1, Clear Quest/Clear Case, OLAP, BOXI, PuTTy, Unix/Windows
Up to Feb 09, involved various projects/clients using ETL/GUI, RDBMS, SQL, Scripts, Version control, monitoring tools:
New York City Transit (MTA), NYC, NY Tracking System (PSRTS) & I-Vault (IDM solution for portal)
MultiPlan, Inc, NYC, NY Supplier of network-based cost mgmt solutions for healthcare providers
Estee Lauder, Melville, NY Global Manufacturing System (GMS): PP, forecasting, procurement, cost & inventory. Demonstration (DeMo): Retail DSS; DW to sales force & mgmt to track sales/personal.
JPMC: Chase Auto Finance, Garden City, NY iCAF - Lease/Loan accounts, Collections &Title Mgmt
Aventis pharmaceuticals, Bridgewater, NJ ISARIS (Institutional Sales&Reporting System): IMS Data
Kraft/Nabisco Inc, Parsippany, NJ Food Service Portal & Integration for Sales Dept.
Camp systems intl., LI, NY AviSource: Computerized Aircraft Maintenance System
NY State insurance fund, NYC, NY Claims Management System; Customer Management System
A. C. Nielsen, Chicago, IL NITE: Converted ratings data from multi-dimensional database to various clients
Other Projects: Production Planning Control & Sales Order Processing, Budget Monitoring(BMS), Inventory & Receivable, Complaints Management & Directory enquiry & Material Requirement Planning (MRP)
EDUCATION: Master of Science in Computer Science