Data Engineer Senior

Location:

San Jose, CA

Salary:

130000

Posted:

October 07, 2025

Contact this candidate

Resume:

Kevin Lively San Francisco, (CA),(*****)

818-***-****

*****.********@*****.***

https://kevin-lively.github.io/portfolio/

https://github.com/Kevin-Lively

Data Solution Architect — Big Data Engineer —ETL Pipeline Expert—Data Warehouse Engineer PROFESSIONAL SUMMARY

Results-driven Senior Data Solution Architect with over 11 years of experience in designing, developing, and optimizing scalable data solutions. Specialized in Latency-Conscious Data Engineering, real-time analytics, and high-throughput systems. Expertise in ETL pipelines, big data processing, and cloud-based architectures using tools such as Talend, Apache NiFi, Apache Airflow, and Informatica. Skilled in data warehousing methodologies including Star Schema, Snowflake Schema, and Data Vault, with deep experience across cloud platforms like AWS, Azure, and GCP. Experienced in modern data architectures including Data Lakehouse, Delta Lake, and cloud-native data platforms

Proficient in big data technologies such as Hadoop, Spark, Kafka, and HDFS, with a strong focus on real-time data streaming, distributed systems architecture, and performance optimization. Proven track record in implementing data governance frameworks to ensure data quality, metadata management, and regulatory compliance (HIPAA, GDPR). Experienced in integrating machine learning models using Scikit-learn, Tensor Flow, and PyTorch, and deploying them via Databricks. Adept in data visualization using Tableau, Power BI, QuickSight, and Plotly. Strong background in DevOps practices, including Docker, Kubernetes, and CI/CD pipelines, enabling efficient deployment and scalable operations. TECHNICAL SKILLS

• Database Management : SQL Server, MySQL, PostgreSQL, Oracle, MongoDB, Cassandra, Redis, SQLite, Amazon Redshift, Google BigQuery

• ETL & Data Integration (Extract, Transform, Load) : Talend, Informatica, Apache NiFi, Apache Airflow, Pentaho, SSIS, Alteryx, dbt, ETL Pipeline Optimization, Data Migration, Real-Time Data Processing

• Data Warehousing : Star Schema, Snowflake Schema, Dimensional Modeling, Data Vault, Kimball Methodology, Fact-Dimension Modeling, Data Marts, Data Lake Architecture, Cloud Data Integration

• Big Data Technologies : Hadoop, Spark, Kafka, Hive, HBase, Presto, Real-Time Data Processing (Apache Storm, Apache Flink), Stream Processing (Kafka Streams, AWS Kinesis)

• Data Modeling : ER Diagrams, Schema Design, UML, Dimensional Modeling, Fact-Dimension Modeling

• Data Governance & Security : Data Quality Management, Metadata Management, Data Lineage, Master Data Management, HIPAA Compliance, GDPR Compliance, Data Privacy

• Programming Languages : Python, Java, Scala, R, SQL, Bash, Rust, .NET

• Cloud Platforms : AWS,(Glue, Redshift, Lake Formation) Azure(Synapse, Data Lake), GCP, Databricks, Snowflake, Azure Synapse Analytics

• Data Visualization : Tableau, Power BI, Matplotlib, Seaborn, Plotly, D3.js, QuickSight, Custom Visualizations, Interactive Reporting (Jupyter Notebooks)

• Machine Learning & AI : Scikit-learn, TensorFlow, PyTorch, Model Deployment, Feature Engineering, Model Evaluation, MLOps (CI/CD for Machine Learning), AI/ML Frameworks, Deep Learning (CNNs, RNNs), Reinforcement Learning

• DevOps & CI/CD : Git, CI/CD, Docker, Kubernetes, Ansible, Terraform, Jenkins, CircleCI, GitLab CI, Automating Cloud Resource Provisioning and Monitoring

• Financial Analysis : Budgeting & Forecasting, Financial Modeling, Risk Management, Variance Analysis, Financial Reporting, KPI Analysis, ROI Calculation

• Healthcare Data Analysis : EHR Data, Claims Data Analysis, Population Health Management, Clinical Data Integration, HL7, Healthcare Informatics, Quality Measures

• Databricks Expertise : Delta Lake, Databricks SQL, MLflow, Unified Data Analytics Platform, Databricks Notebooks, Databricks Lakehouse

• DataOps & Automation : Automated Testing for Data Pipelines, Version Control for Data Assets (DVC), Data Cataloging (AWS Glue, Alation), Data Monitoring and Alerting Systems

• Project Management Tools : Jira, Asana, Trello, Confluence, Agile/Scrum Methodologies

• Cross-Industry Knowledge : AI/ML Applications in Manufacturing (IoT Analytics, Smart Factories), Predictive Maintenance Models, Supply Chain Optimization PROFESSIONAL EXPERIENCE

Palantir Technologies

Data Solution Architect 2022.10 - Present

• Architected and deployed real-time data streaming infrastructures using Apache Kafka, Apache Flink, and AWS Kinesis, enabling 99.9% uptime for data pipelines and improving supply chain visibility and Operational responsiveness by 35%.

• Designed and implemented scalable, cloud-native data architectures across AWS, Azure, and GCP, integrating Amazon Redshift, GoogleBigQuery, Azure DataLake, Databricks Lakehouse, and Snowflake, leading to a 50% reduction in infrastructure costs and enhanced performance elasticity.

• Led end to endmigration of on premise data warehouses to modern cloud ecosystems, leveraging Snowflake, Delta Lake, and Databricks, resulting in 60% improvement in query performance and 70% decrease in maintenance overhead.

• Engineered and optimized over 250 ETL/ELT workflows using Talend, Apache NiFi, Apache Airflow, and Informatica, automating ingestion from MongoDB, flat files, and global systems in structured and semi-structured formats.

• Developed and deployed predictive analytics models in Python, Apache Spark, and MLlib, increasing demand forecast accuracy by 25% and reducing inventory stockouts by 20%.

• Built and maintained CMS-related data pipelines (MOR, MMR, MAO) using GCP Dataflow and BigQuery to support healthcare compliance reporting.

• Established and enforced enterprise-wide data governance policies, ensuring 100% compliance with GDPR and HIPAA, while implementing data lineage tracking, access control, and sensitive data masking using tools like Apache Atlas and AWS Lake Formation.

• Mentored and led cross-functional teams of data engineers, architects, and DevOps professionals, advocating best practices in CI/CD (Jenkins, GitLab CI), containerization (Docker, Kubernetes), and infrastructure as code (Terraform), enhancing team productivity by 40%.

• Boosted analytics performance by 40% through Snowflake query optimization, materialized views, and partitioning, supporting real time dashboards in Power BI and Tableau for executive reporting. Cloudera

Senior Data Engineer 2019.08 - 2022.09

• Developed and optimized large scale data pipelines using Hadoop, Spark, Kafka, and Hive, significantly improving data processing speeds.

• Experience implementing data contracts and aligning with Data Mesh principles for decentralized ownership across distributed teams.

• Integrated Snowflake for cloud warehouse migrations, optimizing query performance and enabling real-time analytics dashboards.

• Automated ETL workflows with Apache NiFi and Airflow, ensuring seamless data ingestion and transformation across cloud platforms.

• Collaborated with data science teams to deploy and maintain machine learning models on Databricks and MLflow, improving predictive capabilities.

• Implemented advanced performance tuning techniques for Apache Spark, reducing query execution times and improving scalability.

• Established data governance frameworks, enforcing data privacy, data lineage tracking, and industry regulation compliance.

• Engineered Informatica-based ETL pipelines for legacy-to-cloud data migration, reducing processing latency by 35%.

• Built CI/CD pipelines for data workflows using Jenkins and GitLab, enabling faster and more reliable deployments.

DataArt

Data Engineer 2017.06 - 2019.07

• Designed and optimized scalable ETL pipelines using Apache Beam, Python, and Google Cloud Dataflow, supporting high-volume data processing.

• Architected and implemented a data lake on Google Cloud Platform (GCP), enhancing data accessibility and crossfunctional analytics.

• Developed and automated data quality validation frameworks using Great Expectations, reducing data discrepancies by 40%.

• Standardized data models and schema designs to improve reporting consistency and reduce redundancy across business units.

• Integrated data from diverse sources, including REST APIs, flat files, and cloud databases, streamlining ingestion workflows and reducing delivery time by 30%.

• Collaborated closely with analytics and engineering teams to support real-time data access, increasing operational efficiency.

• Played a key role in enabling scalable healthcare data infrastructure to support precision oncology research and analytics.

Steer Health

ETL & Data Warehouse Engineer 2014.02 - 2017.05

• Designed, developed, and maintained ETL pipelines using Talend, SSIS, and Apache NiFi, supporting cloud based data warehousing solutions.

• Architected data warehouse solutions utilizing dimensional modeling, fact-dimension modeling, and Kimball methodologies, optimizing reporting performance.

• Implemented data quality automation frameworks, ensuring accuracy and consistency across enterprise systems.

• Developed real-time data integration solutions for healthcare and financial analytics using AWS and Google BigQuery.

• Created interactive dashboards and reports using Tableau, Power BI, and Matplotlib, enabling data driven decision-making.

• Spearheaded cloud migration initiatives, ensuring smooth transitions from on premises data systems to AWS, Azure, and GCP.

• Managed data governance policies, implementation of HIPAA compliance standards, data lineage tracking, and master data management best practices.

PROJECTS

Palantir Technologies

Data Solution Architect

• Designed and led the development of a real-time healthcare analytics platform integrating EHR and claims data using Apache Kafka, Apache Flink, and AWS Kinesis.

• Enabled predictive insights for population health management and reduced data processing latency by 60%.

• Deployed HIPAA-compliant data pipelines with Apache NiFi and Airflow on AWS, enhancing care quality and regulatory compliance.

Cloudera

Cloud Data Lakehouse Migration

• Led the migration of legacy on-premises data infrastructure to a unified cloud-based lakehouse using Databricks and Delta Lake on Azure.

• Streamlined ETL workflows using Apache Spark and Talend, improving data refresh rates by 70%.

• Integrated machine learning models with MLflow to forecast energy demands, increasing predictive accuracy by 30%.

DataArt

Financial Data Pipeline Modernization

• Developed scalable ETL pipelines with Apache Beam, Python, and Google Cloud Dataflow, processing over 10 million financial records daily.

• Designed a cloud-native data lake on GCP, enabling seamless access to structured and unstructured data for cross-team analytics.

• Implemented automated data validation and quality checks using Great Expectations, reducing data inconsistencies by 40%.

Steer Health

ML Feature Store for Fraud Detection

• Designed and deployed a centralized ML Feature Store using Databricks, MLflow, and Feast, enabling 3 faster model iterations. Reduced fraud detection false positives by 18% through real-time feature engineering. CERTIFICATIONS

• AWS Certified Data Analytics – Specialty

• Google Professional Data Engineer Certification

• Microsoft Certified: Azure Data Engineer Associate

• Databricks Certified Data Engineer Professional

EDUCATION

Bachelor of Science

Bluefield State University

Contact this candidate