Sudheer Kumar
Data Engineer
Tampa, FL ****************@*******.*** 563-***-**** LinkedIn
Summary
Data Engineer with over 5+ years of experience designing and implementing scalable, cloud-native data platforms across AWS, GCP, and Azure. Specializes in building real-time and batch pipelines using Spark (Databricks), Kafka, and Snowflake following lakehouse architecture principles. Skilled in orchestrating end-to-end data workflows with dbt and Airflow, enabling low-latency analytics and reverse ETL to power business operations. Strong collaborator across engineering, analytics, and compliance teams, with proven ability to reduce pipeline failures, optimize cloud spend through FinOps practices, and align with HIPAA, SOC2, and GDPR standards. Experienced in mentoring junior engineers and leading cross-functional delivery across product-driven data initiatives. Skills
Cloud Platforms & DevOps: AWS (EC2, S3, Lambda, Redshift, Glue, EFS), Azure (ADF, Synapse), GCP (BigQuery, Dataflow, Pub/Sub), Kubernetes, Docker, Jenkins, GitLab CI, Terraform, Helm, Prometheus, Grafana, ELK, Splunk Programming & Backend: Python, SQL, Java, Bash, RESTful APIs, dbt (Data Build Tool) ETL, Modeling & Orchestration: Apache Spark (Databricks), Delta Lake, Snowflake, Redshift, BigQuery, Airflow, Talend, Apache NiFi, lakehouse architecture
Streaming & Real-Time Data: Apache Kafka, Spark Streaming, GCP Dataflow, reverse ETL, event-driven design, schema enforcement, data contracts
Data Warehousing & Storage: Snowflake, Redshift, BigQuery, PostgreSQL, MySQL, Oracle, MongoDB, Teradata, S3, GCS Data Observability & Quality: Great Expectations, SLA dashboards, lineage tracking, query tuning, data anomaly detection Security & Compliance: IAM, Apache Atlas, Alation, HIPAA, SOX, SOC2, GDPR Analytics & Visualization: Power BI, Tableau, NLP pipelines, GeoPandas, PostGIS FinOps & Cloud Cost Optimization: S3 lifecycle policies, warehouse auto-scaling, storage tiering, FinOps practices Version Control & Collaboration: Git, Jira, Confluence, Agile, stakeholder collaboration EXPERIENCE
Clairvoyant Data Engineer Mar 2023 – Present
• Designed lakehouse-style data platforms using Delta Lake and Snowflake to unify batch and streaming pipelines, reducing insight latency by 50%.
• Modeled analytics layers using dbt and orchestrated Airflow DAGs, streamlining transformation logic and increasing code modularity by 40%.
• Delivered real-time ingestion pipelines using Kafka and Spark (Databricks), reducing alert latency by 70% for operational dashboards.
• Integrated reverse ETL flows from Snowflake into marketing CRMs, improving lead conversion tracking by 25%.
• Reduced monthly cloud costs by 22% by applying FinOps principles—S3 lifecycle rules, warehouse auto-scaling, and resource tagging.
• Implemented SLA tracking and data validation via Great Expectations and Prometheus, decreasing production incident resolution time by 40%.
• Automated Terraform-based provisioning of cloud and Kubernetes resources, cutting onboarding time for new environments by 60%.
• Mentored two junior data engineers in Spark tuning and dbt implementation, improving team delivery speed and onboarding effectiveness.
• Created stakeholder-facing dashboards in Power BI/Tableau, reducing manual reporting overhead by 45%.
• Collaborated with product owners, analysts, and client SMEs to align data architecture with strategic KPIs and compliance goals (HIPAA, SOC2, GDPR).
CueTech Systems Data Engineer Aug 2018 – Jan 2022
• Built scalable ETL and streaming pipelines using Spark, Kafka, and NiFi to support multi-source ingestion into Redshift and BigQuery.
• Migrated ETL workflows to Delta Lake for schema enforcement and time-travel support, reducing data reprocessing incidents by 35%.
• Developed dbt-based models for curated finance and marketing layers consumed by BI teams in Tableau and Power BI.
• Tuned SQL queries in PostgreSQL, Redshift, and Oracle, achieving up to 40% runtime improvement on high-volume datasets.
• Created reverse ETL pipelines from BigQuery to Salesforce and HubSpot, enhancing CRM data accuracy and campaign personalization.
• Managed governance integration using Apache Atlas and Alation to establish end-to-end data lineage and access auditing.
• Delivered geospatial analysis with GeoPandas/PostGIS to inform site expansion strategy and increase regional sales targeting.
• Led implementation of Jenkins/GitLab CI for ETL deployment automation, reducing change failure rate by 30%.
• Interfaced with project managers and business analysts to transform reporting requirements into production-grade datasets. Education
Master of Science in Management Information Systems Hood College