Data Engineer Senior

Location:

Ashburn, VA

Salary:

$120,000

Posted:

October 08, 2025

Contact this candidate

Resume:

Kiff Sharp

Senior

Miami, 430-***-**** https://www.linkedin.com/in/k-sharp-558b9294/ *********@*****.*** PROFESSIONAL SUMMARY

As a Senior with over 8 years of experience, I’m passionate about designing and optimizing large-scale data platforms that help businesses unlock the full value of their data. I’ve built and maintained distributed ETL pipelines, real-time data streaming systems, and cloud-based data solutions across AWS, GCP, and Azure. My hands-on expertise spans Python, SQL, Scala, Apache Spark, and tools like Databricks, Airow, Kafka, dbt, Snowake, MLow, and Terraform. I enjoy working closely with analytics, data science, and product teams to translate complex requirements into practical, scalable solutions. I’ve led initiatives supporting advanced analytics, machine learning workows, and the integration of LLMs using OpenAI and RAG architectures. My focus is always on building reliable, cost-effective systems that deliver meaningful results and actionable insights for the business.

SKILLS

• Languages: Python, Scala, R Programming, Java, C/C++, Bash, ink, MapReduce, Gurobi

• Data Storage: T-SQL, MySQL, PostgreSQL, Server, SQL DML, Oracle, BigQuery, ElasticSearch, AWS RedShift, Snowake, AWS S3, AWS DynamoDB, Redis, Hadoop, Enterprise Data Lake, Data Warehouse, Data Hubs

• Tools / Libraries: Ataccama, Acceldata, MLow, Scipy, Numpy, Pandas, Matplotlib, Scikit-learn, PyTorch, Tensorow, Apache Spark, Apache Airow, PowerBI, Grafana, Tableau, OpenCV, Databricks, Git, RabbitMQ, Anomalo, Unity Catalog, Alation

• Cloud/Devops: AWS(Glue, EC2, RDS, Lambda, Kinesis, SageMaker, EKS, RedShift), GCP, Azure, Docker, Terraform, CI/CD, Linux

• Others: ETL, Web Scraping, LLM(OpenAI, Llama), Econometrics, Agile Development, NLP, Vector Search, RAG WORK EXPERIENCE

Databricks San Francisco, CA 07/2023 - 08/2025

Senior

• Led technical projects with product, analytics, and engineering teams, delivering data solutions that improved reporting accuracy by 25% and reduced time-to-insight for business decisions.

• Managed the migration from Hadoop and Hive to Databricks, cutting data pipeline runtime by 40% and reducing infrastructure costs by 30%.

• Built ETL pipelines in PySpark and Spark SQL, supporting marketing analytics and operations for over 10 million user events daily.

• Set up real-time data ingestion with Azure Event Hub and Kafka, enabling near real-time product launch data with latency under 2 minutes.

• Integrated 15+ new data sources into the data lake, providing comprehensive datasets for business and analytics teams.

• Troubleshot and resolved 100+ pipeline incidents, maintaining 99.9% data availability for downstream applications.

• Communicated project status and outcomes to ership, improving stakeholder alignment and reducing project delivery times by 20%.

• Implemented data quality and governance controls, supporting successful completion of two major compliance audits and reducing data errors by 35%.

• Mentored five junior engineers, contributing to a 100% team retention rate and helping two members earn promotions.

• Organized post-project reviews, resulting in process improvements that decreased project rework by 15%.

• Maintained and optimized data models in Snowake, reducing query times by 50% and supporting analytics for over 30 business users.

• Used Unity Catalog and Alation to improve data discoverability, increasing self-service data adoption by 40%.

• Automated Databricks pipeline deployments with Azure DevOps, decreasing deployment failures by 80%.

• Optimized database performance through partitioning and indexing, resulting in savings of $20K/year in cloud compute.

• Orchestrated batch and streaming jobs in Airow, improving workow reliability and reducing manual interventions by 70%.

• Monitored data quality with Anomalo and Acceldata, proactively resolving 50+ issues before they impacted business reporting.

• Secured sensitive data by establishing access controls, achieving zero security breaches during tenure.

• Prepared curated datasets and features for machine learning, accelerating model development cycles by 30%.

• Empowered teams with metadata management and self-service data discovery, ing to a 25% increase in project delivery speed.

Masterclass San Francisco, CA 07/2021 - 03/2023

• Built real-time data pipelines using Google Cloud Dataow and Pub/Sub to process subscriber interactions, reducing campaign data latency from 24 hours to under 10 minutes.

• Designed event-driven workows with Apache Beam, powering personalized product recommendations and customer segmentation for over 3 million users.

• Automated data validation and anomaly detection with Great Expectations, improving data reliability and cutting data-related incidents by 30%.

• Developed and maintained BigQuery data marts to support advanced analytics, executive dashboards, and marketing performance reporting.

• Orchestrated ETL processes with Prefect, increasing pipeline reliability and reducing manual intervention by 40%.

• Created and managed data lake storage on Google Cloud Storage, optimizing partitioning and lifecycle policies to reduce monthly costs by 20%.

• Integrated Looker dashboards for real-time campaign tracking, enabling marketing teams to react quickly and increase engagement rates by 25%.

• Implemented row-level security and access controls in BigQuery, ensuring compliance for sensitive user and campaign data. Supported machine learning projects by preparing clean, labeled datasets in BigQuery, accelerating model delivery timelines by 30%.

• Collaborated with product s and analytics teams to launch new data products, delivering on deadlines and maintaining 99.9% pipeline uptime.

Allocations Miami, 05/2019 - 07/2021

• Played a key role in designing and maintaining data ingestion pipelines using AWS Glue and AWS Lambda. These solutions increased data availability for client-facing dashboards by 30% and supported multiple new product launches.

• Enabled analysts to access data independently by building modular ELT workows with Amazon Redshift, Snowake, and dbt, which reduced onboarding time for new sources by 60%.

• Contributed to the development of batch and streaming platforms utilizing Amazon Kinesis and Python, which improved real- time analytics and decreased reporting latency by 50%.

• Improved user experience for business teams by optimizing Amazon Redshift and PostgreSQL through materialized views and query tuning, cutting dashboard load times from 3 minutes to under 20 seconds.

• Supported the development and deployment of microservices built with FastAPI and Docker on Amazon ECS, delivering real- time data access for personalization features to over 150,000 users.

• Built an internal data lineage visualization tool using ask and React, reducing audit preparation time for compliance teams by 35% and improving data traceability.

• Participated in a log migration project, moving legacy logs to Amazon S3 and organizing metadata with AWS Glue and AWS Athena, which improved historical audit efficiency by 40%.

• Helped establish real-time anomaly detection alerts using Grafana and AWS Lambda, reducing false positives by 25% and enhancing system monitoring reliability.

• Collaborated with machine learning engineers to prepare training datasets and automate feature pipelines using Tensorow and Pandas, accelerating fraud detection model development by 30%. Meta Fort Worth, TX 11/2017 - 04/2019

• Collaborated with senior engineers to build and maintain data pipelines for supply chain and sales analytics using Google Cloud Pub/Sub, Apache ink, and Python. Improved data delivery speed by 20%, helping teams access critical reports more quickly.

• Monitored and processed real-time manufacturing IoT data, applying streaming concepts and quality checks in Google Cloud Platform. Early issue detection helped reduce plant downtime incidents by 10%.

• Provided day-to-day support and troubleshooting for Cassandra operational data stores, ensuring analytics teams had consistent access to data and contributing to a 15% decrease in data retrieval times.

• Assisted with ETL error handling and retry logic implementation using SQL and Python, which reduced interruptions in data ow by 25%, supporting more reliable business operations.

• Supported the migration of batch jobs from Hadoop/Hive to Databricks Delta Lake and BigQuery, validating data integrity and improving processing performance, which lowered maintenance workload by 30%. EDUCATION

Master’s degree, Data Modeling/Warehousing and Database Administration Texas Southern University Jun 2015 – Oct 2017 Houston, TX

Bachelor’s degree, Computer Science University of Houston May 2011 – Jun 2015 Houston, TX

Contact this candidate