Sampath Sai Yelleti
Professional Summary
Dallas, Texas (Open to Relocate)
+1-703-***-**** – **********@*****.***
Strategic and innovative Data Engineer with 5+ years of experience in designing and deploying scalable, high-impact data solutions across cloud platforms. Proven ability to lead cross-functional teams, enhance project performance, and deliver mission-critical pipelines in high-volume data environments. Demonstrated leadership in optimizing ETL systems and automating workflows, leading to a 35% increase in project efficiency and operational growth. Expert in Azure, AWS, and big data tools with strong collaboration, mentoring, and stakeholder communication skills. Proven expertise in solving complex data challenges in government, telecom, and enterprise domains.
Work Experience
Data Engineer
January 2023 to Present
Radiant Digital
Client :- Administration for Community Living (ACL)
• Worked with a team of 5 engineers to implement scalable ETL pipelines using Azure Data Factory, integrating data from SQL Server, Oracle DB, and Cosmos DB into Azure Data Lake with over 5 TB of data.
• Reduced report generation time by 40% through streamlined transformation processes and efficient data modeling strategies.
• Spearheaded Databricks-based data processing workflows using PySpark, SAS, and Python to enhance pipeline performance by 48% and minimize latency.
• Provided training and guidance to interns, equipping them with industry-standard tools and methodologies for data engineering.
• Developed 6+ KQL-powered dashboards in PowerBI, reducing manual report generation by 50% and enabling real-time insights for operational decision-making across 3 business units.
• Designed and deployed 10+ Azure Functions for API integration, automating data retrieval processes and reducing manual intervention by 70%, while improving data pipeline reliability and uptime.
• Led data governance initiatives by deploying metadata-driven frameworks for traceability, boosting data transparency and compliance with 12 federal government datasets.
• Fostered a culture of collaboration through code reviews, mentoring junior analysts, and organizing knowledge-sharing sessions.
Graduate Teaching Assistant
August 2022 to December 2023
George Mason University, Department of CS
• Mentored over 200 graduate students in developing and optimizing ETL pipelines using Snowflake and Informatica, promoting real-world problem-solving skills.
• Conducted 25 advanced lab sessions on Apache Spark, Kafka, Hadoop, and NoSQL databases, emphasizing best practices in distributed computing.
• Facilitated collaborative peer review processes to drive team-based learning and enhance project execution.
• Designed grading rubrics and evaluation frameworks to ensure consistency and quality in student performance assessments using Blackboard LMS and Microsoft Excel.
• Actively participated in curriculum improvement by integrating cloud and big data industry trends into coursework.
Data Engineer
May 2019 to December 2021
Vizag Broadband Communications
• Led the development and deployment of AWS-native data pipelines using S3, Glue, DynamoDB, and Lambda, improving data ingestion efficiency by 50%.
• Processed over 2 TB of semi-structured and structured data daily using Apache Spark on EMR to support analytics on network performance, customer behavior, and usage trends.
• Designed and maintained Redshift data models, enabling efficient querying and analytics for over 10 key business functions.
• Collaborated with DevOps and QA teams to implement CI/CD pipelines with Git and Jenkins for version- controlled deployment of ETL jobs and data validation scripts.
• Delivered Amazon QuickSight dashboards to leadership teams, enhancing KPI tracking and SLA adherence by 30%.
• Implemented job orchestration and dependency management using Apache Airflow to coordinate daily ETL processes and ensure timely delivery of data to Redshift and QuickSight dashboards.
• Automated ETL workflows and alerts with CloudWatch and SNS, improving system resilience and reducing incident resolution time by 35%.
• Assisted in data modeling and transformation layer design using dbt, ensuring reusable SQL logic and version control across multiple environments.
• Developed RESTful APIs using Java and Spring Boot to expose processed analytics data for use by internal reporting tools and external partner dashboards.
Certifications
Microsoft Certified: Azure Data Engineer Associate, 430EE527980EFCE Education
Master of Science in Data Analytics Engineering
George Mason University
Bachelor of Science in Computer Science
GITAM University
Websites, Portfolios and Profiles
https://www.linkedin.com/in/SampathSaiYelleti/
Awards
Achieved Best Overall Student Solution and Crowd Source Prize in Bringdown Counterfeiting Hackathon. Awarded $12,500 for Building an application to detect phishing URLs in real-time messaging environments. Projects
Batch ETL Pipeline with Apache Airflow, Python, MySQL, Amazon S3, Engineered an end-to-end batch ETL pipeline using Apache Airflow, enabling automated extraction of sales data from APIs and MySQL. Enhanced data integrity by incorporating Pandas-based validation and transformation layers. Leveraged Amazon S3 for cloud storage and configured pipelines for integration with Redshift and BigQuery. Enabled retry logic and alerting via Airflow, ensuring reliability and maintainability. Demonstrated strong automation skills, improving overall data delivery time and pipeline recovery processes. Reduced ETL runtime by 20%, and improved data integrity for BI reporting.
Technical Skills
Cloud: Azure (ADF, Databricks, Data Lake, Functions, Synapse, KQL), AWS (S3, Glue, EMR, Lambda, Redshift, Athena), GCP (BigQuery, Dataflow, Cloud Storage)
Languages: Python, Java, SQL, PySpark, Scala, SAS, KQL, Bash Big Data: Apache Spark, Hadoop, Kafka, Flink, Hive, HBase Data Tools: Snowflake, Informatica, NoSQL (MongoDB, Cassandra), Apache NiFi, dbt, Apache Beam, Airflow Visualization: Power BI, Tableau, Amazon QuickSight, Looker DevOps & Orchestration: Docker, Kubernetes, Git, Jenkins, CI/CD, Terraform Data Management: Data Modeling, Data Warehousing, Data Lakes, Data Quality, Metadata Management, Data Governance
Workflow/ETL: Apache Airflow, Azure Data Factory, AWS Glue, Talend, dbt Other: Agile/Scrum, Team Leadership, Project Optimization, Cost Optimization, Performance Tuning, Monitoring
(Datadog, Prometheus), Logging (ELK Stack, Splunk)