Prasanth Goggela Tagoor
Email: *************@*****.*** Mobile: 913-***-**** Location: Texas, United States
PROFESSIONAL SUMMARY
Over 6 years of experience as a Data Engineer, specializing in data hydration and ETL transformations for analytics platforms.
Proven expertise in setting up Change Data Capture (CDC) using Debezium and other tools, ensuring efficient data flow into data lakes.
Skilled in Apache Spark, with hands-on experience in Data Frames, Spark SQL, and Spark Streaming for both batch and streaming data processing.
Proficient in orchestrating ETL jobs and streaming data pipelines, transforming raw CDC data into query-able formats for analytics.
Extensive knowledge of AWS services, including S3, EMR, Glue Data Catalog, and Lambda functions, optimizing data storage and processing.
Strong background in performance tuning and implementing Big Data concepts, enhancing data processing efficiency.
Familiar with Apache Airflow for workflow management, ensuring reliable data pipeline orchestration.
Committed to continuous learning and professional development, holding multiple certifications in data engineering and analytics.
SKILLS
Programming Languages: Java, Python, Scala
Operating Systems: Windows, Linux
Cloud Platforms: AWS (S3, EMR, Glue Data Catalog, Lambda, Step Functions, MWAA), Google Cloud Platform, Azure
DevOps & CI/CD: Apache Airflow, AWS Batch
Development Tools: Apache Spark (Data Frames, Spark SQL, Spark Streaming), Databricks, TensorFlow
Reporting Tools: Power BI, Tableau, Excel
Frameworks & Libraries: Apache Hudi, Apache Griffin
Databases & Data Warehousing: SQL, NoSQL
Big Data & Streaming: Change Data Capture (CDC), ETL Pipelines, Streaming Data Processing
Testing & QA: Unit Testing, Integration Testing
Security & Compliance: Data Governance, Compliance Standards
Monitoring & Observability: AWS CloudWatch, Apache Kafka
Collaboration Tools: JIRA, Confluence
Documentation Tools: Markdown, Microsoft PowerPoint
CERTIFICATIONS
AWS Certified Data Analytics – Specialty
Azure Data Engineer Associate (DP-203)
Databricks Certified Data Engineer Professional
Google Cloud Professional Data Engineer
TensorFlow Developer Certificate
Machine Learning Specialization – Coursera (Andrew Ng)
EDUCATION
Master of Science in Big Data Analytics University of Central Missouri 4.0 GPA
Bachelor of Technology in Electrical and Electronics Engineering KL University 92%
WORK EXPERIENCE
NVIDIA Corporation – Austin, TX
Senior Data Engineer - AI & Analytics Platforms – Nov 2023 to Present
Spearheaded the design and implementation of a scalable data pipeline using Apache Spark, enhancing data processing speed by 40%, which significantly improved analytics capabilities for AI-driven projects.
Optimized ETL processes by integrating AWS Glue and Apache Airflow, resulting in a 30% reduction in data latency and improved data availability for real-time analytics.
Collaborated with cross-functional teams to develop machine learning models, utilizing TensorFlow and PySpark, which increased predictive accuracy by 25% in customer behavior analysis.
Engineered a robust data lake architecture on AWS S3, leveraging EMR for batch processing, which streamlined data access and reduced storage costs by 20%.
Automated data quality checks using AWS Deequ, ensuring data integrity and consistency across various data sources, which enhanced trust in analytics outputs.
Delivered comprehensive documentation and training sessions for team members on best practices in data engineering and analytics, fostering a culture of knowledge sharing.
Implemented performance tuning strategies for Spark SQL queries, resulting in a 35% improvement in query execution times, thereby enhancing overall system efficiency.
Developed and maintained CI/CD pipelines for data workflows using AWS Lambda and Step Functions, improving deployment frequency and reducing rollback incidents by 15%.
Conducted regular code reviews and performance assessments, mentoring junior engineers and promoting adherence to coding standards and best practices.
Engaged in continuous learning and professional development, obtaining certifications in AWS Data Analytics and Google Cloud Data Engineering to stay updated with industry trends.
Technologies Used: Java, Python, Apache Spark, AWS Glue, Apache Airflow, EMR, S3, TensorFlow, AWS Lambda, AWS Deequ
Humana Inc – Louisville, KY
Big Data Engineer – Mar 2021 to Jul 2023
Automated data ingestion processes using Apache Kafka and Spark Streaming, achieving a 50% increase in data throughput and enabling real-time analytics for healthcare applications.
Engineered ETL pipelines that processed over 5TB of data daily, utilizing Apache Spark DataFrames and SQL, which improved reporting accuracy and reduced processing time by 30%.
Collaborated with data scientists to develop predictive models for patient health outcomes, leveraging machine learning techniques that improved patient care strategies by 20%.
Optimized data storage solutions on AWS S3, implementing lifecycle policies that reduced storage costs by 25% while maintaining compliance with healthcare regulations.
Led the migration of legacy data systems to a cloud-based architecture, enhancing scalability and reliability of data access for over 1,000 users across the organization.
Conducted performance tuning and optimization of Spark jobs, resulting in a 40% reduction in resource consumption and improved job execution times.
Developed comprehensive data governance policies and procedures, ensuring data quality and compliance with HIPAA regulations across all data engineering processes.
Implemented monitoring and alerting systems using AWS CloudWatch, significantly reducing downtime and improving system reliability for critical data pipelines.
Provided mentorship to junior data engineers, fostering a collaborative environment and enhancing team productivity through knowledge sharing and skill development.
Engaged in cross-departmental projects to enhance data accessibility and usability, resulting in improved decision-making processes across the organization.
Technologies Used: Java, Python, Apache Spark, Apache Kafka, AWS S3, AWS CloudWatch, SQL, Data Governance, Machine Learning, ETL
Hexaware Technologies – Chicago, IL
Data Engineer – Jun 2019 to Feb 2021
Developed and maintained ETL processes for data integration from multiple sources, utilizing Apache Spark and Python, which improved data availability for analytics by 35%.
Collaborated with business analysts to gather requirements and translate them into technical specifications, ensuring alignment between data solutions and business needs.
Implemented data validation and cleansing processes, enhancing data quality and reducing errors in reporting by 20%, which improved stakeholder confidence in analytics.
Participated in the design and deployment of a data warehouse solution, leveraging AWS Redshift, which facilitated advanced analytics capabilities for business intelligence teams.
Automated reporting processes using Python scripts, reducing manual effort by 50% and enabling timely insights for strategic decision-making.
Conducted performance tuning of SQL queries, resulting in a 30% improvement in report generation times and enhancing user experience for data consumers.
Engaged in Agile methodologies, participating in sprint planning and retrospectives, which improved project delivery timelines and team collaboration.
Developed comprehensive documentation for data processes and workflows, ensuring knowledge transfer and continuity within the data engineering team.
Assisted in the migration of on-premises data solutions to cloud-based platforms, enhancing scalability and reducing operational costs by 15%.
Provided support for data-related inquiries and troubleshooting, ensuring timely resolution of issues and maintaining high levels of service for internal stakeholders.
Technologies Used: Python, Apache Spark, SQL, AWS Redshift, ETL, Data Warehousing, Agile, Data Quality, Automation, Reporting