Sahith kumar Vanaparthi — Senior Data Engineer
470-***-**** ******************@*****.***
PROFESSIONAL SUMMARY:
Senior Data Warehouse Engineer with 5 years of experience in designing, implementing, and managing robust data solutions and infrastructure.
Expertise in Linux-based infrastructure, shell scripting, and Oracle database administration for complex data warehousing environments.
Proficient in developing and optimizing complex ETL processes using industry-standard tools like Informatica and Apache Airflow with Python.
Skilled in Python and SQL for advanced data manipulation, scripting, and performance tuning on various relational databases.
Proven ability to identify and implement system and architecture improvements for enhanced data warehouse efficiency and reliability.
Strong background in data modeling, data quality, and ensuring data integrity across diverse datasets and systems diligently.
Experienced in deploying and managing data pipelines, including data ingestion, transformation, and loading into data warehouses.
Adept at utilizing Agile methodologies to deliver high-quality data engineering solutions and foster collaborative team environments effectively.
Committed to automation and continuous process improvement to streamline data operations and enhance overall system reliability.
EDUCATION:
Master of Science in Computer Science @ Kennesaw State University TECHNICAL SKILLS:
Programming Languages: Python, Shell Scripting, SQL, Perl
Operating Systems: Linux, Unix
Databases: Oracle Exadata, Oracle, PostgreSQL, MySQL
ETL Tools: Informatica PowerCenter, AWS Glue, Azure Data Factory
Orchestration: Apache Airflow
Data Warehousing: Data Modeling, ETL, ELT, Data Flows, Data Quality
Cloud Platforms: AWS (S3, EMR, Glue, Redshift, Athena), Azure (ADLS, Databricks, Synapse)
Version Control: GitHub
Methodologies: Agile (Scrum)
WORK EXPERIENCE:
Senior Data Engineer @ Kaiser Permanente Oakland, CA Sep 2024 – Present
Led the design and implementation of highly scalable data warehousing solutions on Linux-based infrastructure, ensuring optimal performance and reliability.
Developed complex shell scripts to automate data ingestion, transformation, and loading processes for critical business datasets, enhancing operational efficiency.
Managed and optimized Oracle Exadata databases, including performance tuning and schema design to support large- scale analytical workloads efficiently.
Architected and deployed advanced ETL pipelines using Informatica PowerCenter, integrating diverse data sources into the central data warehouse seamlessly.
Implemented robust data governance strategies and quality checks within the Linux environment to ensure data accuracy and compliance standards.
Designed and enhanced Airflow DAGs using Python for orchestrating complex data workflows and ensuring timely data availability for stakeholders.
Identified architectural bottlenecks and implemented system improvements, resulting in significant reductions in data processing times and enhanced reliability.
Configured and maintained various Linux-based toolsets and processes, enhancing operational efficiency, system stability, and data flow integrity.
Collaborated with cross-functional teams to define data requirements and translate them into efficient data warehouse designs and implementations.
Utilized advanced SQL for intricate data modeling, querying, and analysis within the Oracle environment to support business intelligence initiatives effectively.
Managed version control for all scripts and ETL assets using Git, ensuring collaborative development and robust deployment cycles consistently.
Implemented proactive monitoring and alerting for data warehouse processes, minimizing downtime and ensuring continuous data flow for critical operations.
Technologies Used: Linux, Oracle Exadata, Informatica PowerCenter, Apache Airflow (Python), Shell Scripting, SQL, Git, Data Warehousing, Data Governance
Data Engineer @ Visa Foster City, CA Apr 2022 – Jul 2023
Designed and enhanced Linux-based data processes and infrastructure components for robust data warehousing operations, ensuring high availability.
Developed comprehensive shell scripts for automating data extraction, transformation, and loading into relational databases efficiently.
Managed and optimized Oracle databases, focusing on query performance tuning and efficient storage management for critical enterprise data.
Implemented and enhanced ETL/database load processes using Informatica, ensuring high data throughput and integrity for analytics reporting.
Contributed to identifying and implementing system architecture improvements, enhancing the scalability and reliability of data pipelines significantly.
Administered Unix file systems, including managing permissions, mount types, and leveraging standard tools for effective data manipulation.
Developed data ingestion solutions using Python, integrating diverse data sources into the data warehouse environment with precision.
Orchestrated complex data workflows using Apache Airflow with Python, optimizing job scheduling and dependency management for timely execution.
Ensured data quality and consistency by implementing rigorous validation checks throughout the ETL process to maintain accuracy and reliability.
Collaborated with data analysts and business users to refine data models and improve data accessibility for comprehensive reporting and insights.
Maintained detailed documentation for data pipelines and infrastructure configurations, ensuring knowledge transfer and operational continuity for the team.
Participated in Agile development sprints, contributing to regular stand-ups and ensuring timely delivery of robust data solutions consistently.
Technologies Used: Linux, Oracle, Informatica PowerCenter, Apache Airflow (Python), Shell Scripting, Python, SQL, Unix, Data Warehousing
Junior Data Engineer @ Wayfair Boston, MA Nov 2019 – Mar 2022
Designed and developed complex ETL workflows using Informatica PowerCenter to extract data from various operational sources efficiently.
Managed and optimized Oracle and MySQL databases, ensuring data integrity and efficient query performance for critical analytics.
Developed comprehensive SQL queries and stored procedures for advanced data transformation and cleansing operations accurately.
Implemented critical data integration and migration tasks for large datasets, ensuring accuracy and minimal disruption to operations.
Utilized robust UNIX shell scripting for automating routine data processing tasks and enhancing system monitoring capabilities effectively.
Worked with Hadoop and Hive for querying and managing large volumes of semi-structured data within the data lake environment.
Performed data quality validation and cleansing to ensure the reliability and accuracy of data within the enterprise data warehouse.
Monitored and resolved performance bottlenecks in ETL jobs, ensuring timely delivery of critical business data for reporting and analysis.
Maintained detailed documentation for all ETL processes, ensuring transparency and ease of maintenance for future enhancements readily.
Collaborated with senior engineers to implement foundational Hadoop ecosystem components for scalable and efficient data storage solutions.
Contributed to the continuous improvement of existing data pipelines, enhancing their efficiency, robustness, and overall performance significantly.
Managed source code and collaborated on development tasks using GitHub in an Agile team setting, ensuring proper version control.
Technologies Used: Unix, Informatica PowerCenter, Oracle, MySQL, Hive, Hadoop, SQL, Shell Scripting, GitHub, Data Warehousing