Deekshith Sathu — Senior Data Engineer
913-***-**** ****************@*****.***
PROFESSIONAL SUMMARY:
Seasoned Data Engineer with around 5 years of extensive experience in designing, implementing, and managing robust data warehousing solutions.
Profound expertise in optimizing Linux-based processes and infrastructure for high-performance data storage and retrieval environments.
Demonstrated proficiency in Shell Scripting and Python for automating complex ETL workflows and system administration tasks.
Specialized in Oracle Exadata development, focusing on advanced database optimization, query tuning, and data modeling strategies.
Skilled in identifying and implementing critical system and architecture improvements, enhancing data warehouse scalability and reliability.
Experienced in enhancing ETL and database load/extract processes, ensuring efficient data flow and integrity across diverse platforms.
Strong understanding and practical application of Agile methodology, contributing to rapid development and continuous delivery of data solutions.
Committed to automation and continual process improvement, driving operational efficiencies and reducing manual intervention in data operations.
A results-driven professional with a Bachelor's degree in a STEM-related field and excellent communication skills for stakeholder engagement.
WORK EXPERIENCE:
Senior Data Engineer @ Molina Healthcare Long Beach, CA Sep 2024 – Present
Engineered and managed robust Linux-based data warehousing infrastructure, optimizing performance for large-scale healthcare datasets.
Developed and enhanced complex Shell scripts for automating critical ETL and database load/extract processes, ensuring data integrity and timely delivery.
Implemented advanced Oracle Exadata development techniques to optimize data storage, retrieval, and query execution for analytical reporting.
Identified and applied significant system and architecture improvements, enhancing the scalability and reliability of data processing pipelines.
Orchestrated data workflows using Apache Airflow with Python, automating data ingestion and transformation from diverse sources into data warehouses.
Designed and developed scalable data pipelines using PySpark for processing and integrating various healthcare data formats within AWS.
Conducted comprehensive data quality checks and implemented validation frameworks to ensure accuracy and consistency across data warehousing solutions.
Collaborated with cross-functional teams to define data requirements, implement data models, and support business intelligence initiatives using Agile methodologies.
Managed version control with Git and orchestrated CI/CD pipelines using Jenkins for seamless deployment of data solutions.
Enhanced various Linux-based toolsets and jobs, improving operational efficiency and reducing manual intervention for data management tasks.
Performed in-depth analysis of data flows and identified bottlenecks, implementing solutions that significantly improved data warehouse performance.
Provided expert guidance on Oracle database tuning and optimization strategies, contributing to a 15% reduction in query execution times for complex queries.
Technologies Used: Linux, Oracle Exadata, Shell Scripting, Python, Apache Airflow, PySpark, AWS (S3, EMR, Glue, Redshift), Snowflake, Git, Jenkins, Agile
Data Engineer @ Morgan Stanley New York, NY Aug 2021 – Jul 2023
Designed and implemented robust data warehousing solutions leveraging Oracle Exadata for high-performance financial data management.
Developed and maintained complex Shell scripts to automate critical ETL processes, including data extraction, transformation, and loading into data warehouses.
Configured and managed Linux-based environments, ensuring optimal performance and security for data processing and storage systems.
Enhanced ETL/database load and extract processes, reducing data latency and improving overall efficiency for financial reporting systems.
Utilized practical knowledge of Unix file systems, including mount types and permissions, to manage and secure data storage effectively.
Developed scalable data pipelines using Spark-Scala for processing large volumes of structured and unstructured financial data within Azure.
Orchestrated automated data workflows using Apache Airflow with Python, scheduling daily loads and ensuring data availability.
Collaborated with database administrators to optimize Oracle database performance and fine-tune complex SQL queries for reporting.
Implemented data quality and validation routines to maintain integrity of critical financial datasets within the data warehouse.
Maintained version control systems with Git and supported CI/CD pipelines for deploying data engineering solutions.
Participated actively in Agile Scrum methodologies, contributing to sprint planning, daily stand-ups, and backlog refinement sessions.
Provided comprehensive documentation for implemented data solutions and Linux-based processes, facilitating knowledge transfer and system maintenance.
Technologies Used: Oracle Exadata, Shell Scripting, Linux, Unix, Spark-Scala, Python, Apache Airflow, Azure (ADF, ADLS, Synapse), Hadoop, Kafka, Git, Jenkins, Agile Data Engineer @ Costco Issaquah, WA Nov 2019 – Jul 2021
Designed and developed robust ETL solutions using Informatica PowerCenter for data extraction and transformation from various source systems.
Optimized Oracle database queries and stored procedures, enhancing data load performance into the enterprise data warehouse.
Implemented Unix shell scripting for automating daily data loading tasks, file transfers, and system monitoring activities.
Performed data cleansing, validation, and transformation processes to ensure high data quality and integrity for business reporting.
Managed and maintained relational databases, including Oracle, supporting critical data warehousing operations and analytical requirements.
Collaborated with business analysts to gather requirements and translate them into efficient data models and ETL specifications.
Developed and maintained detailed technical documentation for ETL processes, data flows, and data warehouse schemas.
Utilized SQL extensively for complex data analysis, data manipulation, and report generation from the data warehouse.
Implemented data partitioning strategies and indexing techniques in Oracle to improve query performance significantly.
Supported data integration projects, ensuring seamless flow of data from operational systems to the data warehouse.
Participated in Agile development cycles, contributing to rapid iteration and delivery of data solutions.
Monitored and troubleshoot ETL jobs and data loads, proactively resolving issues to maintain data availability and reliability.
Technologies Used: Informatica PowerCenter, Oracle, SQL, Unix Shell Scripting, SSAS, Power BI, Git, Jenkins, Agile TECHNICAL SKILLS:
Data Warehousing: Oracle Exadata, Snowflake, Azure Synapse Analytics, Redshift, Hive
Programming Languages: Python, Scala, SQL, PL/SQL, Perl
Scripting & OS: Shell Scripting, Linux, Unix, Bash
Databases: Oracle, PostgreSQL, NoSQL, Hadoop, Hive
ETL & Orchestration: Informatica PowerCenter, Azure Data Factory, AWS Glue, Apache Airflow, Spark
Cloud Platforms: AWS (S3, EMR, Glue, Redshift, Lambda), Azure (ADF, ADLS, Synapse)
DevOps & Version Control: Git, Jenkins, Docker
Methodologies: Agile, Scrum
EDUCATION:
Master of Science in Computer Science @ University of Central Missouri