Sudeepthi Rongali — Senior Data Engineer
716-***-**** *************@*****.***
PROFESSIONAL SUMMARY:
Highly experienced Data Warehouse Engineer with approximately 5 years of proven expertise in Linux-based data processing and infrastructure management.
Skilled in implementing, configuring, and managing robust Linux environments to support complex data warehousing solutions and ETL processes.
Adept at developing sophisticated shell scripts to automate database load/extract processes and enhance system architecture improvements.
Demonstrates strong proficiency in Oracle development, including performance tuning and advanced SQL queries, preferably with Oracle Exadata.
Expertise in designing and enhancing various Linux-based toolsets, scripts, jobs, and processes for optimal data flow and integrity.
Possesses practical working experience with Unix file systems, encompassing mount types, permissions, standard tools, and pipe operations.
Proficient in Python programming for building scalable data pipelines and developing orchestration workflows with Apache Airflow.
Extensive experience with relational databases and ETL tools like Informatica, crucial for modern data warehousing initiatives.
Committed to identifying and implementing continuous system and architecture improvements with a strong passion for automation.
Familiar with cloud platforms such as AWS and Azure, leveraging services for hybrid data warehousing and Big Data analytics.
Experienced in Agile methodologies, driving efficient project delivery from requirement gathering through deployment and optimization.
Strong collaborator with excellent communication skills, capable of translating complex data requirements into effective technical solutions.
WORK EXPERIENCE:
Senior Data Engineer @ Molina Healthcare Long Beach, CA Sep 2025 – Present
Implemented and managed robust Linux-based processes and infrastructure, ensuring high availability and performance for critical data warehousing operations on AWS.
Developed sophisticated shell scripts to automate complex ETL/database load and extract processes, reducing manual effort by a significant percentage.
Optimized Oracle Exadata database performance and developed stored procedures for efficient data transformations within a large-scale data warehouse environment.
Architected and deployed scalable data lake solutions on AWS S3, leveraging Linux commands for data governance and directory management.
Enhanced existing ETL pipelines using PySpark on EMR, focusing on system improvements for data ingestion from various Linux file systems.
Administered and configured Linux servers supporting data warehouse applications, ensuring proper file system permissions and resource allocation.
Designed and implemented comprehensive data quality checks and validation frameworks using shell scripting for improved data integrity within the data warehouse.
Integrated Oracle Exadata functionalities for advanced analytical querying and rapid data processing, optimizing reporting latency and efficiency.
Utilized Apache Airflow with Python to orchestrate complex data pipelines, streamlining the end-to-end data flow from source to target systems.
Contributed to identifying and implementing system architecture improvements, specifically enhancing Linux-based toolsets for data movement and transformation.
Ensured HIPAA compliant data security by applying robust IAM policies and encryption, managed through Linux-based security protocols.
Collaborated closely with cross-functional teams using an Agile methodology, delivering continuous enhancements to data warehousing solutions.
Technologies Used: AWS (S3, EMR, Glue, Redshift, RDS, Lambda), Oracle Exadata, PySpark, Apache Airflow, Shell Scripting, Linux, Python, Tableau, GitHub, Jenkins Data Engineer @ Morgan Stanley New York, NY Apr 2022 – Jul 2024
Designed and implemented a scalable Azure-based data platform, integrating Linux-based data sources and ensuring efficient data flow for financial data warehousing.
Developed and enhanced various Linux-based shell scripts to automate daily operational tasks, data validation, and file system management within the Azure ecosystem.
Optimized ETL/database load processes using Azure Data Factory, connecting to diverse relational databases, including Oracle, for seamless data ingestion.
Utilized PySpark in Azure Databricks to process large datasets, integrating custom Linux utilities for data manipulation and transformation.
Implemented robust system architecture improvements for data ingestion pipelines, enhancing performance and reliability of financial data warehousing solutions.
Managed Unix file systems within cloud environments, applying practical knowledge of mount types, permissions, and standard tools for data security.
Developed metadata-driven ETL frameworks, incorporating Python and shell scripting for dynamic process orchestration and error handling.
Integrated Snowflake for analytical workloads, designing data flows that leveraged Linux command-line tools for efficient data preparation and transfer.
Performed advanced SQL query optimization and database development, specifically with Oracle databases, to support enterprise-level reporting and analytics.
Implemented CI/CD pipelines using Azure DevOps, automating the deployment of Linux-based scripts and data processing applications.
Containerized Spark jobs using Docker and Kubernetes, ensuring efficient resource utilization on Linux virtual machines.
Participated in Agile ceremonies, contributing to the design and development of data warehousing solutions while maintaining comprehensive documentation.
Enhanced ETL/database load/extract processes, demonstrating a keen understanding of data warehousing principles and automation.
Technologies Used: Azure (ADLS, ADF, Synapse, Azure SQL DB), Databricks, PySpark, Snowflake, SQL Server, Oracle, Shell Scripting, Linux, Python, Docker, Kubernetes, GitLab Junior Data Engineer @ Sam's Club Bentonville, AR Nov 2019 – Mar 2022
Designed and developed robust ETL workflows using Informatica PowerCenter, significantly enhancing data load/extract processes for retail sales data.
Developed complex SQL and PL/SQL procedures in Oracle, optimizing data transformations and ensuring high data integrity within the data warehouse.
Implemented batch processing workflows on Hadoop, utilizing Linux commands and shell scripts for efficient large-scale data analysis and file system management.
Injected CSV and JSON files from various upstream systems into HDFS, leveraging Unix file system knowledge for permissions and data organization.
Developed Spark jobs for data cleansing and enrichment, integrating custom Linux-based utilities for improved data quality within the data warehousing solution.
Built data warehouse solutions using star schema modeling techniques, facilitating advanced analytics and reporting through SSAS cubes.
Implemented comprehensive data validation and reconciliation processes using shell scripting, ensuring accuracy across diverse retail datasets.
Managed version control using GitHub and automated deployments using Jenkins, integrating Linux-based scripting for CI/CD pipelines.
Provided production support and optimized long-running SQL queries in Oracle, continually seeking system architecture improvements for data pipelines.
Contributed to requirement gathering and design discussions with business stakeholders, applying Agile methodology for efficient project delivery.
Ensured strict data access control using role-based security in Oracle, aligning with data governance principles for the data warehouse.
Enhanced existing ETL tools and processes, demonstrating a passion for automation and continual process improvement within the data engineering team.
Technologies Used: Informatica PowerCenter, Oracle, Hadoop, Hive, Spark, SSAS, Shell Scripting, Linux, SQL, Power BI, GitHub, Jenkins
TECHNICAL SKILLS:
Programming Languages: Python, SQL, PL/SQL, Perl
Operating Systems & Scripting: Linux, Unix, Shell Scripting (Bash, KornShell), Command Line Interface
Cloud & Data Platforms: AWS (S3, EMR, Glue, Redshift, RDS, Lambda), Azure (ADLS, ADF, Synapse, Azure SQL DB), Hadoop, Databricks, Snowflake
Database Management: Oracle (Exadata), MySQL, PostgreSQL, SQL Server, DynamoDB, Hive
ETL & Orchestration: Informatica PowerCenter, Apache Airflow, AWS Glue, Azure Data Factory
Version Control & DevOps: Git (GitHub, GitLab), Jenkins, Docker, Kubernetes, Azure DevOps
Data Warehousing & BI: Data Modeling (Star/Snowflake Schema), SSAS, Tableau, Power BI EDUCATION:
Master of Science in Data Science @ University at Buffalo