Post Job Free
Sign in

Senior Data Engineer - Linux Data Warehouse Expert

Location:
Plano, TX, 75023
Salary:
110000
Posted:
April 30, 2026

Contact this candidate

Resume:

Praveen Yarragunta — Senior Data Engineer

469-***-**** ******************@*****.***

PROFESSIONAL SUMMARY

Bringing 9+ years of experience as a Data Warehouse Engineer with a strong emphasis on Linux-based infrastructure management and robust data solutions.

Demonstrated expertise in designing, implementing, and managing Linux-based processes for high-performance data warehousing environments.

Profoundly skilled in Shell Scripting and Python development, creating advanced automation for data extraction, loading, and system improvements.

Possessing practical working experience with relational databases, specifically Oracle Exadata, optimizing complex queries and database interactions.

Adept at identifying and implementing system and architecture improvements to enhance data flow efficiency and reduce processing bottlenecks.

Proficient in enhancing ETL and database load/extract processes, ensuring data integrity and timely delivery for analytical insights.

Skilled in Unix file systems, including mount types, permissions, and standard tools, to maintain secure and organized data infrastructure.

Experienced with orchestration tools such as Apache Airflow with Python, streamlining complex data workflows for reliable execution.

Embracing Agile methodology and a passion for automation, consistently driving continual process improvement in data warehousing operations.

EDUCATION

Master of Science in Information Systems @ Stratford University TECHNICAL SKILLS

Programming & Scripting: Python, Shell Scripting, Perl, SQL, Scala

Databases & Data Warehousing: Oracle Exadata, Oracle, PostgreSQL, MySQL, Amazon Redshift, Snowflake, MS SQL SERVER, Delta Lake, Hive

ETL & Orchestration: Informatica PowerCenter, Talend, Azure Data Factory, AWS Glue, Apache Airflow

Operating Systems & Tools: Linux, Unix, Docker, Jenkins, Git, JIRA, Confluence

Cloud Platforms: AWS (S3, EMR, Glue, Lambda), Azure (ADLS, ADF, Synapse)

Data Formats & Visualization: Parquet, Avro, ORC, Delta, JSON, CSV, Tableau, Power BI

Methodologies: Agile, Scrum

WORK EXPERIENCE

Senior Data Engineer @ Empower Dallas, TX Jan 2022 – Present

Managed and optimized Linux-based data warehousing infrastructure, enhancing performance of critical ETL processes through shell scripting.

Developed robust Shell scripts and Python utilities to automate database load and extract operations, integrating seamlessly with Oracle Exadata systems.

Designed and implemented scalable data pipelines using Informatica PowerCenter for complex data transformations and movements within Linux environments.

Orchestrated intricate data workflows with Apache Airflow using Python, ensuring timely and efficient data availability for reporting and analytics.

Identified and executed system architecture improvements within the data warehouse, enhancing data flow efficiency and reducing processing bottlenecks by 25%.

Monitored and managed Unix file systems, configuring mount types and permissions to ensure secure and efficient data storage for critical warehouse components.

Optimized SQL queries and database interactions with Oracle databases, significantly improving the performance of data load processes.

Collaborated with cross-functional teams, applying Agile methodologies to deliver high-quality data warehouse solutions and process enhancements.

Technologies Used: Linux, Shell Scripting, Oracle Exadata, Informatica PowerCenter, Apache Airflow (Python), AWS

(Redshift, S3), Docker, Jenkins, Agile

Data Engineer @ MassMutual Life Insurance Springfield, MA May 2020 – Jan 2022

Implemented and managed Linux-based processes for data warehousing, streamlining ETL workflows and improving data ingestion rates.

Developed advanced Shell scripts and Python programs to automate data extraction from Oracle and SQL Server databases into Azure Data Lake Storage.

Utilized Informatica PowerCenter for designing and enhancing complex data transformations, ensuring data integrity and consistency within the data warehouse.

Configured and maintained Unix file systems, applying appropriate permissions and standard tools to secure sensitive data assets.

Enhanced ETL and database load processes using Azure Data Factory, integrating data from diverse sources into Azure Synapse Analytics.

Applied Agile methodologies in project development, consistently delivering robust data solutions and process improvements.

Optimized large-scale data processing on Azure Databricks with PySpark, focusing on improving query performance and data warehouse efficiency.

Collaborated with data architects to identify and implement system improvements, enhancing overall data pipeline reliability and scalability.

Technologies Used: Linux, Shell Scripting, Oracle, Informatica PowerCenter, Python, Azure Data Factory, Azure Synapse Analytics, Azure Databricks, Agile

Data Engineer @ SIRVA Worldwide Chicago, Illinois Oct 2018 – May 2020

Developed and optimized batch data pipelines using Spark (Scala) for large-scale data processing within the Hadoop ecosystem effectively.

Designed and managed Hive tables to improve query performance and support analytical reporting requirements for various business units.

Utilized Sqoop for efficient data ingestion from various relational databases into Hadoop Distributed File System, streamlining data flow.

Implemented complex data transformation and aggregation logic, ensuring data quality and consistency across various data sources.

Built robust workflows using Oozie scheduler to automate and monitor complex data processing jobs effectively, improving reliability.

Ensured data validation and consistency across pipelines, contributing to reliable data for critical business intelligence initiatives.

Contributed to the development of Tableau dashboards, providing valuable insights from processed data for business users to drive decisions.

Maintained detailed documentation for data pipelines and processes, facilitating knowledge transfer and system maintenance activities.

Technologies Used: Hadoop, Hive, Spark (Scala), Sqoop, Oozie, Tableau, Oracle, MySQL Data Engineer @ Seacoast National Bank FL, USA Jan 2018 – Oct 2018

Developed and maintained robust ETL pipelines using Talend to integrate critical financial data from disparate sources efficiently.

Designed and managed data warehouse schemas within Oracle databases to support various reporting and analytical needs comprehensively.

Extracted and transformed complex datasets from Oracle systems, ensuring data accuracy and compliance with stringent business rules.

Wrote and optimized complex SQL queries for data manipulation and performance tuning across relational databases, enhancing query speeds.

Implemented data validation and reconciliation processes to ensure the integrity and reliability of all processed data efficiently.

Automated various data engineering tasks using Unix shell scripting, improving operational efficiency and reducing manual effort by 20%.

Collaborated on data modeling efforts, contributing to the development of efficient and scalable data solutions for the bank's needs.

Utilized Git for version control, ensuring collaborative code development and efficient change management within project timelines.

Technologies Used: Talend, Oracle, SQL, Unix Shell Scripting, Git, Data Warehousing Junior Data Engineer @ Genpact India Aug 2014 – Oct 2015

Developed foundational ETL workflows using Talend, extracting and loading data from various source systems effectively and reliably.

Extracted and transformed data from multiple source systems, ensuring data quality and consistency for downstream applications.

Loaded processed data into Oracle and MySQL databases, supporting operational reporting and analytical requirements accurately.

Built and maintained ETL jobs and workflows, contributing to the overall data integration framework's stability and performance.

Performed data validation and cleansing operations to ensure high data accuracy and reliability for critical business processes.

Assisted in the optimization of SQL queries to improve database performance and data retrieval times for various reports.

Collaborated with senior engineers on data mapping and schema design for new data integration projects, learning best practices.

Gained foundational experience in data warehousing concepts and principles through practical application in a production environment.

Technologies Used: Talend, Oracle, MySQL, SQL



Contact this candidate