Senior Data Engineer - Linux-based Data Warehousing Expert

Location:

Preston Bend, TX, 75024

Salary:

110000

Posted:

April 30, 2026

Contact this candidate

Resume:

Venkata Sai Kumar — Senior Data Engineer

404-***-**** **************@*****.***

PROFESSIONAL SUMMARY:

Around 5 years of experience as a Data Engineer, specializing in building and optimizing Linux-based data warehousing infrastructure.

Expertise in implementing, configuring, and managing scalable data processes with a strong focus on system and architecture improvements.

Proficient in Shell Scripting and Python for automating ETL workflows, enhancing data pipelines, and managing Linux environments effectively.

Extensive practical experience with Oracle development, including Exadata, and other relational databases for robust data management solutions.

Skilled in enhancing ETL and database load/extract processes, ensuring efficient data movement and high-performance data operations across platforms.

Proven ability to work with various Linux-based toolsets, scripts, and jobs to streamline operations and improve data processing efficiency.

Strong understanding of Agile methodology, promoting iterative development and continuous improvement in data engineering projects.

Passionate about automation and continually improving data processes, leveraging advanced scripting and orchestration tools like Airflow with Python.

Experienced in integrating ETL tools such as Informatica within complex data warehousing environments for comprehensive data solutions and analytics.

TECHNICAL SKILLS:

Programming Languages: Python, Perl, Scala

Scripting: Shell Scripting, Unix Scripting, SQL, PL/SQL

Data Warehousing: Oracle Exadata, Snowflake, Azure Synapse Analytics, AWS Redshift, Hive

Databases: Oracle, PostgreSQL, MySQL, MS SQL Server, DynamoDB

ETL Tools: Informatica PowerCenter, AWS Glue, Azure Data Factory, Apache Spark

Orchestration: Apache Airflow, Jenkins

Cloud Platforms: AWS (S3, Glue, EMR, Redshift), Azure (ADLS, ADF, Synapse, Databricks)

Version Control: Git, GitHub

Methodologies: Agile, SDLC

WORK EXPERIENCE:

Senior Data Engineer @ SMBC — Jersey City, NJ Jan 2026 – Present

Implemented and managed advanced Linux-based processes and infrastructure for enterprise data warehousing solutions, ensuring optimal performance.

Designed and deployed robust shell scripts for automating data ingestion, transformation, and database load/extract operations efficiently.

Configured and optimized Oracle database environments, including performance tuning for critical data warehousing workloads and applications.

Identified and implemented significant system and architecture improvements across data platforms, enhancing overall efficiency and scalability.

Developed complex ETL pipelines using Python and PySpark on AWS EMR, integrating with Oracle for large-scale data processing requirements.

Enhanced various Linux-based toolsets and jobs, integrating with Apache Airflow to orchestrate complex data workflows seamlessly.

Managed secure data access and compliance within AWS S3 and Redshift, applying robust security protocols and best practices for data integrity.

Automated operational tasks and monitoring processes using shell scripts, significantly reducing manual intervention and improving system reliability.

Technologies Used: AWS (S3, EMR, Redshift, Glue), Linux, Shell Scripting, Oracle, PySpark, Python, Snowflake, Airflow, Jenkins, Git

Data Engineer @ Epic Systems — Verona, WI Jul 2024 – Dec 2025

Established and maintained Linux environment setups for data processing clusters within the Azure cloud ecosystem, ensuring stable operations.

Developed and enhanced critical shell scripts to manage data extraction, loading, and transformation processes for sensitive healthcare data.

Designed and implemented data pipelines using Azure Data Factory and Databricks, focusing on Oracle database integrations for data warehousing.

Utilized Python for developing custom data processing modules and automating various ETL workflows on the Azure platform.

Contributed to system and architecture improvements by optimizing data flow and enhancing existing Linux-based processes for efficiency.

Managed and optimized large datasets within Azure Data Lake and Synapse Analytics, ensuring efficient querying and data integrity.

Implemented robust data quality checks and validation frameworks using PySpark and SQL within the Azure environment for accuracy.

Collaborated with cross-functional teams to integrate new data sources and improve existing ETL/database load processes.

Technologies Used: Azure (ADLS, ADF, Synapse, Databricks), Linux, Shell Scripting, Python, PySpark, Oracle, SQL Server, Power BI, GitHub

Data Engineer @ Allstate — Chicago, IL Mar 2021 – Aug 2023

Developed and maintained robust ETL workflows using AWS Glue and Python for processing large-scale insurance datasets accurately.

Orchestrated complex data pipelines and job scheduling using Apache Airflow with Python, ensuring timely data delivery for reporting.

Processed and transformed diverse datasets using Apache Spark on AWS EMR clusters, handling structured and unstructured data formats.

Designed and optimized Hive tables for data warehousing, significantly improving query performance and reporting capabilities.

Implemented data aggregation pipelines to support business reporting, utilizing Kafka streams for real-time data ingestion.

Developed data validation frameworks, ensuring data quality and integrity for critical business applications and analytics.

Worked on data migration initiatives from legacy systems to the AWS cloud, ensuring seamless transition and data fidelity.

Contributed to CI/CD practices using Jenkins, integrating automated testing and deployment for data engineering solutions.

Technologies Used: AWS (Glue, S3, EMR, DynamoDB), Spark, Kafka, Hive, Airflow, Python, Tableau, Jenkins Data Engineer @ Target Corporation — Minneapolis, MN Dec 2019 – Feb 2021

Developed comprehensive ETL processes using Informatica PowerCenter for extracting and transforming data from diverse sources efficiently.

Extracted, transformed, and loaded data from Oracle databases and flat files into the data warehouse using advanced ETL techniques.

Designed and implemented data models and mappings for the enterprise data warehouse, ensuring data consistency and accuracy.

Wrote complex SQL queries and PL/SQL procedures for data manipulation, validation, and reporting purposes in Oracle.

Managed and optimized Unix environments, utilizing shell scripting for batch processing and automated job execution.

Performed rigorous data validation and reconciliation activities to maintain high data quality and integrity across systems.

Collaborated with database administrators to optimize Oracle database performance and ensure efficient data operations.

Maintained comprehensive documentation for ETL processes and data flows, adhering to SDLC methodologies and best practices.

Technologies Used: Informatica PowerCenter, Oracle, SQL, PL/SQL, Unix, Shell Scripting, GitHub EDUCATION:

Master of Science in Computer Science @ New Jersey Institute of Technology

Contact this candidate