Senior Data Engineer - Linux & ETL Automation

Location:

Plano, TX, 75025

Posted:

April 30, 2026

Contact this candidate

Resume:

Rajinikanth Boini — Senior Data Engineer

203-***-**** ******.***@*****.***

PROFESSIONAL SUMMARY:

A seasoned Data Warehouse Engineer with around 5 years of experience, specializing in Linux-based infrastructure and robust data warehousing solutions.

Expertly implements, configures, and manages Linux-based processes, ensuring optimal performance for complex data pipelines and ETL operations.

Proficient in designing and enhancing ETL/database load/extract processes, with a strong focus on Shell Scripting and Oracle development.

Demonstrates practical working experience in Linux environment setup, including Unix file systems, permissions, and standard scripting tools.

Skilled in Python and possessing working knowledge of Perl, applying these languages to automate and enhance data warehouse operations.

Extensive practical experience with relational databases, particularly Oracle Exadata, for high-performance data storage and retrieval.

Passionate about automation and continuous process improvement, consistently identifying and implementing system and architecture enhancements.

Experienced with orchestration tools like Apache Airflow with Python and ETL tools such as Informatica, streamlining data flows effectively.

Adept at working within Agile methodologies, contributing to efficient project delivery and collaborative data engineering development environments.

EDUCATION:

Master of Science in Data Science @ University of New Haven WORK EXPERIENCE:

Senior Data Engineer @ Molina Healthcare Long Beach, CA Apr 2025 – Present

Designed and implemented robust Linux-based infrastructure and processes for a scalable healthcare data warehousing system.

Enhanced critical ETL and database load/extract processes using advanced Shell Scripting and Python for optimized data ingestion and transformation.

Administered and managed Oracle Exadata databases, performing advanced SQL tuning and optimization for high- performance data retrieval.

Developed scalable data pipelines on Linux platforms, integrating with AWS S3 and Glue, to process diverse healthcare datasets efficiently.

Implemented automated workflows using Apache Airflow with Python, orchestrating complex data transformations and report generation on Linux servers.

Enhanced various Linux-based toolsets, scripts, and cron jobs, improving data processing efficiency and reducing manual intervention by 30%.

Ensured secure data handling and compliance with healthcare regulations like HIPAA within the Linux environment, maintaining data integrity.

Utilized practical working experience in Unix file systems, including mount types and permissions, to manage vast data storage for analytics.

Collaborated with data scientists and analysts to design and optimize data models, supporting advanced analytics and reporting requirements.

Implemented data validation frameworks within Linux scripts to ensure data quality and consistency across all data warehousing layers.

Provided expert production support for Linux-based data processes, resolving issues swiftly to maintain continuous data availability and reliability.

Contributed to an Agile development environment, consistently identifying and implementing system and architecture improvements for data warehousing.

Developed and maintained data pipelines using Informatica for large-scale data integration, streamlining critical healthcare data flows.

Technologies Used: Linux, Oracle Exadata, Shell Scripting, Python, Apache Airflow, SQL, AWS S3, AWS Glue, ETL, Informatica

Data Engineer @ Cisco San Jose, CA Aug 2023 – Jul 2024

Implemented and managed Linux-based data processing environments for large-scale network telemetry and log data within Azure infrastructure.

Developed and enhanced complex Shell Scripts for automated data extraction, transformation, and loading into Azure Data Lake Storage.

Designed and optimized ETL processes involving Oracle databases, migrating and synchronizing critical on-premise data with Azure Synapse Analytics.

Utilized practical working knowledge of Unix file systems, permissions, and standard tools to manage data pipelines on Linux VMs in Azure.

Built and deployed Spark jobs within Azure Databricks, leveraging Python and Scala for efficient data processing on Linux clusters.

Automated data integration tasks from various sources to Azure using a combination of Azure Data Factory and robust Shell Scripting.

Configured and maintained Linux-based tools and applications, ensuring seamless operation of data warehousing components and services.

Implemented CI/CD pipelines using Jenkins for automated deployments of Linux scripts and data solutions, enhancing development agility.

Performed advanced performance tuning on ETL and database load processes, significantly improving data ingestion rates for critical systems.

Collaborated on system and architecture improvements, redesigning Linux-based processes to enhance scalability and reliability of data platforms.

Ensured data quality and consistency by developing comprehensive validation routines using Python and Shell Scripting within the Linux ecosystem.

Worked within an Agile methodology, actively contributing to sprint planning and delivering incremental enhancements to the data warehouse.

Technologies Used: Linux, Oracle, Shell Scripting, Python, Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Jenkins, SQL

Data Engineer @ American Express New York, NY Aug 2021 – Jul 2023

Built scalable ETL pipelines using Google Cloud Dataflow, ensuring efficient processing of high-volume financial transaction data.

Implemented real-time streaming data ingestion pipelines using Pub/Sub for immediate availability of critical financial information.

Designed and optimized BigQuery data warehouses for financial analytics, enhancing query performance through partitioning and clustering.

Developed sophisticated SQL-based transformations for credit card fraud detection datasets, ensuring data accuracy and consistency.

Orchestrated complex data workflows using Cloud Composer (Apache Airflow), automating data pipeline execution and monitoring on GCP.

Integrated external APIs for seamless ingestion of diverse financial data sources into the Google Cloud Platform environment.

Developed Python scripts for data reconciliation and validation, ensuring the integrity of financial datasets across the data platform.

Implemented secure data access using IAM roles and encryption within GCP, adhering to stringent financial data security standards.

Supported fraud analytics and risk modeling teams by providing clean, well-structured datasets for their critical decision- making processes.

Utilized practical experience with relational databases and data warehousing concepts to design robust and scalable data solutions.

Collaborated with cross-functional teams to identify and implement system and architecture improvements for data analytics platforms.

Maintained comprehensive documentation for all data pipelines and processes, ensuring transparency and ease of maintenance.

Technologies Used: Google Cloud Platform (GCP), BigQuery, Dataflow, Pub/Sub, Cloud Composer, Python, SQL Junior Data Engineer @ Morgan Stanley New York, NY Sep 2020 – Jul 2021

Developed ETL pipelines for high-frequency financial trading and market data, ensuring timely delivery for critical analysis.

Built data warehouse solutions using AWS Redshift, optimizing schema design and query performance for financial reporting.

Migrated on-premise Oracle database environments to a cloud-based setup, facilitating scalable and resilient data operations.

Implemented robust batch processing frameworks using Hadoop and Hive on Linux clusters for large-scale financial data volumes.

Designed and implemented Kafka-based ingestion pipelines for real-time market data, ensuring low-latency data availability.

Developed Spark jobs for large-scale financial data transformation, leveraging Scala and Python on distributed computing platforms.

Utilized Sqoop for efficient data transfer between relational databases (like Oracle) and Hadoop ecosystems on Linux servers.

Optimized complex SQL and PL/SQL queries for financial reporting, improving execution times by over 25%.

Implemented CI/CD pipelines using Jenkins for automated deployments of data solutions and scripts, enhancing development workflows.

Designed comprehensive data models for risk and trading analytics, ensuring data accuracy and consistency for financial instruments.

Ensured high performance and data accuracy for critical financial reporting systems through rigorous testing and validation procedures.

Applied knowledge of Agile methodologies to deliver incremental data pipeline enhancements and support business objectives.

Technologies Used: Oracle, Linux, Hadoop, Hive, Spark, Kafka, AWS Redshift, Python, Scala, SQL, Jenkins TECHNICAL SKILLS:

Programming Languages: Python, Shell Scripting, SQL, Scala, Perl

Databases: Oracle (Exadata), MySQL, PostgreSQL, Hive, Cassandra

Data Warehousing: ETL, Data Lakes, Data Marts, Informatica, Data Modeling, Data Flows

Operating Systems & Tools: Linux, Unix File Systems, Standard Unix Tools, Pipes, Cron

Orchestration & Automation: Apache Airflow (with Python), Jenkins, Automation Scripting

Big Data Technologies: Spark, Hadoop, Kafka, Databricks

Cloud Platforms: AWS (S3, Glue, Redshift), Azure (ADF, ADLS, Synapse), Google Cloud Platform (BigQuery, Dataflow)

Methodologies & Version Control: Agile, SDLC, GitHub

Contact this candidate