Senior Linux Data Engineer for ETL & Data Warehousing

Location:

Plano, TX, 75074

Salary:

110000

Posted:

April 30, 2026

Contact this candidate

Resume:

Venkatesh Kunchapu

Email: *****************@*****.*** Contact: 612-***-****

PROFESSIONAL SUMMARY

Highly experienced Senior Data Engineer with 5 years of proven expertise in designing, implementing, and managing robust Linux-based data warehousing processes and infrastructure effectively.

Adept at optimizing complex ETL/database load/extract processes utilizing Shell Scripting, Python, and advanced Oracle development within high-volume enterprise environments.

Demonstrated success in enhancing various Linux-based toolsets, scripts, and scheduled jobs to drive automation and achieve significant process improvements for data pipelines.

Skilled in leveraging relational databases, including Oracle Exadata, alongside powerful ETL tools like Informatica and orchestration platforms such as Apache Airflow.

Passionate about data-driven solutions, possessing practical knowledge of Unix file systems and committed to delivering scalable, secure, and analytics-ready data assets. SKILLS

Programming Languages: Python, SQL, Scala, Perl

Operating Systems & Scripting: Linux, Unix, Shell Scripting Databases & Data Warehousing: Oracle Exadata, Oracle, Snowflake, PostgreSQL, MySQL, Teradata ETL & Orchestration: Informatica, Apache Airflow, Apache Spark, Databricks, AWS Glue Cloud Platforms: AWS (S3, Redshift, EMR, Athena), Azure (Data Factory, Synapse), Google Cloud Platform (BigQuery) DevOps & Version Control: Git, Jenkins, Docker, Kubernetes BI & Data Visualization: Tableau, Power BI, Looker Other Tools & Methodologies: Data Modeling, Data Governance, Agile, Performance Optimization EXPERIENCE

Global Payments Jan 2024 – Present

Senior Data Engineer Atlanta, Georgia

Designed and implemented advanced Linux-based processes for critical financial data warehousing, ensuring high availability and robust data integrity.

Engineered and optimized complex ETL/database load/extract processes using Python and Shell Scripting, dramatically improving data ingestion efficiency for financial transactions.

Managed and configured Oracle Exadata environments to support terabyte-scale financial analytics, ensuring optimal performance for real-time data processing.

Enhanced various Linux-based toolsets and scheduled jobs, integrating Informatica to automate data flows and reduce manual intervention in payment processing by 40%.

Developed sophisticated data pipelines using Apache Airflow with Python to orchestrate real-time fraud detection and payment monitoring workflows securely.

Implemented system and architecture improvements for data warehousing solutions, leading to a 45% reduction in reporting latency for sensitive cardholder data.

Collaborated with risk analytics teams to provide high-quality datasets for credit scoring and anomaly detection models, leveraging secure Unix file systems.

Performed comprehensive data reconciliation across diverse payment gateways and APIs, ensuring end-to-end accuracy within a PCI-DSS compliant data environment.

Optimized PySpark jobs and SQL queries on Snowflake and AWS Redshift for processing large-scale transaction datasets, enhancing data access for analytical reporting.

Provided curated datasets for machine learning teams, contributing to the development of predictive fraud detection and customer churn models using modern data warehousing techniques. Technologies Used: Python, SQL, Shell Scripting, Linux, Oracle Exadata, Informatica, Apache Airflow, Databricks, Apache Spark, Snowflake, AWS (S3, Glue, Redshift)

Humana Feb 2021 – Jul 2023

Data Engineer Louisville, Kentucky

Configured and managed Linux-based infrastructure to support large-scale insurance data warehousing, ensuring compliance with strict regulatory requirements.

Enhanced ETL/database load/extract processes for policy, claims, and risk data using Shell Scripting and Python, improving data flow efficiency by 30%.

Implemented system and architecture improvements for relational databases, specifically leveraging Oracle Exadata for high-performance processing of billions of records.

Developed and automated underwriting and premium calculation workflows by enhancing various Linux-based toolsets, scripts, and jobs for data transformation.

Utilized Informatica to integrate diverse insurance data sources into a centralized Snowflake data warehouse, accelerating data availability for actuarial analysis.

Designed and deployed real-time streaming pipelines using Apache Airflow and Python to monitor claim status and proactively detect anomalies across vast datasets.

Ensured data governance and compliance with HIPAA and SOC2 regulations through meticulous management of Unix file systems and secure data handling protocols.

Collaborated with actuarial teams to create comprehensive risk scoring datasets for pricing models and portfolio analysis, emphasizing data quality and integrity.

Migrated critical insurance workloads from Teradata to Snowflake, achieving a 40% improvement in processing speeds and overall data warehousing performance.

Optimized Spark jobs and SQL queries to efficiently process large volumes of healthcare data, enabling faster insights and supporting strategic decision-making.

Technologies Used: Python, SQL, Shell Scripting, Linux, Oracle Exadata, Informatica, Apache Airflow, Databricks, Apache Spark, Snowflake, AWS (S3, Glue)

Amae Health Nov 2019 – Jan 2021

Data Engineer Los Angeles, California

Developed robust Linux-based ETL pipelines to process patient engagement, EMR, and clinical data, ensuring secure and compliant data flow within the healthcare sector.

Implemented and enhanced data load/extract processes using Python and Shell Scripting for integrating diverse healthcare data sources, including FHIR and HL7 feeds.

Managed and optimized relational databases like Oracle to support critical clinical data warehousing, ensuring high performance and data availability.

Built HIPAA-compliant frameworks by enhancing various Linux-based toolsets and scripts, ensuring secure handling of sensitive patient health information.

Integrated multiple data sources into a unified Snowflake data warehouse using Informatica, reducing processing times by 50% for clinical analytics.

Designed and automated ingestion pipelines for EMR systems using Apache Airflow, improving data freshness and reliability for patient outcome tracking.

Performed root cause analysis on data quality issues within Unix file systems, improving reporting accuracy by 25% for patient adherence and therapy success rates.

Collaborated with clinicians and researchers to deliver analytics-ready datasets for patient outcome tracking and predictive modeling initiatives.

Built predictive data pipelines supporting machine learning models for patient risk scoring, utilizing data processed through robust ETL processes.

Partnered with compliance teams to ensure HIPAA-driven security protocols were consistently applied, maintaining stringent data governance across all data assets.

Technologies Used: Python, SQL, Shell Scripting, Linux, Oracle, Informatica, Apache Airflow, Databricks, Apache Spark, Snowflake, AWS (S3, Glue, Redshift)

EDUCATION

Concordia University, St. Paul

MS in Information Technology Management

Contact this candidate