Senior Data Engineer ETL, Linux Infra, Oracle Exadata

Location:

Hyderabad, Telangana, India

Salary:

110000

Posted:

April 30, 2026

Contact this candidate

Resume:

Sudeep Kakarla — Senior Data Engineer

814-***-**** **************@*****.***

PROFESSIONAL SUMMARY:

Results-oriented Senior Data Engineer with around 5 years of experience specializing in data warehousing, ETL/database load/extract processes, and infrastructure management.

Profound expertise in implementing, configuring, and managing Linux-based processes and infrastructure critical for robust data warehousing solutions.

Adept at identifying and implementing system and architecture improvements, enhancing various Linux-based toolsets, scripts, jobs, and processes for optimal performance.

Highly skilled in Shell Scripting and Oracle development, with practical working experience in relational databases, including Oracle Exadata environments.

Experienced in developing scalable data pipelines utilizing Python and potentially Perl, alongside a strong understanding of Unix file systems and standard tools.

Proven track record with ETL tools, specifically Informatica, and orchestration platforms like Apache Airflow with Python for complex data flow management.

Demonstrates a passion for automation and continual process improvement within data warehousing, ensuring high efficiency and data accuracy.

Comprehensive knowledge of Agile methodologies, facilitating collaborative development and swift delivery of data engineering solutions.

Possesses excellent written and oral communication skills, effectively conveying technical insights and collaborating with cross-functional teams on data initiatives.

WORK EXPERIENCE:

Senior Data Engineer @ Syniti Boston, MA Feb 2025 – Present

Implemented and managed complex Linux-based processes and infrastructure vital for the enterprise data warehousing environment, ensuring high availability and performance.

Designed and developed robust data warehouse pipelines leveraging advanced Shell Scripting and Python for efficient data extraction, transformation, and loading.

Engineered scalable ETL workflows using Informatica PowerCenter to ingest data from diverse structured and semi- structured sources into Oracle Exadata.

Utilized Oracle Exadata for high-performance data storage and retrieval, optimizing complex SQL queries for significant performance gains in data warehouse operations.

Performed extensive data transformation and aggregation using PySpark on Databricks, integrating seamlessly with Linux-based data processing frameworks.

Built sophisticated data ingestion pipelines from APIs, JSON, and CSV files, ensuring clean and consistent data flows into the data warehouse.

Migrated legacy on-premise data systems to modern cloud data warehousing solutions, ensuring data integrity, high availability, and scalability through architectural improvements.

Implemented rigorous data quality checks and validation frameworks using Shell Scripting and SQL to ensure utmost data accuracy within the data warehouse.

Leveraged Apache Airflow with Python for scheduling, monitoring, and orchestrating intricate data warehouse jobs, enhancing overall process efficiency.

Optimized data load/extract processes for large datasets, significantly reducing execution times and improving the responsiveness of analytical reporting.

Implemented IAM policies and robust security protocols for secure data access and governance within the Linux-based data warehousing environment.

Collaborated within an Agile framework, utilizing JIRA for sprint planning and tracking to deliver timely and impactful data solutions.

Technologies Used: Linux, Shell Scripting, Oracle Exadata, SQL, Informatica PowerCenter, Python, PySpark, Databricks, Apache Airflow, AWS S3, Git, Jenkins, JIRA

Data Engineer @ Stryker Kalamazoo, MI Aug 2024 – Feb 2025

Managed Linux server environments to support critical data warehousing operations, including deploying and configuring data processing applications and utilities.

Developed and enhanced various Linux-based toolsets, scripts, and jobs using Shell Scripting to automate data load/extract processes and system maintenance tasks.

Designed and implemented robust ETL pipelines using Informatica PowerCenter for efficient data ingestion and complex transformations into the data warehouse.

Worked extensively with Oracle databases, designing data models and optimizing SQL and PL/SQL procedures for enhanced data warehousing performance and reporting.

Engineered scalable data processing solutions using Azure Databricks and PySpark, ensuring seamless integration with existing Linux and Oracle data flows.

Ingested high volumes of data from various sources including SQL Server, APIs, and flat files into the enterprise data warehouse, maintaining data quality.

Performed in-depth data transformations and aggregations using Spark and SQL, preparing data for comprehensive analytics and business intelligence reporting.

Implemented data migration strategies from on-premise SQL Server to modernized Oracle and cloud-based data warehouses, ensuring data consistency and security.

Built data pipelines to process data in Parquet and Avro formats, optimizing storage and retrieval for large-scale data warehousing requirements.

Applied row-level security and encryption techniques for sensitive healthcare data within the Oracle data warehouse, adhering to strict compliance standards.

Scheduled and monitored complex ETL workflows using Apache Airflow and Azure Data Factory triggers, ensuring timely and reliable data delivery.

Utilized Git for version control and actively collaborated with cross-functional teams within an Agile environment to achieve project objectives.

Technologies Used: Linux, Shell Scripting, Oracle, SQL Server, Informatica PowerCenter, Apache Airflow, Python, PySpark, Azure Databricks, Azure Data Factory, Git, Agile Data Engineer @ CISCO San Jose, CA Jan 2024 – Aug 2024

Built scalable data pipelines using Spark-Scala on a Hadoop ecosystem, processing massive datasets for analytics and reporting, contributing to robust data flows.

Processed large datasets using Hive and HDFS for distributed storage, optimizing data access patterns for improved query performance in a data lake environment.

Developed intricate ETL jobs to ingest diverse data from relational databases into the Hadoop ecosystem, ensuring data integrity and consistency.

Designed and optimized complex Hive queries for performance improvements, enabling faster data retrieval for business intelligence applications.

Implemented Kafka-based streaming pipelines for real-time data ingestion and processing, supporting immediate analytical insights from rapidly changing data.

Stored processed data in optimized ORC and Parquet formats, enhancing efficiency for subsequent analytics and data warehousing tasks.

Implemented robust data validation and error handling mechanisms using Log4j, ensuring high data quality throughout the data processing lifecycle.

Utilized Oozie scheduler for comprehensive workflow orchestration, automating complex multi-step data processing jobs and dependencies.

Integrated processed data into various reporting tools for business insights, providing stakeholders with critical information for decision-making.

Ensured data security using Ranger access control policies across the Hadoop ecosystem, safeguarding sensitive information effectively.

Collaborated closely with business stakeholders for requirement gathering and solution design, aligning data engineering efforts with business needs.

Followed Agile methodology using JIRA for development cycles, ensuring iterative progress and adaptability to evolving project requirements.

Technologies Used: Hadoop, Spark-Scala, Hive, HDFS, Kafka, Oozie, Ranger, Git, Agile Data Engineer @ Phoenix Global Hyderabad, India Dec 2021 – Jul 2022

Developed robust ETL workflows using Informatica PowerCenter for efficient data integration across various source and target systems.

Extracted and transformed critical data from Oracle databases and flat files into target data warehousing systems, ensuring data accuracy.

Designed and implemented comprehensive data models, performing complex transformations using SQL to prepare data for analytical consumption.

Developed sophisticated PL/SQL procedures and stored functions to automate data manipulation and business logic within Oracle databases.

Loaded transformed and validated data into PostgreSQL database, ensuring consistent data structures and referential integrity.

Implemented rigorous data validation and quality checks within ETL processes, significantly reducing data errors and improving reliability.

Worked on batch processing pipelines for daily data loads, optimizing job scheduling and monitoring for timely data availability.

Optimized complex SQL queries to significantly improve performance of data extraction and loading operations.

Managed version control using Git, ensuring collaborative development and robust tracking of all code changes.

Assisted in basic Hadoop data ingestion activities, gaining exposure to distributed data processing environments and contributing to data flows.

Collaborated with data analysts to understand reporting requirements and ensure the ETL processes delivered accurate data.

Participated in system documentation and knowledge transfer sessions to maintain high operational standards and process improvements.

Technologies Used: Informatica PowerCenter, Oracle, PostgreSQL, SQL, PL/SQL, Git Junior Data Engineer @ Wipro Ltd Hyderabad, India Apr 2019 – Nov 2021

Designed and developed entry-level ETL workflows using Informatica, facilitating data movement and transformation for business intelligence.

Extracted data from multiple sources including flat files and relational databases, ensuring comprehensive data collection for analytical systems.

Performed data transformations and loaded cleansed data into MySQL database, supporting various reporting and application needs.

Developed SQL queries for efficient data extraction and reporting, contributing to accurate and timely business insights.

Created data mappings and workflows for batch processing, ensuring regular and automated updates of data stores.

Worked on data cleansing and validation techniques, improving the overall quality and reliability of business data assets.

Used GitHub for version control, collaborating with team members on code development and maintaining code integrity.

Supported deployment activities using Jenkins, assisting in the automation of software release cycles.

Participated in daily stand-up meetings following Agile methodology, contributing to team planning and task assignments.

Assisted in monitoring ETL jobs and troubleshooting data load failures, ensuring continuous data availability.

Gained foundational experience in relational database management systems and data manipulation techniques.

Documented ETL processes and data flow diagrams, contributing to knowledge sharing and operational transparency. Technologies Used: Informatica PowerCenter, MySQL, SQL, GitHub, Jenkins, Agile TECHNICAL SKILLS:

Programming Languages: Python, Scala, Perl, PL/SQL, SQL

Operating Systems & Scripting: Linux, Shell Scripting, Unix, Bash

Databases: Oracle Exadata, Oracle, PostgreSQL, MySQL, Hive, MS SQL Server, NoSQL

Data Warehousing & ETL: Informatica PowerCenter, Azure Data Factory, Data Modeling, Data Lake, Data Mart

Orchestration & Workflow: Apache Airflow, AWS Step Functions, Oozie

Cloud Platforms: AWS (S3, Glue, EMR, Lambda), Azure (ADLS, Synapse, Databricks)

Big Data Technologies: Spark (PySpark, Spark-Scala), Hadoop, Databricks, Kafka, Snowflake

Version Control & DevOps: Git, GitHub, Jenkins, Docker

BI & Reporting: Tableau, Power BI

Methodologies: Agile (Scrum)

EDUCATION:

Master of Science in Business Analytics and Information Systems @ University of South Florida

Contact this candidate