Sai Kiran Komarisetty — Data Engineer
734-***-**** *************@*****.***
PROFESSIONAL SUMMARY:
Highly accomplished Data Warehouse Engineer with over 6 years of progressive experience in designing, implementing, and optimizing robust data solutions. Possesses extensive hands-on experience in configuring and managing Linux-based infrastructure, developing intricate Shell scripts for automation, and performing advanced Oracle development. Proficient in enhancing ETL/database load/extract processes, with a strong background in large-scale data processing using technologies like Spark, Databricks, and Snowflake. Skilled in Python and SQL for complex data transformation, validation, and automation, demonstrating practical working experience with Unix file systems. Committed to identifying and implementing system/architecture improvements, with a passion for automation and continual process improvement within Agile methodologies. Experience spans both AWS and Azure cloud platforms, ensuring scalable and high-performance data warehousing and analytics solutions.
EDUCATION:
Master of Science in Information Systems @ Central Michigan University TECHNICAL SKILLS:
Programming Languages: Python, SQL, Shell Scripting, Perl
Operating Systems: Linux, Unix
Data Warehousing & ETL: Oracle Exadata, Informatica, Databricks, Spark, Hadoop, Snowflake, Azure Synapse Analytics, Azure Data Factory, AWS Glue
Databases: Oracle, MySQL, PostgreSQL, Azure SQL
Orchestration: Airflow, Jenkins, Azure Data Factory
Cloud Platforms: Azure (ADLS Gen2, Databricks, Synapse), AWS (S3, EMR, Redshift, Glue)
Version Control & CI/CD: GitHub
Business Intelligence: Power BI
Methodologies: Agile, SDLC
WORK EXPERIENCE:
Data Engineer @ Baylor Scott & White Health Dallas, TX Jan 2025 – Present
Implemented and managed robust Linux-based infrastructure to support large-scale data warehousing processes and critical analytics solutions.
Developed sophisticated Shell scripts for automating data ingestion, ETL workflows, and system administration tasks across diverse Linux environments.
Configured and optimized Oracle Exadata database instances, ensuring high performance and availability for critical healthcare data warehousing needs.
Engineered and enhanced ETL pipelines using Azure Data Factory and Spark on Databricks, integrating seamlessly with Linux-based data sources.
Designed and implemented significant system architecture improvements for data warehousing, focusing on scalability and operational efficiency within Linux.
Orchestrated complex data loads and extracts from Oracle databases to Azure Data Lake Storage Gen2, utilizing optimized Shell scripts for efficiency.
Developed and maintained Python scripts for comprehensive data transformation and validation, operating effectively within Linux server environments.
Monitored and managed Unix file systems, ensuring proper mount types, permissions, and efficient use of standard tools and pipes for data security.
Migrated and integrated on-prem Oracle databases into Azure Synapse, performing comprehensive data validation and reconciliation processes.
Collaborated with cross-functional teams to define data requirements and deliver actionable insights through Power BI dashboards, backed by Oracle data.
Implemented CI/CD pipelines using GitHub for version control and automated deployments of Linux-based scripts and data warehousing solutions.
Ensured data security and HIPAA compliance for regulated healthcare data, applying robust access controls and encryption within Oracle and Azure.
Technologies Used: Linux, Shell Scripting, Oracle Exadata, Azure Data Factory, ADLS Gen2, Azure Databricks, PySpark, Azure Synapse, Azure SQL, Power BI, GitHub, Parquet, Python Data Engineer @ Fifth Third Bank Cincinnati, OH Aug 2021 – Jun 2023
Managed and configured robust Linux environments for complex data processing and warehousing, optimizing resource utilization and system performance.
Developed and maintained sophisticated Shell scripts to automate data ingestion, ETL operations, and monitoring for large-scale financial data.
Engineered high-performance ETL pipelines leveraging Oracle databases for critical financial transaction processing and regulatory reporting.
Implemented data migration strategies from on-prem Oracle systems to AWS S3 and Amazon Redshift, ensuring data integrity and security compliance.
Designed and optimized data warehousing solutions using Snowflake, processing vast datasets with exceptional efficiency and scalability.
Developed Spark ETL jobs for transforming and loading financial data, with a strong focus on enhancing database load and extract processes.
Utilized Python extensively for data manipulation, validation, and API integrations within complex Linux- based data ecosystems.
Applied practical knowledge of Unix file systems, expertly managing permissions, standard tools, and pipes for secure data handling.
Automated pipeline orchestration using Airflow with Python DAGs, significantly enhancing the reliability and efficiency of data workflows.
Collaborated actively within an Agile environment, contributing to significant system and architecture improvements for data warehousing solutions.
Ensured stringent data security and compliance for sensitive financial data, implementing robust access controls and encryption mechanisms.
Optimized SQL queries and Spark transformations, significantly improving pipeline performance and reducing processing times for financial analytics. Technologies Used: Linux, Shell Scripting, Oracle, AWS (S3, EMR, Glue, Redshift), Spark, Snowflake, Python, SQL, Airflow, GitHub, Agile
Data Engineer @ Old Navy San Francisco, CA Jul 2019 – Jul 2021
Supported and managed Linux-based environments crucial for operating ETL tools and database systems within the enterprise data warehouse.
Developed and refined Shell scripts for automating daily data processing tasks, including file transfers and critical job scheduling.
Enhanced existing ETL processes and database load/extract operations using Informatica PowerCenter for comprehensive retail sales and inventory data.
Designed and implemented robust batch ETL pipelines utilizing Oracle databases as source and target systems, ensuring data consistency.
Migrated legacy ETL workflows to Spark-based processing frameworks, optimizing for superior performance and scalability on Linux servers.
Applied practical knowledge of Unix file systems to effectively manage data storage, access, and permissions across various platforms.
Developed advanced SQL queries and stored procedures within Oracle databases to support complex data transformations and analytical reporting.
Implemented robust data quality checks and reconciliation reports, ensuring high data integrity for critical retail analytics.
Integrated Hadoop and Hive for processing large-scale historical data, significantly contributing to the overall data warehousing architecture.
Utilized Python for custom scripting and data manipulation tasks, improving automation across various established data pipelines.
Collaborated with business analysts to model fact and dimension tables, delivering comprehensive data structures for Power BI reporting.
Managed version control for all ETL code and scripts using GitHub, strictly adhering to Agile SDLC methodologies for project delivery.
Technologies Used: Linux, Shell Scripting, Oracle, Informatica, Azure Data Factory, ADLS, PySpark, Azure SQL Database, Python, SQL, GitHub, Hadoop, Hive, Agile