Bala Rama Krishna Reddy Naredla — Senior Data Engineer
872-***-**** ******************@*****.***
PROFESSIONAL SUMMARY:
Seasoned Data Warehouse Engineer with 5 years of comprehensive experience in implementing, configuring, and managing Linux-based infrastructure for robust data warehousing solutions.
Adept at enhancing complex ETL processes and database load/extract operations, ensuring high performance and data integrity across enterprise systems.
Proficient in designing and optimizing data architecture, with a strong focus on system and architecture improvements for scalability and efficiency.
Expert in Shell Scripting for automating Linux-based toolsets, scripts, jobs, and processes, driving significant operational enhancements.
Deep practical working experience in Linux environment setup, including mastery of Unix file systems, permissions, and standard command-line tools.
Skilled in Oracle development, possessing practical working experience with relational databases, including high- performance environments like Oracle Exadata.
Strong programming capabilities in Python for data processing, automation, and ETL enhancement within data warehousing environments.
Experienced with leading ETL tools, specifically Informatica, for building sophisticated data transformation and integration pipelines.
Proven expertise in orchestration tools, particularly Apache Airflow with Python, for efficient workflow scheduling and management.
Committed to the Agile methodology, contributing to continuous process improvement and rapid project delivery in dynamic teams.
Passionate about automation and continually seeking opportunities to enhance existing processes, reducing manual effort and improving system reliability.
Demonstrated ability to identify and resolve complex data warehousing challenges, ensuring optimal data flow and accessibility for analytical insights.
EDUCATION:
Master of Science in Computer Science @ Lewis University TECHNICAL SKILLS:
Programming Languages: Python, SQL, PL/SQL
Operating Systems: Linux, Unix
Databases: Oracle (Exadata), MySQL, PostgreSQL, SQL Server, DynamoDB, Hive, Snowflake
Data Warehousing & ETL: Informatica PowerCenter, Azure Data Factory, AWS Glue, Hadoop, Spark (PySpark), Databricks, Delta Lake
Orchestration & Automation: Apache Airflow, Azure Data Factory Triggers, AWS Step Functions, Shell Scripting, Jenkins
Cloud Platforms: Azure (ADLS Gen2, Synapse Analytics, Databricks, SQL Database), AWS (S3, EMR, Redshift, Athena, RDS)
Version Control & Collaboration: GitHub, GitLab, Docker, JIRA, Confluence
BI & Visualization: Tableau, Power BI, SSAS
WORK EXPERIENCE:
Senior Data Engineer @ HCA Healthcare Nashville, TN Feb 2025 – Present
Designed and implemented robust Linux-based data warehousing infrastructure on Azure for comprehensive healthcare data integration and analytics.
Developed scalable ETL pipelines using Azure Data Factory and Shell Scripting to ingest critical data from Oracle Exadata and REST APIs into ADLS Gen2.
Built PySpark-based data transformation workflows on Azure Databricks for processing large volumes of claims and patient data, optimizing for performance.
Created and managed Delta Lake tables, enhancing data lake capabilities and optimizing Spark jobs to achieve significant cost efficiencies.
Loaded curated datasets into Azure Synapse Analytics, leveraging Oracle SQL and PL/SQL expertise for enterprise reporting and advanced analytics.
Implemented secure data ingestion processes from JSON, CSV, and HL7 formatted files into cloud storage using Linux utilities and Python scripts.
Executed complex on-prem to Azure cloud data migration strategies for legacy Oracle databases, ensuring data integrity and minimal downtime.
Enforced stringent data security using role-based access control and advanced column-level masking within Synapse, meeting compliance standards.
Developed and deployed data quality validation frameworks using Python and integrated comprehensive logging mechanisms for proactive issue detection.
Orchestrated complex data workflows using Apache Airflow with Python and scheduled pipelines effectively with Azure Data Factory triggers.
Containerized Spark applications using Docker and streamlined deployments through Jenkins CI/CD pipelines within a Linux environment.
Championed Agile methodologies, actively participating in sprint planning and tracking tasks using JIRA for transparent project management.
Technologies Used: Azure (ADLS, ADF, Synapse, Databricks, Azure SQL), Oracle, SQL Server, PySpark, Python, Shell Scripting, Linux, Airflow, Docker, Jenkins, GitHub Data Engineer @ Goldman Sachs New York, NY Apr 2022 – Nov 2023
Designed and optimized an AWS-based data lake architecture for high-volume financial trade and risk data processing leveraging Linux servers.
Built data ingestion pipelines using AWS Glue and Shell Scripting to extract data from Oracle RDS into Amazon S3, ensuring robust data capture.
Developed PySpark applications on AWS EMR for large-scale batch data transformations, focusing on performance tuning and resource optimization.
Created and managed external tables in Hive on EMR, optimizing partitioning strategies for faster queries against massive datasets.
Implemented streaming data ingestion using Amazon Kinesis for near real-time analytics on critical financial transactions.
Stored processed data in Parquet format in S3, optimizing query performance using Athena for efficient data exploration and reporting.
Loaded aggregated datasets into Amazon Redshift for comprehensive reporting and advanced analytics consumption by business users.
Integrated DynamoDB for storing semi-structured trade metadata, ensuring rapid access and flexible schema management.
Developed advanced Oracle SQL queries and PL/SQL procedures for crucial data reconciliation processes, maintaining financial accuracy.
Implemented stringent IAM policies and encryption mechanisms to secure sensitive financial data across all AWS services.
Built and maintained CI/CD pipelines using Jenkins for automated deployments, ensuring smooth transitions within a Linux environment.
Actively participated in Agile ceremonies, maintained user stories in JIRA, and contributed to continuous process improvement initiatives.
Technologies Used: AWS (S3, EMR, Glue, Redshift, Athena, RDS, Kinesis, DynamoDB), Oracle, Hadoop, Hive, PySpark, PostgreSQL, Python, Shell Scripting, Linux, Jenkins, GitLab Junior Data Engineer @ Big Lots Columbus, OH Nov 2019 – Mar 2022
Designed and developed enterprise data warehouse solutions using Informatica PowerCenter for retail sales and inventory data.
Built robust ETL workflows to extract data from Oracle and MySQL databases into a centralized data warehouse, ensuring data consistency.
Developed complex SQL queries, stored procedures, and implemented performance tuning for critical reporting datasets.
Implemented Slowly Changing Dimensions (SCD Type 1 & 2) in dimensional data models to track historical changes accurately.
Integrated data from flat files (CSV, TXT) and XML feeds into staging tables, preparing data for downstream processing.
Performed comprehensive data cleansing and transformation using Informatica mappings and transformations, enhancing data quality.
Migrated selected workloads from on-prem Oracle databases to AWS S3 for archival purposes, reducing on-prem footprint.
Created and managed Hive tables on Hadoop for large historical sales data analysis, supporting business intelligence initiatives.
Implemented rigorous data quality checks and reconciliation reports to ensure overall data consistency and accuracy.
Developed SSAS cubes and enabled connectivity from Excel and Power BI for executive dashboard reporting and data visualization.
Managed code repositories using GitHub and automated deployments using Jenkins, ensuring efficient release cycles.
Collaborated effectively in an Agile environment with business analysts and reporting teams to meet evolving business requirements.
Technologies Used: Informatica PowerCenter, Oracle, MySQL, Hadoop, Hive, SSAS, AWS S3, SQL, Python, GitHub, Jenkins, Windows Server