Bhavana Kothoju — Senior Data Engineer
302-***-**** ************@*****.***
PROFESSIONAL SUMMARY:
Highly experienced Senior Data Engineer with 5 years of proven expertise in designing and optimizing robust data warehousing solutions.
Adept at implementing, configuring, and managing Linux-based processes and infrastructure for complex data environments.
Skilled in developing sophisticated Shell Scripts to automate critical data operations, enhancing efficiency and reliability.
Extensive experience with Oracle development, including database load/extract processes and performance tuning within data warehouses.
Proficient in Python for data processing, ETL pipeline development, and orchestrating advanced data workflows.
Expertise in enhancing ETL and database load/extract processes, ensuring high data quality and timely delivery for analytics.
Strong practical experience with data warehousing concepts, data flows, and implementing system/architecture improvements.
Hands-on experience with orchestration tools like Apache Airflow, utilizing Python for efficient workflow management.
Committed to automation and continual process improvement, collaborating effectively within Agile methodology frameworks.
WORK EXPERIENCE:
Senior Data Engineer @ Tenet Healthcare Dallas, TX Sep 2023 – Present
Architected and managed robust Linux-based data infrastructure, implementing system improvements to support large- scale data warehousing operations.
Developed and enhanced complex Shell Scripts for automating critical data ingestion, transformation, and database load/extract processes.
Designed and optimized scalable data architectures on AWS, focusing on efficient ETL pipelines for healthcare datasets.
Utilized Oracle development practices to integrate external data sources, enhancing the enterprise data warehouse capabilities significantly.
Engineered advanced ETL pipelines using AWS Glue and PySpark, processing extensive healthcare datasets for analytical insights.
Built and optimized high-performance Spark jobs on EMR, improving data processing efficiency and reducing execution times.
Ingested diverse data from APIs, CSV, and JSON into S3, ensuring data quality and availability for downstream systems.
Performed intricate data transformations and cleansing using PySpark and SQL, preparing data for Amazon Redshift data warehouse.
Orchestrated complex data workflows using Apache Airflow with Python, achieving reliable scheduling and operational efficiency.
Implemented secure data access and governance through AWS IAM policies, upholding strict compliance standards for sensitive data.
Integrated advanced Tableau dashboards for comprehensive reporting, providing actionable business insights and data visualization.
Collaborated within an Agile framework, leveraging JIRA for sprint planning, task tracking, and continuous process improvement.
Technologies Used: AWS (S3, Glue, EMR, Redshift, IAM), Apache Airflow, Python, PySpark, SQL, Linux, Shell Scripting, Oracle, Tableau, Parquet, Git, Jenkins, JIRA
Data Engineer @ U.S. Bank Minneapolis, MN May 2021 – Jul 2022
Designed and developed robust data pipelines within a Linux environment, enhancing ETL processes for financial data warehousing.
Implemented comprehensive Shell Scripts for managing and automating data ingestion from diverse sources into Azure Data Lake Storage.
Developed and optimized data transformations using Azure Databricks with PySpark, handling large-scale financial datasets effectively.
Integrated data from Oracle and SQL Server databases into the data lake, facilitating extensive analytical capabilities.
Engineered efficient ETL processes using Azure Data Factory, ensuring timely and accurate data delivery to Azure Synapse Analytics.
Built optimized Spark workflows for large-scale financial data processing, significantly improving data throughput and performance.
Loaded transformed financial data into Azure Synapse Analytics, serving as the core for enterprise-wide reporting and business intelligence.
Implemented advanced data quality checks and validation frameworks, maintaining high integrity of critical financial information.
Utilized Apache Airflow with Python for orchestrating complex data workflows, improving reliability and scheduling efficiency.
Applied robust role-based access control (RBAC) mechanisms for secure data access within the Azure ecosystem.
Developed impactful dashboards using Power BI, providing key financial reporting and enabling data-driven decision- making.
Participated actively in Agile teams, utilizing JIRA and GitHub for collaborative development and version control processes. Technologies Used: Azure Data Factory, ADLS, Databricks, Synapse Analytics, PySpark, Python, Shell Scripting, Oracle, SQL Server, Apache Airflow, Linux, Power BI, GitHub, JIRA Data Engineer @ Chewy Plantation, FL Nov 2019 – Apr 2021
Developed and maintained critical ETL pipelines using Informatica PowerCenter, ensuring reliable data flow for the data warehouse.
Extracted and transformed data from Oracle and MySQL databases, supporting comprehensive data integration initiatives.
Designed sophisticated data transformation logic using Informatica mappings and workflows, adhering to data modeling best practices.
Wrote complex SQL queries and stored procedures for efficient data processing, optimizing database performance.
Performed rigorous data validation and implemented quality checks, guaranteeing data accuracy and consistency within the warehouse.
Integrated processed data into data warehouse systems, enabling advanced reporting and business intelligence capabilities.
Utilized Unix shell scripting for automating ETL processes and managing file systems, enhancing operational efficiency.
Implemented robust scheduling using Control-M for ETL jobs, ensuring timely execution and dependency management.
Assisted in minor Hadoop data ingestion using Hive, broadening experience with big data technologies.
Employed Git for comprehensive version control, facilitating collaborative development and code management.
Identified and implemented system/architecture improvements, contributing to the overall stability and scalability of data platforms.
Participated actively in Agile methodology, leveraging JIRA for task tracking, sprint planning, and continuous improvement. Technologies Used: Informatica PowerCenter, Oracle, MySQL, SQL, Unix, Linux, Shell Scripting, Python, Control-M, Hive, Git, JIRA TECHNICAL SKILLS:
Programming Languages: Python, Shell Scripting, SQL, Perl
Databases: Oracle Exadata, Oracle, MySQL, PostgreSQL, MS SQL Server, Amazon Redshift, Azure Synapse Analytics
ETL & Data Warehousing: Informatica PowerCenter, AWS Glue, Azure Data Factory, Databricks, Snowflake, Apache Spark, Hadoop, Hive, Data Warehouse Architecture
Cloud Platforms: AWS (S3, EMR, Redshift, Glue, Lambda, Athena), Azure (ADLS, Databricks, Synapse)
Orchestration & Automation: Apache Airflow, Control-M, Jenkins, Docker
Version Control & Collaboration: Git, GitHub, JIRA, Confluence
Operating Systems: Linux, Unix
BI & Reporting: Tableau, Power BI
EDUCATION:
Master of Science in Computer and Information Systems @ Wilmington University