Hema Konduru — Senior Data Engineer
443-***-**** ****.*******.*******@*****.***
PROFESSIONAL SUMMARY:
Leveraging 5 years of comprehensive experience as a Data Warehouse Engineer with a profound emphasis on Shell Scripting and Oracle development.
Adept at implementing, configuring, and managing robust Linux-based processes and infrastructure critical for data warehousing operations.
Proven ability to identify and implement strategic system and architecture improvements enhancing overall data platform efficiency.
Skilled in enhancing various Linux-based toolsets, scripts, jobs, and processes to streamline data pipelines and operations.
Expertise in optimizing ETL, database load, and extract processes ensuring high performance and data integrity across systems.
Demonstrated practical working experience in Linux environment setup and developing intricate shell scripts for automation.
Extensive practical knowledge of Unix file systems, including mount types, permissions, standard tools, and effective pipe utilization.
Proficient in Python for data processing and automation, with a strong understanding of relational databases like Oracle Exadata.
Solid understanding and application of Agile methodology, fostering collaborative environments and iterative development cycles.
Passionate about automation and continually implementing process improvements to drive operational excellence and reduce manual effort.
Experienced with ETL tools such as Informatica and proficient in using orchestration tools like Airflow with Python for complex workflows.
Comprehensive knowledge of data warehouses and intricate data flows, ensuring efficient data transformation and availability for reporting.
EDUCATION:
Master of Science in Data Science @ University of Maryland and Baltimore County TECHNICAL SKILLS:
Programming Languages: Python, SQL, PL/SQL, Java
Operating Systems & Scripting: Linux, Unix, Shell Scripting
Databases & Data Warehousing: Oracle Exadata, Oracle, SQL Server, MySQL, Snowflake, DynamoDB, Hive, SSAS, Data Modeling
ETL & Orchestration: Informatica PowerCenter, Apache Airflow, AWS Glue, Azure Data Factory, Control-M, Jenkins
Cloud Platforms: AWS (S3, EMR, Glue, Redshift, RDS, Lambda, DynamoDB), Azure (ADLS, ADF, Synapse, Azure SQL Database)
Big Data Technologies: Hadoop, Spark (PySpark), Databricks
Version Control & Tools: GitHub, JIRA, Confluence, Docker, Kubernetes WORK EXPERIENCE:
Senior Data Engineer @ CommonSpirit Health Chicago, IL Sep 2024 – Present
Designed and developed robust ETL workflows using Informatica PowerCenter for comprehensive retail sales and inventory analytics.
Implemented, configured, and managed Linux-based processes and infrastructure to support efficient data warehousing operations.
Developed complex PL/SQL procedures and advanced SQL queries in Oracle for sophisticated data transformation and aggregation tasks.
Built dimensional data models, including fact and dimension tables, optimizing the Oracle data warehouse for sales reporting.
Integrated data from multiple flat files like CSV and TXT, alongside various relational sources, into the centralized Oracle data warehouse.
Performed data migration from on-premise Oracle to AWS S3 as part of a strategic cloud transition initiative, ensuring data integrity.
Developed Hive external tables on Cloudera Hadoop for analyzing extensive transaction logs, improving data accessibility for business intelligence.
Implemented practical shell scripts for automating data ingestion, transformation, and job scheduling within the Linux environment.
Utilized SSAS cubes to enable multidimensional reporting for finance teams, providing enhanced analytical capabilities.
Implemented comprehensive data validation and reconciliation checks, ensuring high data accuracy and reliability across all datasets.
Developed shell scripts for job automation and scheduling using Control-M, improving operational efficiency and reducing manual intervention.
Participated actively in requirement analysis, development, unit testing, and production support activities within an Agile framework.
Technologies Used: Informatica PowerCenter, Oracle, SQL, PL/SQL, Linux, Shell Scripting, Cloudera Hadoop, Hive, Spark, SSAS, AWS S3, GitHub, Jenkins, Control-M Data Engineer @ Wells Fargo San Francisco, CA Apr 2022 – Jul 2023
Designed and implemented a scalable Azure-based data platform utilizing ADLS Gen2, Azure Data Factory, and Synapse Analytics for financial risk reporting.
Developed PySpark jobs within Azure Databricks, operating on Linux-based clusters, to transform and cleanse structured and semi-structured financial datasets.
Built robust Azure Data Factory pipelines for ingesting data from SQL Server, Oracle databases, and external APIs into ADLS in various formats.
Implemented sophisticated data modeling using Star schema principles within Synapse dedicated SQL pools for optimized analytical queries.
Created partitioned Parquet datasets in ADLS to significantly optimize large-scale reporting queries and improve data access performance.
Integrated Power BI with Synapse for interactive dashboards, implementing row-level security to ensure data governance and compliance.
Implemented advanced data quality checks and reconciliation processes using custom validation frameworks to maintain data integrity.
Enhanced ETL processes by developing custom shell scripts to manage data extracts and loads for various financial applications.
Enabled secure data access using Azure Active Directory roles and column-level masking, adhering to strict financial data security standards.
Scheduled and monitored complex data pipelines using Azure Data Factory triggers, ensuring timely and reliable data delivery.
Version controlled all Databricks notebooks and Azure Data Factory code using GitHub, promoting collaborative development and auditing.
Worked within an Agile Scrum model, collaborating extensively with cross-functional teams for thorough requirement gathering and design.
Technologies Used: Azure (ADLS, ADF, Synapse, Azure SQL Database), Databricks (PySpark), Linux, Shell Scripting, SQL Server, Power BI, Jenkins, GitHub
Junior Data Engineer @ Dollar Tree Chesapeake, VA Nov 2019 – Mar 2022
Designed and implemented scalable data lake architecture on AWS using S3, EMR, and Glue for processing diverse healthcare claims and EHR data.
Developed PySpark-based ETL pipelines on EMR, leveraging Python for complex data transformations of large volumes of JSON and CSV files.
Implemented efficient data ingestion workflows from Oracle databases to AWS S3 using AWS Glue, incorporating incremental load strategies.
Created external Hive tables on EMR and optimized queries using partitioning and bucketing techniques to enhance data retrieval performance.
Migrated legacy Informatica workflows to AWS Glue jobs, significantly reducing operational overhead and improving scalability.
Designed robust Snowflake schemas and loaded curated datasets into Redshift, preparing them for advanced analytics and reporting.
Integrated DynamoDB for storing semi-structured patient activity logs, ensuring high availability and low- latency data access.
Implemented robust IAM policies and encryption mechanisms to ensure HIPAA-compliant data security and privacy.
Developed comprehensive Apache Airflow DAGs to orchestrate end-to-end ETL workflows with proper logging and alerting mechanisms.
Enabled seamless data consumption through Tableau dashboards and Athena queries for diverse business users and stakeholders.
Containerized ETL applications using Docker and deployed them through Jenkins CI/CD pipelines, streamlining deployment processes.
Actively participated in Agile ceremonies and managed project tasks efficiently using JIRA, contributing to timely project delivery.
Technologies Used: AWS (S3, EMR, Glue, Redshift, RDS, DynamoDB, Lambda), PySpark, Hive, Snowflake, Apache Airflow, Python, Docker, Jenkins, Tableau, GitHub