Sree Harrsha Singara — Senior Data Engineer
469-***-**** *********@*****.***
PROFESSIONAL SUMMARY:
Highly experienced Senior Data Engineer with 5 years in designing, implementing, and managing robust data warehousing solutions.
Expertise in configuring and maintaining Linux-based infrastructure, optimizing processes, and enhancing system architecture for data management.
Proficient in developing advanced Shell Scripts for automation, job scheduling, and efficient data management operations across various platforms.
Proven ability in Oracle development, including Exadata environments, for complex data warehousing, ETL processes, and database load/extracts.
Skilled in Python and Perl scripting, delivering automated solutions for data extraction, transformation, and loading into data warehouses.
Comprehensive experience with ETL tools, specifically Informatica, to enhance database load and extract functionalities for critical datasets.
Adept at leveraging orchestration tools like Airflow with Python for streamlined, end-to-end data pipeline management and automation.
Strong understanding of Unix file systems, permissions, and standard tools, ensuring secure and efficient data handling in Linux environments.
Committed to Agile methodologies, driving continuous process improvement and automation in data engineering workflows to deliver high-quality solutions.
WORK EXPERIENCE:
Senior Data Engineer @ Kaiser Permanente Oakland, CA Sep 2024 – Present
Designed and implemented scalable data warehousing solutions, leveraging Linux-based infrastructure for comprehensive healthcare analytics.
Developed robust Shell Scripts to automate critical data ingestion, transformation, and Oracle database load processes on Linux environments.
Managed and configured Oracle Exadata databases, optimizing performance for large-scale healthcare data warehousing needs and complex queries.
Enhanced existing ETL processes using Informatica PowerCenter, significantly improving data load and extract efficiencies for patient records.
Implemented system and architecture improvements, ensuring high availability and performance of data warehouse components and related tools.
Developed Python scripts to streamline data quality checks and integrate various data sources into the central data warehouse effectively.
Utilized Apache Airflow with Python for orchestrating complex data pipelines, ensuring timely and accurate delivery of analytical datasets.
Maintained Unix file systems, managed permissions, and configured standard tools for secure and efficient data operations and storage.
Collaborated with cross-functional teams to define data flows and implement robust data models within the enterprise data warehouse.
Applied Agile methodologies to iterative development and deployment of new features and enhancements for critical data systems.
Conducted performance tuning on Oracle queries and Informatica mappings to accelerate data processing cycles and optimize resource usage.
Ensured compliance with stringent data governance and security standards, including HIPAA, within the data warehousing environment.
Technologies Used: Oracle Exadata, Linux, Shell Scripting, Python, Informatica PowerCenter, Apache Airflow, SQL, Data Warehousing, Agile, Git, Unix, AWS S3
Data Engineer @ Mastercard Purchase, NY Oct 2021 – Jul 2023
Engineered and managed Linux-based environments, providing robust infrastructure for critical financial data warehousing initiatives.
Developed intricate Shell Scripts for automating data extraction, transformation, and loading (ETL) into Oracle databases.
Implemented significant system and architecture improvements, enhancing the scalability and reliability of data warehouse components.
Enhanced ETL processes and database load/extract operations using Informatica PowerCenter for large transactional datasets.
Utilized practical knowledge of Unix file systems, including mount types and permissions, to secure sensitive financial data.
Developed Python and Perl scripts for data validation, metadata management, and generating comprehensive audit reports.
Collaborated with database administrators to optimize Oracle database performance and schema design for robust data warehousing.
Automated workflow orchestration using Apache Airflow with Python, improving the efficiency of daily data refreshes and reporting.
Applied Agile methodology to rapidly develop and deploy data solutions, adapting to evolving business requirements effectively.
Provided expert guidance on data flows, ensuring seamless integration of diverse data sources into the centralized data warehouse.
Managed version control for scripts and ETL mappings using GitLab, ensuring collaborative development and deployment practices.
Implemented comprehensive monitoring and logging for all Linux-based data processes, ensuring operational stability and integrity.
Technologies Used: Linux, Shell Scripting, Oracle, Python, Perl, Informatica PowerCenter, Apache Airflow, SQL, Data Warehousing, Agile, GitLab, Unix
Data Engineer @ Walmart Bentonville, AR Nov 2019 – Sep 2021
Developed and optimized data warehouse solutions on AWS Redshift, incorporating Linux-based processing for comprehensive retail analytics.
Created and maintained Shell Scripts for automating daily ETL jobs, data archival, and system health checks within the Linux environment.
Enhanced ETL processes for ingesting large volumes of sales, inventory, and customer behavior data from diverse source systems.
Utilized Python to develop advanced data transformation routines and validate data quality within the data warehousing framework.
Applied practical knowledge of Unix file systems for managing data directories, access controls, and operational scripts effectively.
Implemented system improvements to optimize data ingestion and querying performance within the AWS Redshift data warehouse.
Orchestrated complex batch jobs using Apache Airflow, scheduling various data processing and reporting tasks efficiently.
Worked extensively with relational databases, including Oracle, for source data extraction and complex query generation.
Collaborated in an Agile SDLC environment, delivering high-impact data solutions for critical business intelligence needs.
Managed data flows from source systems to the data warehouse, ensuring accuracy and consistency across all datasets.
Ensured data security and compliance for sensitive retail data by implementing strict access controls and data masking techniques.
Conducted performance tuning for Spark and Hive queries, improving the efficiency of large-scale data processing operations.
Technologies Used: AWS (Redshift, S3), Linux, Shell Scripting, Python, Apache Airflow, SQL, Oracle, Hadoop, Spark, ETL, Data Warehousing, Agile, Git, Unix
TECHNICAL SKILLS:
Programming & Scripting: Python, Shell Scripting, SQL, Perl, Java
Data Warehousing & ETL: Oracle Exadata, Informatica PowerCenter, Data Warehouses, ETL Processes, Data Flows, Data Modeling
Operating Systems: Linux, Unix (file systems, mount types, permissions, standard tools, pipes)
Cloud & Big Data Platforms: AWS (Redshift, S3), Snowflake, Azure Data Factory, Apache Spark, Hadoop, Hive
Orchestration & DevOps: Apache Airflow, Jenkins, Git, GitLab, Agile Methodology
Databases: Oracle, MySQL, PostgreSQL, DynamoDB
Business Intelligence: Tableau, Power BI
EDUCATION:
Master of Science in Computer Science @ Missouri University of Science and Technology