Akshaya K — Senior Data Engineer
470-***-**** *************@*****.***
PROFESSIONAL SUMMARY
Highly accomplished Data Engineer with around 5 years of experience, specializing in designing and optimizing robust data warehousing solutions and processes.
Expertise in implementing, configuring, and managing Linux-based processes and infrastructure crucial for high- performance data platform operations.
Proficient in developing and enhancing ETL processes, database load/extract operations, and intricate data pipelines using modern techniques.
Strong command of Shell Scripting for automating complex tasks, implementing system/architecture improvements, and managing Linux environments effectively.
Extensive practical experience with relational databases, including Oracle Exadata, ensuring high performance and data integrity for warehousing initiatives.
Adept at leveraging Python for data processing, scripting, and developing scalable solutions within diverse data ecosystems and analytical frameworks.
Experienced with orchestration tools like Apache Airflow, utilizing Python to design and manage complex data workflows and dependencies efficiently.
Proven ability to identify and implement system and architecture improvements, enhancing data quality, performance, and operational efficiency across platforms.
Committed to Agile methodologies and continual process improvement, consistently delivering high-quality, maintainable data solutions for business intelligence. EDUCATION
Master of Science in AI and Computer Science @ The University of Texas at Arlington TECHNICAL SKILLS
Programming Languages: Python, Scala, SQL
Operating Systems & Scripting: Linux, Unix, Shell Scripting
Cloud Platforms: AWS (S3, EMR, Redshift, Glue, Lambda), Azure (ADF, ADLS Gen2, Synapse Analytics)
Data Warehousing & ETL: Hadoop, Spark, Hive, Snowflake, Informatica PowerCenter, Azure Data Factory, AWS Glue, ETL Design
Orchestration & Automation: Apache Airflow, Jenkins, Control-M
Database Management: Oracle, Oracle Exadata, MySQL, PostgreSQL, MS SQL SERVER
DevOps & Version Control: GitHub, Docker
Methodologies: Agile Scrum
WORK EXPERIENCE
Senior Data Engineer @ Goldman Sachs New York, NY Aug 2024 – Present
Implemented and managed Linux-based processes and infrastructure, ensuring robust and scalable operations for enterprise data warehousing solutions.
Designed and developed advanced ETL pipelines using PySpark on EMR, efficiently processing large-scale financial data for critical reporting needs.
Leveraged extensive Shell Scripting to automate system/architecture improvements, streamline data load/extract processes, and enhance operational efficiency.
Collaborated with database teams to integrate Oracle Exadata into data pipelines, optimizing performance for high- volume transactional data warehousing.
Built and optimized Spark jobs to handle structured and semi-structured data, ensuring data quality and consistency across various financial sources.
Orchestrated complex data workflows and dependencies using Apache Airflow with Python, maintaining reliable and automated data delivery.
Applied deep understanding of data warehousing principles to integrate data from diverse sources into Snowflake, facilitating advanced analytics.
Utilized Jenkins for CI/CD automation of data pipelines, contributing to Agile development practices and continuous integration strategies.
Implemented data quality checks and validation frameworks. Technologies Used: AWS (S3, EMR, Redshift, Lambda, Glue), PySpark, Kafka, Snowflake, Apache Airflow, Python, Linux, Shell Scripting, Oracle, Power BI, GitHub, Jenkins Data Engineer @ GEICO Chevy Chase, MD Feb 2021 – Aug 2023
Engineered and managed Linux-based data processing environments within Azure, enhancing efficiency for complex data warehousing operations.
Designed and developed robust ETL pipelines using Azure Data Factory and Azure Databricks, processing diverse data for analytical insights.
Developed and enhanced Shell Scripts to automate routine tasks, implement system improvements, and manage data ingestion and transformation flows.
Integrated relational databases, including Oracle, as source systems for data extraction, ensuring seamless data flow into Azure Synapse Analytics.
Built scalable solutions using Azure Data Lake Storage (ADLS Gen2) and Spark in Scala, performing complex transformations for data warehousing.
Applied practical working knowledge of Unix file systems and permissions for secure and efficient data management within the cloud environment.
Implemented data partitioning and optimization strategies for performance, enhancing ETL/database load processes in Azure Synapse Analytics.
Utilized an Agile approach for project delivery, working closely with stakeholders to refine data models and reporting requirements.
Built reporting datasets for Power BI dashboards. Technologies Used: Azure Data Factory, ADLS Gen2, Azure Databricks (Scala), Synapse Analytics, SQL Server, Linux, Shell Scripting, Oracle, Power BI, GitHub
Junior Data Engineer @ UnitedHealth Group Minneapolis, MN Jul 2019 – Feb 2021
Designed and developed intricate ETL workflows using Informatica PowerCenter, extracting and transforming critical healthcare data from source systems.
Implemented comprehensive Shell Scripts for job automation and data validation, ensuring smooth operation of ETL and data loading processes.
Developed complex SQL queries and stored procedures, performing data cleansing and transformation within Oracle databases for data warehousing.
Managed and optimized large datasets within Hadoop (Hive), applying data warehousing principles for efficient storage and retrieval.
Worked with diverse data formats and integrated data from multiple sources into the data warehouse, supporting robust reporting requirements.
Gained practical working experience with Unix file systems, including standard tools and permissions, essential for managing data environments.
Supported reporting teams by providing accurate data extraction and transformation services, improving data availability for business intelligence.
Participated actively in Agile SDLC lifecycle activities, contributing to sprint planning and delivering incremental enhancements to data solutions.
Maintained version control using GitHub.
Technologies Used: Informatica PowerCenter, Oracle, Hive, Hadoop, SQL, Unix Shell Scripting, Control-M, GitHub