Venkateshwar Reddy Kodiganti — Senior Data Engineer
209-***-**** *******@*****.***
PROFESSIONAL SUMMARY:
Highly experienced Data Warehouse Engineer with around 5 years of proven expertise in implementing and managing robust data warehousing solutions.
Adept at designing and optimizing Linux-based processes and infrastructure crucial for high-performance data operations and ETL workflows.
Skilled in comprehensive Shell Scripting and Oracle development, including advanced SQL and database load/extract processes for complex datasets.
Proficient in enhancing ETL processes and database load/extract mechanisms to improve data ingestion and transformation efficiency across systems.
Expertise in Python for data pipeline development and Apache Airflow for sophisticated workflow orchestration, ensuring reliable data delivery.
Committed to identifying and implementing system and architecture improvements, driving automation and continual process enhancements within data environments.
Experienced with various Linux-based toolsets, scripts, jobs, and processes, ensuring stable and scalable data warehousing operations.
Strong understanding of Agile methodologies, contributing effectively to sprint planning, task management, and project delivery.
Demonstrates a passion for automation and continual process improvement, consistently seeking to optimize data infrastructure and operational efficiency.
EDUCATION:
Master of Science in Business Analytics @ University of the Pacific WORK EXPERIENCE:
Senior Data Engineer @ UnitedHealth Group Minnetonka, MN Feb 2024 – Present
Implemented and managed robust Linux-based processes and infrastructure critical for high-volume data warehousing operations.
Designed and developed scalable data pipelines using Python and PySpark, optimizing performance for large-scale healthcare datasets.
Enhanced ETL and database load/extract processes, integrating data from diverse sources into enterprise data warehouses.
Developed advanced Shell Scripts to automate routine data operations, system monitoring, and administrative tasks on Linux servers.
Optimized Oracle database performance by tuning complex SQL queries, stored procedures, and data loading mechanisms for efficiency.
Built and maintained sophisticated data warehousing solutions leveraging Amazon Redshift, ensuring high availability and data integrity.
Utilized Apache Airflow with Python for orchestrating intricate data pipelines, enhancing reliability and scheduling of data flows.
Ensured secure data access and governance through meticulous IAM roles and policies within a comprehensive Linux environment.
Implemented system and architecture improvements to enhance data processing capabilities and overall data warehouse scalability.
Integrated various Linux-based toolsets and scripts for efficient management of data ingestion and transformation workflows.
Collaborated within an Agile framework, utilizing JIRA for tracking tasks and facilitating seamless sprint planning and project delivery.
Maintained comprehensive documentation for all Linux-based processes, ETL workflows, and data models to ensure clarity and knowledge transfer.
Technologies Used: Linux, Oracle, AWS (S3, Glue, Redshift, Lambda), PySpark, Airflow, Python, Shell Scripting, PostgreSQL, Docker, Jenkins, Git
Data Engineer @ Mastercard Purchase, NY Jul 2021 – Dec 2022
Managed Linux-based environments and infrastructure, ensuring optimal performance for critical data warehousing applications and processes.
Developed and optimized advanced Shell Scripts for automating data extraction, transformation, and loading
(ETL) tasks in a production environment.
Designed and implemented robust ETL processes using Azure Data Factory (ADF) to ingest and transform large financial datasets from various sources.
Performed extensive Oracle development, including crafting complex SQL queries and optimizing procedures for data manipulation on Exadata platforms.
Utilized Azure Synapse Analytics for scalable data warehousing and analytical processing, integrating seamlessly with Linux-based systems.
Enhanced ETL and database load/extract processes, significantly reducing data processing times and improving data availability.
Contributed to system and architecture improvements by implementing efficient data partitioning and indexing strategies within Oracle Exadata.
Processed and transformed financial transaction data using Spark-Scala, ensuring data quality and consistency for reporting systems.
Managed data flows from on-premise databases to Azure Data Lake Storage (ADLS), leveraging Linux-based tools for secure transfers.
Implemented role-based access controls and data security policies consistent with enterprise data warehousing best practices.
Automated pipeline scheduling using ADF triggers and integrated with Shell Scripts for pre/post-processing tasks.
Maintained code using Git and adhered strictly to Agile methodologies for efficient development and deployment cycles.
Technologies Used: Linux, Oracle Exadata, Azure (ADF, ADLS, Synapse), Spark-Scala, Python, Shell Scripting, SQL, Git, Agile
Data Engineer @ Etsy Brooklyn, NY Feb 2020 – Jun 2021
Developed and maintained ETL workflows using Informatica to extract, transform, and load e-commerce data into various data targets.
Designed and implemented relational data models utilizing MySQL, providing robust foundations for reporting and analytical systems.
Wrote and optimized complex SQL queries and stored procedures to perform intricate data transformations and aggregations.
Ingested diverse data from flat files and relational databases into staging environments, ensuring data integrity at ingress.
Performed thorough data cleansing and transformation using Informatica ETL processes, enhancing data quality for business insights.
Assisted in the strategic migration of legacy data systems to a scalable Hadoop ecosystem, ensuring minimal disruption.
Utilized Hive for efficiently querying and analyzing large datasets stored within the Hadoop Distributed File System (HDFS).
Implemented foundational Spark jobs for distributed data processing, contributing to the modernization of data pipelines.
Ensured data quality through rigorous validation and reconciliation processes across all ETL stages.
Collaborated closely with data analysts to understand reporting requirements and provide optimized datasets.
Participated in agile ceremonies, contributing to continuous improvement of data engineering practices and workflows.
Maintained detailed documentation for ETL mappings, data dictionaries, and data flow diagrams, ensuring knowledge accessibility.
Technologies Used: Informatica, MySQL, SQL, Hive, Hadoop, Data Warehousing, Data Flows TECHNICAL SKILLS:
Programming Languages: Python, Scala, SQL, Perl (foundational)
Data Warehousing: Oracle Exadata, Snowflake, Hadoop, Hive, Data Lake Storage
Operating Systems & Scripting: Linux, Unix File Systems, Shell Scripting
Cloud Platforms: AWS (S3, EMR, Glue, Redshift, Lambda), Azure (ADF, ADLS, Synapse)
ETL & Orchestration: Informatica, Apache Airflow, AWS Glue, Azure Data Factory
Database Management: Oracle, PostgreSQL, MySQL
Version Control & DevOps: Git, Jenkins, Docker
BI & Reporting: Tableau, Power BI
Methodologies: Agile, Scrum