Venkata Vikas Nadella — Senior Data Engineer
331-***-**** *********@*****.***
PROFESSIONAL SUMMARY:
Leveraging around 7 years of comprehensive experience in Data Engineering and advanced Data Warehousing technologies for robust solutions.
Demonstrated expertise in designing, implementing, and managing Linux-based processes and infrastructure critical for data warehousing operations.
Proficiently enhance complex ETL and database load/extract processes, ensuring optimal performance and data integrity across various platforms.
Adept at identifying and implementing strategic system and architecture improvements to enhance data flow and operational efficiency.
Skilled in optimizing Linux-based toolsets, shell scripts, automated jobs, and processes to streamline data management workflows.
Extensive practical experience with relational databases, specifically Oracle Exadata, for high-performance data storage and retrieval.
Expert in shell scripting and Python for automating data pipelines, system tasks, and intricate data transformations.
Strong command of orchestration tools like Apache Airflow, utilizing Python for managing complex data warehouse workflows efficiently.
Committed to agile methodologies and continuous process improvement, ensuring high-quality data solutions and collaborative team environments.
EDUCATION:
Master of Science in Computer Information Technology @ Elmhurst University WORK EXPERIENCE:
Senior Data Engineer @ American Express Phoenix, AZ May 2024 – Present
Implemented and configured robust Linux-based infrastructure to support critical data warehousing operations and high- volume data processing.
Developed and enhanced advanced shell scripts for automating data ingestion, transformation, and database load/extract processes within the data warehouse.
Designed scalable data pipelines using PySpark, focusing on processing large-scale financial datasets efficiently on Linux environments.
Integrated data from diverse sources including Oracle Exadata, APIs, and JSON, ensuring seamless loading into the core data warehouse.
Optimized existing Linux-based toolsets, scripts, and scheduled jobs to significantly improve data flow and system performance for data warehousing.
Engineered sophisticated ETL workflows using AWS Glue and custom Python scripts for transforming and loading data into Redshift and Snowflake.
Managed and maintained data models in Snowflake, specifically tailoring them for advanced analytics and comprehensive reporting requirements.
Implemented rigorous data quality checks and validation frameworks across Linux environments, ensuring data accuracy and consistency within the warehouse.
Utilized Apache Airflow with Python to schedule, monitor, and manage complex ETL workflows, enhancing automation and reliability of data processes.
Provided critical insights for system and architecture improvements, focusing on enhancing the resilience and scalability of the Linux-based data platform.
Developed robust dashboards using Tableau for business stakeholders, providing actionable insights derived from the streamlined data warehouse.
Integrated CI/CD pipelines using Jenkins for automated deployment of data engineering artifacts, ensuring rapid and consistent delivery.
Technologies Used: Linux, Shell Scripting, Oracle, Python, Apache Airflow, Snowflake, PySpark, AWS (S3, Redshift, Glue), Git, Jenkins
Data Engineer @ LTI Mindtree / Marsh & McLennan Hyderabad, India Jun 2022 – Mar 2023
Managed and maintained Linux environments for big data processing, optimizing resource utilization and system stability for data warehousing tasks.
Developed extensive shell scripts for automating data ingestion, processing, and transformation tasks within Hadoop and Spark ecosystems.
Designed and implemented robust ETL pipelines using Spark (Scala) for processing complex insurance data, enhancing data quality and accessibility.
Ingested diverse datasets from CSV, JSON, and Oracle Exadata into Hadoop HDFS, ensuring efficient data capture for analytical purposes.
Built sophisticated data transformation logic using Hive and Spark SQL, optimizing queries for improved performance within the data warehouse.
Implemented advanced batch processing pipelines using Hadoop and Spark frameworks, integrating seamlessly with Linux-based scheduling tools.
Utilized Apache Kafka for real-time data ingestion and streaming pipelines, ensuring timely availability of critical business information.
Enhanced ETL and database load/extract processes, focusing on performance tuning and reliability within the Linux operating environment.
Stored processed data in optimized Hive tables using ORC format, facilitating faster querying and reduced storage footprint for reporting.
Implemented comprehensive data validation and reconciliation processes, ensuring accuracy across all stages of the data pipeline.
Scheduled and orchestrated complex workflows using Apache Airflow with Python, improving the consistency and efficiency of data loads.
Contributed to system/architecture improvements, focusing on enhancing Linux-based toolsets and processes for better data management.
Technologies Used: Linux, Shell Scripting, Python, Apache Airflow, Hadoop, Spark (Scala), Hive, Kafka, Oracle, Git, JIRA Data Engineer @ Hudson Infotech / HDFC Bank Hyderabad, India July 2018 – May 2022
Administered and configured Unix/Linux environments to support critical ETL and data warehousing operations for banking data processing.
Developed robust ETL workflows using Informatica PowerCenter for efficient extraction, transformation, and loading of banking data.
Extracted complex financial data from Oracle Exadata and MySQL databases, preparing it for integration into the enterprise data warehouse.
Designed and implemented intricate data transformation mappings within Informatica, ensuring compliance with business rules and data standards.
Wrote and optimized complex SQL queries and stored procedures for data processing and validation in Oracle and MySQL environments.
Performed comprehensive data cleansing and validation routines to ensure high data integrity and accuracy within the data warehouse.
Worked extensively on data modeling and schema design for reporting systems, ensuring efficient data retrieval and analytical capabilities.
Implemented error handling and logging mechanisms in ETL processes, providing proactive identification and resolution of data issues.
Developed and maintained shell scripts for job automation, scheduling, and system monitoring within the Unix/Linux operating system.
Managed version control for all code and scripts using Git, ensuring collaborative development and traceability of changes.
Collaborated closely with business teams for requirement gathering and analysis, translating needs into effective data warehouse solutions.
Provided critical production support and resolved data-related issues promptly, minimizing impact on business operations.
Technologies Used: Linux, Shell Scripting, Oracle Exadata, Informatica PowerCenter, SQL, MySQL, Unix, Git TECHNICAL SKILLS:
Programming Languages: Python, Scala, SQL, Perl
Operating Systems & Scripting: Linux, Unix, Shell Scripting (Bash, KornShell)
Databases: Oracle Exadata, PostgreSQL, MySQL, Hive, Snowflake, Redshift
Data Warehousing & ETL: Informatica PowerCenter, Apache Spark, AWS Glue, Hadoop, Snowflake, Data Modeling, ETL Design
Orchestration & Workflow: Apache Airflow, Jenkins
Cloud Platforms: AWS (S3, EMR, Glue, Redshift, Lambda, Athena)
Version Control & DevOps: Git, Docker
Methodologies: Agile, Scrum