Nikith Reddy Palugulla — Senior Data Engineer
610-***-**** ****************@*****.***
PROFESSIONAL SUMMARY:
Leveraging 7 years of extensive experience as a Senior Data Engineer, specializing in robust Linux-based data warehousing solutions and advanced ETL processes.
Proficiently implemented and managed critical Linux infrastructure, enhancing system architecture for optimal data processing efficiency and reliability.
Demonstrated expertise in Shell Scripting and Oracle development, designing and optimizing complex relational database systems including Oracle Exadata.
Expertly enhanced ETL/database load/extract processes using Informatica and advanced Python scripting for seamless data integration and performance.
Skilled in developing and maintaining various Linux-based toolsets, automated scripts, and critical batch jobs to streamline operational workflows effectively.
Possesses practical working experience with Unix file systems, including mount types, permissions, standard tools, and process piping for secure data handling.
Orchestrated intricate data pipelines using Apache Airflow with Python, ensuring reliable and scalable data delivery across diverse platforms.
Adept at applying Agile methodologies to drive continuous process improvement and deliver high-quality data warehousing solutions efficiently.
Committed to automation and passionate about continually improving data warehouse architectures to meet evolving business demands and requirements.
EDUCATION:
Master of Science in Computer Science @ SUNY at New Paltz WORK EXPERIENCE:
Senior Data Engineer @ GAF Materials Corporation Parsippany, NJ Mar 2024 – Present
Architected and managed scalable Linux-based infrastructure on AWS, optimizing data warehousing solutions for high performance and reliability.
Developed intricate Shell Scripts to automate routine tasks, including data extracts, loads, and system health checks across Linux environments.
Designed and implemented robust ETL pipelines using PySpark on AWS EMR, enhancing data load/extract processes from various source systems.
Interfaced directly with Oracle and other relational databases on RDS, performing advanced SQL development and query optimization for data warehousing.
Utilized AWS Glue for serverless ETL operations, developing Python-based scripts to cleanse and transform large datasets efficiently.
Orchestrated complex data workflows using Apache Airflow with Python, ensuring timely and dependent execution of data ingestion and processing jobs.
Managed data storage in Amazon S3 using Parquet and Avro formats, implementing efficient partitioning strategies for improved query performance.
Implemented data warehousing solutions using Amazon Redshift, optimizing schema designs and leveraging columnar storage for analytical queries.
Monitored and managed Linux-based processes within AWS, implementing system improvements to enhance stability and resource utilization.
Performed rigorous data quality validations and reconciliation, ensuring data integrity across all warehousing layers.
Collaborated with cross-functional teams, applying Agile methodologies to iteratively develop and deploy data solutions.
Implemented CI/CD pipelines using Jenkins for automated deployment of Python scripts and infrastructure changes. Technologies Used: AWS (S3, EMR, Redshift, Glue, Lambda, RDS), Linux, Oracle, PySpark, Shell Scripting, Python, Airflow, Tableau, Git, Jenkins
Data Engineer @ CVS Health Irving, TX Sep 2023 – Mar 2024
Implemented and configured Linux-based processes within Azure environments, optimizing infrastructure for data warehousing initiatives.
Developed sophisticated Shell Scripts for managing data movement, file system operations, and automated job execution on Azure Linux VMs.
Designed and executed robust ETL pipelines using Azure Data Factory, enhancing database load/extract processes from diverse sources.
Performed complex data transformations and modeling using Databricks with PySpark, operating efficiently within a Linux-centric compute environment.
Migrated large volumes of on-premise data to Azure Data Lake Storage (ADLS), ensuring data integrity and optimizing storage structures.
Integrated with relational databases like Azure SQL Database, performing advanced SQL development and tuning similar to Oracle systems.
Built and optimized data models using Azure Synapse Analytics, providing scalable solutions for large-scale data warehousing requirements.
Applied data cleansing and transformation logic using Python, contributing to system improvements for data quality and consistency.
Orchestrated data pipelines using Azure Data Factory triggers and scheduled events, managing complex dependencies effectively.
Implemented robust security controls including Role-Based Access Control (RBAC) to secure sensitive data assets.
Maintained version control using Git, collaborating with development teams to ensure efficient code management.
Collaborated closely with business stakeholders and utilized Agile methodologies to gather requirements and deliver data solutions.
Technologies Used: Azure (ADLS, Data Factory, Databricks, Synapse Analytics), Linux, Shell Scripting, Oracle (via Azure SQL DB concept), PySpark, Python, SQL, Git
Data Engineer @ UPS Atlanta, GA Dec 2021 – Sep 2023
Developed and optimized complex ETL pipelines using Spark (Scala and Python) on a Hadoop ecosystem, enhancing data processing capabilities.
Managed and configured Linux-based environments for Hadoop clusters, ensuring optimal performance and resource allocation.
Wrote intricate Shell Scripts to manage cluster operations, automate data transfers, and monitor system health for large data volumes.
Designed and implemented data warehousing solutions using Hive, applying advanced partitioning and bucketing strategies for efficient querying.
Processed large datasets on Hadoop Distributed File System (HDFS), ensuring data integrity and high availability.
Integrated real-time streaming data using Kafka, developing robust consumers for continuous data ingestion.
Developed aggregation and transformation logic using Spark, significantly enhancing data quality and preparing data for analytical consumption.
Orchestrated complex data workflows and dependencies using Apache Airflow with Python, improving pipeline reliability and transparency.
Supported analytics teams by providing cleansed and structured data, facilitating robust reporting and business intelligence initiatives.
Implemented logging mechanisms using Log4j, enabling comprehensive monitoring and troubleshooting of data pipelines.
Followed Agile and Scrum methodologies rigorously, contributing to sprint planning and daily stand-ups for project success.
Utilized JIRA for efficient task tracking, backlog management, and transparent project communication within the team. Technologies Used: Hadoop, Spark (Scala, Python), Hive, Kafka, Airflow, Linux, Shell Scripting, Git, JIRA Junior Data Engineer @ Wizard Tech Solutions Piscataway, NJ Aug 2020 – Dec 2021
Developed and maintained complex ETL workflows using Informatica PowerCenter, significantly enhancing data load/extract processes.
Designed and implemented data pipelines for integrating information from various source systems into target data warehouses.
Worked extensively with Oracle and SQL Server databases, performing advanced SQL development and query tuning.
Wrote comprehensive Shell Scripts on Unix/Linux platforms to automate job scheduling, file transfers, and system monitoring tasks.
Performed rigorous data validation and reconciliation processes to ensure the accuracy and consistency of migrated data.
Created detailed Informatica mappings, transformations, and workflows to achieve precise data manipulation requirements.
Implemented robust data quality checks within ETL processes, identifying and resolving discrepancies proactively.
Supported batch processing operations, ensuring timely delivery of critical data for business reporting.
Documented all ETL processes, data flows, and technical specifications, maintaining clear and concise project documentation.
Collaborated with senior engineers to gather and analyze business requirements, translating them into technical specifications.
Utilized Git for version control, managing code repositories and facilitating collaborative development efforts.
Contributed to system architecture improvements, identifying opportunities for automation and process optimization. Technologies Used: Informatica, Oracle, SQL Server, Unix, Shell Scripting, SQL, Git ETL Developer @ TCS Hyderabad, India May 2018 – Dec 2019
Developed and enhanced ETL processes using Informatica PowerCenter, ensuring efficient data extraction and loading.
Worked extensively with Oracle databases, writing advanced SQL queries, stored procedures, and functions.
Created detailed mappings and workflows within Informatica to support data integration and migration projects.
Performed comprehensive testing and validation of ETL processes, ensuring data accuracy and compliance with business rules.
Supported critical data migration activities, contributing to successful system transitions and upgrades.
Resolved production issues promptly, analyzing logs and data discrepancies to minimize system downtime.
Contributed to basic Shell Scripting tasks for job scheduling and file manipulation on Unix environments.
Utilized Jenkins for automated deployments of Informatica workflows and database scripts, improving release efficiency.
Collaborated with team members to understand data requirements and translate them into technical solutions.
Maintained detailed documentation for all developed ETL components and operational procedures.
Assisted in performance tuning of SQL queries and Informatica mappings to optimize data processing times.
Participated in team meetings, contributing to project planning and problem-solving discussions. Technologies Used: Informatica, Oracle, SQL, Unix, Shell Scripting, Jenkins TECHNICAL SKILLS:
Programming & Scripting: Python, Scala, SQL, Shell Scripting, Perl
Databases & Data Warehousing: Oracle, Oracle Exadata, PostgreSQL, MySQL, Hive, Snowflake, Amazon Redshift, Azure Synapse Analytics
ETL & Orchestration Tools: Informatica PowerCenter, Apache Airflow, AWS Glue, Azure Data Factory, Apache Spark, Databricks
Operating Systems & Utilities: Linux, Unix, Docker
Cloud Platforms: AWS (S3, EMR, Redshift, Glue, Lambda, RDS), Azure (ADLS, Data Factory, Synapse Analytics)
Version Control & DevOps: Git, Jenkins
Methodologies & Tools: Agile, Scrum, JIRA, Confluence, Tableau