Naga Sai Nikhil Ramisetty
Data Engineer
682-***-**** ******.*@***********.*** LinkedIn
SUMMARY
Data Engineer with 3+ years of experience architecting robust ETL pipelines and optimizing cloud-based data ecosystems. Expert in Python, SQL, and distributed processing frameworks to transform complex datasets into actionable intelligence. Proficient in implementing data governance protocols, automating monitoring solutions, and reducing processing times through performance optimization. Skilled in CI/CD integration and cross-functional collaboration to deliver business-critical analytics infrastructure. SKILLS
Methodologies: SDLC, Agile, Waterfall
Programming Languages: Python, SQL, R
Libraries: TensorFlow, PyTorch, Scikit-learn, Keras, Pandas, NumPy, SciPy Big Data and ETL Tools: PySpark, Apache Kafka, Hadoop (HDFS, Hive), Apache Airflow, dbt Databases & Warehousing: MySQL, PostgreSQL, SQL Server, Snowflake, MongoDB Data Visualization: Tableau, Power BI, Advanced Excel, Matplotlib Cloud Platforms: AWS (S3, Glue, Redshift), Azure (Data Factory, Databricks, Blob Storage) Version Control and CI/CD: Git, GitHub, Bitbucket, Azure DevOps, Jenkins Infrastructure as Code and Containerization Tools: Terraform, Docker, Kubernetes Soft Skills: Problem-Solving, Effective Communication, Cross-Functional Collaboration, Attention to Detail, Stakeholder Management EDUCATION
Master of Science in Data Science University of Texas at Arlington May 2024 Bachelor of Technology Anil Neerukonda Institute of Technology and Sciences, India May 2022 WORK EXPERIENCE
Data Engineer PNC, Texas January 2024 - Current
• Developed and maintained ETL pipelines using Python, SQL, and Apache Airflow, achieving 99.8% uptime for critical financial data workflows. Automated data processing tasks using AWS Glue to reduce manual intervention and ensure reliability.
• Collaborated across teams within Agile framework to deliver structured datasets for fraud detection, using dbt for SQL based transformations in Snowflake while implementing data quality checks with Python testing frameworks
• Implemented data security protocols for sensitive information using role-based access controls and encryption standards, maintaining compliance with financial regulations and documenting security processes using Git version control.
• Optimized SQL Server and PostgreSQL databases, reducing query execution times and improving cloud infrastructure costs through performance tuning and AWS Redshift partition strategies
• Architected serverless data processing workflows using AWS Lambda for event-driven transformations and S3 for cost-effective data lake storage, implementing automated data lifecycle policies and cross-region replication for disaster recovery.
• Built real-time financial market data pipeline utilizing Apache Kafka for ingestion and PySpark on Databricks for distributed processing, enabling near-instantaneous analytics visualized through Tableau dashboards
• Automated data quality monitoring with Python and Airflow, implementing CI/CD pipelines via GitHub for version control and deployment, reducing data incidents by 60%.
Data Engineer Sage Softtech, India May 2021 - December 2022
• Engineered ETL/ELT pipelines using Apache Airflow and SQL, reducing data processing time while leveraging Python scripts for data transformation tasks and ensuring consistent delivery for critical reporting systems.
• Migrated legacy Oracle databases to Azure Synapse Analytics with zero downtime, implementing optimized schema design and SQL performance tuning that improved query response time by 45%.
• Built scalable data ingestion pipelines with Python and Azure Data Factory, automating integration of structured and unstructured data from 10+ sources into centralized warehouses, boosting downstream data availability.
• Established data quality validation frameworks using dbt tests and custom Python scripts, reducing data inconsistencies by 70% and ensuring reliable data feeds for Power BI dashboards, while implementing Git for version control of all validation code.
• Collaborated within Agile environment to translate business requirements into technical specifications, creating PostgreSQL data models and documentation that reduced development cycles.
• Implemented scalable data architectures using Azure Blob Storage for storage and PySpark for processing, enabling efficient handling of growing datasets while using GitHub Actions for CI/CD automation of pipeline deployments. PROJECTS
Multiple Linear Regression on Life Expectancy of People, (Python, Apache Airflow, Matplotlib, Seaborn, Pandas, Scikit-learn): Developed life expectancy prediction model with automated ETL pipeline processing WHO health data. Implemented PostgreSQL data modeling, comprehensive preprocessing, and Multiple Linear Regression with ANOVA validation for improved accuracy. Railway Reservation System (Python, Flask, SQLite, MySQL, PostgreSQL, Docker, Apache Kafka): Developed train reservation system with real-time streaming, ETL pipelines, star schema warehousing, Flask APIs, secure authentication, database optimization, automated batch processing, improving search time and transaction speed. CERTIFICATIONS
• Programming for Everybody (Getting Started with Python), Coursera
• Python for Data Science, IBM
• Data Science using Python Programming, 360DigiTMG
• Algorithmic Toolbox, Courser