Data Engineer

Location:

Bayonne, NJ

Posted:

March 30, 2025

Contact this candidate

Resume:

Akula Nithish

Jersey City, NJ 201-***-**** ****************.*****@*****.***

SUMMARY

Data Engineer with 5+ years of experience designing and implementing scalable data solutions in finance and e-commerce, handling datasets exceeding 10TB. Expertise in ETL development using PySpark and Python, optimizing data pipelines to reduce processing time by 40%. Strong background in data governance, cloud-based data processing, and real-time data replication in AWS, improving system reliability for thousands of users.

EDUCATION

Pace University New York, NY

M.S. in Data Science, Machine Learning Aug 2023 – Dec 2024 PROJECTS

Pace University New York, NY

Amazon Customer Reviews Analysis Aug 2024 – Dec 2024

Conducted sentiment analysis on 130M+ Amazon reviews using NLP and machine learning.

Achieved 85% accuracy in sentiment classification using Logistic Regression & DistilBERT.

Performed EDA, tokenization, and feature engineering to preprocess text data, reducing noise by 30% and improving model accuracy by 15%.

Analyzed sentiment distribution (67% Positive, 23.7% Negative, 9.3% Neutral) and visualized insights. Big Data Flight Analytics Pipeline Jan 2024 – May 2024

Built a scalable data pipeline using HDFS and Hive to process 10TB+ flight data, improving efficiency by 40%.

Ingested structured data from Harvard Dataverse into Hive tables, reducing query response time by 30%.

Analyzed flight delays via Hive SQL, identifying top 3 airports and carriers with highest delays, enhancing insights by 20%.

Optimized ETL workflows with HDFS and Hive, enabling 50% faster data retrieval and improved reporting. EXPERIENCE

AWS Data Engineer May 2021 – Aug 2023

Tata Consultancy Services Hyderabad, India

Designed and implemented end-to-end data pipelines to migrate on-premises databases and file-based sources to AWS S3, leveraging Attunity Replicate for real-time data replication.

Developed PySpark-based ETL processes to standardize and transform high-volume datasets for analytics and reporting.

Automated data filtering mechanisms using Python, reducing S3 storage costs by 20%, saving $480M and optimizing EMR processing efficiency.

Built and maintained Hive tables on AWS S3, enabling efficient querying and analytics for business users.

Built PySpark ETL pipelines to standardize and transform 1TB+ daily data, improving analytics efficiency by 35%. Tableau Developer Nov 2018 – May 2021

Edge Solutions pvt ltd. Bangalore, India

Optimized Tableau Server on AWS, managing 500+ users and enforcing security, improving access control by 30%.

Monitored Tableau performance using AWS CloudWatch, reducing dashboard load times by 30%.

Automated rehydration across dev, SAT, and prod, cutting downtime by 50% and enhancing data reliability.

Supported Tableau Data Management, optimizing Tableau Catalog & Prep Conductor, improving data processing by 25%. TECHNICAL SKILLS

Programming Languages: Python, PySpark, SQL, BASH

Cloud Technologies: AWS (S3, EMR, EC2, RDS, CloudWatch, Service Catalog, Kafka)

Big Data & Data Processing: Apache Hive, HDFS, Spark, Qlik Replicate

Machine Learning & Deep Learning: TensorFlow, PyTorch, Keras, Scikit-learn

Data Science Libraries: NumPy, Pandas, NLTK

Data Visualization & BI Tools: Tableau Server, Tableau Prep

DevOps & CI/CD: Docker, Git, Bitbucket, Control-M, CI/CD CERTIFICATIONS

Google Analytics (Credential ID: 205157802 — Expired: Sep 2024)

Contact this candidate