Akula Nithish
Jersey City, NJ 201-***-**** ****************.*****@*****.***
SUMMARY
Data Engineer with 5+ years of experience designing and implementing scalable data solutions in finance and e-commerce, handling datasets exceeding 10TB. Expertise in ETL development using PySpark and Python, optimizing data pipelines to reduce processing time by 40%. Strong background in data governance, cloud-based data processing, and real-time data replication in AWS, improving system reliability for thousands of users.
EDUCATION
Pace University New York, NY
M.S. in Data Science, Machine Learning Aug 2023 – Dec 2024 PROJECTS
Pace University New York, NY
Amazon Customer Reviews Analysis Aug 2024 – Dec 2024
Conducted sentiment analysis on 130M+ Amazon reviews using NLP and machine learning.
Achieved 85% accuracy in sentiment classification using Logistic Regression & DistilBERT.
Performed EDA, tokenization, and feature engineering to preprocess text data, reducing noise by 30% and improving model accuracy by 15%.
Analyzed sentiment distribution (67% Positive, 23.7% Negative, 9.3% Neutral) and visualized insights. Big Data Flight Analytics Pipeline Jan 2024 – May 2024
Built a scalable data pipeline using HDFS and Hive to process 10TB+ flight data, improving efficiency by 40%.
Ingested structured data from Harvard Dataverse into Hive tables, reducing query response time by 30%.
Analyzed flight delays via Hive SQL, identifying top 3 airports and carriers with highest delays, enhancing insights by 20%.
Optimized ETL workflows with HDFS and Hive, enabling 50% faster data retrieval and improved reporting. EXPERIENCE
AWS Data Engineer May 2021 – Aug 2023
Tata Consultancy Services Hyderabad, India
Designed and implemented end-to-end data pipelines to migrate on-premises databases and file-based sources to AWS S3, leveraging Attunity Replicate for real-time data replication.
Developed PySpark-based ETL processes to standardize and transform high-volume datasets for analytics and reporting.
Automated data filtering mechanisms using Python, reducing S3 storage costs by 20%, saving $480M and optimizing EMR processing efficiency.
Built and maintained Hive tables on AWS S3, enabling efficient querying and analytics for business users.
Built PySpark ETL pipelines to standardize and transform 1TB+ daily data, improving analytics efficiency by 35%. Tableau Developer Nov 2018 – May 2021
Edge Solutions pvt ltd. Bangalore, India
Optimized Tableau Server on AWS, managing 500+ users and enforcing security, improving access control by 30%.
Monitored Tableau performance using AWS CloudWatch, reducing dashboard load times by 30%.
Automated rehydration across dev, SAT, and prod, cutting downtime by 50% and enhancing data reliability.
Supported Tableau Data Management, optimizing Tableau Catalog & Prep Conductor, improving data processing by 25%. TECHNICAL SKILLS
Programming Languages: Python, PySpark, SQL, BASH
Cloud Technologies: AWS (S3, EMR, EC2, RDS, CloudWatch, Service Catalog, Kafka)
Big Data & Data Processing: Apache Hive, HDFS, Spark, Qlik Replicate
Machine Learning & Deep Learning: TensorFlow, PyTorch, Keras, Scikit-learn
Data Science Libraries: NumPy, Pandas, NLTK
Data Visualization & BI Tools: Tableau Server, Tableau Prep
DevOps & CI/CD: Docker, Git, Bitbucket, Control-M, CI/CD CERTIFICATIONS
Google Analytics (Credential ID: 205157802 — Expired: Sep 2024)