Data Analyst Machine Learning

Location:

India

Posted:

May 27, 2025

Contact this candidate

Resume:

DANIEL NEHEMIAH PETER KATAM

DATA ANALYST

NY +1-929-***-**** *************@*****.*** LinkedIn eportfolio SUMMARY

A skilled Data Analyst with around 3 years of experience in ETL pipeline development, data modeling, and predictive analytics across finance and industrial domains. Proficient in SQL, Python, and Apache Spark, optimizing data processing with Snowflake, PostgreSQL, and AWS Redshift for enhanced business intelligence. Experienced in building efficient data pipelines using SSIS, Talend, and Apache NiFi, ensuring data accuracy and compliance. Well-versed in machine learning techniques for predictive maintenance, fraud detection, and financial forecasting. Expertise in Power BI and Tableau for interactive dashboards, enabling real-time KPI monitoring, anomaly detection, and trend analysis.

EXPERIENCE

BNY Mellon Feb 2025 - Current

Data Analyst NY

Developed AI-driven automation solutions, leveraging NLP and LLMs to enhance trade monitoring and anomaly detection, reducing financial transaction discrepancies by 28%.

Preprocessed large datasets using Microsoft Excel (VLOOKUP, Pivot Tables, Power Query) to remove duplicates, standardize formats, and handle missing values, improving data integrity by 35%.

Implemented Retrieval-Augmented Generation (RAG) techniques to streamline data retrieval processes, improving real-time risk assessment and decision-making.

Pioneered interactive Tableau dashboards, translating raw data into actionable insights, enabling stakeholders to track engagement trends and improve customer experience strategies by 20%. KPMG Jul 2021 - Dec 2022

Data Analyst India

Developed ETL pipelines using SQL Server Integration Services (SSIS) and Apache NiFi, processing 1M+ transactional records monthly, improving data ingestion efficiency by 35%.

Pioneered predictive analytics models using machine learning (Regression, K-Means Clustering, Random Forest) to forecast revenue fluctuations, improving business decision-making accuracy by 20%.

Designed Power BI dashboards to track key financial KPIs such as revenue growth, risk exposure, and policy churn rates, automating manual reporting and reducing turnaround time by 50%.

Optimized data modeling using Snowflake and PostgreSQL, restructuring schemas and implementing indexing strategies, decreasing query execution time by 30%.

Led a data cleaning initiative using Pandas and PySpark, eliminating anomalies and ensuring 88% data accuracy across financial reporting systems.

Orchestrated data governance compliance with GDPR and internal policies, ensuring 100% adherence to regulatory standards through automated validation scripts.

Conducted exploratory data analysis (EDA) with Matplotlib and Seaborn, identifying fraudulent transaction patterns, enhancing fraud detection by 15%.

Led A/B testing initiatives using R and Python, analyzing customer behavior and optimizing marketing strategies, leading to 12% improvement in customer retention rates.

Integrated AWS Redshift and Glue for scalable data storage and transformation, optimizing real-time processing pipelines and reducing latency by 25%.

Collaborated cross-functionally with business stakeholders to translate business needs into analytical solutions, leading to 3X improvement in KPI tracking.

Vedanta Jul 2020 - Jun 2021

Technical Data Analyst India

Engineered ETL pipelines using Talend and MySQL, processing 50K+ sensor readings per day, ensuring accurate tracking of equipment health and operational performance.

Conducted in-depth data mining using Python (NumPy, Scikit-learn, PySpark) to analyze thermal efficiency fluctuations in metallurgical processes, reducing energy waste by 10%.

Conducted Root Cause Analysis (RCA) and Failure Mode Effect Analysis (FMEA), identifying key process inefficiencies and reducing failure rates by 15%.

Optimized industrial process monitoring dashboards in Tableau, tracking temperature, pressure, and production efficiency, reducing system lag and enabling faster anomaly detection.

Implemented data wrangling techniques in TensorFlow, standardizing unstructured sensor data from multiple manufacturing units, increasing data accuracy by 30%.

Deployed SQL-based anomaly detection models to identify inconsistencies in supply chain logistics, enhancing procurement efficiency and reducing wastage by 18%.

Supported mechanical engineering teams by leveraging Python-based automation scripts to streamline equipment performance testing, reducing evaluation time by 30%. TECHNICAL SKILLS

Programming Languages: Python, SQL, R

Data Processing & Libraries: NumPy, Pandas, Scikit-learn, TensorFlow, Keras, PyTorch, Apache Spark, Apache NiFi, Matplotlib, Seaborn, SciPy, PySpark

Machine Learning & Predictive Analytics: Linear Regression, Logistic Regression, Decision Tree, SVM, Naïve Bayes, KNN, K-Means, Random Forest, Gradient Boosting, Data Augmentation Techniques, Exploratory Data Analysis (EDA) Big Data & Databases: Snowflake, MySQL, MongoDB, PostgreSQL, AWS Redshift, SQL Server, SSIS, SSAS, ETL, Talend Data Visualization & Reporting: Tableau, Power BI, MS Excel (VLOOKUP, Pivot Tables, Macros, VBA) Cloud Technologies: AWS (S3, EC2, Lambda, Glue), Azure Data Management & Automation: Data Warehousing, Data Governance, Data Cleaning, Data Wrangling, Data Modeling, Data Mining, Automation Scripts, Root Cause Analysis (RCA), Failure Mode Effect Analysis (FMEA) Methodologies: SDLC, Agile, Jira

EDUCATION

Pace University, Seidenberg School of Computer Science and Information Systems New York, NY Master of Science (MS) in Data Science Concentration: Data Science GPA: 3.9 Jan 2023 – Dec 2024 University of Texas, McCombs School of Business Austin, TX Post-graduation Certification Concentration: Data Science and Business Analytics GPA: 3.17 Jawaharlal Nehru Technological University India

Bachelor of Technology in Mechanical Engineering May 2016 – Dec 2020 ACADEMIC PROJECTS

Heart Failure Prediction in Clinical Patients

Engineered machine learning models using Python (Logistic Regression, Random Forest) to predict heart failure outcomes with 78% accuracy, integrating a Power BI dashboard with ELI5 and SHAP to analyze 10K+ patient records, enhancing clinical decision-making efficiency by 30%. Stock Market Analysis

Built a cloud-based stock prediction app using Apache Spark, Scrapy, and AWS/OCI, deploying Random Forest and Neural Networks for real-time analytics, while implementing automated web scraping to analyze market trends and news sentiment, improving predictive accuracy by 45%. Visualizing Mental Health Insurance Claims and Reimbursements Using Tableau

Developed interactive Tableau dashboards and processed healthcare data using Python to analyze claim frequency, reimbursement rates, and service utilization, uncovering trends in mental health service accessibility and insurance coverage.

Pneumonia Detection in Chest X-Rays

Built a CNN-based deep learning pipeline for chest X-ray classification, achieving a 0.98 AUC, enhancing early pneumonia detection and improving diagnostic accuracy in healthcare.

Contact this candidate