Data Analyst Machine Learning

Location:

Jersey City, NJ

Posted:

June 16, 2025

Contact this candidate

Resume:

Snehith Chelamallu

Data Analyst

+1-940-***-**** **********@*****.***

PROFESSIONAL SUMMARY

Experienced Data Analyst with four years of experience in data analysis, data science, machine learning, and data visualization tools. Demonstrated success in building scalable data pipelines, developing predictive models, and deriving insights from large datasets. Proficient in statistical analysis, data preprocessing, and advanced SQL scripting to enhance business operations and efficiency. Looking to apply Python skills, machine learning knowledge, and data visualization expertise in a progressive organization.

TECHNICAL SKILLS

Programming Languages: Python, SQL (advanced), SAS, C++, Java

Machine Learning: Regression, Classification, Clustering, Ensemble Learning, Neural Networks, Deep Learning, NLP, Generative AI

Data Manipulation: NumPy, Pandas, PySpark

Visualization Tools: Tableau, Power BI, Quicksight, Matplotlib, Seaborn, Plotly, Excel(Pivot tables, VLookup index match, HLookup)

Big Data & Cloud: AWS (S3, Redshift, Glue, SageMaker, IAM), Hadoop, Hive, Spark, Databricks, Snowflake

Databases: MySQL, PostgreSQL, MongoDB, Cassandra, SQLite

DevOps 7 CI/CD: Docker, Kubernetes, Jenkins, Git

Build Tools and IDE’s: VS Code, Visual Studio, Eclipse

API Documentation Tools: Postman

Other Skills: Data structures and algorithms, Linux, A/B Testing, Statistical Hypothesis Testing, Time Series Analysis, Feature Engineering, Data Wrangling, Predictive Modeling, Dimensionality Reduction, Text Analytics

EDUCATION

Master of Science in Information systems and technologies, GPA: 3.9 Jan 2022 – Dec 2023 University of North Texas

Coursework: Big Data Analytics, Data Visualization for Analytics, Enterprise Data Warehousing, Data Mining, Enterprise Applications of Business Intelligence, Predictive Analytics & Business Forecasting, Foundations of Database Management, Programming Languages for Business Analytics.

PROFESSIONAL EXPERIENCE

Data Analyst JP Morgan Chase Feb 2024 – Present

Spearheaded the development of ETL pipelines using Python, AWS Redshift, Glue, and S3, ensuring seamless integration of loan repayment data across systems.

Developed advanced SQL queries and views for real-time reporting in Tableau, enabling the creation of dynamic dashboards that reduced stakeholder interpretation time by 20%.

Applied machine learning algorithms (Random Forest, Logistic Regression) for customer behavior prediction, achieving a 15% improvement in accuracy and increasing quarterly revenue by $50,000.

Extracted and transformed data from SQL databases and AWS S3, Redshift, enabling advanced analysis for loan repayment strategies and improved collection rates by 12%.

Executed predictive modeling for loan default risks using Python (Scikit-learn) and Tableau, achieving a precision score of 92%.

Tech Stack: Python, SQL, Tableau, AWS (Redshift, Glue,S3), Scikit-learn, Pandas, NumPy, Plotly, seaborn, Matpplotlib, Pyspark

Data analyst Sun Micro Info solutions Jul 2020 – Oct 2021

Conducted detailed analysis of sales trends, inventory data, and regional performance using Python libraries (Pandas, NumPy, Matplotlib), enabling a 15% increase in annual revenue.

Built advanced models using Random Forests to predict CI/CD pipeline behavior, improving release reliability by 20%.

Implemented predictive models to refine pricing strategies, boosting revenue growth by 12%.

Produced reports on inventory levels and customer satisfaction scores, leveraging Tableau, python for visualization and SQL for querying databases.

Spearheaded a predictive maintenance system with Random Forest models and AWS SageMaker, improving on-time maintenance execution by 25% and reducing downtime by 20%.

Tech Stack: Python, SQL, Tableau, Scikit-learn, AWS SageMaker, Pandas, Matplotlib

Data Analyst Ample technologies June 2018 – July 2020

Designed and implemented a modular solution to identify erratic test cases within the CI/CD pipeline, improving testing reliability by 20%.

Developed advanced Python scripts to analyze test case execution history, leading to a 20% increase in testing accuracy and reliability.

Created informative reports and dashboards in Tableau, providing actionable insights to project teams and stakeholders.

Developed machine learning models to analyze test case behavior and predict intermittency, leading to more efficient CI/CD pipelines.

Led data cleaning and feature scaling efforts, ensuring data integrity and accuracy in predictive modeling.

Optimized database performance, resulting in a 15% reduction in query execution time and improved data accessibility

PERSONAL PROJECTS

Chronic Kidney disease prediction

Developed a predictive model to assist in diagnosing chronic kidney disease using supervised learning algorithms, including Logistic Regression, Random Forest, and SVM.

Carried out extensive data preprocessing with Pandas to handle missing values, normalize data, and encode categorical variables.

Optimized the model's performance using feature engineering, hyperparameter tuning with GridSearchCV, and cross-validation to achieve an accuracy improvement of 25%.

Combined top-performing models into an ensemble using a single neuron called perceptron further boosting prediction efficiency by 20%.

Visualized insights into the dataset and feature importance using Matplotlib and Seaborn

Environment: Python, Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, Jupyter Notebook

IPL Score and Performance Analysis Dashboard

Launched an interactive Tableau dashboard focusing on IPL match statistics; integrated data points that facilitated more informed discussions during team strategy meetings, thus enhancing overall performance review processes.

Aggregated and cleaned data using SQL and Python, ensuring accuracy and consistency for effective visualization.

Generated advanced filters and interactive charts, allowing users to explore data by teams, players, venues, and match types.

Highlighted metrics like points tables, player contributions, and dismissal types, enabling stakeholders to identify key insights and patterns.

Environment: Tableau, Python, Pandas, SQL

Customer Churn prediction

Engineered a machine learning model to predict customer churn for subscription-based services using Logistic Regression and Random Forest.

Executed comprehensive exploratory data analysis (EDA) to uncover trends in customer behaviors and their correlation with churn.

Enhanced prediction accuracy by implementing feature selection, data normalization, and hyperparameter tuning.

Created a user-friendly Tableau dashboard to visualize customer segments and risk scores, providing actionable insights for churn reduction strategies.

Environment: Python, Pandas, NumPy, Scikit-learn, Seaborn, Tableau

Contact this candidate