Data Analyst Machine Learning

Location:

Jersey City, NJ, 07306

Salary:

80000

Posted:

September 10, 2025

Contact this candidate

Resume:

Siddarth Reddy

+* (***) *** - **** ****************@*****.*** Folsom, CA https://www.linkedin.com/in/siddarth-reddy-45693637a/

PROFESSIONAL SUMMARY

Data Analyst with 3+ years of experience in data engineering, advanced analytics, and machine learning, with a proven ability to transform complex datasets into actionable insights that deliver measurable business impact. As a highly driven and detail-oriented Data Analyst with a strong foundation in data engineering, analytics, and machine learning, I bring a proven track record of transforming complex datasets into actionable insights that drive business impact. Skilled in Python, R, e, and advanced ETL pipeline design using tools like Talend, Apache Spark, and Kafka, I have delivered scalable, real-time analytics solutions that optimize operations and forecasting accuracy. My experience spans developing interactive dashboards in Power BI and Tableau, building predictive models with Scikit-learn, and ensuring data quality through governance tools like Collibra and Great Expectations. Having worked on large-scale projects at FedEx, Accenture, and Q Info Solutions, I thrive in fast-paced environments where innovation meets execution. Backed by a Master’s in Computer Science and multiple industry certifications, I am passionate about leveraging data to solve problems, enhance decision-making, and deliver measurable business value.

TECHNICAL SKILLS

Technical Category

Technical Skills

Programming & Scripting

Python, R, SQL, Bash, DAX, Pandas

Data Engineering

Talend, Apache Spark, Apache Kafka, dbt, ETL Pipelines

Data Visualization

Power BI, Tableau, Matplotlib, KPI Dashboards

Machine Learning & AI

Scikit-learn, NLTK, Predictive Modeling, Clustering Algorithms

Cloud Platforms

Azure Data Lake, Google Cloud Platform, Snowflake

Data Governance & Quality

Collibra, Alation, Great Expectations

Tools & Collaboration

Git, Jira, Postman, Docker, GitHub CI/CD

Databases & APIs

SQL Server, Snowflake, SharePoint, API Integration

EXPERIENCE

FedEx May 2025 – Present

Data Analyst Memphis, TN

Developed and optimized complex SQL Server queries to merge tracking, delivery, and customer datasets, enabling faster generation of Power BI performance dashboards.

Automated ETL workflows in Talend to extract shipment data, transform it with Python Pandas for cleaning and deduplication, and load it into SQL Server for analytics.

Configured Apache Kafka pipelines to capture real-time package scan events and store them in Azure Data Lake for historical trend and anomaly analysis.

Built predictive delivery delay models using Scikit-learn on Apache Spark-processed datasets enriched with weather and customs clearance data.

Created Tableau executive-level reports using R-based statistical analysis to identify seasonal shipping patterns and optimize staffing during peak demand.

Embedded Great Expectations into ETL processes to detect missing or inconsistent shipment data before loading into the central data warehouse.

Implemented Collibra and Alation for enterprise data governance and data cataloging, improving dataset discoverability and compliance management.

Designed real-time Power BI dashboards integrating SQL Server live queries and Kafka streams to monitor delivery performance by geography.

Conducted exploratory analysis in Jupyter Notebook using Python and R to identify delivery route inefficiencies and propose optimization strategies.

Processed billions of tracking logs in Apache Spark during high-volume shipping periods, ensuring timely reporting to operations teams.

Integrated external APIs via Postman to pull weather and customs data, enhancing the accuracy of delivery delay prediction models.

Managed sprint planning and progress tracking in Jira while using Git for collaborative version control across the analytics team.

Utilized Azure Data Lake to store and retrieve raw and curated shipment datasets, enabling scalable historical analysis and model training.

Partnered with operations managers to translate dashboard insights into route and capacity adjustments, improving on-time delivery rates.

Established end-to-end data quality monitoring using Collibra and Great Expectations to maintain accuracy, completeness, and compliance across all datasets.

Accenture Jul 2021 – Jun 2023

SQL Developer India

Designed and implemented an end-to-end sales data pipeline using Talend, Azure Data Lake, and SQL Server, ensuring automated daily ingestion, cleaning, and integration of multi-source retail data for instant accessibility.

Developed advanced analytical scripts in Python integrated with Apache Spark to process and analyze over 500M+ sales records, reducing big data processing time from hours to minutes.

Created interactive Power BI dashboards connected to Snowflake and Azure Data Lake, enabling real-time sales, inventory, and promotion performance monitoring across multiple retail regions.

Built predictive sales forecasting models in Scikit-learn using cleaned and transformed datasets from dbt and SQL Server, achieving 90%+ accuracy in seasonal demand predictions.

Orchestrated large-scale data transformation workflows using dbt and Python, standardizing raw supplier and loyalty program data for advanced trend and customer segmentation analysis.

Leveraged R and Apache Spark to perform statistical analysis on nationwide sales trends, uncovering high-impact promotional strategies that boosted weekend sales by 25%.

Integrated weather API data using Postman-tested endpoints into Python analytics pipelines, enabling correlation analysis between climate conditions and regional purchasing patterns.

Containerized Python-based analytics environments with Docker and GitHub CI/CD pipelines, ensuring consistent deployment of machine learning and visualization solutions across teams.

Automated data quality checks and database backup processes using Bash scripts and SQL Server stored procedures, maintaining 99.9% data availability and integrity.

Designed and managed a Snowflake-based retail data warehouse, combining in-store, online, and third-party datasets for unified executive reporting in Power BI.

Collaborated cross-functionally via Jira to track project milestones, manage data engineering tasks, and coordinate machine learning model deployments, ensuring on-time delivery of analytics solutions.

Applied Apache Spark with Scikit-learn to train and deploy large-scale predictive models for inventory optimization, reducing overstock by 18% and minimizing stockouts during peak demand periods.

HCL Dec 2020– May 2021

Data Analyst Intern India

Developed an automated Power BI dashboard connected directly to SharePoint, enabling real-time branch performance tracking and eliminating the need for manual file consolidation.

Integrated Power Apps with SharePoint and SQL databases to streamline daily branch data submissions, reducing manual entry effort by 50% and improving data accuracy.

Designed interactive Power BI visualizations with KPI tracking, pulling live data via SQL queries to provide executives with instant insights into customer volumes and service performance.

Built Python-based ETL scripts to clean, transform, and prepare large datasets from SharePoint and SQL sources for Power BI dashboards, ensuring high data quality and consistency.

Applied Python clustering algorithms on transaction data hosted in Google Cloud Platform to segment customers, enabling targeted marketing strategies for different customer groups.

Deployed Power BI dashboards and machine learning models on Google Cloud App Engine, ensuring secure, cloud-based access for both on-site and remote bank management teams.

Automated weekly and monthly performance reporting by integrating Power BI scheduled refresh with GCP-hosted services, delivering PDF and interactive reports directly to stakeholders without manual intervention.

Collaborated with stakeholders to design KPI-driven dashboards that combined Power BI visualizations, SQL data extraction, and machine learning insights, improving decision-making speed by 40%.

ACHIEVEMENTS

Reduced average delivery delays by 18% through the implementation of real-time Power BI dashboards, predictive delay models in Scikit-learn, and optimized ETL pipelines using Talend and Python, enabling faster operational decision-making across global FedEx.

Delivered an automated retail analytics platform that improved sales forecast accuracy by 90% and reduced overstock by 18%.

ACADEMIC PROJECT

Project Title: Sentiment Analysis in Customer Reviews Using Machine Learning (Tech Stack: Python, Scikit-learn, Pandas, NLTK, Jupyter Notebook, Matplotlib)

Project Description:

Built a machine learning model to analyze sentiment in Amazon customer reviews using both supervised and unsupervised learning, enabling businesses to gain insights for improving products and services.

CERTIFICATIONS

Certified in Database Programming with SQL by ORACLE Academy

Certified in Data Analytics by Deloitte Australia.

Software Engineering Job Simulation offered by JP Morgan Chase & Co.

Certified in Python 101 for Data Science by Cognitive class.ai

EDUCATION

Master of Science in Computer Science Sacred Heart University

Bachelor of Technology in Computer Science 0smania University

Contact this candidate