Data Analyst Machine Learning

Location:

Scranton, PA

Salary:

70000

Posted:

October 15, 2025

Contact this candidate

Resume:

David Pruthvi Raj Chepuri

Data Analyst

EDUCATION

Master’s in Business Analytics, The University of Scranton, Pennsylvania, USA. PROFILE OVERVIEW

4+ years of industrial working experience focused on big data analysis, data mining, statistical inference, A/B testing, machine learning, data visualization and ETL data pipelines.

Proficient in complex formulas, Pivot Tables, VLOOKUP, Power Query, and Macros for efficient data manipulation. Automated workflows using Excel VBA to streamline reporting and reduce manual effort.

Skilled in writing complex SQL queries, optimizing database performance, and handling MySQL, PostgreSQL, SQL Server, and Snowflake. Developed efficient indexing strategies to enhance query performance and data retrieval.

Experienced in Python and R, leveraging libraries like pandas, NumPy, Matplotlib, Seaborn, and SciPy for statistical analysis. Built data models and visualizations to uncover insights and drive business decisions.

Hands-on expertise in Tableau, Power BI, and Looker, creating interactive dashboards and dynamic reports. Designed data-driven visualizations to present actionable insights for business stakeholders.

Familiar with Hadoop, Apache Spark, and Kafka for large-scale data processing and real-time streaming analytics. Utilized Spark SQL and PySpark for distributed data processing across clusters.

Experienced with AWS (S3, Redshift), Google BigQuery, and Azure Data Lake for scalable cloud data storage. Implemented cost-effective data retrieval and storage strategies for optimal performance.

Proficient in ETL tools like Informatica and Alteryx, along with custom ETL pipelines for structured data integration. Automated data ingestion from multiple sources to improve data accuracy and availability.

Expertise in ARIMA, Prophet, and LSTM models for trend analysis and demand forecasting. Developed time-series models to predict business metrics and optimize resource allocation. Skilled in Snowflake for cloud-based data warehousing, query optimization, and secure data sharing. Designed schema structures to maximize performance and reduce storage costs.

Experience in text analytics using NLTK, SpaCy, and Hugging Face Transformers for sentiment analysis and chatbot development. Built NLP pipelines to extract insights from unstructured text data.

Implemented Apache Kafka for real-time data streaming, ensuring low-latency event processing. Developed streaming applications using Kafka Streams and Apache Flink for anomaly detection.

Experience in deploying models using AWS SageMaker, GCP Vertex AI, and Azure Machine Learning Studio.

Ensured data quality, integrity, and compliance using AWS IAM, Azure Purview, and GCP Data Catalog.

Automated ML model deployment and ETL workflows using Docker, Kubernetes, and Apache Airflow.

Experienced in Agile and Scrum methodologies, using JIRA and Confluence for project tracking and collaboration. Actively participated in sprint planning, retrospectives, and daily stand-ups. TECHNICAL SKILLS

Data Analysis Tools Excel, Power BI, Tableau, Looker, Google Data Studio Programming Languages Python, R, SQL

Big Data Technologies Apache Hadoop, Apache Spark, Apache Kafka, Apache Flink Cloud Platforms AWS, Google Cloud Platform, Microsoft Azure Data Warehousing Snowflake, Amazon Redshift, Google BigQuery ETL Tools Informatica, Alteryx, Apache Airflow, Talend Machine Learning & Statistical

Analysis

scikit-learn, TensorFlow and Keras, Statsmodels, SciPy 201-***-**** **********************@*****.***

Data Manipulation &

Transformation

pandas, NumPy, dplyr, SQL-based transformations

Version Control & Collaboration Git, GitHub, JIRA, Confluence Data Modeling & Database

Management

SQL Server, MySQL and PostgreSQL, MongoDB

Data Governance & Quality Informatica Data Quality, Collibra, Talend Data Quality Data Security Encryption, Data Masking, IAM (Identity and Access Management) WORK EXPERIENCE

Client: Weis Markets, Sunbury, Pennsylvania, USA

Role: Senior Data Analyst Feb 2025 - Present

Description: Weis Markets, Inc. is a regional supermarket chain that operates stores across the Mid-Atlantic region, offering groceries, and household products. As a Data Analyst, I employed SQL, Python, and data visualization tools like Power BI and Tableau to analyze large datasets, identify consumer trends, and drive business decisions. I applied advanced techniques such as predictive modeling, machine learning, and A/B testing. Responsibilities:

Utilized SQL to query and retrieve data from various databases, ensuring accurate and timely data extraction from retail systems for analytics and reporting. Optimized SQL queries for improved database performance and data integrity.

Performed exploratory data analysis (EDA) using Python (Matplotlib, Seaborn) to identify trends, patterns, and outliers in retail data. Generated visual insights to support data-driven decision-making.

Created interactive dashboards and visual reports using Power BI, Tableau, and Looker to present key retail insights. Ensured stakeholder engagement with dynamic filtering and drill-down capabilities.

Applied machine learning algorithms using Python (scikit-learn) to predict sales trends, demand forecasting, and inventory management. Improved retail operations by optimizing stock levels and reducing waste.

Worked with Snowflake for cloud data warehousing, ensuring efficient data integration and high-performance analytics. Implemented role-based security and query optimization for large-scale retail data.

Integrated Apache Kafka for real-time data streaming and event-driven architectures to analyze and respond to customer activities in real-time. Enabled real-time fraud detection and personalized recommendations.

Utilized Apache Spark and Hadoop for processing large volumes of retail transaction data, enabling faster and more efficient analytics. Scaled distributed computing for big data processing in cloud environments.

Managed retail data pipelines in Azure Data Lake and Azure Synapse, ensuring efficient data storage, integration, and analysis in the cloud. Implemented data partitioning and indexing for optimal performance.

Deployed machine learning models on Azure Machine Learning to operationalize insights and enhance decision-making in retail operations. Automated model retraining and deployment for continuous improvement. Environment: SQL, Python (pandas, NumPy), Matplotlib, Seaborn, Power BI, Tableau, Looker, scikit-learn, Informatica, Alteryx, Snowflake, Apache Kafka, Apache Spark, Hadoop, Azure Data Lake, Azure Synapse, SciPy, Azure Machine Learning. Client: Comcast Corporation, Philadelphia, Pennsylvania, USA Role: Senior Data Analyst Jun 2024 – Jan 2025

Description: Comcast Corporation is a global media and technology company offering broadband, cable, streaming, and entertainment services. I worked as a Senior Data Analyst, utilizing SQL, Python (pandas, NumPy) for data analysis and transformation to support business decisions.

Responsibilities:

Utilized Excel (advanced level) for complex data analysis, manipulating large datasets, and generating financial and operational reports for metrics. Ensured data accuracy through using Pivot Tables, VLOOKUP, Power Query, and Macros.

Wrote and optimized SQL queries to extract, analyze, and validate data from MySQL, PostgreSQL, SQL Server, and Snowflake. Improved query performance and database efficiency for reporting and decision-making.

Used Tableau, Power BI, and Looker to design and develop interactive dashboards, enabling real-time insights into customer behavior, claims data, and risk management. Enhanced reporting with dynamic filtering.

Utilized Apache Spark and Hadoop to process large volumes of unstructured data, enhancing data integration, reporting, and analytical capabilities. Leveraged PySpark for distributed data processing and transformation.

Managed data storage and processing with AWS S3 and AWS Redshift, ensuring scalability and high-performance analytics in the cloud. Optimized storage costs and query performance using partitioning and compression.

Built and maintained ETL pipelines using Informatica, Alteryx, and custom-built pipelines to automate data extraction, transformation, and loading. Streamlined data workflows for seamless integration across multiple sources.

Developed and implemented machine learning models using Python (scikit-learn, TensorFlow) to predict claims, assess risk, and detect fraud in insurance processes. Applied feature engineering and model tuning for accuracy. Environment: Excel (advanced level), SQL, Python (pandas, NumPy), Tableau, Power BI, Looker, Apache Kafka, Apache Spark, Hadoop, AWS S3, AWS Redshift, Informatica, Alteryx, scikit-learn, SciPy, Matplotlib, Seaborn, Snowflake. Client: Amazon, Bangalore, India

Role: Data Analyst Aug 2022 – Jul 2023

Description: Amazon is a global technology and e-commerce company that provides online retail, cloud computing, digital streaming, and artificial intelligence services. I served as a Data Analyst, utilizing SQL and R to analyze operational data for development and market strategies. I created interactive dashboards and reports using Power BI and Tableau to monitor key performance indicators (KPIs) in trials and sales performance. Responsibilities:

Utilize SQL to query, extract, and manipulate data from EHR, databases, and systems, ensuring accurate analysis and reporting for insights. Leverage complex joins, window functions, and indexing to optimize the query.

Create and maintain interactive dashboards using Tableau, Power BI, or Looker to visualize key healthcare metrics, such as outcomes, readmission rates, and operational efficiency. Implement dynamic filters and drill-down capabilities.

Design and implement ETL processes using Informatica, Alteryx, or custom-built pipelines to centralize data from multiple sources. Perform statistical analysis and predictive modeling with R or Python (scikit-learn, TensorFlow). Apply regression, clustering, and classification techniques to uncover valuable insights.

Work with specific platforms like Epic, Cerner, or other EHR systems to extract and analyze patient data while ensuring compliance with HIPAA and other regulatory standards. Collaborate with medical professionals.

Optimize data storage and performance using AWS, ensuring scalable and cost-effective management of large datasets. Implement partitioning, indexing, and caching techniques to enhance query performance.

Ensure data quality and integrity by implementing automated data validation techniques and conducting regular audits for industry-compliant reporting. Utilize data profiling, anomaly detection, and reconciliation processes. Environment: SQL, Python (pandas, NumPy), Tableau, Power BI, Looker, Informatica, scikit-learn, R, Epic, Cerner, Azure. Client: Rabobank, Bangalore, India

Role: Programmer Analyst Jun 2020 – Jul 2022

Description: Rabobank, a leading international bank, worked as a Programmer Analyst, leveraging Java and SQL to develop and optimize banking applications and backend systems. I designed and implemented data integration solutions using ETL tools, ensuring seamless data flow between core banking platforms and external systems. Responsibilities:

Design and implement ETL processes using Informatica or custom scripts to extract, transform, and load data from multiple banking systems into centralized data warehouses for enhanced reporting and analytics.

Collaborate with business analysts and stakeholders to gather banking requirements and convert them into technical solutions, leveraging Agile methodologies and SQL for iterative development and process optimization.

Perform database design, indexing, and query optimization using SQL and Oracle to ensure high-performance data storage and efficient retrieval for core banking transactions and financial reporting.

Integrate third-party applications, payment gateways, and financial services with core banking platforms using REST APIs, SOAP, and Web Services, ensuring secure and seamless data flow between systems.

Troubleshoot and resolve complex technical issues related to banking applications using Java and SQL, ensuring high availability, minimal downtime, and a smooth customer banking experience. Environment: Java, SQL, Spring Framework, Informatica, Agile, Oracle, Python, REST APIs, SOAP, Web Services, SSL/TLS

Contact this candidate