Data Analyst Machine Learning

Location:

Wichita, KS

Posted:

February 21, 2025

Contact this candidate

Resume:

Suma Peddireddy

Data Analyst Missouri, Kansas

+1-832-***-**** ****************@*****.*** https://www.linkedin.com/in/suma-peddireddy/ SUMMARY

Detail-oriented Data Analyst with 5+ years of experience in analyzing complex datasets, deriving actionable insights, and optimizing business processes. Proficient in leveraging data visualization tools, SQL, Python, and advanced analytical techniques to drive strategic decision-making. Seeking a challenging role in a dynamic organization where I can utilize my skills to solve data-driven problems and contribute to organizational success.

• Highly skilled in Advanced Excel, utilizing complex formulas, Pivot Tables, Macros, and data analysis functions to handle large datasets efficiently. Capable of automating repetitive tasks and enhancing decision-making.

• Experienced in writing and optimizing complex SQL queries for data extraction, transformation, and reporting. Adept at working with relational databases like MySQL, PostgreSQL, and SQL Server to ensure seamless data management.

• Proficient in Python and R, leveraging libraries such as pandas, NumPy, Matplotlib, Seaborn, and SciPy for statistical analysis and data visualization. Skilled in handling structured and unstructured data to derive meaningful insights.

• Adept at creating interactive and insightful dashboards using Tableau, Power BI, and Looker. Experienced in presenting complex data in a visually appealing manner, enabling stakeholders to make data-driven decisions.

• Hands-on experience with Hadoop, Apache Spark, and Kafka for processing and analyzing large-scale datasets. Familiar with distributed computing techniques to optimize performance in data-intensive environments.

• Expertise in working with cloud platforms such as AWS (S3, Redshift), Google BigQuery, and Azure Data Lake for scalable data storage, analysis, and management. Skilled in designing cloud-based data solutions for business intelligence.

• Proficient in building and deploying Machine Learning models using Scikit-learn, TensorFlow, and PyTorch. Experienced in feature engineering, model evaluation, and applying predictive analytics to business problems.

• Skilled in using Amazon SageMaker, Google AI Platform, and Azure Machine Learning Studio for building, training, and deploying machine learning models. Familiar with cloud-native solutions for real-time inference.

• Experienced in designing and maintaining ETL workflows using tools like Informatica, Alteryx, and Apache NiFi. Capable of automating data extraction, transformation, and loading processes for seamless data integration.

• Strong knowledge of Data Warehousing concepts, including schema design (Star Schema, Snowflake Schema) and data modeling. Hands-on experience with Snowflake for scalable and efficient analytics.

• Certified in Power BI, with experience in designing complex dashboards and integrating real-time data. Proficient in DAX and Power Query for creating advanced calculations, custom columns, and efficient data transformations.

• Well-versed in Statistical Analysis techniques, including Regression Analysis, Hypothesis Testing, A/B Testing, and Time Series Forecasting. Skilled in using Python and R to derive data-driven insights.

• Experienced in Natural Language Processing (NLP) techniques such as text classification, sentiment analysis, and topic modeling. Proficient in using NLTK, spaCy, and Transformers to analyze and interpret textual data.

• Committed to maintaining data accuracy and reliability through data cleansing, validation, and transformation. Knowledgeable in regulatory frameworks such as GDPR and HIPAA to ensure data privacy and compliance.

• Skilled in automating repetitive tasks using Python scripts, improving efficiency and streamlining workflows. Experienced in writing scripts for data extraction, API integration, and workflow automation.

• Dedicated to staying updated on the latest advancements in Data Analytics, Data Engineering, and Machine Learning. Passionate about improving technical skills and implementing best practices in data science.

• Experienced in Data Governance and Metadata Management, ensuring proper documentation, cataloging, and lineage tracking for enterprise-wide data assets. Skilled in using Alation, Collibra, and Informatica Data Governance.

• Knowledgeable in ABAC (Attribute-Based Access Control) and RBAC (Role-Based Access Control) for securing sensitive data. Experienced in implementing AWS IAM Policies, Azure RBAC, and Google Cloud IAM to enforce data security and compliance.

• Demonstrated strong problem-solving skills by developing and optimizing predictive models for healthcare and retail analytics, improving operational efficiency and decision-making through advanced statistical analysis and machine learning techniques. Successfully managed multiple projects simultaneously while ensuring accuracy and timeliness of insights. TECHNICAL SKILLS

Data Analysis Tools: Excel, Power BI, Tableau, Looker, Google Data Studio Programming Languages: Python, R, SQL

Big Data Technologies: Apache Hadoop, Apache Spark, Apache Kafka, Apache Flink Cloud Platforms: AWS, Google Cloud Platform, Microsoft Azure Data Warehousing: Snowflake, Amazon Redshift, Google BigQuery ETL Tools: Informatica, Alteryx, Apache Airflow, Talend Machine Learning & Statistical Analysis: Scikit-learn, TensorFlow and Keras, Statsmodels, SciPy Data Manipulation & Transformation: Pandas, NumPy, dplyr, SQL-based transformations Version Control & Collaboration: Git, GitHub, JIRA, Confluence Data Modeling & Database Management: SQL Server, MySQL and PostgreSQL, MongoDB Data Governance & Quality: Informatica Data Quality, Collibra, Talend Data Quality Data Security: Encryption, Data Masking, IAM (Identity and Access Management) EDUCATION

Wichita State University

Masters, Business analytics in Management 2023 – 2024 EXPERIENCE

Cerner Corporation, Kansas, USA May 2024 - Present Senior Data Analyst

Description: Cerner Corporation, a global healthcare leader, specializes in electronic health records (HER) and data analytics. As a Data Analyst, I used SQL, Python, and Tableau to analyze data, build predictive models, and provide insights, leveraging ETL, data visualization, and AWS for strategic decision-making. Responsibilities:

• Developed predictive models for healthcare datasets, leveraging statistical modeling, machine learning, and deep learning to improve patient care, resource allocation, and disease progression analysis.

• Built and optimized machine learning algorithms for disease prediction, patient risk stratification, and hospital readmission analysis, using tools like Scikit-learn, TensorFlow, and Keras to enhance clinical decision support.

• Designed interactive dashboards using Power BI, Tableau, and Chartio to track key healthcare metrics, patient outcomes, cost efficiency, and hospital performance, enabling doctors, administrators, and policy-makers.

• Engineered and managed ETL pipelines to integrate structured and unstructured healthcare data from diverse sources into AWS RDS, S3, and Snowflake, ensuring seamless data extraction, transformation.

• Processed and analyzed large-scale healthcare data using Hadoop, HDFS, MapReduce, Hive, Pig, and Spark within the AWS EMR environment, enhancing medical research, operational workflows, and real-time analytics for hospitals.

• Crafted and optimized complex SQL queries in AWS RDS to support clinical data analysis, patient record retrieval, and healthcare performance tracking, ensuring efficient query execution, accuracy.

• Excelled in organizational skills and attention to detail by designing and maintaining complex ETL workflows, automating data extraction, transformation, and reporting processes across cloud-based platforms (AWS, Snowflake, Google BigQuery). Ensured data integrity and compliance with industry regulations (HIPAA, GDPR) while handling large-scale datasets efficiently.

• Performed sentiment analysis and NLP research on patient feedback, medical literature, and clinical notes, using Python, PySpark, and Scikit-learn to extract meaningful insights for improving patient experience and healthcare service quality.

• Applied data manipulation techniques with Pandas, NumPy, Seaborn, Matplotlib, and R programming to conduct disease trend analysis, mortality rate forecasting, and treatment effectiveness studies, helping healthcare organizations make evidence-based decisions.

• Utilized Advanced Excel features, including Pivot Tables, Macros, VLOOKUP, INDEX-MATCH, Power Query, and DAX to perform data cleaning, trend analysis, financial forecasting, and healthcare cost optimization, enabling accurate reporting and data-driven decision-making.

Environment: Scikit-learn, TensorFlow, Keras, Pandas, NumPy, Seaborn, Matplotlib, R programming, Power BI, Tableau, Chartio, AWS RDS, AWS S3, Advanced Excel, Hadoop, HDFS, MapReduce, Hive, Pig, Spark, Google BigQuery, Informatica, Snowflake, Python, PySpark, Google Cloud Functions.

Associated Wholesale Grocers, Kansas, USA Nov 2023 – April 2024 Data Analyst

Description: Associated Wholesale Grocers (AWG), the largest U.S. cooperative food wholesaler, supplies independent supermarkets. As a Senior Data Analyst, I used SQL, Python, and BigQuery to optimize supply chain efficiency and demand forecasting. Leveraging Power BI and Tableau, I built interactive dashboards for real-time insights into retail performance and customer trends. Responsibilities:

• Applied advanced Excel techniques, including Pivot Tables, Macros, and VBA, to automate data analysis, streamline reporting, and enhance decision-making efficiency for sales, inventory, and financial data.

• Integrated and optimized data pipelines by extracting and transforming data from SQL Server DB, BigQuery, and Azure SQL DB, enabling seamless data consolidation and improving reporting accuracy in Power BI and Looker dashboards.

• Leveraged Azure Data Lake and Azure SQL Database for efficient storage, retrieval, and processing of large datasets, ensuring scalable and high-performance data management across the organization.

• Developed and deployed data analysis models using Python and R, utilizing Pandas, NumPy, Matplotlib, Seaborn, and Scipy to perform in-depth statistical analysis, trend identification, and forecasting.

• Applied Scikit-learn and NLTK to build predictive models and natural language processing (NLP) solutions, improving customer sentiment analysis, demand forecasting, and anomaly detection.

• Created interactive dashboards and insightful visualizations using Power BI, Tableau, and Looker, providing real-time business intelligence on product performance, operational efficiency, and market trends.

• Managed large-scale datasets and real-time data streaming using Hadoop, Apache Spark, and Kafka, optimizing data ingestion, processing, and analytics for enhanced decision-making.

• Built and automated ETL pipelines using Informatica and Alteryx, ensuring seamless data transformation and integration from multiple sources into cloud-based storage solutions like Snowflake for efficient querying and analytics.

• Utilized Snowflake for scalable and high-performance querying of structured and unstructured data, enabling fast insights generation and improved analytics capabilities for strategic business initiatives.

• Designed and optimized complex SQL queries for data extraction, transformation, and reporting, improving query performance and reducing processing time for large datasets in SQL Server DB, BigQuery, and Azure SQL DB.

• Developed and automated forecasting models using Python, R, and Scikit-learn, enabling accurate sales trend predictions, inventory optimization, and demand planning for improved supply chain efficiency.

• Streamlined data workflows by integrating Kafka for real-time event-driven data streaming, ensuring seamless data processing between various systems and enhancing operational responsiveness.

• Implemented advanced Power BI and Tableau functionalities, including DAX and Power Query, to create dynamic, self-service reports and interactive dashboards, enabling business users to derive insights with minimal technical dependency. Environment: Excel, Pivot Tables, Macros, VBA, SQL Server DB, BigQuery, Azure SQL DB, Power BI, Azure Data Lake, Azure SQL Database, Python, R, Pandas, NumPy, Matplotlib, Seaborn, SciPy, Scikit-learn, NLTK, Tableau, Looker, Hadoop, Apache Spark, Kafka, Informatica, Alteryx, Snowflake.

Marathon Petroleum Company (Accenture), Hyderabad, India Aug 2021 - July 2023 Data Analyst

Description: Marathon Petroleum Corporation is a leading integrated downstream energy company specializing in refining, marketing, and transportation of petroleum products. As a Data Analyst, I utilized SQL, Python, and R to analyze large datasets, build predictive models, and create actionable insights. I leveraged tools like Tableau, Power BI, and Hadoop for data visualization, reporting, and real- time data processing, optimizing decision-making and operational efficiency. Responsibilities:

• Built ad-hoc and standard dashboards in Power BI and Tableau, providing real-time insights into claim trends, customer demographics, and outcomes. These dashboards helped stakeholders monitor key metrics.

• Conducted in-depth data exploration and developed interactive visualizations using Python and Apache Spark, analyzing customer and claims data to uncover hidden patterns. These insights enabled risk assessment and fraud detection.

• Scaled machine learning algorithms in Scikit-learn with Apache Spark to process large-scale datasets, improving predictive accuracy for claims approvals and fraud detection. By leveraging distributed computing, model execution times.

• Predicted customer behavior and outcomes by splitting data into training and test datasets, applying cross-validation in Scikit- learn to enhance model generalization. These predictive insights supported personalized recommendations.

• Created insightful visualizations, including histograms, time-series charts, scatterplots, and correlation tables, using Matplotlib in Python to analyze trends. These visualizations helped in tracking outcomes and identifying patterns.

• Worked with data warehouse technologies like Google BigQuery, Amazon Redshift, and Snowflake, managing and querying large datasets for efficient reporting. These platforms provided scalable storage solutions for structured.

• Processed and analyzed large datasets, including customer demographics and claims history, using NumPy, SciPy, Pandas, PySQL, and PySpark. This enabled efficient data cleansing, transformation, and feature engineering for advanced analytics.

• Developed and monitored Key Performance Indicators (KPIs) for metrics such as claims processing times and customer satisfaction usinSense and Excel (Pivot Tables, VLOOKUP, and Macros). These KPIs were used.

• Built statistical models in R, leveraging regression, probability analysis, and feature selection to identify key factors influencing claims approval or denial. These models provided actionable insights for decision-makers.

• Implemented scalable data processing solutions using Hadoop, HBase, Hive, and Sqoop, ensuring efficient ingestion, storage, and retrieval of extensive datasets. This architecture facilitated batch processing of historical data.

• Streamlined the processing of claims and customer information by loading data into HBase tables using Java and MapReduce, improving data accessibility and computational efficiency. These optimizations accelerated processing times. Environment: Power BI, Tableau, Python, Apache Spark, Scikit-learn, Matplotlib, Google BigQuery, Redshift, Snowflake, NumPy, SciPy, Pandas, PySQL, PySpark, Qlik Sense, Excel (Pivot Tables, VLOOKUP, Macros), R, Hadoop, HBase, Hive, Sqoop, Java, MapReduce. Gap Inc, Bangalore, India Aug 2019 - July 2021

Programmer Analyst

Description: Gap Inc. is a leading global retailer offering clothing, accessories, and personal care products. As a Programmer Analyst, I worked with Java, SQL, and Spring Framework to design, develop, and maintain software applications. I utilized Oracle and MySQL databases for data management, integrated RESTful APIs, and optimized performance through Agile methodologies and version control tools like Git.

Responsibilities:

• Designed and implemented ETL pipelines for data extraction, transformation, and loading from multiple data sources into AWS RDS and S3, ensuring data consistency and availability for analysis.

• Developed and implemented predictive models for user behavior data on websites, including URL categorization, social network analysis, and search content using machine learning techniques.

• Defined, executed, and interpreted complex SQL queries in AWS RDS, involving subqueries, joins, aggregations, and window functions to track and analyze product sales in AWS EC2.

• Led the development of Natural Language Processing (NLP) initiatives, including chatbots and virtual assistants for customer support. Managed data processing using Hadoop, HDF5, MapReduce, Hive, Pig, and Spark in AWS EMR.

• Wrote Python scripts with Apache Spark and Elastic Search to create real-time dashboards visualized in Grafana.

• Stored and retrieved large datasets in AWS S3 for the company’s personal marketing website.

• Automated routine data processing tasks using Python and AWS Lambda, optimizing workflow efficiency.

• Defined business questions and created databases based on schema of key features and metrics using SQL in AWS RDS.

• Developed data analysis prototypes using Power BI and Power Pivot, and visualized reports with Power View and Power Map. Explored and visualized data using Power BI and Tableau, providing insights across various dimensions. Environment: AWS RDS, AWS S3, AWS EC2, Hadoop, HDFS, MapReduce, Hive, Pig, Spark, Apache Spark, Elastic Search, Grafana, Python, AWS Lambda, SQL, Power BI, Power Pivot, Power View, Power Map, Tableau.

Contact this candidate