Machine Learning Data Analyst

Location:

Brentwood, TN

Posted:

May 27, 2025

Contact this candidate

Resume:

RUQUIYA FATIMA

SENIOR DATA ANALYST

EMAIL: ***************@*****.*** PH NO: 773-***-****

LINKEDIN: linkedin.com/in/ruquiya-fatimaa

PROFESSIONAL SUMMARY:

Overall 9+ years of experience as a Data Analyst, leveraging a deep understanding of data analysis, machine learning, and ETL processes to drive actionable insights and optimize business decision-making.

Expertise in designing and implementing automated ETL pipelines using Python, R, and SQL to streamline data extraction, transformation, and loading processes, reducing manual effort and increasing efficiency.

Proficient in data wrangling, cleaning, and preparing large, complex datasets for analysis using tools such as Pandas, SQL, and SAS, ensuring data accuracy and integrity across all stages.

Skilled in developing and maintaining interactive dashboards and visualizations using Power BI, Tableau, and Excel, providing stakeholders with real-time insights to drive data-driven decisions.

Extensive experience in applying statistical analysis and machine learning models to uncover business trends and improve forecasting accuracy, leveraging tools such as Scikit-Learn and TensorFlow.

Strong proficiency in implementing predictive analytics models for business forecasting and customer segmentation using Azure Machine Learning, AWS,

Led initiatives to integrate and streamline data sources, providing unified data access for stakeholders through advanced database architectures and real-time reporting systems.

Proven advanced analytics tools such as IBM SPSS and SAS Enterprise Miner to conduct in-depth business analysis, uncovering key patterns and insights to optimize operational efficiency.

Led large-scale data migrations, managing seamless transitions of complex datasets across on-premises and cloud-based environments, ensuring minimal downtime and disruption.

Experience in utilizing Hadoop, Spark, and AWS Redshift for big Data Analytics, handling petabytes of data and optimizing distributed data processing systems.

Expertise in time-series forecasting and trend analysis using R and Python, enabling strategic business planning and optimization of sales, operations, and financial forecasting models.

Hands on leadership in implementing end-to-end machine learning pipelines, from data extraction and preprocessing to model deployment and evaluation, to automate business processes.

Successfully automated report generation and data processing workflows in Python and R, reducing manual interventions and improving data analysis timelines.

Proven ability to design and implement data governance frameworks, ensuring data quality, consistency, and compliance across large datasets and multiple platforms.

Expertise in SQL optimization and query performance tuning, using advanced techniques such as indexing, partitioning, and query refactoring to enhance performance in MySQL, PostgreSQL, and Oracle databases.

Deep understanding of cloud-based data storage and management, including experience with Azure Data Lake Storage, Azure SQL Database, and AWS S3, enabling secure and scalable data solutions.

Proficient high-performance business intelligence solutions using Power BI and Tableau, with a focus on optimizing dashboard performance, improving data retrieval speed, and enhancing user experience.

Applied advanced data analysis techniques such as regression modeling, clustering, and data profiling to support decision-making and process improvement across various business domains.

Highly real-time data ingestion pipelines using AWS Lambda, Azure Data Factory, and Hadoop, ensuring smooth integration between relational and non-relational data sources.

TECHNICAL SKILLS:

Programming Languages

Python, R, SQL, PL/SQL,

Data Analysis & ETL

Pandas, NumPy, SAS, SAS Enterprise Miner, IBM SPSS, Power Query, Azure Data Factory

Databases

MySQL, PostgreSQL, Oracle, SQL Server, AWS RDS, Azure SQL Database

Big Data Tools

Hadoop, HDFS, MapReduce, Hive, Pig, Spark, AWS Redshift, Azure Synapse Analytics

Data Visualization

Tableau, Power BI, Matplotlib, Seaborn, Plotly, Excel, Google Analytics

Machine Learning

Scikit-Learn, TensorFlow, Azure Machine Learning, MLlib, Mahout, caret, randomForest

Clouds

Azure, AWS

Statistical Analysis

R, SAS, IBM SPSS

Data Wrangling

Pandas, OpenPyXL, Jupyter Notebook, SAS, Power Query

Reporting & Automation

Tableau, Power BI, Excel, SAS Stored Processes, Google Analytics

Data Governance

Data Profiling, Anomaly Detection, Data Cleansing, Data Validation

PROFESSIONAL EXPERIENCE:

Client: Ameriprise Financial Services, LLC, Minneapolis, MN Feb 2023 – Present

Role: Sr. Data Analyst

Responsibilities:

Designed and implemented automation Python scripts for database queries and report generation, reducing manual efforts and errors.

Developed and optimized ETL processes in R, integrating data from multiple sources to ensure clean and structured datasets.

Utilized MySQL and SQL to design, develop, and optimize complex queries for extracting and analyzing large datasets, ensuring data accuracy,

Automated recurring SQL workflows and scheduled jobs using database management tools and scripting languages, improving data pipeline efficiency and reducing manual interventions.

Built and maintained large SAS datasets by merging, sorting, and filtering data from multiple sources, ensuring consistency and accuracy in reporting.

Designed and implemented forecasting models in Tableau using trend lines, moving averages, and regression analysis to predict future business outcomes.

Implemented Azure Machine Learning (AML) models for predictive analytics by integrating Azure AI services, enabling automated customer segmentation, forecasting, and anomaly detection.

Conducted exploratory data analysis (EDA) using R to uncover hidden trends and correlations within large datasets.

Utilized Python for time-series forecasting and statistical modeling to analyze trends and patterns in financial, sales, and operational data.

Developed interactive dashboards using Hadoop-powered data sources, providing business leaders with real-time analytics and insights.

Conducted data profiling and anomaly detection within Oracle Database, identifying data inconsistencies and implementing cleansing techniques to enhance data accuracy and reliability.

Created stored procedures and triggers in MySQL to automate business logic execution, reducing manual intervention and improving data processing efficiency.

Implemented data partitioning strategies to improve data retrieval efficiency in Hadoop-based data warehouses.

Conducted ad-hoc analysis and exploratory data analysis (EDA) using SAS to identify key insights and support business initiatives.

Automated data preprocessing and transformation tasks in IBM SPSS, cleaning raw datasets, handling missing values, and standardizing data formats to enhance analysis efficiency.

Designed Excel-based inventory tracking and supply chain optimization tools that improved stock management, minimized waste, and enhanced operational efficiency.

Developed machine learning models in Jupyter Notebook using Scikit-Learn and TensorFlow, supporting predictive analytics initiatives and improving forecasting accuracy.

Optimized Tableau workbooks by reducing query execution time, implementing indexing strategies, and improving dashboard performance through efficient data modeling techniques.

Configured event tracking and enhanced eCommerce tracking in Google Analytics to analyze shopping behavior, checkout performance, and sales funnels.

Integrated PostgreSQL with cloud-based data warehouses and business intelligence tools, enabling real-time reporting and cross-platform analytics for stakeholders.

Managed large-scale data migrations and database upgrades within Oracle environments, ensuring minimal downtime and seamless transition to newer database versions.

Developed and automated ETL workflows using Azure Data Factory (ADF) and Azure Databricks to support seamless data integration between relational databases, NoSQL databases, and cloud storage solutions.

Environment: Python, R, MySQL, SQL, SAS, Tableau, Azure Machine Learning (AML), Azure AI, Hadoop, Oracle Database, PostgreSQL, IBM SPSS, Excel, Jupyter Notebook, Scikit-Learn, TensorFlow, Google Analytics, Azure Data Factory (ADF), Azure Databricks.

Client: Caterpillar Inc, Irving, Texas Oct 2019 – Jan 2023

Role: Sr. Data Analyst

Responsibilities:

Developed interactive dashboards and reports using Python-based visualization libraries such as Matplotlib, Seaborn, and Plotly.

Leveraged Python and R to perform data wrangling, cleaning, and transformation on large, complex datasets, applying machine learning algorithms, statistical tests, and visualizations to uncover trends and patterns that contributed to improved decision-making.

Conducted data mining and predictive modeling using SAS Enterprise Miner, uncovering patterns and relationships within large datasets.

Developed scalable machine learning models on Hadoop using MLlib and Mahout for predictive analytics.

Performed advanced regression and clustering techniques using R’s statistical modeling functions to identify data patterns.

Configured and maintained AWS RDS instances for relational database workloads, ensuring optimal performance through query optimization, indexing strategies, and automated backup configurations.

Conducted detailed MySQL data analysis, using advanced SQL functions such as windowing, ranking, and pivoting to generate insights that support data-driven business decisions.

Developed AWS Lambda functions in Python and SQL to automate data processing, file ingestion from Amazon S3, and real-time data transformations for business intelligence applications.

Utilized Excel’s data validation and conditional formatting features to ensure data integrity, highlighting anomalies and inconsistencies for proactive issue resolution.

Conducted data modeling in Power BI, creating star and snowflake schemas with fact and dimension tables to support high-performance analytical queries.

Developed predictive analytics models leveraging IBM SPSS, utilizing regression analysis, clustering, and decision trees to improve forecasting accuracy and optimize operational strategies.

Designed relational database schemas in Oracle to support analytical needs, ensuring data integrity, normalization, and scalability for long-term data storage and retrieval.

Conducted deep-dive analysis of website traffic and user behavior using Google Analytics reports, identifying optimization opportunities to improve conversion rates.

Developed and optimized Power Query transformations to clean, shape, and preprocess large datasets before loading them into Power BI models for efficient reporting.

Created and maintained SAS Stored Processes for real-time execution of analytical models and data-driven decision-making.

Worked with cross-functional teams to implement ETL pipelines in Hadoop, supporting real-time analytics and machine learning use cases.

Implemented indexing strategies, query optimization techniques, and performance tuning in MySQL to enhance query execution speed, reduce resource consumption, and support real-time data analysis.

Automated data cleaning and transformation processes in Jupyter Notebook using Pandas and OpenPyXL, improving data processing efficiency and accuracy.

Conducted performance tuning of Oracle SQL queries, indexes, and database structures, reducing query execution times and optimizing database resource utilization.

Environment: Python, Matplotlib, Seaborn, Plotly, R, SAS Enterprise Miner, Hadoop, MLlib, Mahout, R Statistical Modeling, AWS RDS, MySQL, SQL, Lambda, Amazon S3, Excel, Power BI, IBM SPSS, Oracle, Google Analytics, PostgreSQL, Power Query, SAS Stored Processes, ETL, Jupyter Notebook, Pandas.

Client: Max Healthcare, New Delhi, India Mar 2018 – Aug 2019

Role: Data Analyst

Responsibilities:

Developed and maintained ETL pipelines using MySQL, extracting data from various structured and unstructured sources, transforming it with SQL logic, and loading it into data warehouses for analytics and decision-making.

Developed and maintained Tableau metadata layers, calculated fields, and extract files to optimize performance and enhance the end-user experience.

Built data ingestion pipelines using Sqoop and Flume to extract, transform, and load data from relational databases and streaming sources into Hadoop.

Developed and maintained interactive Jupyter Notebooks for exploratory data analysis, leveraging Python libraries such as Pandas, NumPy, and Matplotlib to generate insights from complex datasets.

Integrated Tableau with multiple data sources, including SQL Server, Oracle, and cloud-based data warehouses, ensuring seamless data connectivity and real-time reporting capabilities.

Developed ETL processes using SAS to cleanse, validate, and transform raw data into structured formats for analytical and reporting purposes.

Wrote custom Python scripts for data cleansing, feature engineering, and outlier detection, ensuring high data quality and accuracy.

Created automated Excel macros using VBA to streamline repetitive data processing tasks, reducing manual efforts and improving workflow efficiency across departments.

Created and maintained Oracle PL/SQL stored procedures, functions, and triggers to automate data validation, transformation, and reporting workflows for enhanced efficiency.

Conducted time-series analysis and forecasting using R’s forecast and TTR packages for sales and operational planning.

Designed and optimized Hive queries to analyze structured and semi-structured data stored in Hadoop clusters, improving query performance.

Implemented Azure Synapse Analytics to enable data exploration, real-time querying, and predictive analytics by integrating on-premises and cloud-based data sources while optimizing query performance through partitioning and indexing.

Created and customized Google Analytics dashboards to provide real-time insights into website performance, user acquisition, and retention trends.

Designed and implemented complex statistical models using IBM SPSS to analyze large datasets, identify patterns, and generate insights for data-driven decision-making across multiple business functions.

Designed and developed Azure Data Factory (ADF) pipelines to ingest, transform, and process large-scale structured and unstructured datasets from various sources into Azure Data Lake Storage and Azure SQL Database for business analytics and reporting.

Environment: MySQL, PostgreSQL, Tableau, Sqoop, Flume, Hadoop, Python, Pandas, NumPy, Matplotlib, SQL Server, Oracle, SAS, VBA, R, caret, randomForest, Google Analytics, Hive, Azure Synapse Analytics, IBM SPSS, Oracle PL/SQL, AzureDataFactory,DataLakeStorage,SQLDatabase.

Client: Canara hsbc life insurance, Gurgaon, India Jul 2016 – Feb 2018

Role: Data Analyst

Responsibilities:

Developed and optimized Python scripts to automate data extraction, transformation, and loading (ETL) processes, enhancing data processing efficiency.

Developed scalable data processing pipelines using Hadoop components such as HDFS, MapReduce, Hive, and Pig to handle large datasets efficiently.

Developed statistical models and hypothesis testing in R to drive data-driven insights and support strategic decision-making.

Designed and implemented MySQL-based ETL workflows, extracting raw data from APIs, logs, and external sources, transforming it with SQL logic, and loading it into structured databases.

Created complex DAX measures and calculated columns to enable advanced data analysis, trend identification, and performance measurement in Power BI reports.

Created custom SAS reports and dashboards to present key insights and performance metrics to business stakeholders and executive teams.

Built and optimized AWS Redshift queries, schemas, and columnar storage configurations to ensure high-performance data retrieval, aggregation, and processing for large-scale analytical workloads.

Designed and developed highly interactive Power BI dashboards incorporating drill-through functionality, slicers, and filters to provide actionable insights for business users.

Managed Hadoop clusters, ensuring high availability, resource optimization, and efficient data distribution across nodes.

Developed complex Excel dashboards and reports utilizing advanced formulas, pivot tables, and Power Query to analyze large datasets and extract actionable business insights for data-driven decision-making.

Designed and managed AWS ETL pipelines to extract, transform, and load large volumes of structured and unstructured data from diverse sources into AWS Redshift, S3, and RDS for analytical reporting and data warehousing.

Developed and optimized complex SQL queries in Oracle Database to extract, transform, and analyze large datasets, improving data retrieval efficiency for business intelligence applications.

Environment: Python, Hadoop, HDFS, MapReduce, Hive, Pig, R, MySQL, DAX, SAS, Power BI, AWS, Redshift, S3, RDS, Oracle,Database,Excel,PowerQuery.

EDUCATION: BTech in Computer Science and Engineering, June 2012 - May 2016 at JNTUH, India

Contact this candidate