Data Analyst Machine Learning

Location:

Boca Raton, FL

Posted:

April 21, 2025

Contact this candidate

Resume:

Data Analyst

Shravani

Email: *********@*****.*** Contact: 786-***-****

Summary:

5+ years’ experience Data Engineer skilled in building and managing data and machine learning solutions using cloud platforms.

Expert in designing data pipelines with AWS tools like Glue, Redshift, Lambda, and Sage Maker for efficient data processing.

Skilled in real-time data processing with tools like Kinesis and Step Functions, ideal for tasks like fraud detection.

Proficient in Python (Pandas, PySpark, Scikit-learn) and SQL for data analysis, model building, and database management.

Knowledgeable in data security, ensuring sensitive data is protected using AWS encryption, IAM, and KMS.

Experienced in managing master data using SAP and ERP systems, ensuring consistency, compliance, and alignment with data governance standards.

Experienced in applying mach ine learning techniques to detect fraud and assess risks.

Creates insightful dashboards with Power BI, Tableau, and Python visual libraries to support business decisions.

Automating workflows and optimizing data processes using Power/Klik Automate, enhancing efficiency and business operations.

Familiar with Agile workflows and tools like Jira, collaborating effectively with teams to meet business goals.

Communicates technical concepts clearly to non-technical audiences, helping improve strategies like fraud prevention.

Experience in migrating and optimizing data pipelines from Snowflake to cloud-native platforms, ensuring seamless data integration and minimal downtime.

Optimizes performance in AWS Glue and Redshift using techniques like partitioning and caching.

Skilled in automating data workflows to save time and increase efficiency.

Hands-on experience with AWS cloud technology for building scalable, reliable systems.

Proficient in utilizing Databricks for big data processing and analytics, enhancing pipeline scalability and performance.

Proficient in using Azure Data Factory and Databricks for handling large data and analytics.

Improved data query speed and reduced costs in Azure Synapse Analytics with performance optimizations.

Built real-time data streaming systems with Azure Event Hubs and Stream Analytics for detecting anomalies.

Technical Skills:

Category

Tools/Technologies/Skills

Cloud Platforms

AWS Glue, Amazon Redshift, AWS Lambda, Amazon SageMaker, Amazon Kinesis, AWS Step Functions, S3, IAM, Azure Data Factory, Azure App Service

Data Engineering

SQL, Python (Pandas, PySpark), AWS Lake Formation, AWS Key Management Service (KMS), Pentaho,DBT

Data Processing

AWS Glue, SQL, PySpark, Data Pipelines, Data Cleansing, Data Transformation, Spark (Optimization, Troubleshooting)

Machine Learning/AI

SAS, Python (Scikit-learn, XGBoost), Logistic Regression, Random Forest, Hyperparameter Tuning, Isolation Forest, DBSCAN

Real-time Processing

Amazon Kinesis, AWS Lambda, AWS Step Functions

Data Visualization

Power BI, Tableau, Python (Matplotlib, Seaborn)

ERP & Master Data Management

SAP, Master Data Management (MDM), ERP Systems, Microsoft Dynamics 365

Databases

MySQL, PostgreSQL, MS SQL Server, Teradata, Oracle, Azure SQL,Snowflake

Security & Compliance

AWS IAM, Data Masking, Encryption (AWS KMS), AML/KYC Compliance

Development

Flask (API Development), Git, CI/CD (Azure DevOps, Jenkins), Azure DevOps

Business Intelligence

Power BI, Tableau, Custom SQL, Interactive Dashboards

Other

Scrum, Agile, Jira, User Acceptance Testing (UAT), Cross-functional Team Collaboration

Professional Experience:

Client: Citibank, Fort Lauderdale, FL June 2024 – Till Date

Role: Data Engineer

Project: Customer Risk Profiling and Fraud Detection System

Objective: To build a robust data-driven system for identifying high-risk customers and detecting fraudulent activities across Citibank’s diverse banking operations. This system aims to enhance regulatory compliance, reduce financial fraud, and improve customer trust.

Responsibilities:

Aggregated data from multiple sources (credit card transactions, loans, savings, CRM) using AWS Glue for ETL workflows.

Developed data pipelines to load data into Amazon Redshift, ensuring consistency via data cleansing and transformation in Python (Pandas, PySpark) and AWS Glue.

Managed data quality with AWS Lake Formation for metadata and lineage tracking.

Built predictive models for customer risk using Python (Scikit-learn, XGBoost) and trained them on Amazon SageMaker.

Stored data in Amazon S3 and performed analytics in Amazon Redshift.

Hands-on experience using DBT to build modular, reusable SQL models and manage transformations within Redshift and Azure Synaps

Automated workflows and optimized data processing using Klik Automate/Microsoft Power Automate, integrating cloud services and business applications for enhanced efficiency.

Designed and implemented end-to-end workflow automation using Klik Automate /Power Automate, reducing manual effort and improving operational efficiency

Visualized risk scores in real-time Power BI dashboards connected to Amazon Redshift.

Developed anomaly detection models (Isolation Forest, DBSCAN) using AWS Glue and Python.

Built real-time streaming pipelines with Amazon Kinesis and AWS Lambda for suspicious transaction detection.

Deployed APIs with Flask on AWS Lambda and API Gateway for integration with detection and alert systems.

Implemented real-time alerts via email and SMS using Amazon Simple Notification Service (SNS).

Automated fraud analysis reports using Python (Matplotlib, Seaborn) and AWS Step Functions.

Monitored data pipelines and dashboards with Amazon CloudWatch.

Ensured AML/KYC compliance with Amazon Redshift IAM policies and data masking.

Encrypted sensitive data with AWS KMS, ensuring protection at rest and in transit.

Conducted compliance audits using AWS Config and Security Hub.

Optimized large-scale data processing in AWS Glue, reducing query times by 35%.

Enhanced query performance in Amazon Redshift with partitioning, caching, and indexing.

Designed self-service analytics tools for business teams to query data and generate custom reports.

Developed and maintained end-to-end workflows for fraud detection using AWS Step Functions.

Integrated external data sources like credit reports and behavioral insights to enhance model accuracy.

Enabled dynamic role-based access for analytics teams using AWS IAM and Lake Formation permissions.

Migrated and optimized data pipelines from Snowflake to cloud-native platforms, achieving seamless integration with minimal downtime.

Built data pipelines using Azure Data Factory to collect and process data from various sources.

Processed large datasets with Azure Databricks and PySpark for advanced analytics.

Improved query speed in Azure Synapse Analytics by optimizing performance settings.

Set up real-time data streaming with Azure Event Hubs and Azure Stream Analytics for quick insights.

Managed data quality and tracking using Azure Purview to ensure reliable analytics.

Created easy-to-understand dashboards in Power BI connected to Azure SQL Database.

Environment:

AWS Glue, Amazon Redshift, Amazon S3, Amazon SageMaker, Python (Pandas, PySpark, Scikit-learn, XGBoost), SQL, Amazon Kinesis, AWS Lambda, Power BI, Flask, AWS Key Management Service (KMS), AWS Lake Formation, AWS Config, Security Hub, Amazon CloudWatch, Amazon SNS, AWS Step Functions, Azure Data Factory, Azure DevOps, Azure App Service, Azure SQL, Azure Databricks, Azure Synapse Analytics, Azure Purview, Azure Event Hubs, Azure Stream Analytics, Matplotlib, Seaborn

Client: Verizon Communications Inc, Atlanta, GA April 2023 – May 2024

Role: Data Analyst

Project: Customer Churn Prediction and Retention Strategy Analysis

Overview: This project aimed to develop and implement a predictive model to identify customers at risk of churning (leaving the service) and suggest targeted retention strategies. The primary goal was to enhance Verizon's customer retention rates by leveraging historical customer data and machine learning techniques.

Responsibilities:

Collected large-scale datasets from multiple internal sources, including CRM, customer billing systems, and network performance logs, using SQL to query relational databases (MySQL, PostgreSQL).

Utilized Python (Pandas, NumPy) for data manipulation and preprocessing, cleaning missing values, handling outliers, and transforming raw data into a structured format for analysis.

Automated the data ingestion pipeline using Python scripts to ensure real-time data updates for continuous analysis.

Conducted detailed exploratory data analysis using Python (Matplotlib, Seaborn) to visualize patterns, trends, and outliers within customer behaviors (e.g., churn rate, service usage).

Applied statistical tests (e.g., Chi-square, t-tests, ANOVA) to understand relationships between customer attributes (e.g., age, service type) and churn probability.

Performed correlation analysis and feature importance assessments using Pearson’s correlation and Spearman rank correlation methods to identify the most relevant predictors for churn.

Applied advanced feature engineering techniques using Python (Pandas, Scikit-learn) to generate new features like customer tenure, usage frequency, and service disruptions from raw data.

Performed data transformation techniques such as log transformations and scaling (StandardScaler, MinMaxScaler) to improve the quality of input data for predictive modeling.

Employed One-Hot Encoding and Label Encoding for categorical data, ensuring that the machine learning models could interpret the data effectively.

Developed and trained several machine learning models using Scikit-learn, including logistic regression, random forest, gradient boosting, and decision trees, to predict customer churn.

Utilized XGBoost and LightGBM for efficient and high-performance modeling, particularly when dealing with imbalanced data or large datasets.

Leveraged Hyperparameter Tuning techniques, including GridSearchCV and RandomizedSearchCV, to optimize model performance and avoid overfitting.

Assessed model performance using multiple evaluation metrics such as accuracy, precision, recall, F1-score, and ROC-AUC to select the most appropriate model.

Applied Cross-Validation (K-fold) to ensure robustness of the model’s performance across different subsets of data, reducing the risk of overfitting.

Utilized Confusion Matrices and Precision-Recall curves to further understand model strengths and weaknesses, ensuring actionable outcomes for churn mitigation strategies.

Designed interactive and informative Power BI dashboards to visualize key performance indicators (KPIs) such as churn probability, customer segments, and retention strategies.

Created detailed, visual reports using Tableau to showcase model predictions, customer behavior insights, and retention campaign results to senior stakeholders.

Developed custom Python visualizations using Matplotlib and Seaborn to communicate complex trends and model results effectively to non-technical teams.

Worked closely with the marketing and customer service teams to design targeted customer retention strategies based on model predictions, such as offering personalized discounts and enhancing service quality for at-risk customers.

Presented findings in regular meetings with senior management, offering strategic insights and recommendations that drove actionable decision-making in customer retention programs.

Utilized Jupyter Notebooks to document and communicate the analytical process, ensuring transparency and reproducibility of the data analysis steps.

Environment: MySQL, PostgreSQL, Python (Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, XGBoost, LightGBM), Power BI, Tableau, Jupyter Notebooks, Apache Airflow, Git.

Client-Aditya Birla sun life insurance, India June 2018– Mar 2022

Role-Data Analyst

Description: Analyzed large datasets to derive actionable insights, supporting business decisions in sales, marketing, and risk management. Cleaned, transformed, and visualized data using SQL, Excel, and Python (Pandas). Developed dashboards and reports in Power BI and Tableau to track key performance metrics. Applied statistical and machine learning models to predict customer churn, fraud, and claims. Collaborated with cross-functional teams to optimize operations and enhance customer experience.

Responsibilities:

Develop business architecture using requirements such as scope, processes, alternatives, and risks.

Created Tableau scorecards, dashboards using stack bars, bar graphs, scattered plots, geographical maps, Gantt charts using show me functionality during POC.

Created side by side bars, Scatter Plots, Stacked Bars, Heat Maps, Filled Maps and Symbol Maps according to deliverable specifications.

Created customized Tableau Dashboards, integrating Custom SQL from Teradata and Oracle and performing data blending in reports.

Generated Interactive Dashboards with Quick filters, Parameters and Actions to handle views more efficiently. Created Bullet graphs to determine profit generation by using measures and dimensions data from Oracle and MS Excel.

A Big Query table contains individual records organized in rows. Each record is composed of columns (also called fields).

Developed and reviewed SQL queries with use of joins clauses (inner, left, right) in Tableau Desktop to validate static and dynamic data for data validation.

Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining, and reporting solutions that scales across massive volume of structured and unstructured data.

Maintained material master data in SAP and conducted regular data health checks to support business operations.

Having strong SQL skills with ability to optimize, analyse and improved database T-SQL performance.

Created Tableau views with complex calculations and hierarchies making it possible to analyse and obtain insights into large data sets.

Utilized Tableau server to publish and share the reports with the business users.

Responsible for Ad-hoc reporting, which consists of Client and account data reporting.

Environment: MS SQL Server 2012, SQL, Jenkins, HTML, JavaScript, CSS, Oracle, Agile, PLSQL, Git, UNIX, Salesforce, Windows XP, User Acceptance Testing (UAT), Rational Clear Quest, MS Access, Jenkins, GitHub, Rational Test Manager, MS Office.

Education:

Masters : Florida Atlantic University – Boca Raton, Florida

Contact this candidate