Data Analyst Machine Learning

Location:

Owings Mills, MD

Salary:

$100k

Posted:

April 24, 2025

Contact this candidate

Resume:

Vadde Anne Keziah

Data Analyst

About Me

Highly skilled and detail-oriented Data Analyst with 5+ years of experience in analyzing complex datasets, creating data-driven insights, and delivering actionable recommendations to drive business growth. Proficient in statistical analysis, data visualization, and using tools like SQL, Python, R, and Tableau to uncover trends and optimize processes.

Profile Summary

5+ years of industry experience with solid understanding of Data Modeling, Evaluating Data Sources, and strong understanding of Data Warehouse/Data Mart Design, ETL, BI, Data visualization, OLAP, Client/Server applications.

Accomplished various tasks in big data environment which involved Microsoft Azure Data Factory, Data Lake, and SQL server. Experience with R (dplyr, ggplot2, tidyverse) for statistical analysis.

Excellent in Data Analysis, Data Profiling, Data Validation, Data Cleansing, Data Verification, and Data Mismatch Identification. Experience with Apache Airflow, Prefect, Luigi for workflow automation.

Proficient in Power BI, Tableau, Looker, QlikView, Google Data Studio for creating insightful dashboards and reports.

Strong experience with Excel (Advanced), Pivot Tables, Power Query, VBA, and Google Sheets for data manipulation.

Expertise in SQL (MySQL, PostgreSQL, SQL Server, Oracle, Snowflake, BigQuery, Redshift) for querying large datasets. Experience in different Project Management Methodologies like Traditional waterfall and Agile.

Experience with NoSQL databases (MongoDB, Cassandra, DynamoDB) for handling unstructured data.

Proficient in Python (Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn) for data analysis and automation.

Hands-on experience with Azure Data Factory, AWS Glue, Google Cloud Dataflow for ETL pipelines.

Strong knowledge of Informatica, Talend, Alteryx, SSIS, DBT (Data Build Tool) for data transformation.

Expertise in creating KPI dashboards, ad-hoc reports, and data storytelling to drive business decisions.

Experience with A/B testing, forecasting models, regression analysis, hypothesis testing using statistical tools.

Knowledge of machine learning techniques for predictive analytics. Version control with Git, GitHub, Bitbucket for collaboration. Adept at designing advanced statistical models and hypothesis testing to support business strategy using tools like SciPy. Familiar with Spark (PySpark), Hadoop, Databricks for big data processing.

Strong experience in creating interactive dashboards and reports using Looker for data exploration and decision-making.

Extensive experience in working with Snowflake for data warehousing, including building scalable data models, optimizing queries, and implementing secure data sharing across multiple teams.

Proficient in applying machine learning algorithms such as regression, classification, clustering, and time series forecasting to analyse trends, predict outcomes, and support data-driven business decisions.

Used Python to select attributes by RFE, build models and train models, measure how well the model performs by MSE or Accuracy, predict the future by final models.

Used regression to predict values with Python including Linear Regression, Decision Tree, Random Forest and classification to predict class with Python including SVM, K-NN.

Solid experience in data visualization using Tableau, Google Data Studio in the Business Intelligence Industry to connect MySQL database to develop reports, tables, charts, dashboards. Used Excel (Pivot Table, VLOOKUP, VBA) to statistically analyze structured data, visualize data and Power Point to do presentation.

Key Skills

Power BI – DAX, Power Query, Data Modeling, Report Building

Tableau – Dashboards, Calculated Fields, Tableau Prep

Looker – LookML, Data Exploration, Looker Studio

QlikView/Qlik Sense – Data Load Scripting, Set Analysis

Google Data Studio – Report Creation, Data Blending

Excel (Advanced) – Pivot Tables, Power Query, VBA, Macros

Database & Querying: SQL (MySQL, PostgreSQL, SQL Server, Oracle, Snowflake, BigQuery, Redshift)

NoSQL Databases – MongoDB, Cassandra, DynamoDB

Programming & Scripting: Python, Pandas, NumPy, Matplotlib, Seaborn, SciPy, Scikit-learn, R, JavaScript

Shell Scripting – Bash, PowerShell for automation

Cloud Platforms – AWS (S3, Redshift, Athena), Azure (Synapse, Data Factory), GCP (BigQuery, Dataflow)

Big Data Processing – Apache Spark (PySpark), Hadoop, Databricks

Data Warehousing – Snowflake, Amazon Redshift, Google BigQuery

ETL & Data Integration Tools: Informatica, Talend, Alteryx, SSIS, DBT (Data Build Tool), Apache Airflow, RESTful APIs, Web Scraping, Kafka

Statistical & Machine Learning Skills: Statistical Analysis (Hypothesis testing, A/B Testing, ANOVA), Predictive Modeling (Regression, Classification, Clustering)

Version Control – Git, GitHub, Bitbucket

Project Management – JIRA, Confluence, Agile/Scrum methodologies

CI/CD for Data Pipelines – GitHub Actions, Jenkins

Work Experience

Oct 2023 – Present

Senior Data Analyst CareFirst BlueCross BlueShield, Baltimore, Maryland, USA

CareFirst BlueCross BlueShield is a not-for-profit health insurance provider that offers a range of health plans and services. I analyze large datasets using tools such as SQL, Python, R, or Excel to identify trends, patterns, and insights. Create dashboards and reports in business intelligence (BI) tools such as Tableau, Power BI, or Looker.

Responsibilities:

Used Hadoop, HDFS, MapReduce, Hive, Pig and Spark to manage data processing in AWS(EMR) and stored big data in AWS(S3) as our personal marketing website.

Provided scaled solutions for bug requests, resolved SQL performance issues. Ran SQL in AWS(RDS) for data requests.

Used AWS (S3, EC2, RDS, EMR) to report data to marketing teams.

Set database triggers for constrains, defined, executed and interpreted simple to complex SQL queries in AWS(RDS), involving correlated subqueries, non-trivial joins, self joins, grouping, aggregations and window functions, tracking and identifying products component failure rate metric based on the components test result.

Deployed AWS (S3, EC2, RDS, EMR) to assemble, storage, manage and report data across multiple sources.

Used MySQL to load data from database, extract and retrieve summary statistics of key metrics for sentimental analysis in AWS(RDS). Published Power BI Reports in the required originations and Made Power BI Dashboards available in Web clients and mobile apps, Power Apps

Used Power BI, Power Pivot to develop data analysis prototype, and used Power View and Power Map to visualize reports. Used Dataflows, Power Automate features to work efficiently for further dashboards

Involved in creating new stored procedures and optimizing existing queries and stored procedures.

Applied advanced classifying algorithms such as Decision Tree, Random Forests, SVM and Gradient Boosting to training data using Scikit-learn and NLTK packages in Python. Conducted feature engineering, model diagnosis and validation, adjust parameters by cross-validation and grid research in python.

Used RMSE to evaluate models how well it performs in this data sets in Python. Combined Python and Tableau to create and modified Tableau worksheets and dashboards by performing Table level calculations.

Created Doker images base on Doker files and pushed to Docker Hub repositories. Used Kubernetes control to deploy the images to Google Cloud, describe the pod information and get the deployment information.

Used MySQL to clean, transform, join and merge data to meet the business requirement, exploded text data for calculating and storing sentiment of each tweet in AWS(RDS).

Engaged in Python with Numpy, SciPy, Pandas, PySQL and PySpark to load and process large-scaled data about customers of a credit card company for credit scores classification.

Worked with time series data, and perform various statistical and (Al) artificial intelligence, (ML) machine learning algorithm such as LMS, regression, filtering, correlation, neural network.

Designed the (Al) artificial intelligence, (ML) Machine learning data pipeline for regular monitoring and performance evaluation of the deployed ML models. Created DAX Queries to generate computed columns in Power BI. Created Row level security with power BI and integration with power BI service portal.

Understand the business requirements and develop data models accordingly by taking care of the resources.

Environment: Hadoop, HDFS, MapReduce, Hive, Pig, Spark, AWS, artificial intelligence, machine learning, DAX Queries, Power BI, Power Pivot, dashboards, Google Cloud, Doker, Kubernetes

May 2022 - Sep 2023

Data Analyst University of Maryland Medical System, Baltimore, Maryland, USA

Gilead Sciences is a biopharmaceutical company that develops and sells medicines to treat life-threatening diseases. I provided actionable insights to stakeholders across different departments (marketing, sales, operations) to support decision-making. Identified key performance indicators (KPIs) and create data visualizations to track business performance.

Responsibilities:

Work with Azure Data explorer and use Kusto Query Language (KQL) for querying. Imported data from SQL Server DB, Azure SQL DB to Power BI to generate reports.

Created reports using dynamic column selection, conditional formatting, sorting, and grouping

Extensively used SSIS transformations such as Lookup, Derived column, Data conversion, Aggregate, Conditional split, SQL task, Script task and Send mail task etc. Regular responsibilities for updating staging and dimensional databases as well as rebuild the Dimensions and Cubes in Analysis Services (SSAS). Used Matplotlib in Python to generate plots, histograms, power spectra, bar charts, time-series charts, error charts, scatterplots, correlation tables, etc.

Combined Matplotlib in Python and Google Data Studio to generate plots, histograms, power spectra, bar charts, time- series charts, error charts, scatterplots, correlation tables, etc. to create reports and do presentation.

Created and joined multiple tables using MySQL, exploded text data for calculating and storing sentiment of each tweet.

Set database triggers for constrains, defined, executed and interpreted simple to complex SQL queries in MySQL, involving correlated subqueries, non-trivial joins, self joins, grouping, aggregations.

Used SAS to plot histograms, scatterplots, box-plots, correlation tables and used MS SQL Server to pivot summary statistics. Categorized data in SAS to build Hypothesis Tests, including t-test, F-test, ANOVA, MANOVA, and statistics modeling. Developed reports and Dashboards from multiple data sources using Data blending.

Used Power BI Gateways to keep the dashboards and reports up to date.

Representing data using the Cross Tabs, Scatter Plots, Geographic Maps, Pie Charts, Bar Charts and Tree Maps.

Combine the visualizations into Interactive Dashboards and publish them to the web

Worked on all types of transformations that are available in Power BI Query editor.

Environment: Azure Data explorer, Kusto Query Language, Cross Tabs, Scatter Plots, Geographic Maps, Pie Charts, Bar Charts, Tree Maps, MySQL, Lookup, Power BI

Jan 2021 - Jan 2022

Data Analyst Nomura Holdings, Mumbai, India

Nomura Holdings, Inc. is a financial services company that provides investment banking, wealth management, and other services globally. Gathered, cleaned, and managed large datasets from various sources, ensuring accuracy and integrity. Performed the data extraction, transformation, and loading (ETL) for streamlined data processing.

Responsibilities:

Designed and worked on extraction, Transformation and Loading (ETL) process by pulling up large volume of data from various data sources using SSIS. Connected to Excel to use Pivot Table to get summary statistics and implemented statistical analysis plan to compare product performance horizontally, vertically.

Built data pipeline for daily reporting to track products performance from databases and set up automated systems using VBA to pull data smoothly into BI platform to create reports. Strongly experience in data mining and machine learning with Python including Linear Regression, Logistic Regression, SVM, Decision Tree, Random Forest, PCA, XGB, K-NN.

Created Dax Queries to generate computed columns in Power Bl.

Designed SSIS packages to transfer the data from flat files, Access database and excel documents to the Staging Area in SQL Server 2012. Established business questions and creating database based on the schema of main features and key metrics by SQL in AWS (RDS).

Used Hadoop, HDFS, MapReduce, Hive, Pig and Spark to manage data processing in AWS (EMR).

Stored and retrieved vast amounts of data sets in AWS (S3) as our personal marketing website.

Defined, executed and interpreted simple to complex SQL queries in AWS (RDS), involving correlated subqueries, non-trivial joins, self joins, grouping, aggregations and window functions, to secure track and analyze product sales in AWS (EMR, EC2). Provided scaled solutions for bug requests, resolved SQL performance issues in AWS(RDS).

Used the Power BI interface to effectively create powerful visualizations. Generated computed tables in Power BI by using DAX. Designed cubes with star schema using SQL Server Analysis Services (SSAS).

Understanding the OLAP processing for changing and maintaining the Warehousing Optimizing Dimensions, Hierarchies and adding the Aggregations to the Cube.

Support pricing strategy optimization through analysis of competitive intelligence data and collaborate with DevOps teams to deploy analytics solutions via CI/CD pipelines.

Build real-time dashboards for claims adjudication tracking using Kafka or Stream Analytics and Conduct cost-benefit analysis of insurance campaigns using ROI metrics. Optimize and manage Snowflake data warehouses for storing and analysing large-scale insurance data, ensuring high performance and cost efficiency.

Creating Row level security with power BI and integration with power BI service portal.

Environment: ETL, VBA, OLAP, SQL queries, AWS, SSAS, SQL, Tableau, Power BI, Python, GDPR, HIPAA, machine learning, Pandas, Python scripting, NoSQL databases MongoDB, Cassandra, Hadoop, Spark, CI/CD pipelines, Kafka, Snowflake

Mar 2019 - Dec 2020

Data Analyst Transasia Bio-Medicals Ltd., Mumbai, India

Transasia Bio-Medicals Ltd. is an in-vitro diagnostics company that manufactures blood analyzers and reagents. Automated the data collection and reporting processes using scripting languages (Python, SQL) or BI tool functionalities. Optimized and streamlined data processes to enhance data processing efficiency.

Responsibilities:

Created different chart objects like Bar Chart, Line Chart, Text Tables, Tree Maps and Scatter Plot.

Coordinated with source system owners, day-to-day ETL progress monitoring, Data warehouse target schema design (Star Schema) and maintenance. Develop and maintain interactive dashboards and reports using tools like Power BI, Tableau, or Excel to present key performance indicators (KPIs) for insurance management.

Created interactive and visually compelling dashboards using Power BI and Tableau to monitor key healthcare metrics, providing real-time insights for decision-makers and improving patient care outcomes.

Utilized NumPy for efficient data manipulation, Matplotlib and Seaborn for visualizing healthcare trends, and SciPy for conducting statistical analyses to identify patterns and correlations in patient data.

Integrate data from multiple sources, including policies data, and claims, ensuring accuracy and consistency using SQL, Python, and ETL tools. Used Sets, Bins, Parameters, Aggregate Functions, Filters.

Conduct predictive modelling and statistical analysis to forecast patient outcomes, readmission rates, and insurance trends using R, Python, or SAS. Use Tableau Desktop to analyze and obtain insights into large data sets.

Manage large datasets and databases (e.g., MySQL, SQL Server, Oracle) to ensure timely and accurate data retrieval for decision-making. Preparing Dashboards using calculations, parameters in Tableau.

Automate routine data reporting tasks and processes using SQL queries, Python, or PowerShell scripts to increase efficiency. Developed and maintained various cubes using MDX complex queries in SSAS.

Environment: Power BI, Tableau, Excel, SQL, Python, ETL, Python, SAS, SQL Server, Oracle, PowerShell, scikit-learn, Informatica, Alteryx, Hadoop, Apache Spark, Kafka, Snowflake.

************@*****.***

443-***-****

Contact

Education

Masters in Computer Science from

Saint Peter’s University

Contact this candidate