Data Analyst Machine Learning

Location:

Fairfax, VA

Salary:

45$/hr

Posted:

September 10, 2025

Contact this candidate

Resume:

Venkata Naga Sanjay Reddy Narra

DATA ANALYST

Mobile : 551-***-**** Email : ***********.*****@*****.*** Location: Fairfax, Virginia LinkedIn SUMMARY

Data Analyst with 4 years of experience in analyzing large datasets, developing predictive models, and providing data- driven insights to support business decisions in industries such as healthcare and customer retention.

Proficient in Python, SQL, Tableau, and machine learning techniques, skilled in data cleaning, analysis, and visualization to enhance data accuracy and optimize business processes, improving efficiency.

Expertise in building ETL pipelines using Python, Apache Airflow, and cloud platforms such as AWS S3 and Redshift, ensuring scalable and secure data storage and management.

Strong background in applying statistical analysis and predictive modeling (regression, classification models, machine learning) to forecast trends, improve operational efficiency, and drive business outcomes, including reducing churn and improving predictions.

Effective team collaborator with experience in Agile environments, maintaining high standards in data governance, and ensuring data accuracy and project delivery within timelines. TECH - SKILLS

Methodologies: SDLC, Agile, Waterfall

Programming Language: Python, SQL, R

Packages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, TensorFlow, Seaborn, dplyr, ggplot2 Visualization Tools: Tableau, Power BI, Advanced Excel (Pivot Tables, VLOOKUP) IDEs: Visual Studio Code, PyCharm, Jupyter Notebook, IntelliJ Database: MySQL, PostgreSQL, MongoDB, SQL Server

Cloud Platform: Amazon Web Services (AWS), Microsoft Azure Other Technical Skills: SSIS, SSRS, Machine Learning Algorithms, ETL\ELT Tools, Statistics, ServiceNow, Hadoop, Spark, MapReduce, Alteryx, Google Big Query, Power Query, Probability distributions, Mathematics, Confidence Intervals, ANOVA, Hypothesis Testing, Regression Analysis, Linear Algebra, Advance Analytics, Data Mining, Big Data, Data Integration, Data Interpretation, Data Pipeline, Data Visualization, Data warehousing, Data transformation, Data Governance, Data Association rules, Clustering, Classification, Regression, A/B Testing, Forecasting & Modelling, Data Cleaning, Data Wrangling, Descriptive analytics, Git, GitHub, JIRA Operating Systems: Windows, Linux

EDUCATION

George mason University August 2023 - May 2025

Master of Science, Data Analytics and Engineering Fairfax, Virginia Chandigarh University June 2018 - April 2022

Bachelor of Engineering, Mechanical Engineering Chandigarh, Punjab CERTIFICATION

• Acquired Skill Mastery Certification in SQL from Scaler Academy

• AWS Cloud Practitioner (CLF-C02) and Snowflake Foundation Certification from Data Camp

• Completed SQL, Python, R, Power Bi, Tableau certifications from Data Camp PROJECT

Database management system for Public Library

Designed MySQL database with 8+ relational tables to manage library users, materials, and borrowing operations.

Built 15+ optimized SQL queries using joins, sub queries, and windows, increasing data retrieval speed by 40 percent.

Automated overdue tracking and deactivation with SQL triggers, reducing manual effort by 60 percent. Predicting Credit Card Account Cancellations

Built and compared three ML models; Decision Tree achieved 90.49 percent accuracy and 95.05 percent AUC.

Identified churn-prone customers based on demographics, credit limits, spending behavior, and employment type.

Recommended incentives using spending/utilization data to reduce churn and retain revenue-generating customer segments.

Credit Card Default prediction using machine learning

Applied advanced classification algorithms (Random Forest, Gradient Boosting) to predict customer default using 20+ financial and behavioral features.

Applied advanced analytics and PCA to synthesize complex data, enhancing interpretability and model efficiency.

Used SMOTE to balance training data and enhanced minority class recall, improving model sensitivity and fairness EXPERIENCE

Aug 2024 - Present DATA ANALYST Inova Health System, USA

Designed, developed, and implemented advanced machine learning models using Scikit-learn and TensorFlow for anomaly detection to identify and flag potentially fraudulent healthcare claims, improving fraud detection accuracy by 35% while reducing false positive rates by 25%.

Extracted, transformed, and loaded (ETL) large-scale healthcare claims data from multiple sources using PostgreSQL and Apache Spark, ensuring efficient data processing and enabling timely analytics, resulting in a 40% increase in data throughput.

Created dynamic, interactive dashboards and detailed data visualizations with Power BI and Advanced Excel to present actionable fraud insights and trends to cross-functional teams and senior management, enhancing decision-making efficiency by 30%.

Leveraged Amazon Web Services (AWS) tools including S3 for secure data storage and SageMaker for scalable machine learning model training, validation, and deployment in a cloud- based environment, reducing model deployment time by 20%.

Applied machine learning algorithms including Random Forest, Gradient Boosting, and Support Vector Machines (SVM), resulting in a significant increase in model precision and recall for detecting complex fraudulent claim patterns.

Applied advanced statistical analysis methods such as ANOVA and A/B testing to rigorously evaluate model performance, optimize detection thresholds, and measure the impact of fraud detection initiatives, contributing to operational cost savings of 45%.

Automated complex data workflows and preprocessing tasks using Alteryx, resulting in improved data pipeline efficiency by 50%, repeatability, and reduced manual errors by 30%. May 2020 – Jun 2023 DATA ANALYST Wipro, India

Partnered with cross-functional Agile teams (Finance, IT, and Business stakeholders) to gather, document, and prioritize data and reporting requirements using JIRA, ensuring alignment with business objectives and smooth sprint delivery.

Designed automated ETL pipelines using Python (pandas, NumPy) and Apache Airflow, enabling seamless extraction, transformation, and loading of financial data from disparate sources such as Excel and SQL databases. Automation reduced manual processing time.

Authored complex SQL queries in SQL Server to join, filter, and aggregate multi-source financial data for dashboards and ad hoc analysis. Utilized window functions, CTEs, and subqueries to optimize 20% performance and handle large datasets efficiently.

Built dynamic financial dashboards and summary reports using Tableau, integrating slicers, filters, and drill-through capabilities. Empowered stakeholders to explore trends in revenue, expenses, and variances independently, significantly improving reporting turnaround time.

Consolidated operational and financial datasets into a unified analytics layer using Snowflake and Power Query, enabling a single source of truth for enterprise-wide reporting and analysis.

Implemented data governance and compliance protocols through role-based access controls and row-level security, securing sensitive audit and financial data using Azure Blob Storage and SharePoint, ensuring alignment with SOX and GDPR standards.

Established data validation workflows using Great Expectations and custom Python-based quality checks, performing source-to-target reconciliations to maintain data integrity.

Employed unsupervised learning techniques like K-means clustering to segment business units based on performance indicators, guiding targeted strategic initiatives.

Contact this candidate