Arun Kumar Dara
IL +* (***) ***- **** **************@*****.*** Linkedin
SUMMARY
Data Analyst with 3+ years of experience designing and implementing end-to-end data pipelines, predictive models, and interactive dashboards across insurance, banking, healthcare, and manufacturing domains. Proficient in Python, SQL, Power BI, Looker, Snowflake, Databricks, and cloud platforms (AWS, GCP, Azure), with expertise in fraud detection, anomaly detection, customer analytics, and data governance. Skilled at leveraging machine learning, FHIR/HL7, and advanced analytics frameworks to drive actionable business insights, optimize operations, and ensure regulatory compliance. SKILLS
Programming & Data Analysis: Python, R, SQL, Pandas, NumPy, Dask, Data Preprocessing, Feature Engineering Machine Learning & AI: Scikit-learn, XGBoost, LightGBM, CatBoost, SHAP, TensorFlow, PyTorch, Predictive Modeling, Anomaly Detection, Dimensionality Reduction
Statistical & Analytical
Techniques:
A/B Testing, Scenario Analysis, Data Storytelling, Model Evaluation Data Visualization & BI Tools: Power BI, Tableau, Plotly Dash, R Shiny, Looker, Interactive Dashboards, Drill-down Analysis Databases, Big Data & ETL: Snowflake, Delta Lake, BigQuery, Azure Synapse, Redis, PySpark, Apache Beam, Databricks, ETL Pipeline Design, Apache Airflow, dbt
Cloud & Deployment Platforms: AWS SageMaker, AWS Lake Formation, Google Cloud, Azure Methodologies: Agile/Scrum, Statistical Modeling, Predictive Analytics, Experimental Design Data Governance & Compliance: Great Expectations, Collibra, Soda Core, FHIR/HL7 Standards, SOX Compliance, Neo4j, GraphQL APIs, Slack Webhooks, Notion, Confluence, JIRA PROFESSIONAL EXPERIENCES
Data Analyst AIG, USA Jun 2025 – Present
Engineer real-time claims ingestion pipelines using Google Pub/Sub, Dataflow, and BigQuery. Secure sensitive FNOL data with Google Cloud DLP to ensure regulatory compliance.
Design and maintain a claims data warehouse in BigQuery, applying advanced SQL transformations and Soda Core validations to maintain high data accuracy and support actuarial reporting.
Develop and refine anomaly detection models in Python (CatBoost) incorporating policy, exposure, and FNOL text features. Improve fraud detection precision while reducing false positives.
Automate real-time anomaly scoring through Vertex AI Endpoints and integrate alerts with Slack Webhooks, enhancing fraud detection efficiency for SIU teams.
Build and update interactive dashboards in Power BI, embedding row-level security and drill-down functionality to deliver actionable insights and streamline triage processes.
Establish and monitor data governance practices by documenting lineage in Data Catalog and tracking model drift using Evidently AI, ensuring sustainable performance across business units. Data Analyst Intern AIG, USA Jan 2025 – May 2025
Assisted in developing a real-time claims ingestion pipeline using Google Pub/Sub, Dataflow, and BigQuery, ensuring secure capture of FNOL data with Google Cloud DLP.
Supported the creation of a claims data warehouse in BigQuery, performing SQL transformations and Soda Core data quality checks to improve reporting accuracy and reliability.
Contributed to an anomaly detection model in Python (CatBoost) by preparing datasets, engineering features, and evaluating model performance, helping identify potentially fraudulent claims.
Collaborated on real-time scoring deployment in Vertex AI, integrating alerts through Slack Webhooks, which reduced manual monitoring efforts for SIU investigators.
Built interactive dashboards in Looker under mentorship. Visualized claims trends and anomalies to support faster triage and decision-making for SIU teams.
Data Analyst MPhasis, India Aug 2022 – July 2023
Engineered scalable data pipelines using SQL, Apache Airflow, and dbt to integrate cross-border banking transactions into Snowflake, enhancing AML anomaly detection and reducing compliance risks.
Developed predictive models for customer lifetime value (CLV) in Python (LightGBM, SHAP) on Databricks, consolidating structured and semi-structured CRM and transactional data to improve upsell conversions and insurance policy renewals.
Standardized healthcare claims data by designing normalization workflows with PySpark and FHIR/HL7 frameworks, shortening patient risk scoring cycles from 2 weeks to 3 days and enabling precision-medicine analytics.
Created interactive dashboards in Tableau using multimodal logistics datasets from GraphQL APIs and Neo4j, enabling inventory rebalancing and reducing warehouse stockouts.
Implemented data quality frameworks with Great Expectations and AWS Lake Formation, automating validation checks for sensitive banking data and cutting manual exception handling by 40%.
Designed and evaluated scenario-based A/B experiments in R (ggplot2, Statsmodels), presenting insights via dashboards that optimized digital loan origination funnels and boosted conversions. Data Analyst Hexaware technologies, India Jan 2021 – July 2022
Developed churn prediction models using Python (Scikit-learn, XGBoost) and AWS SageMaker, reducing customer attrition by 18% in the banking sector.
Transformed healthcare and retail datasets with advanced SQL in Azure Synapse, accelerating actuarial risk assessments, securing
$2.5M in new business, and increasing cross-sell revenue by 22%.
Engineered healthcare data lakes with Delta Lake (FHIR/HL7) and centralized Snowflake marts, streamlining compliance audits and enabling faster, data-driven underwriting.
Built and visualized interactive dashboards by integrating manufacturing telemetry (PySpark, GraphQL APIs) and financial risk models (R Shiny, Monte Carlo simulations, ggplot2) into Power BI and Plotly Dash, enabling real-time defect tracking and safeguarding 50+ crore AUM.
Strengthened data governance using Great Expectations and Collibra, automating financial data quality checks for SOX compliance and reducing reconciliation workloads.
Enhanced credit scoring models by optimizing Databricks feature stores (UMAP, PCA), improving predictive precision by 15%, and led workshops on reusable analytics frameworks via Notion/Confluence, accelerating deployment of data science solutions. EDUCATION & CERTIFICATIONS
Masters in Science in Management – Data Analytics Specialization, Indiana Wesleyan University, IL May 2025 Bachelor of Technology in Engineering, TKR College of Engineering & Technology, India Aug 2022
Power BI Data Analyst Associate
PROJECTS
Patient Condition Classification Using Drug Reviews
Designed an end-to-end NLP pipeline using Python (Pandas, NumPy, and Scikit-learn) to process 10,000+ patient drug reviews, extracting key side effects and sentiment indicators.
Built and optimized classification models (Logistic Regression, Random Forest, XGBoost) with GridSearchCV/RandomizedSearchCV, achieving 85% accuracy in predicting patient conditions.
Automated preprocessing workflows (tokenization, lemmatization, TF-IDF, sentiment scoring), cutting manual data cleaning by 40% and enabling scalable clinical insights.
Integrated demographic and health data to enrich predictions and deployed REST APIs via Flask for real-time drug effectiveness checks by healthcare practitioners.
Delivered actionable insights through interactive dashboards (Power BI, Tableau) and structured reports, improving treatment personalization and clinical decision-making.
Stock Market Analysis for Reliance Industries
Analyzed 8 years of stock data via Python & Yahoo Finance API, uncovering historical trends to guide trading strategies.
Developed ARIMA, LSTM, and Random Forest forecasting models with technical indicators (MACD, RSI, and Bollinger Bands), achieving robust price predictions.
Automated data pipelines for continuous updates and integrated real-time dashboards in Streamlit, enabling investors to act on live forecasts.
Designed portfolio optimization and quantitative trading algorithms, balancing returns with risk in volatile markets.
Delivered detailed risk & performance reports, helping stakeholders evaluate scenarios and make data-driven investment decisions.
Book Recommendation System with Chatbot Integration
Built a hybrid recommendation engine combining collaborative filtering and NLP-based content models to deliver personalized book suggestions.
Integrated a conversational AI chatbot for seamless discovery, with A/B tested dialogue flows that boosted user satisfaction.
Deployed the solution via Flask & Streamlit with a PostgreSQL-backed preference database for adaptive learning of user tastes.
Leveraged external APIs for real-time metadata enrichment and presented engagement insights through interactive dashboards.
Enhanced UX by analyzing behavior metrics and implementing improvements, increasing user retention by 30%.