Meghana G
*******************@*****.*** 832-***-**** LinkedIn Texas, USA (Open to Relocate)
Professional Summary
Data Analyst and Engineer with 4+ years of experience building ETL/ELT pipelines, cloud data warehouses, and BI dashboards across insurance, healthcare, and technology domains. Proficient in Python, SQL, Snowflake, Apache Spark, and dbt across AWS and Azure environments. Proven ability to translate large-scale datasets into predictive models and executive dashboards that directly improve operational efficiency and business decisions. Technical Skills
• Languages: Python, SQL, PySpark
• Data Engineering: Apache Spark, ETL/ELT Pipelines, dbt, Apache Airflow, Apache Kafka, Data Modeling, Data Warehousing
• Cloud Platforms: Snowflake, BigQuery, AWS (S3, Lambda, Redshift), Azure Data Lake, Azure Databricks
• Databases: MySQL, SQL Server, PostgreSQL
• BI & Visualization: Power BI, Tableau, Advanced Excel (VBA, Power Query)
• Machine Learning: Scikit-learn, Predictive Modeling, A/B Testing
• Tools: Git, Docker, JIRA
Experience
Hartford Financial Services Group, USA Jan 2025 – Present Data Analyst
• Processed 5M+ insurance policy and claims records using Python, SQL, and Snowflake, building a unified risk segmentation model that improved accuracy by 18%.
• Orchestrated ETL workflows using AWS S3, Lambda, and SQL Server to consolidate structured and semi-structured claims data into Snowflake and BigQuery, reducing manual reporting effort by 35% and freeing 8+ analyst hours weekly.
• Implemented Random Forest and Decision Tree models using Python and Scikit-learn to forecast claim volumes and premium trends, lowering quarterly forecast variance by 22%.
• Designed Tableau and Power BI dashboards tracking underwriting KPIs, loss ratios, and policy performance, enabling leadership to identify risk trends and make decisions 30% faster.
• Applied NLP techniques using Python and spaCy on 200K+ customer complaints and adjuster notes, classifying claims into 12 risk categories and improving categorization efficiency by 28%. Alephys LLC, USA Jan 2023 – Jul 2023
Data Engineer
• Migrated legacy Hadoop-based batch pipelines to Cloudera Data Platform (CDP) using Apache Spark and Python, reducing infrastructure costs by 30% and cutting average batch processing time from 6 hours to under 2 hours.
• Engineered ingestion pipelines on the migrated Cloudera and Snowflake stack using Python, Apache Spark, Apache Kafka, and AWS Lambda, processing 100M+ transactional records daily and reducing data processing costs by 25% over the legacy Hadoop infrastructure.
• Reduced end-to-end pipeline latency by 40% post-migration, enabling analytics teams to access same-day data instead of waiting on overnight Hadoop batch cycles.
• Optimized dbt transformation models and SQL queries across the Snowflake reporting stack, reducing BI dashboard load time from 8 seconds to under 3 seconds and cutting compute costs by 32%.
• Built microservices-based ingestion APIs using Python and AWS Lambda, serving 500+ daily API requests and reducing end-to-end data latency by 40%, enabling product and marketing teams to launch campaigns 30% faster.
• Delivered Tableau dashboards on standardized Snowflake data models tracking product adoption, funnel drop-off, and marketing ROI, reducing team decision cycle time by 25% and enabling weekly revenue planning. Fusion Software Technologies, India Jul 2020 – Dec 2022 Data Analyst
• Analyzed 2M+ EHR records, patient demographics, and treatment histories using Python and SQL to surface clinical patterns, improving readmission prediction accuracy by 18%.
• Constructed ETL frameworks using Python, MySQL, Azure Data Lake, and Snowflake to consolidate data from 5+ disconnected hospital source systems, reducing integration time by 40% and enabling clinical teams to access unified patient data for care decisions.
• Developed Power BI dashboards with Snowflake direct query for 10+ hospital departments tracking bed occupancy, readmission rates, and discharge KPIs, cutting staffing decision time by 50% for 200+ clinical staff.
• Reduced Power BI reporting turnaround from 4 days to 2 days by scheduling automated Snowflake refresh jobs via Python, enabling department leads to adjust staffing and resource allocation same-day.
• Automated 20+ weekly reports across 5 departments using Python and Advanced Excel (VBA, Power Query), cutting turnaround from 10 hours to under 7 and saving 15+ analyst hours weekly across departments. Projects
Hospital Readmission Risk Pipeline Python Airflow Snowflake dbt Power BI Docker 2024
• Identified that hospital care teams had no automated early-warning system for high-risk patients, leading to reactive interventions and avoidable readmissions built an end-to-end pipeline ingesting daily EHR batch feeds into Snowflake via Airflow DAGs with automated failure alerting and retry logic.
• Transformed raw clinical records using dbt into a clean analytics mart across diagnosis, vitals, and discharge tables, then trained a Gradient Boosting classifier on 15 engineered features achieving 82% prediction accuracy.
• Surfaced daily risk scores through a Power BI dashboard showing 30-day readmission trends per department, enabling care teams to prioritize high-risk patients for intervention and reducing avoidable readmissions by 12%. E-Commerce Sales Analytics Pipeline Python SQL dbt Tableau 2023
• Solved a fragmented data problem where 5+ disconnected sales sources gave leadership no unified view of product performance built automated ELT pipelines using Python, SQL, and dbt consolidating 1M+ monthly records into a single Snowflake analytics layer.
• Conducted A/B analysis using Python and pandas on promotional campaign data, identifying 2 high-converting customer segments and integrating findings into Tableau dashboards used weekly by the marketing team to guide spend allocation.
• Enabled leadership to reallocate budget to top-performing product lines based on real-time revenue insights, improving sales forecast accuracy by 18% and reducing manual reporting workload by 40%. Credit Risk Prediction & Analytics Dashboard Python SQL Scikit-learn Tableau 2023
• Addressed the challenge of inconsistent manual credit reviews where loan officers relied on subjective judgment with no data-driven scoring model evaluated 50K+ loan applications using Python and SQL to engineer risk features across credit history, income bands, and delinquency patterns.
• Built a Logistic Regression classifier achieving 85% prediction accuracy, validated on a held-out test set, and integrated risk scores into a Tableau dashboard enabling loan officers to benchmark applicants against portfolio risk thresholds.
• Reduced average credit review time by 60% and cut high-risk loan approvals by 20%, giving the lending team a repeatable, auditable framework for credit decisions.
Education
Master of Science – Computer and Information Sciences University of North Texas, USA May 2025 Bachelor of Technology – Computer Science and Engineering Vignan Institute of Technology and Science, India May 2020