Yogendra Chandra Hasan Karri
+* (***) *** - **** **************@*****.*** Jersey City, NJ LinkedIn PROFESSIONAL SUMMARY
I am a results-driven and detail-oriented Data Analyst with a strong foundation in data engineering, cloud platforms, and advanced analytics, bringing hands-on experience from impactful roles at Cigna and HDFC ERGO General Insurance. Skilled in building end-to-end ETL pipelines, real-time streaming solutions, and predictive models, I have worked with diverse tools such as Talend, Apache Kafka, Azure Data Factory, Snowflake, and Python to deliver data-driven insights that improve operational efficiency, detect fraud, and enhance decision-making. My expertise spans data governance, visualization with Tableau and Power BI, and applying machine learning techniques for actionable outcomes. With a proven track record of optimizing healthcare and insurance analytics, I combine technical proficiency with a problem-solving mindset, ensuring data solutions are accurate, secure, and aligned with business goals. Certified in Azure Data Engineering and Power BI, I bring both the technical depth and business acumen to bridge the gap between complex data systems and strategic insights. TECHNICAL SKILLS
Technical Category Technical Skills
Data Engineering & ETL
Talend, Apache Kafka, Azure Data Factory, SSIS, HL7 Parsing, Alteryx, REST API Integration, Apache Atlas, Collibra Databases & Data Warehousing
SQL Server, Snowflake, Azure SQL Database, Amazon S3, Azure Blob Storage
Programming & Scripting
Python (Pandas, NumPy, Scikit-learn, NLTK), R, Jupyter Notebook, Power Query
Big Data & Cloud Computing
Apache Spark, AWS Lambda, Azure ML Studio, Amazon S3, Azure Cloud Services
Business Intelligence &
Visualization
Tableau, Power BI, Excel, SSRS
Machine Learning & Analytics
Predictive Modeling, Statistical Analysis, XGBoost, CNN, LSTM, Sentiment Analysis
Data Quality & Governance
Great Expectations, Data Cataloging, HIPAA Compliance, Data Validation
Collaboration & DevOps Git, GitHub, Jira, Confluence EXPERIENCE
Cigna Jan 2024 – Present
Data Analyst New York City
• Designed and optimized complex SQL Server queries integrated with Talend ETL workflows to consolidate patient, hospital, and insurance claim data into Snowflake for unified analytics.
• Leveraged Python (Pandas) with Apache Spark to process millions of claim records in near real time, enabling faster fraud detection and cost analysis.
• Built interactive Tableau and Power BI dashboards connected to Snowflake, delivering executive-level KPIs on patient outcomes, hospital efficiency, and cost reduction.
• Developed HL7-compliant data ingestion pipelines using Talend and Apache Kafka, ensuring standardized and secure transfer of healthcare data across multiple hospital systems.
• Utilized Scikit-learn with R statistical models to predict patient readmission probabilities and forecast disease trends, driving proactive care management.
• Implemented AWS Lambda with Amazon S3 for serverless processing and secure storage of large-scale medical reports and lab results, improving data retrieval efficiency.
• Automated repetitive data preparation tasks in Alteryx and integrated outputs into Tableau for real-time visualization of emergency admissions and claim approvals.
• Applied Great Expectations within Talend ETL flows to validate incoming hospital data, ensuring 100% compliance with quality and format standards before loading into Snowflake.
• Integrated REST APIs with Apache Kafka to stream live patient vitals into dashboards, enabling care teams to respond to critical cases instantly.
• Created scalable big data pipelines in Apache Spark with AWS Lambda triggers to handle spikes in claim processing without impacting system performance.
• Designed and maintained a centralized data catalog in Apache Atlas, linking datasets from SQL Server, Amazon S3, and Snowflake for faster analyst self-service.
• Implemented Collibra governance policies to control user access to sensitive health data, ensuring HIPAA compliance and audit readiness.
• Developed multi-source ETL frameworks in Talend to merge historical claims data from SQL Server with live Kafka streams for comprehensive trend analysis.
• Built predictive fraud detection models in Scikit-learn and deployed outputs to Power BI dashboards for the insurance risk team.
• Collaborated via Jira and GitHub to manage code versions, dashboard iterations, and cross-team task dependencies, reducing delivery delays.
• Orchestrated end-to-end healthcare data processing combining HL7 parsing, Talend transformation, and Snowflake warehousing for consistent analytics.
• Conducted advanced statistical analysis in R and visualized findings in Tableau to help hospital administrators improve recovery times and service quality.
• Designed streaming analytics architecture using Apache Kafka, AWS Lambda, and Tableau to provide near real-time monitoring of ICU patient metrics.
HDFC ERGO General Insurance Aug 2021 – Mar 2023
Data Analyst Mumbai, India
• Developed and maintained ETL workflows using azure data factory and SSIS to automate data ingestion from multiple sources into azure SQL database for claim analysis.
• Cleaned and transformed large insurance datasets using Python (pandas, numpy) and power query, resulting in 98% clean and structured data for downstream analytics.
• Built interactive dashboards in power bi and excel to visualize claim trends, fraud hotspots, and customer demographics, helping reduce reporting time by 40%.
• Analyzed claim patterns and customer behavior using python and jupyter notebook, providing insights that led to the detection of unusual claim spikes across regions.
• Supported fraud detection efforts by preparing training and testing datasets for scikit-learn models in Azure ML Studio, improving model accuracy to 87%.
• Integrated structured and unstructured data sources using azure data factory and stored them in azure blob storage for scalable and secure access.
• Created real-time reporting dashboards in Power BI using live connections to azure sql database, enabling proactive monitoring of high-value claims.
• Performed statistical analysis using python and R (basic level) to identify correlations in policyholder behavior and claim submission frequency.
• Developed rule-based and predictive analytics using python and scikit-learn to flag potential duplicate or suspicious claims for manual review.
• Utilized git and github for version control and collaboration, ensuring seamless teamwork on data pipelines and analytics scripts.
• Documented analytical processes, findings, and fraud detection patterns using confluence while managing project deliverables and sprint tasks in jira.
• Automated the extraction and transformation of policyholder and agent data using SSIS and azure data factory, reducing manual processing efforts by 60%.
• Conducted deep-dive analysis on customer segmentation and claim types using jupyter notebook and visualized key metrics using power bi for strategic planning.
• Collaborated with cross-functional teams to identify fraud indicators by combining insights from python analytics and business rules defined in SQL.
• Created custom SSRS reports and excel dashboards for claims operations teams to highlight high-risk profiles and claim frequency trends.
• Enabled large-scale data storage and access by designing Azure blob storage structures integrated with analytics workflows in azure ml and Power BI.
ACHIEVEMENTS
• Predicted high-risk hospitals with 90% accuracy using machine learning models built in Python with Scikit-learn and XGBoost, enabling Stryker to proactively address equipment issues, reduce maintenance costs by $2 million, and improve client satisfaction.
• Played a pivotal role in reducing fraudulent insurance claim approvals by 35% by preparing high-quality datasets using Python and SQL, enabling the successful deployment of predictive fraud detection models in Azure ML Studio. ACADEMIC PROJECT
Project Title: Sentiment Analysis in Customer Reviews Using Machine Learning (Tech Stack: Python, Scikit-learn, Pandas, NLTK, Jupyter Notebook, Matplotlib)
Project Description:
• Built a machine learning model to analyze sentiment in Amazon customer reviews using both supervised and unsupervised learning, enabling businesses to gain insights for improving products and services. Project Title: Enhancing Visual Question Answering with Hybrid Deep Learning Models (Tech Stack: Python, TensorFlow, Keras, OpenCV, CNN, LSTM)
Project Description:
• Developed a Visual Question Answering system combining CNNs for image understanding and LSTMs for question processing to generate accurate, context-aware responses.
CERTIFICATIONS
• Microsoft Certified: Azure Data Engineer Associate
• Certified Power BI Data Analyst Associate (PL-300) EDUCATION
Master of Science in Data Science NEW JERSEY INSTITUTE OF TECHNOLOGY Bachelor of Technology in Computer Science SRM INSTITUTE OF TECHNOLOGY