Evaluating LLM Reliability: STED Framework for Structured Outputs
Evaluating LLM Reliability: STED Framework for Structured Outputs
Created using ChatSlide
This research focuses on evaluating the reliability of structured outputs generated by Large Language Models (LLMs), addressing key challenges like JSON data evaluation and scoring consistency. A novel framework, STED, is introduced utilizing tree structures, the Hungarian algorithm, and normalization techniques for multi-level consistency scoring. Experimental validation is conducted on synthetic datasets across diverse models, analyzing schema and content variations. Findings highlight the...