Evaluating LLM Reliability: STED Framework for Structured Outputs