AWS AI Practitioner
A company is evaluating several large language models (LLMs) for a text summarization task. The company needs to select a metric to evaluate the quality of the summaries that the LLMs generate. Which metric will meet this requirement?
A
Recall
B
Area under the ROC curve (AUC)
C
Recall-Oriented Understudy for Gisting Evaluation (ROUGE)
✓ Correcta
D
Mean squared error (MSE)