Which metric is used to evaluate the performance of foundation models (FMs) for text summarization tasks?