Large language systems (LLMs) have achieved remarkable performances in various natural language processing tasks. Scientific text summarization is a particularly complex task due to the specialized nature of scientific content. Evaluating LLMs on this specific task requires meticulously constructed benchmarks and assessment tools. Several studies