Monday, August 26, 2019 - 13:30

Efficiency Metrics for Data-Driven Models: A Text Summarization Case Study

Erion Çano (ÚFAL MFF UK)

Data-driven models are heavily used today for solving text summarization, title generation, similar NLP problems and in general many complex problems from different domains. However, most of the literature studies benchmark their proposed models using accuracy scores only, and nothing is known about the data efficiency of those models. I have defined three data efficiency metrics: data score efficiency, data time deficiency and overall data efficiency. I also propose a simple scheme that uses these metrics for a more comprehensive evaluation of popular data-driven methods applied in NLP or other types of tasks. The experimental results indicate that among the tested models, Transformer is the most data efficient on text summarization and title generation.