Authors: Alberto Baccini Giuseppe De Nicolao
Publish Date: 2016/03/23
Volume: 108, Issue: 3, Pages: 1651-1671
Abstract
During the Italian research assessment exercise the national agency ANVUR performed an experiment to assess agreement between grades attributed to journal articles by informed peer review IR and by bibliometrics A sample of articles was evaluated by using both methods and agreement was analyzed by weighted Cohen’s kappas ANVUR presented results as indicating an overall “good” or “more than adequate” agreement This paper reexamines the experiment results according to the available statistical guidelines for interpreting kappa values by showing that the degree of agreement always in the range 009–042 has to be interpreted for all research fields as unacceptable poor or in a few cases as at most fair The only notable exception confirmed also by a statistical metaanalysis was a moderate agreement for economics and statistics Area 13 and its subfields We show that the experiment protocol adopted in Area 13 was substantially modified with respect to all the other research fields to the point that results for economics and statistics have to be considered as fatally flawed The evidence of a poor agreement supports the conclusion that IR and bibliometrics do not produce similar results and that the adoption of both methods in the Italian research assessment possibly introduced systematic and unknown biases in its final results The conclusion reached by ANVUR must be reversed the available evidence does not justify at all the joint use of IR and bibliometrics within the same research assessment exercise
Keywords: