The strange case of reproducibility versus represe

Authors: Thaer Samar Alejandro Bellogín Arjen P de Vries

Publish Date: 2015/12/28

Volume: 19, Issue: 3, Pages: 230-255

PDF Link

Abstract

The most common approach to measuring the effectiveness of Information Retrieval systems is by using test collections The Contextual Suggestion CS TREC track provides an evaluation framework for systems that recommend items to users given their geographical context The specific nature of this track allows the participating teams to identify candidate documents either from the Open Web or from the ClueWeb12 collection a static version of the web In the judging pool the documents from the Open Web and ClueWeb12 collection are distinguished Hence each system submission should be based only on one resource either Open Web identified by URLs or ClueWeb12 identified by ids To achieve reproducibility ranking web pages from ClueWeb12 should be the preferred method for scientific evaluation of CS systems but it has been found that the systems that build their suggestion algorithms on top of input taken from the Open Web achieve consistently a higher effectiveness Because most of the systems take a rather similar approach to making CSs this raises the question whether systems built by researchers on top of ClueWeb12 are still representative of those that would work directly on industrystrength web search engines Do we need to sacrifice reproducibility for the sake of representativeness We study the difference in effectiveness between Open Web systems and ClueWeb12 systems through analyzing the relevance assessments of documents identified from both the Open Web and ClueWeb12 Then we identify documents that overlap between the relevance assessments of the Open Web and ClueWeb12 observing a dependency between relevance assessments and the source of the document being taken from the Open Web or from ClueWeb12 After that we identify documents from the relevance assessments of the Open Web which exist in the ClueWeb12 collection but do not exist in the ClueWeb12 relevance assessments We use these documents to expand the ClueWeb12 relevance assessments Our main findings are twofold First our empirical analysis of the relevance assessments of 2 years of CS track shows that Open Web documents receive better ratings than ClueWeb12 documents especially if we look at the documents in the overlap Second our approach for selecting candidate documents from ClueWeb12 collection based on information obtained from the Open Web makes an improvement step towards partially bridging the gap in effectiveness between Open Web and ClueWeb12 systems while at the same time we achieve reproducible results on wellknown representative sample of the webRecommender systems aim to help people find items of interest from a large pool of potentially interesting items The users’ preferences may change depending on their current context such as the time of the day the device they use or their location Hence those recommendations or suggestions should be tailored to the context of the user Typically recommender systems suggest a list of items based on users’ preferences However awareness of the importance of context as a third dimension beyond users and items has increased for recommendation Adomavicius and Tuzhilin 2011 and search Melucci 2012 alike The goal is to anticipate users’ context without asking them as stated in The Second Strategic Workshop on Information Retrieval SWIRL 2012 Allan et al 2012 “Future information retrieval systems must anticipate user needs and respond with information appropriate to the current context without the user having to enter a query” This problem is known as contextual suggestion in Information Retrieval IR and contextaware recommendation in the Recommender Systems RS communityThe TREC Contextual Suggestion CS track introduced in 2012 provides a common evaluation framework for investigating this task DeanHall et al 2012 The aim of the CS task is to provide a list of ranked suggestions given a location as the current user context and past preferences as the user profile The public Open Web was the only source for collecting candidate documents in 2012 Using APIs based on the Open Web either for search or recommendation has the disadvantage that the endtoend contextual suggestion process cannot be examined in all detail and that reproducibility of results is at risk Hawking et al 2001 1999 To address this problem starting from 2013 participating teams were allowed to collect candidate documents either from Open Web or from the ClueWeb12 collectionIn the 2013 and 2014 editions of CS track there were more submissions based on the Open Web compared to those based on the ClueWeb12 collection However to achieve reproducibility ranking web pages from ClueWeb12 should be the preferred method for scientific evaluation of contextual suggestion systems It has been found that the systems that build their suggestion algorithms on top of input taken from the Open Web achieve consistently a higher effectiveness than systems based on the ClueWeb12 collection Most of the existing works have relied on public tourist APIs to address the contextual suggestion problem These tourist sites such as Yelp and Foursquare are specialized in providing tourist suggestions hence those works are focused on reranking the resulting candidate suggestions based on user preferences Gathering suggestions potential venues from the ClueWeb12 collection has indeed proven a challenging task First suggestions have to be selected from a very large collection Second these documents should be geographically relevant the attraction should be located as close as possible to the target context and they should be of interest for the user

The strange case of reproducibility versus represe

Authors: Thaer Samar Alejandro Bellogín Arjen P de Vries

Publish Date: 2015/12/28

Volume: 19, Issue: 3, Pages: 230-255

PDF Link

Abstract

Keywords:

References

Other Papers In This Journal:

Search Result: