ADCS 2009 Programme


9:00-10:00 Plenary: David Traum
10:00-10:30 Coffee & posters
10:30-12:30 Huizhi Liang, Yue Xu, Yuefeng Li and Richi Nayak. Collaborative Filtering Recommender Systems based on Popular Tags
David Newman, Sarvnaz Karimi and Lawrence Cavedon. External Evaluation of Topic Models
Nicholas Sherlock and Andrew Trotman. Id - Dynamic Views on Static and Dynamic Disassembly Listings
Gavin Shaw, Yue Xu and Shlomo Geva. Interestingness Measures for Multi-Level Association Rules
Hilal Al Maqbali, Falk Scholer, James A. Thom and Mingfang Wu. Do Users Find Looking at Text More Useful than Visual Representations? A Comparison of Three Search Result Interfaces
12:30-2:00 Lunch
2:00-3:00 Plenary: Mark Sanderson. Is this document relevant? Errr it'll do
3:00-3:30 Coffee & posters
3:30-5:30 Chris De Vries, Lance De Vine and Shlomo Geva. Random Indexing K-tree
Andrew Turpin and Falk Scholer. Modelling Disagreement Between Judges for Information Retrieval System Evaluation
Andrew Trotman and David Alexander. University Student Use of the Wikipedia
Tim O'Keefe and Irena Koprinska. Feature Selection and Weighting in Sentiment Analysis
Su Nam Kim, Timothy Baldwin and Min-Yen Kan. The Use of Topic Representative Words in Text Categorization
5:30 Close


Mark Sanderson

"Is this document relevant? Errr it'll do"

Evaluation of search engines is a critical topic in the field of information retrieval. Doing evaluation well allows researchers to quickly and efficiently understand if their new algorithms are a valuable contribution or if they need to go back to the drawing board. The modern methods used for evaluation developed by organizations such as TREC in the US have their origins in research that started in the early 1950s. Almost all of the core components of modern testing environments, known as test collections, were present in that early work. Potential problems with the design of these collections were described in a series of publications in the 1960s, but the criticisms were largely ignored. However, in the past decade a series of results were published showing potentially catastrophic problems with a test collection's "ability" to predict the way that users will work with searching systems. A number of research teams showed that users given a good system (as measured on a test collection) searched no more effectively than users given one that was bad.

In this talk, I will briefly outline the history of search evaluation, before detailing the work finding problems with test collections. I will then describe some pioneering but relatively overlooked research that pointed out that the key problem for researchers isn't the question of how to measure searching systems accurately, the problem is how to accurately measure people.