The SubSift Services project will decompose SubSift, an innovative “submission sifting” application developed at the University of Bristol, into a collection of lightweight REST services designed to support not only peer review but also personalised data discovery and mashups. As a demonstrator of the utility of these services, the full functionality of the original SubSift software will be recreated and extended as a simple mash-up in Yahoo! Pipes. Importantly, while the services and demonstrator will be useful in their own right, they will also allow the project to promote a community dialogue on open standards-based integration of such services into existing peer review support tools, e.g. conference management systems.
Peer review of written works is an essential pillar of the academic process, providing the central quality control and feedback mechanism for submissions to conferences, journals and funding bodies across a wide range of disciplines. However, from the perspective of a busy conference chair, journal editor or funding manager, identifying the most appropriate reviewer for a given submission is a non-trivial and time-consuming task. Effective assignment, first and foremost, requires a good match to be made between the subject of the submission and the corresponding expertise of reviewers drawn from a, potentially large, pool of potential reviewers. In the case of conferences, a recent trend transfers much of this allocation work to the reviewers themselves, giving them access to the full range of submissions and asking them to bid on submissions they would like to review. Their bids are then compared to inform the allocation decisions of the programme committee chair. It was the challenge of just such a bidding process that motivated Professor Flach, co-chair for a major international data mining conference (KDD’09), to develop the original SubSift software upon which our own proposal is based.
During the peer review process for KDD’09, SubSift aided in the allocation of over 500 submitted KDD’09 research papers to about 200 reviewers so that each paper had 3 reviewers and each reviewer had approximately 9 papers to review. The software assisted and supported the following three sub-tasks of the allocation process.
- Matching Submissions to Reviewers Each reviewer’s bids were initialised based on subject areas as well as a textual comparison between the paper’s abstract and the reviewer’s publications (titles and abstracts) as listed in the DBLP bibliographic database. The comparison employed techniques from the fields of information retrieval, data search and data mining. The initial bids were exported from SubSift and imported into the conference management system adopted by KDD’09.
- Ranking Potential Assignments Each reviewer was sent an email containing a link to a personalised SubSift generated web page listing details of all 500 papers ordered by a choice of: initial bid allocation, keyword matches, and similarity to their own published works. The page also lists the keywords extracted from their own publications and from each of the submitted papers. Guided by this personalised perspective, plus the usual titles and abstracts, reviewers affirmed or revised their bids recorded in the conference management system.
- Allocation Decision-Making After all reviewer bids were submitted, a proposed assignment of papers to reviewers was produced by SubSift as a starting point for the programme committee chair in making the final allocation decisions. In addition to constraints such as 3 reviewers per paper, declared conflicts of interest, etc., the method took into account the relative distribution of bids per reviewer and bids per paper in an effort to increase fairness.
Given the difficulty of manually assigning submissions to reviewers, the automation of this process has itself been the subject of prior research in various communities, but the application of SubSift in KDD’09 moves this work from the lab and into practice. To quantitatively evaluate SubSift’s performance, the bids made by KDD’09 reviewers were considered to be the “correct assignments” against which SubSift’s automated assignments were compared. On that basis, an average of 58% of the papers recommended by SubSift were subsequently included in the reviewers’ own bids. Furthermore, an average of 69% of the papers on which reviewers bid for were ones initially recommended to them by SubSift. These precision and recall figures are comparable with those published for simulated problems in the literature and are all the more impressive for being on real-world data in a practical setting.
Despite this clear success, there are a number of obstacles that prevent SubSift being re-used for other conferences, let alone for novel peer review applications or for personalised mashups.
- The software is not packaged up as an Open Source application.
- Usage is via the command line only; there is no web or graphical user interface.
- Customisation currently requires JAVA programming skills.
- The list of potential reviewers requires a bespoke file format.
- DBLP is the only bibliographic data source supported.
- Submission text acquisition is hard-coded to come from a single data source.
- The software enforces a specific workflow; it is not possible to “pick and mix” the most useful features on an application-by-application basis.
- Dissemination beyond the immediate research community is unlikely.
The SubSift Services project will remove these obstacles to wider application of the innovative components currently locked inside the original SubSift software. The process of removing these obstacles will also contribute to a community dialogue moving towards the integration of such technology into peer review support tools, such as conference management systems.
The following project deliverables will be produced within the duration of the funding:
- Software implementing SubSift Services as a collection of lightweight REST services, usable individually or in mashups, to support: matching reviewers with submissions, ranking potential assignments, and allocation decision-making.
- Publicly accessible deployment of SubSift Services hosted on the ILRT web site for a period of at least one year after the end of the project.
- Demonstrator conference peer review support workflow on Yahoo! Pipes using the above services in combination with other data sources, such as Citeseer, Google Scholar, eprints, news and blogs.
- Demonstrator personalised research interests FOAF metadata generator on Yahoo! Pipes using the above services in combination with a bibliographic data source.
- Source code, developer and user documentation publicly available under an open source license on the Google Code site.
- Evaluation report on the use of the conference peer review demonstrator at a forthcoming workshop being organised by the Department of Computer Science, University of Bristol.
- Promotion of SubSift Services at national events and by direct communication with relevant organisations, such as the National Centre for Text Mining (Nactem).
The SubSift Services project is being undertaken by the Institute for Learning and Research Technology in collaboration with the Machine Learning and Biological Computation group at the Department of Computer Science, University of Bristol. The project will run from June-November 2009.
Update: The project is now running through to January 2010.