A Sample SubSift Workflow

The original SubSift software was used to support the allocation of papers to reviewers for KDD09, a large data mining conference. An important deliverable of the JISCRI SubSift Services project is a recreation of this original SubSift workflow using the new software. For pragmatic reasons, the original workflow relied on locally installed applications, command line utilities and various manual processes. However, after consulting potential users it became clear that the workflow needed migrating to a hosted web-based application. Doing so would avoid the need for local software installation and allow the creation of an easy to use, self-documenting, wizard-like user interface.

subsift tools

So, with formative feedback from Peter Flach, KDD09 programme co-chair, and from Bart Goethals, SDM10 programme co-chair, a simple web interface was created to implement the workflow and this was then used to support SDM10. This sample web-based workflow, which we call SubSift Tools, is described below. A later post will describe user feedback and suggested improvements arising from its use supporting the peer review process of SDM10.

SubSift Tools currently consists of three tools, listed on the site’s landing page under three headings:

  • A. Reviewer Profile Builder
  • B. Paper Profile Builder
  • C. Profile Matcher

The user is instructed to use tools A and B to build profiles of reviewers and papers respectively. Then to use C to compare pairs of these profiles and generate personalised web pages listing papers ranked by similarity to the reviewer’s profile and, optionally, initial bid assignments for each of the reviewers.

A. Reviewer Profile Builder is a five-step process that allows the user to build collections of reviewer profiles based on their DBLP author bibliography pages. DBLP is a computer science bibliographic database website that is used in this sample SubSift workflow as the source of information about a reviewer’ published works. All the user needs to get started is a list of programme committee (PC) member names. SubSift Tools then searches DBLP for matching authors, presents the user with a page allowing them to disambiguate between authors with similar names, fetches the DBLP author page for each PC member and finally builds a profile for each author.

subsift tools reviewer profile builder

Optionally, SubSift Tools can display the profile details: a list of keywords extracted from author publication titles ordered by their discriminating power – i.e. terms that most closely identify the works of this PC member, as compared to the works of the other PC members.

subsift tools reviewer-paper match report

The above example shows this optional profile details view for review Luc De Raedt. It is not necessary to view these details in order to use SubSift Tools but the profile information can be useful in itelf. At present the details shown are the term itself (i.e. keyword), the term count (#, i.e. how many times the term occurs within this reviewer’s bibliography), term frequency (tf, i.e. how frequent the term is within this reviewer’s bibliography), inverse document frequency (idf, i.e. how infrequent the term is within the reviewer bibliographies of all the reviewers), and “tfidf” (i.e. the product of tf and idf which scores most highly terms which are most discriminating for this reviewer with respect to the other reviewers). For the mathematically minded, further details of the calculation of these commonly used information retrieval statistics are available here. Again, it should be noted that there is no need to understand the calculation of, or even look at, these statistics in in order to use the SubSift Tools.

B. Paper Profile Builder is a two-step process that allows the user to build collections of paper profiles based on the titles and abstracts of submitted papers. The submitted paper details must be uploaded as a CVS file consisting of a unique identifier, tile and abstract for each paper. The user must construct this file themselves or export it from their conference management system. Once the CSV file is uploaded, SubSift Tools builds a profile of each paper and, as with the reviewer profiles, can optionally display the profile details.

C. Profile Matcher allows the user to choose a pair of profiles and then SubSift Tools compares each reviewer profile against each paper profile to calculate a similarity score (between 0 and 1) for each reviewer-paper pair. The results of the profile matching may be viewed for a single reviewer, listing papers in descending similarity order, or for a single paper, listing reviewers in descending similarity order.

subsift tools view reports form

SubSift Tools also builds static HTML pages for each reviewer and for each paper. Links to these can then be circulated by the programme chair to the PC members to assist them in choosing papers to bid to review. The ranked list of reviewers for each paper is useful to the programme chair when allocating papers on which no one or relatively few reviewers have bid on.

subsift tools reviewer-paper match report

Initial, default, bid assignments can also be exported from SubSift Tools as a CSV file for importing into a conference management system. Also, a CSV of a single similarity matrix of all reviewers against all papers can be downloaded for convenient viewing in a spreadsheet or for subsequent processing using other applications.