User Feedback on SubSift Tools

SubSift Tools have been used or are being used to help match submitted conference papers to reviewers for the ACM Conference on Knowledge Discovery and Data Mining (KDD10), the SIAM International Conference on Data Mining (SDM10), and the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD10). Previously, the original pre-JISCRI SubSift software was also used to support KDD09. For each of these conferences, the primary user of SubSift Tools is one of the programme committee chairs.

Qiang Yang
Prof. Qiang Yang
PC Chair KDD10
Bart Goethals
Prof. Bart Goethals
PC Chair SDM10
Mohammed Zaki
Prof. Mohammed Zaki
PC Chair PAKDD10
Peter Flach
Prof. Peter Flach
PC Chair KDD09

In this article we present qualitative feedback and suggestions from these pc chairs, annotated with our own thoughts on each topic raised. In some cases these lead to actual or planned changes to the tools; in other cases, to possible directions for further development of SubSift in future projects.

We present each topic, beginning with a quote from one of the pc chairs, in the order in which they were received and grouped by person. Our thoughts follow each quote. Occasionally there is a response or follow-on quote.

1. Bart Goethals – Data export

Bart Goethalsthis is great, but working with it is fairly slow. Would it be possible to see, for every paper, the top 10-15 reviewers in a csv file, ordered? Would it be possible to generate one big matrix (Excel file) where the columns are the reviewers, and the rows are the papers, and the content are your similarity scores?

The “this” Bart refers to is the pc chair’s task of allocating papers to reviewers once bids are submitted and, in particular, allocating papers with no or too few bids. In the early version of SubSift Tools that Bart was using there was no caching of reviewer-paper similarity lists and so the similarity calculations were repeated for each reviewer or paper page visited. From the perspective of a single reviewer, who only needed to see their own personal page, this was not a problem but for the poor programme chair, who had to visit numerous pages to make their allocation decisions, this delay was unacceptable. For this reason we made the reviewer-paper similarity values available as a single csv file which could be imported into Excel.

scores spreadsheet

We also implemented caching of similarity calculations and generated static versions of the reviewer and paper web pages that could be independently published.

This is really cool!!! (and fast!)
Thank you very much.

Bart was clearly pleased with the combined result.

2. Bart Goethals – Similarity versus importance

Bart GoethalsThanks again for all this wonderful work! I think you would be able to improve the usability your system tremendously, if you would be able to add one more feature. That is, for every reviewer, you currently show the similarity to every paper. It would be nice if you could add a second column, where you show the rank of that reviewer w.r.t. all other reviewers for that paper.

For example, if you look at this paper: (link removed for confidentiality) you can see that Charu Aggarwal has the highest similarity. Now, the problem is that Charu will never bid for this paper as within his personal ranking this paper is ranked extremely low, as can be seen here: (link removed for confidentiality)

Therefore, it would be nice if they (and I) would also be able to sort the papers on this second attribute as this is even more useful for me than the similarity measure!

Obviously, the same could be done for each paper, i.e. add a second column next to similarity that states where the paper is ranked in the authors’ list as compared to all other papers.

We had not previously considered this, but it would add so much value to the utility of SubSift that we have decided to implement this. Peter Flach’s response below further develops the idea.

3. Peter Flach – On “Similarity versus importance”

Peter FlachThanks Bart, that’s a brilliant suggestion!

So, on a “Reviewers sorted by similarity to paper” page we indicate, for each reviewer, where that paper ranks in their “Papers sorted by similarity to reviewer” page. (We need to take care of ties: my suggestion is to use the average rank in a bin as the rank of all the papers in that bin.)

Another cool thing is that we can then calculate the average of this rank over all reviewers, which gives an indication of how easy or hard it is to find a reviewer for this paper. This is extremely useful for a PC chair, because they should start with the “hard” papers.

The symmetric situation is also very useful: you also want to fix the assignments for the “hard” reviewers early on.

This in turn elicits another idea from Bart: this time one that directly changes the bidding and review process itself.

4. Bart Goethals – Automatic assignment of papers to reviewers

Bart GoethalsThe next step, of course, is to also immediately do the automatic assignment of papers to reviewers and skip the bidding phase. Then, you could use that time to allow people to look at their assigned set of papers and ‘unbid’ some of them. (You could for example assign them too many papers, such that the ‘unbidding’ doesn’t get you into troubles).

Ideally, such a feature should be closely integrated with the Conference Management System being used by the conference. Without integration it would be necessary to provide logins for reviewers, and while that can be achieved relatively easily using OpenID or one of several established mechanisms, this would be a different login to the one used by the reviewer for the CMS – inevitably resulting in confusion and irritation. So, although we feel this is an interesting idea, it is not something we will be able to pursue in the scope of this JISCRI project; it is, however, something we would like to investigate further in a possible follow-on project.

5. Bart Goethals – Finding reviewers for journal submissions

Bart Goethalsit would also be extremely useful for finding reviewers for journals. I am currently an editor of two journals and I am always having troubles for finding good reviewers for a new submission.
If you would be able to build your index for the complete DBLP db, then one should be able to enter the abstract (or introduction) from the paper after which you could suggest some reviewers. This would be extremely cool and all journal editors would be forever grateful to you!

Scaling SubSift up from hundreds to tens of thousands of reviewers (and even more terms) would require more development time than we have in this JISCRI project. However, this is a clear next step for a follow-on project and very much something we would like to do in the future.

6. Bart Goethals – Finding potential collaborators

Bart GoethalsYou should also be able to make some cool social networking application from your tool (maybe this already exists).

i.e. compute the similarity between authors and suggest potential collaborators.

Provided that the number of potential collaborators is in the hundreds rather than than thousands then this is something we would hope to enable within this JISCRI project. E.g. find potential collaborators within my university.

7. Qiang Yang – Generating keywords from publications

Qiang Yangdoes your system produce a set of keywords for each author based on the titles of publications?"

SubSift generates a list of terms (or keywords) based on supplied text that, it is assumed, in some way represents the author. In SubSift Tools this text is currently drawn from the titles of a DBLP author page but we will shortly be generalising this to allow arbitrary text and web pages to be supplied.

8. Qiang Yang – Using only recent publications for reviewer profiling

Qiang Yang'Can we limit the pub[lication]s to the most recent 3 years, say?"

Once SubSift Tools are able to accept arbitrary text or web pages, rather than just accepting DBLP author pages, then a simple Yahoo! Pipe would allow such constraints to be easily applied. For example, for Qiang’s suggestion, we already have a demonstration pipe that fetches all titles for a given author and it would be trivial to modify this pipe to fetch just ones from the most recent N years.

9. Qiang Yang – Restricting keywords to come from a supplied ontology

Qiang Yang'Can we generate those keywords based on an ontology we provide (that is, expertise keywords)?"

The current implementation of SubSift Tools already filters all terms (or keywords) to remove all “stop words” (i.e. words such as “the”, “a”, “some”, etc. which are so common that they do not typically contribute to the matching process). However, our intent is to make this filtering process both optional and configurable. Once implemented, it should be possible to filter text representing reviewers, and the abstracts too if required, through a Yahoo! Pipe in order to remove terms that do not appear in a specific list (or with more complexity a taxonomy or an ontology) supplied by the user.

In concluding, we would like to thank all the pc chairs mentioned in this article for their interest in, adoption of and feedback on SubSift Tools. This early adoption and feedback has arrived in time to have a formative effect on the software and continues to prove invaluable.