Blog

This is an archive of the blog of the JISC grant funded SubSift project. The project and this blog finished in January 2010 although the software is still actively maintained.

Contents



Simon Price talking about SubSift at WAPA2010

September 1st, 2010

In this VideoLectures.net video recorded at the 2010 Workshop on Applications of Pattern Matching (WAPA2010), Simon Price presents the paper, "SubSift: a novel application of the vector space model to support the academic research process" and gives a demonstration of the software.

Simon Price at WAPA2010

Read the rest of this entry


SubSift Final Progress Post

January 3rd, 2010

This is a mandatory blog post required by the JISC Rapid Innovation (JISCRI) programme and follows a prescribed format. SubSift is a six-month project funded under JISCRI and is due to finish at the end of January 2010.

Title of Primary Project Output:

SubSift: matches submitted conference or journal papers to potential peer reviewers based on similarity to published works.

Screenshots or diagram of prototype:

SubSift workflow

Description of Prototype:

SubSift is an innovative “submission sifting” application developed by the Intelligent Systems group at the University of Bristol to support academic peer review. SubSift matches submitted conference or journal papers to potential peer reviewers based on their similarity to published works of prospective reviewers in online bibliographic databases, such as Google Scholar. In the JISCRI SubSift Services project, the ILRT redeveloped SubSift into a collection of web services designed to support not only peer review but also personalised data discovery and mashups in tools like Yahoo! Pipes.

Read the rest of this entry


SubSift supports peer review for PAKDD10

December 30th, 2009

SubSift Tools have been used to support the peer review of research papers submitted for the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD10), which will be taking place from 21-24 June 2010 in Hyderabad, India.

Programme committee (PC) chair, Professor Mohammed Zaki of Rensselaer Polytechnic Institute, New York, used SubSift to produce a custom web page for each of the 151 PC members, ranking the 352 submitted papers in order of their similarity to the reviewer’s published works. The PC members were then able to use this information to assist them in bidding for papers to review. Once reviewers’ bids were submitted, Zaki was able to use SubSift alongside existing tools to allocate the required number of reviewers for each paper.

Read the rest of this entry


Fun with Pipes for SubSift

December 21st, 2009

pipe_overview image

Currently SubSift uses the DBLP website to feed it’s term frequency algorithms. Ignoring disambiguation issues for a moment, once each researcher is known, the system fetches their publication listing from DBLP:

http://dblp.uni-trier.de/db/indices/a-tree/b/Bailey:Christopher.html

This returns a chronologically ordered list of publications. SubSift internally parses the contents of this page (a technique called screen scraping), strips out the unnecessary information (e.g. html tags, stop words, etc) and extracts the paper titles of each publication. This forms the basis of the bag-of-words; the collection of terms that represents each researcher.

One of the goals of the SubSift project has been to modularise the system and in particular, extract this DBLP-fetching component out and replace it with a call to Yahoo! Pipes.

Read the rest of this entry


User Feedback on SubSift Tools

November 27th, 2009

SubSift Tools have been used or are being used to help match submitted conference papers to reviewers for the ACM Conference on Knowledge Discovery and Data Mining (KDD10), the SIAM International Conference on Data Mining (SDM10), and the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD10). Previously, the original pre-JISCRI SubSift software was also used to support KDD09. For each of these conferences, the primary user of SubSift Tools is one of the programme committee chairs.

Qiang Yang
Prof. Qiang Yang
PC Chair KDD10
Bart Goethals
Prof. Bart Goethals
PC Chair SDM10
Mohammed Zaki
Prof. Mohammed Zaki
PC Chair PAKDD10
Peter Flach
Prof. Peter Flach
PC Chair KDD09

In this article we present qualitative feedback and suggestions from these pc chairs, annotated with our own thoughts on each topic raised. In some cases these lead to actual or planned changes to the tools; in other cases, to possible directions for further development of SubSift in future projects.

Read the rest of this entry


SubSift: “How It Works” Publication

November 10th, 2009

An article to appear in the December 2009 issue of ACM SIGKDD Explorations, explains in detail how the SubSift software matches submitted conference papers to potential peer reviewers based on similarity to published works. This “how it works” article, titled “Novel Tools To Streamline the Conference Review Process: Experiences from SIGKDD’09“, has been co-authored by the developers of SubSift at the University of Bristol, in collaboration with Microsoft Research, Cambridge and Rensselaer Polytechnic, New York.

Read the rest of this entry


A Sample SubSift Workflow

November 5th, 2009

The original SubSift software was used to support the allocation of papers to reviewers for KDD09, a large data mining conference. An important deliverable of the JISCRI SubSift Services project is a recreation of this original SubSift workflow using the new software. For pragmatic reasons, the original workflow relied on locally installed applications, command line utilities and various manual processes. However, after consulting potential users it became clear that the workflow needed migrating to a hosted web-based application. Doing so would avoid the need for local software installation and allow the creation of an easy to use, self-documenting, wizard-like user interface.

subsift tools

Read the rest of this entry


Baptism of fire for new SubSift at SDM10

October 27th, 2009

The new JISCRI SubSift has been used to support the paper reviewing process for the 2010 SIAM International Conference on Data Mining (SDM10), which will take place 29 April – 1 May, 2010 in Columbus, Ohio.

2010 SIAM International Conference on Data Mining

Read the rest of this entry


Whether to Refactor or Rewrite SubSift

October 22nd, 2009

We have recently wrestled with the question of whether to refactor the original SubSift software upon which the intended functionality of JISCRI SubSift Services is based, or whether to completely rewrite this functionality from scratch. On balance we opted to rewrite for the reasons set out below. Hopefully our analysis may prove useful to other projects that aim to re-use software produced as a “side-effect” of academic research.

language soup

Read the rest of this entry


Piping Hot (or not) for SubSift

October 13th, 2009

One of the ideas behind the SubSift project is to create a number of Yahoo! Pipes that may be combined as a mash-up to support peer review of conference and journal papers, matching submitted publications with their most appropriate reviewers. So, it will come as no surprise that we’ve been experimenting with Yahoo! Pipes recently and with mixed success.

Yahoo! Pipe to extract publication titles from DBLP

Read the rest of this entry


Peter Flach talking about SubSift at KDD-09

October 1st, 2009

In a five minute video taken from the opening session of the 2009 ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (SIGKDD-09), conference co-chair Professor Peter Flach explains how the SubSift software was used to support the paper reviewing, bidding and allocation process.

Peter Flach speaking at KDD09

Read the rest of this entry


Initial SubSift on Google Code

September 28th, 2009

The original version of the SubSift software, written specifically for the KDD 2009 conference to explore a potential application of machine learning, data mining and information retrieval theory, is now on the Google Code website. The software was written as an in-house toolset and is not yet parameterised for re-use in other contexts. The SubSift Services project will develop, parameterise and document this software to enable its re-use.

SubSift on Google Code

Read the rest of this entry


SubSift Services project funded under JISC Rapid Innovations

June 24th, 2009

The SubSift Services project will decompose SubSift, an innovative “submission sifting” application developed at the University of Bristol, into a collection of lightweight REST services designed to support not only peer review but also personalised data discovery and mashups. As a demonstrator of the utility of these services, the full functionality of the original SubSift software will be recreated and extended as a simple mash-up in Yahoo! Pipes. Importantly, while the services and demonstrator will be useful in their own right, they will also allow the project to promote a community dialogue on open standards-based integration of such services into existing peer review support tools, e.g. conference management systems.

Original developers of SubSift
Bruno Golenia, Professor Peter Flach and Sebastian Spiegler, developers
of the original SubSift upon which the SubSift Services project is based.

Read the rest of this entry