The SubSift REST API is organised into six functional areas described in the following sections. Each section begins with a description which is then followed by a list of API methods.

  1. Documents
  2. Profiles
  3. Matches
  5. Reports
  6. System

TIP: You can use the SubSift API Explorer web page to interactively experiment with the REST API methods online. Alternatively, use the widely available curl command line tool to experiment interactively from your own computer.


Throughout the API, data is organised as folders into which data items are stored. This organisation is modelled on the familiar OS filing system concept of folders and files. The diagram below shows a typical usage of SubSift to compare two document collections by first producing a profile of each document and then pairwise cross-comparing each of these profiles to calculate similarity "match" data for each pair. Notice how the document collections are represented in SubSift as Document Folders containing Documents (aka document items). Likewise, the Profiles (aka profile items) created to summarise the features of these documents are grouped into Profiles Folders. The Match data (aka match items) produced by comparing the profiles of a pair of Profiles Folders is then grouped into a Matches Folder.

Although the diagram depicts a sequence of transformations from a pair of document collections through to a matrix of matching statistics, a variety of useful data and metadata may be obtained at each step in the process via the REST API method. For example, a Profiles Folder makes available a full list of the vocabulary of the associated Documents Folder's set of Document Items, along with statistics on the frequency of terms (words) in the vocabulary.

The remainder of this page is divided into sections describing the main areas of the SubSift REST API and their available methods. The first three sections cover documents, profiles and matches, as seen in the above diagram; the remaining sections cover bookmarks, reports and system methods.

1. Documents

In SubSift, a document is a piece of text to be profiled and matched. A document will usually be the text from some external source such as the text of a web page (with or without the HTML markup) or a conference paper abstract. The text of a document is stored in SubSift as a document item and these are organised into document folders. To add document items to a document folder it is necessary to first create the document folder using the documents create method. Document items may then be added to the folder individually, or in bulk as a comma separated list, or all the urls from a specified bookmarks folder (described later) may be fetched and imported automatically. A typical usage of document folders in SubSift is to hold a corpus of documents, for example the published works of a conference programme committee or the abstracts of papers submitted to a conference. Documents can be analysed by SubSift to produce a profiles folder (described later).

1.1 Documents Folders

API MethodHTTPURI SchemaParameters
documents listGET/:user_id/documentsitems
documents showGET/:user_id/documents/:folder_id
documents existsHEAD/:user_id/documents/:folder_id
documents createPOST/:user_id/documents/:folder_iddescription, mode
documents updatePUT/:user_id/documents/:folder_iddescription, mode
documents destroyDELETE/:user_id/documents/:folder_id

1.2 Document Items

API MethodHTTPURI SchemaParameters
document items listGET/:user_id/documents/:folder_id/itemsfull
document items showGET/:user_id/documents/:folder_id/items/:item_idfull
document items existsHEAD/:user_id/documents/:folder_id/items/:item_id
document items importPOST/:user_id/documents/:folder_id/import/:bookmarks_iddepth, breadth, same_domain, same_stem, threshold, remove_html
document items importingHEAD/:user_id/documents/:folder_id/import/:bookmarks_id
document items createPOST/:user_id/documents/:folder_id/itemsitems_list, full
document items createPOST/:user_id/documents/:folder_id/items/:item_idtext, description, full
document items updatePUT/:user_id/documents/:folder_id/itemsitems_list, full
document items updatePUT/:user_id/documents/:folder_id/items/:item_idtext, description, full
document items destroyDELETE/:user_id/documents/:folder_id/itemsfull
document items destroyDELETE/:user_id/documents/:folder_id/items/:item_idfull

2. Profiles

In SubSift, a profile item is a summary representation of the features of a document item, with respect to the other document items in the same documents folder. Each profiles folder is a container to hold a set of profile items. All the profile items in a single profiles folder are created by analysing the individual document items from a single documents folder. A profile item represents the relative uniqueness of each term (word) appearing in the profiled document item, as compared to all the other document items in that document folder. A typical usage of profiles in SubSift is to obtain a list of distinguishing terms, or keywords, for a document item. For example automatically extracting keywords from abstracts of papers submitted to a conference. SubSift also allows a pair of profile folders to be compared against each other to produce a match folder (described later).

2.1 Profiles Folders

API MethodHTTPURI SchemaParameters
profiles listGET/:user_id/profilessort, full, items
profiles showGET/:user_id/profiles/:folder_idsort, full
profiles existsHEAD/:user_id/profiles/:folder_id
profiles createPOST/:user_id/profiles/:folder_id/from/:document_id description, mode, ignore_case, remove_html, remove_stopwords, stopwords, stem, ngrams, restrict_vocabulary, vocabulary, limit, term_weights, term_weight_default, length, threshold, sort, full
POST/:user_id/profiles/:folder_id document_id, description, mode, ignore_case, remove_html, remove_stopwords, stopwords, stem, ngrams, restrict_vocabulary, vocabulary, limit, term_weights, term_weight_default, length, threshold, sort, full
profiles updatePUT/:user_id/profiles/:folder_id/from/:document_id description, mode, ignore_case, remove_html, remove_stopwords, stopwords, stem, ngrams, restrict_vocabulary, vocabulary, limit, term_weights, term_weight_default, length, threshold, recalculate, sort, full
PUT/:user_id/profiles/:folder_id document_id, description, mode, ignore_case, remove_html, remove_stopwords, stopwords, stem, ngrams, restrict_vocabulary, vocabulary, limit, term_weights, term_weight_default, length, threshold, recalculate, sort, full
profiles destroyDELETE/:user_id/profiles/:folder_idsort, full
profiles recalculatePOST/:user_id/profiles/:folder_id/recalculatesort, full

2.2 Profile Items

API MethodHTTPURI SchemaParameters
profile items listGET/:user_id/profiles/:folder_id/itemssort, full
profile items showGET/:user_id/profiles/:folder_id/items/:item_idsort, full
profile items existsHEAD/:user_id/profiles/:folder_id/items/:item_id

3. Matches

In SubSift, a match item is a similarity score (and supporting statistics) representing how alike a specific pair of profile items are. Each matches folder is a container to hold a list of match items. A matches folder is created by analysing every pairing of profile items drawn from a pair of profiles folders. Each match item scores the similarity of a single profile from the first profiles folder against every profile from the second profiles folder. A typical usage of such a comparison is to match submitted conference abstracts with the bibliography pages of programme committee members in order to rank potential reviewers for each paper and visa versa.

3.1 Matches Folders

API MethodHTTPURI SchemaParameters
matches listGET/:user_id/matchessort, full
matches showGET/:user_id/matches/:folder_idsort, full
matches existsHEAD/:user_id/matches/:folder_id
matches createPOST/:user_id/matches/:folder_id/profiles/:profiles_id1/
description, mode, limit, threshold, sort, full
POST/:user_id/matches/:folder_idprofiles_id1, profiles_id2, description, mode, limit, threshold, sort, full
matches updatePUT/:user_id/matches/:folder_id/profiles/:profiles_id1/
description, mode, limit, threshold, sort, full
PUT/:user_id/matches/:folder_idprofiles_id1, profiles_id2, description, mode, limit, threshold, sort, full
matches destroyDELETE/:user_id/matches/:folder_idsort, full
matches recalculatePOST/:user_id/matches/:folder_id/recalculatesort, full

3.2 Match Items

API MethodHTTPURI SchemaParameters
match items listGET/:user_id/matches/:folder_id/itemsprofiles_id, sort, threshold3, threshold2, threshold1, full
match items showGET/:user_id/matches/:folder_id/items/:item_idprofiles_id, sort, threshold3, threshold2, threshold1, full
match items existsHEAD/:user_id/matches/:folder_id/items/:item_idprofiles_id

3.3 Match Matrix

Each match folder can be viewed as a matrix of match items as shown in the diagram below. The match matrix may be retrieved via API methods and saved as a file to be subsequently imported into other tools such as Matlab or incorporated into application software.

API MethodHTTPURI SchemaParameters
match matrix showGET/:user_id/matches/:folder_id/matrix/:typeseparator
match matrix show pairsGET/:user_id/matches/:folder_id/pairsprofiles_id


For convenience, SubSift provides bookmarks as a way of building lists of urls that can then be used to add document items to a documents folder. Each bookmarks folder is a container to hold a list of bookmarks (i.e. urls). This is analogous to the list of bookmarks that web browsers allow you to build up as you surf the web. In SubSift, bookmarks folders are used to specify lists of urls, which are called bookmark items in SubSift's terminology. To make use of a bookmarks folder it must be imported into a document using the document items import API method described earlier. Doing so will add the bookmarks to SubSift's web harvester queue. The web harvester robot will then (gradually) fetch each of the web pages referred to by the bookmarks and add their page source (i.e. typically HTML text) to the document folder. The retrieved web pages are added to the document verbatim; each one creating a single document item in the document. A typical usage of bookmarks folders in SubSift is to specify a list of bibliography web pages for all members of a conference programme committee.

4.1 Bookmarks Folders

API MethodHTTPURI SchemaParameters
bookmarks folders listGET/:user_id/bookmarks
bookmarks folder showGET/:user_id/bookmarks/:folder_id
bookmarks folder existsHEAD/:user_id/bookmarks/:folder_id
bookmarks folder createPOST/:user_id/bookmarks/:folder_iddescription, mode
bookmarks folder updatePUT/:user_id/bookmarks/:folder_iddescription, mode
bookmarks folder destroyDELETE/:user_id/bookmarks/:folder_id

4.2 Bookmark Items

API MethodHTTPURI SchemaParameters
bookmark items listGET/:user_id/bookmarks/:folder_id/items
bookmark items showGET/:user_id/bookmarks/:folder_id/items/:item_id
bookmark items existsHEAD/:user_id/bookmarks/:folder_id/items/:item_id
bookmark items createPOST/:user_id/bookmarks/:folder_id/itemsitems_list
bookmark items createPOST/:user_id/bookmarks/:folder_id/items/:item_iditem_url, description
bookmark items updatePUT/:user_id/bookmarks/:folder_id/itemsitems_list
bookmark items updatePUT/:user_id/bookmarks/:folder_id/items/:item_iditem_url, description
bookmark items destroyDELETE/:user_id/bookmarks/:folder_id/items
bookmark items destroyDELETE/:user_id/bookmarks/:folder_id/items/:item_id

5. Reports

For convenience, SubSift provides report generation to present data created in profiles and matches folders in human-centric formats, such as HTML, rather than machine-centric formats, such as XML, JSON, etc. Reports are generated as files in report folders, which have the same creation and access control methods supported by the other folder types in SubSift's REST API. Unlike other SubSift folders, report folders do not contain API addressable data items; instead they contain files which may be viewed or downloaded as ordinary web pages.

5.1 Reports Folders

SubSift reports folders provide an API controllable way of generating HTML reports detailing specific profiles and matches. A report folder may be published to the web with either public or restricted access. Alternatively, reports folders may be downloaded as a zip archive for distribution or publishing to third-party web servers or content management systems.

API MethodHTTPURI SchemaParameters
reports folders listGET/:user_id/reports
reports folder showGET/:user_id/reports/:folder_id.:format
reports folder existsHEAD/:user_id/reports/:folder_id
reports folder createPOST/:user_id/reports/:folder_id/profiles/:profiles_iddescription, mode, type
reports folder createPOST/:user_id/reports/:folder_id/matches/:matches_iddescription, mode, type
reports folder destroyDELETE/:user_id/reports/:folder_id

5.2 Report Files

Each reports folder contains report files generated when the folder was created or last updated. The exact names and types of files generated depend on the report type. However, there will always be an index.html file representing the home page of the report, and an index.zip file which contains all the files in the report as a downloadable archive. If the report folders is set to public then it functions as a self-contained website which may be publicised via its URL or linked to in the normal way from other web pages.

API MethodHTTPURI SchemaParameters
report homepageGET/:user_id/reports/:folder_id[.html]
report downloadGET/:user_id/reports/:folder_id.zip

6. System

System methods provide functionality that does not fit into the folders and items model employed by the rest of the API.

6.1 System: Status

The status/test methods can be used to check, for the service overall or for an individual user, that the SubSift REST API is up and running as normal.

API MethodHTTPURI SchemaParameters
status testGET/status/test
status testPOST/status/test/:user_id

6.2 System: Workflow

SubSift has a minimalist workflow engine that enables the creation and batch execution of sequences of API method calls specified in a simple workflow language (documented on the create method below). Creating a workflow adds its sequence of commands to the workflow engine's enactment queue, eventually resulting in each method call in the workflow being executed one after the other. Using workflows greatly simplifies the use of the SubSift API by enabling this batch execution of method calls without the need for multiple consecutive HTTP requests from the client program. All the client program needs to do is wait for the workflow to complete, which can be detected by periodically issuing the enacting method. The downside is that workflows run as background tasks on the server and will not execute as quickly as if the sequence of methods were issues as multiple HTTP requests by the client. However, for non-interactive applications, this is the simplest way of using SubSift.

API MethodHTTPURI SchemaParameters
workflow createPOST/:user_id/workflow/:workflow_idcommands
workflow enactingHEAD/:user_id/workflow/:workflow_id
workflow destroyDELETE/:user_id/workflow/:workflow_id