What would science look like if it were invented today? - Part II: knowledge structuring
Part II: What would knowledge structuring look like if it were invented today?
Science is already a wiki if you look at it a certain way. It’s just a highly inefficient one -- the incremental edits are made in papers instead of wikispace, and significant effort is expended to recapitulate existing knowledge in a paper in order to support the one to three new assertions made in any one paper. (John Wilbanks)
There are many ways to structure knowledge. One is via coordinated cellular activity in your brain. Others may involve spatial arrangements of sheets of paper or numeric arrangements of digital documents. Here, we will focus on the difference between the latter two, building on a previous outline.
Structuring scientific knowledge online
Let us first consider some practical aspects of organizing scientific knowledge in online environments:
- Newly incoming information can be inserted at any time later, independent of press runs — some call this micropublication. For example, part I of this post has already been "published" on the blog, but its wiki version can still be updated with references that were not available at the time. This may not be relevant for blog posts, but consider it as a proof of principle for writings in general, including scholarly reviews on a topic.
- In sharp contrast to current practice in paper-based scholarly journals, online platforms like public wikis make the whole article openly accessible right form the start. Detailed explanations of keywords and key concepts can be linked from within the article, and the article itself can be put in linked context via a variety of mechanisms, e.g. categories (Wikipedia), Related Articles (Citizendium), or links to other ontological frameworks (like MeSH terms).
- Documents can be edited simultaneously by multiple authors. Google docs have been doing this for years, Etherpad improves on it, and Google Wave (scheduled to be released later this week) is going to have truly realtime simultaneous editing as well; while wikis are currently somewhat limited in this regard.
- Suitably designed schemes for identifying authors, their individual contributions, and versions of the whole document provide for bug tracking, permanent availability of text (or code) snippets, and attribution. For example, Wikibu tells you that at this point, the users RokerHRO, Proxima, Saperaud, 18.104.22.168, and BirgitLachner have been the main contributors to the Aggregatzustand (state of matter) entry in the German Wikipedia, and the individual contributions to the blog post you are reading can be viewed via its version history, embedded here:
These are certainly not all the aspects of online environments relevant to science (for instance, we have left out data management issues) but let us contend with these four for the moment and consider what their combination implies for the structuring of knowledge:
It is technically possible that all researchers currently investigating a given topic could coordinate their efforts by collaboratively creating, editing, and maintaining a central set of interlinked knowledge elements (be these wiki articles, knols, or other structures) that explain what is known about their topic in detail and embed it in a wider context.
As implied by the introductory quote, it is probably fair to say that this could make research on that particular topic (as well as teaching and outreach) much more efficient. Just imagine you had a time slider and could watch the history of research on general relativity, plate tectonics, self-replication, or cell division unfold from the earliest ideas of their earliest proponents (and opponents) onwards up to you, your colleagues, and those with whom you compete for grants. So why don't we do it?
Structuring scientific knowledge on paper
Traditionally, given the scope of a particular journal, knowledge about specialist terms (which may describe completely non-congruent concepts in different fields), methodologies, notations, mainstream opinions, trends, or major controversies could reasonably be expected to be widespread amongst the audience, which reduced the need to redundantly say and then repeat the same things all over again and again (in cross-disciplinary environments, there is a higher demand for proper disambiguation of the various meanings of a term). Nonetheless, redundancy is still quite visible in journal articles, especially in the introduction, methods, and discussion sections and the abstracts, often in a way characteristic of the authors (such that services like eTBLAST and JANE can make qualified guesses on authors of a particular piece of text, with good results if some of the authors have a lot of papers in the respective database, mainly PubMed, and if they have not changed their individual research scope too often in between).
Of course, there would be side effects: A manuscript well-adapted to the scope of one particular journal is often not very intelligible to someone outside its intended audience, which hampers cross-fertilization with other research fields (we will get back to this below). When using paper as the sole medium of communication there is not much to be done about this limitation. Indeed, we have become so used to it that some do not perceive it as a limitation at all. Similar thoughts apply to manuscript formatting. However, the times when paper alone reigned over scholarly communication have certainly passed, as discussed in part I. The relative merits of paper-based and wiki-based scholarly communication are covered in more detail at a dedicated Wikiversity page.
Cross-field fertilization is crucial with respect to interdisciplinary research projects, digital libraries and multi-journal (or indeed cross-disciplinary) bibliographic search engines (e.g. Google Scholar), since these dramatically increase the likelihood of, say, a biologist stumbling upon a not primarily biological source relevant to her research (think shape quantification or growth curves, for instance). What options do we have to systematically integrate such cross-disciplinary hidden treasures with the traditional intra-disciplinary background knowledge and with new insights resulting from research?
As a sidenote, lack of context is also a consistent feature of most "Facebooks for scientists" — in fact, the whole set of scholarly pages on the web is the appropriate network for researchers but so far it is not optimally connected, particularly because formal scholarly communication has not yet fully hatched from the structures from the paper-based era (see also this nice overview of the current situation). If it had, this would shift the focus away from periodicals (and, in passing, render things like a journal's scope and Journal Impact Factor superfluous; see part I), which is likely to meet resistance from the publishing establishment. Yet, authors might just act on their needs by moving their "content" to grow in better production and exchange surroundings like the ones discussed here. Without good authors, no established publisher will be able to keep their grip on anyone's research habits and thinking.
Wikis as an example of public knowledge environments online
Groupware comes to mind in this regard, and wikis in particular (another example would be collaborativey edited mindmaps, like the one embedded above that represents the topics covered by this blog post series): They allow us to aggregate and inter-link diverse sets of knowledge in an online-accessible manner, basically for free. The by now classical example is Wikipedia, and one scientific journal — RNA biology — has already announced that it requires an introductory Wikipedia article for papers it is to publish on RNA families, an idea that recently spurred a debate on the merits of such an initiative and of doing it with Wikipedia where basically anyone can edit any page, regardless of subject matter expertise.
An investigation (video lecture by Bill Wedemeyer here, a brief annotation here) of the quality of a set of science articles in the English Wikipedia is currently being written up for classical paper-style publication but the preliminary results indicate that "[t]here is a subset of reliably helpful science articles on the English Wikipedia for outreach, teacher training, and general science education" (slide shown at 29:35min in the video). However, the distribution of the set of articles was skewed towards the Good Article and Featured Article classes, which constituted only 2% of the English Wikipedia at the time of investigation, and it did not include articles in the humanities (scheduled to come next). Further information on academic studies about Wikipedia is available via these two Wikipedia pages.
The larger Wikipedias have a serious problem with vandalism: take an article of your choice and look in its history page for reverts - most of them will be about changes like this or worse. This is less of an issue with more popular topics for which large numbers of volunteers may be available to correct "spammy" entries but it is probably fair to assume that most researchers value their time too much to spend it on repeatedly correcting such information if it had already been correctly entered once. Other problems with covering scientific topics at Wikipedia include the notability criteria which have to be fulfilled to avoid an article being deleted, and the rejection of "original research" in the sense of not having been peer reviewed before publication. Peer review is indeed an important aspect of scholarly communication, as it paves the way towards the reproducibility that forms one of the foundations of modern science. Yet we know of no compelling reason to believe that it works better before than after publication (doing it beforehand was just a practical decision in times when journal space was measured in paper pages).
Fortunately, the Wikipedias are not the only wikis around, and amongst the more scholarly inclined alternatives, there are even a number of wiki-based journals, though usually with a very narrow scope and/or a low number of articles. On the contrary, Scholarpedia (which has classical peer review and an ISSN and may thus be counted as a wiki journal, too), OpenWetWare, Citizendium and the Wikiversities are cross-disciplinary and structured (and of a size, for the moment) such that vandalism and notability are not really a problem. With minor exceptions, real names are required at the first three, and anybody can contribute to entries about anything, particularly in their fields of expertise. None of these is even close to providing the vast amount of context existing in the English Wikipedia but the difference is much less dramatic if the latter were broken down to scholarly useful content, as discussed above. Out of these four wikis, only OpenWetWare and some Wikiversities (here counted as one) currently allow for original research to be published on their site — in the case of OpenWetWare, this is indeed the main purpose. Furthermore, a number of more specialized scholarly wikis exist (e.g. WikiGenes, the Encyclopedia of Earth, the Encyclopedia of the Cosmos, or the Dispersive PDE Wiki) which can teach us about the usefulness of wikis within specific academic fields.
We will not dwell on any details here, but since new suggestions about combining elements of wiki and scholarly environments keep coming in, e.g. in the form of a Wikipedia journal, we will list a number of features we deem desirable for future scholarly wikis, derived from experience with existing ones. These include, in no particular order:
- Some system of peer review (basically, any wiki allows comments, annotations or formal reviews on talk pages of users or articles but these ratings should be featured more prominently; templates like those visualizing article status at Citizendium may help with that); this may be as simple as disallowing individuals to add information to Citizendium when the only available support is their own non-reviewed research published at OpenWetWare — the real name policy will minimize misuse
- Uploadability of all kinds of media that traditionally (if you can call a habit that barely is a decade old a tradition already) went along with paper-based publications as "supporting online information" (which would be easily integrated in an all-online non-printable article with no sharp space limitations).
- Stable versions for content that has undergone peer review (like the Approved Articles at Citizendium, or the results of the double phase review model at the OA journal ACPD/ACP), along with draft versions for anything else (including improvements to and updates of previous stable versions); like any non-protected page at the Wikipedias, these draft versions can serve as a playground, though a real-name policy would probably make it a more educational one
- Search engines that integrate or otherwise compare favourably with major scholarly search engines on the web (the already mentioned Google Scholar and PubMed as well as, say, the BioText Search Engine that searches Open Access text and images), also in terms of the updating frequency.
- pan-disciplinary scope, with consistent disambiguation of specialist terms (mainly but not fully achieved at Citizendium)
- Separate namespaces for references (already in use at the Dispersive PDE Wiki and the French Wikipedia, in test at Citizendium); as a side line, this would open up ways for new citation metrics, via the What links here function
- Separate namespaces for original research. Encyclopedic endeavours need expert input. This is most likely to be achievable if the encyclopedic activites can be integrated with the experts' workflow, e.g. via platforms like OpenWetWare.
- Attributability of contributions (automatically realized, though not in the traditional scholarly way, in any wiki with a real name policy like that at Citizendium, via the User contributions function; special arrangements exist at Scholarpedia and WikiGenes; OpenWetWare does allow nicknames but real names prevail; the Wikiversities have basically the same user name policy as the Wikipedias)
- Easy download of selected sets of pages for local archiving or analysis.
- Licenses that allow unrestricted reuse and derivative work if the original source is properly acknowledged (typically CC-by-SA or the older GFDL, both of which have been made compatible now)
- Resource-effective design (see also discussions on the energy use of the internet and individual websites). This overview may also help in working out an ecological footprint scheme applicable to research, as described previously.
- integration with the non-scholarly world (certainly achieved in the Wikipedias and Citizendium), particularly with students (cf. the Eduzendium initiative at Citizendium) and non-English contents
- Automation of the formatting, as already common in non-wiki environments, e.g. with LaTeX templates, for which collaborative editing environments exist too. None of the wikis we know comes close to that, albeit templates are heavily used at the various Wikipedias and, to a lesser extent but in a more consistent manner, at Citizendium; they seem to be rather rarely used on smaller or more specialized wikis. The same applies to references, though automated wikification has already progressed considerably here, despite the lack of wiki export functions at publisher's sites (or of suitable XML-to-wiki converters for those who provide XML)
- Integration with mind maps (which structure knowledge) and databases (which harbour bits of knowledge that are hard to interpret without a broader context).
- The article's main page is a stable version, approved by an author with expertise in that field
- Next comes the Talk tab that leads to the discussion page, as per default in any wiki
- The Draft tab leads to the editable version (this only applies for articles that have already been approved; in others, the main page is editable)
- The Related Articles tab roughly corresponds to "see also" in the Wikipedias but is more usefully structured for navigation and somewhat replaces the categories which are heavily used in Wikipedia but only to a limited extent at Citizendium
It is interesting to see that these and other individual subpages largely complement existing social networking tools and have thus the potential to replace them (or to be replaced by them), at least for scholarly purposes:
- The Bibliography subpage is a context-based alternative to CiteULike, Zotero, BibSonomy and other reference managers, possibly in conjunction with Open Library, scholarly search engines and tools like Scribd, Mendeley or Papers. One problem wikis cannot solve is that of access to paper-based research publications, but due to the current spread of Green and Gold Open Access initiatives, this is likely to change in the next few years anyway if authors decide to follow suit in a consistent manner and act accordingly for their own contributions.
- The External Links subpage is a context-based alternative to conventional social bookmarking as known from delicious and simpy
- Additional subpages could be tailored to meet the needs of individual categories of articles (e.g., properties of chemical elements, genes, stellar constellations etc.) or more general scholarly needs (e.g., peer review, slides, code, protocols, or bot-generated transcripts from video lectures)
Besides, User pages may provide context-based alternatives to individual pages at different networking sites, and possibly even to blogs like this one, while the Recent changes page could turn into an alternative for friendfeed, with items on your Watchlist (if you are logged in) equivalent to friendfeed rooms or personal feeds you are subscribed to.
For the record, this social networking component of Citizendium has already been discussed three years ago, prior to its official launch and thus at a time when many of its current structures and their implications were not known yet.
Finally, and importantly, the easy availability of context (once the system is reasonably well adopted by scholarly communities, and the encyclopedic corpus thus reasonably complete) would make it more easy to guide expert attention and thus to identify obvious gaps in current knowledge (e.g., by means of an expert evaluation of items listed on the Most Wanted page). Science funders (or indeed anyone) could then put forward research proposals on such topics (e.g., via a Calls subpage, FundScience, InnoCentive, Mechanical Turk or by more traditional means). And while we are at it, we think science funders, job committees and review panels would profit from familiarizing themselves with the workings of collaborative platforms like wikis, particularly the aspects relevant to reliability, attribution, and outreach. Your organization, company, university, research subject or methodology probably has a page on some of the wikis described here — take a look at it, along with its history and talk pages, and you will almost certainly find something that needs improvement.
To sum up, the still fledgling Citizendium currently seems to be the closest match for a cross-disciplinary scholarly wiki anchored in the real world, and independent of whether it will allow original research to be posted in the future or not, this essential function in scholarly communication can be fulfilled by OpenWetWare (indeed, a similar separation of powers is one of the most healthy elements of most democracies). If widely adopted, this would entail a major shift in the way research is being done and communicated, towards what has come to be known as open science. As a side effect, commercial publishers would have to look for new things to publish, other than original research (non-commercial publishers like scholarly societies may, after the usual period of resistance, see more advantages than disadvantages in the groupware model). Reviews at different levels of expertise may be one option, also tutorials or other learning tools. All of this could be undertaken via some intelligently structured sets of groupware, too, depending on the incentives involved (in fact, such reviews are the scope of Scholarpedia). A side effect for researchers would be that they could use the author fees, page and figure charges and all other sums currently spent for publishing a paper for other purposes, including the maintenance of the shared public knowledge environments of the kind described here.
Of course, there are potential problems with such an enormous concentration of knowledge (e.g. for attacks and misuse, especially in relation to an international author identification that is currently being discussed). The obvious solutions are appropriate mirroring and otherwise transparency. Similar concerns would apply to a journal like PLoS ONE that does not have a scope in the traditional paper-limited sense mentioned above, yet one year after launch, it is doing pretty well. If it were to adopt a symbiosis with a suitable wiki in a way similar to the RNA Biology initiative — which requires authors to submit "a short manuscript, a high quality Stockholm alignment and at least one Wikipedia article" (emphasis added) — it might do even better. The first steps in this direction have already been taken.
This blog post was written and structured collaboratively by Daniel Mietchen, Claudia Koltzenburg and François Dongier, with further input received via the FriendFeed thread embedded below. As you can infer from the mindmap, the originally two-part series is now going to be continued, and as always, you are warmly invited to join the drafting of the next part, which will deal with the implications of the paper-to-digital transition for research funding.
This text and the associated mindmap are available under a CC-BY license.