OpenWetWare

Slides are up - on the wiki, that is

Later today, I will attend the Semantic Web Day in Leipzig and give a talk on Integrating wikis with scientific workflows. Perhaps appropriate to the topic, I did not prepare slides but a set of wiki pages.


Today was a strategic day, it seems


 

In particular, I came across the Wellcome Trust's Strategic Plan 2010-20 and ICSU's Strategic Plan 2012-2017.

Furthermore, as a follow-up to our previous conversations, Janet Haven from the Open Society Institute's Information Initiative sent me some supplementary questions in relation to their strategy (in which open science may or may not play a role, but it is now kind of short-listed as a potential major strategic element), on which I will briefly reflect here before passing on the ball to you.

Finally, a major scientific society asked me for input about the likely advantages and drawbacks of allowing, as per default, all content of the scientific sessions of their conferences to be broadcast live in any medium, and whether it would be sensible to make this a standard requirement whenever they sign the contract with the organizers of an upcoming conference.

I find the last item a bit daunting for tonight, so I will just link to a blog post on a related discussion (that of how to signal which way of broadcasting a conference is OK) and invite your comments, so that, hopefully, I can send them a useful reply within a few days.

Image via Phil, depicting a game of Shogi (将棋).


Three avenues to support open approaches to science - the cases of funding, data acquisition and knowledge curation

Today, I received an email from the Open Society Institute's Information Initiative:

We'd like to ask you to think about two to three emerging opportunities for--or threats to--open society institutions and values that you are aware of which are not receiving sufficient attention and where a funder like OSI could usefully intervene. We encourage you to suggest issues that are still very much on the horizon; there need not be an obvious solution to the points you raise.

I know that the OSI had and has many interesting projects running (also in regions and cultures normally off the radar, including some of those dear to me) but I have often (not just jokingly) taken its abbreviation to stand for "Open Science Institute", and so I take the liberty here to shrink the space of possible replies by concentrating on openness in science, anyway the most prominent topic in my blog.

My intuitive response would be that several inefficiencies in our current knowledge creation and curation systems cry for a test run of open approaches. Not sure whether I can distill this down to three issues, but let's get started by listing some of the ideas, and I hope that you can then help me structure and adapt them appropriately. To facilitate the discussion, I will resort to Cameron's depiction of the research cycle:

The journal scope in focus -- putting scholarly communication in context

Just imagine if all authors currently writing up manuscripts about a subject were instead to coordinate their efforts by collaborating on a single but detailed and balanced citable reference in which the topic would be described in and linked to all relevant contexts, updated as new research results pass peer review.

Since the advent of printed scholarly periodicals in the late Middle Ages, context in scientific communication has mainly been established by providing each of these publication venues (now collectively referred to as journals) with a scope, typically in terms of topics or methods covered, or with respect to a perceived threshold in newsworthiness.

Besides establishing context, the scope also defined the audience -- and thus indirectly the number of printed copies, their pricing and their distribution amongst individuals and institutions -- as well as criteria to be met by manuscripts in order to be considered for publication. Given the scope of a particular journal, consequently, knowledge about specialist terms (which may describe completely non-congruent concepts in different fields), methodologies, notations, mainstream opinions, trends or major controversies could reasonably be expected to be widespread amongst the audience, which reduced the need of redundantly repeating the same things all over and again. Interestingly, redundancy is still quite visible nonentheless, especially in the introductory, methods and discussion sections and the abstracts, often in a way characteristic of the authors (such that services like eTBLAST and JANE can make qualified guesses on authors of a particular piece of text, with good results if some of the authors have a lot of papers in the respective database, mainly PubMed, and if they have not changed their individual research scope too often in between).

Of course, there would be side effects: A manuscript well-adapted to the scope of one particular journal is often not very intelligible to someone outside its intended audience, which hampers cross-fertilization with other research fields (we will get back to this below). When using paper as the sole medium of communication, there is not much to be done about this limitation, and we got so used to it that few indeed would perceive it as a limitation at all. However, the times when paper alone reigned over scholarly communication have certainly passed.

So, in principle, the online version of a manuscript could link directly to any appropriate source of information (even blogs, for that matter, if no better source is available or accessible; see here for an example) but in current practice, linking is usually achieved paper-style, i.e. indirectly, via a list of references which itself is often not linked to online versions (let alone openly accessible ones) of the references in question, even though Uniform Resource Identifiers like DOI and SRef have been around for about a decade now, and International Standard Book Numbers longer still.

The above-mentioned hampered cross-field fertilization is crucial with respect to interdisciplinary research projects, digital libraries and multi-journal (or indeed cross-disciplinary) bibliographic search engines (e.g. Google Scholar), since these dramatically increased the likelihood of, say, a biologist stumbling upon a not primarily biological source relevant to her research (think shape quantification or growth curves, for instance). What options do we have to integrate these cross-disciplinary hidden treasures with the traditional intra-disciplinary background knowledge?

Interestingly, lack of context is also a consistent feature of most "Facebooks for scientists" (including ways.org which hosts this blog) - in fact, the whole set of scholarly pages on the www is the appropriate network for researchers but so far it is not optimally connected, particularly because formal scholarly communication has not yet fully hatched from the structures it had during the paper-based era (see also this nice overview of the current situation). Just imagine if all authors currently writing up manuscripts about a subject were instead to coordinate their efforts by collaborating on a single but detailed and balanced citable reference in which the topic would be described in and linked to all relevant contexts, updated as new research results pass peer review. Of course, this would shift the focus away from periodicals (and, in passing, render things like a journal's scope and Impact Factor superfluous), which is likely to meet resistance from the publishing establishment.

Groupware comes to mind in this regard, and wikis in particular: They allow to aggregate and inter-link diverse sets of knowledge in an online-accessible manner, basically for free. The by now classical example are the Wikipedias, and one scientific journal - RNA biology - has already announced that it requires an introductory Wikipedia article for papers it is to publish on RNA families, an idea that recently spurred an ongoing debate on the merits of such an initiative and of doing it with Wikipedia.

An investigation (video lecture by Bill Wedemeyer here, my brief annotation here) of the quality of a set of science articles in the English Wikipedia is currently being written up for classical paper-style publication but the preliminary results indicate that "[t]here is a subset of reliably helpful science articles on the English Wikipedia for outreach, teacher training, and general science education" (slide shown at 29:35min in the video). However, the distribution of the set of articles was skewed towards the Good Article and Featured Article classes which constituted only 2% of the English Wikipedia at the time of investigation, and it did not include articles in the humanities (they come next).

Furthermore, the larger Wikipedias have a serious problem with vandalism: take an article of your choice and look in its history page for reverts - most of them will be about changes like this or worse. This is less of an issue with more popular topics for which large numbers of volunteers may be available to correct spammy entries but it is probably fair to assume that most researchers value their time too much to spend it on repeatedly correcting such information if it had already been correctly entered once. Other problems with covering scientific topics at Wikipedia include the notability criteria which have to be fulfilled to avoid an article being deleted, and the rejection of "original research" in the sense of not having been peer reviewed before publication. Peer review is indeed an important aspect of scholarly communication, as it paves the way towards the reproducibility that forms one of the foundations of modern science. Yet I know of no compelling reason to believe that it works better before than after publication (doing it beforehand was just a practical decision in times when journal space was measured in paper pages).

Fortunately, the Wikipedias are not the only wikis around, and amongst the more scholarly inclined alternatives, there are even a number of wiki-based journals, though usually with a very narrow scope and/ or a low number of articles. On the contrary, Citizendium, Scholarpedia (which has classical peer review and an ISSN and may thus be counted as a wiki journal, too), OpenWetWare and the Wikiversities are cross-disciplinary and structured (as well as sized, for the moment) such that vandalism and notability are not really a problem (with minor exceptions, real names are required at the first three, and anybody can write about anything, particularly their fields of expertise). None of these is even close to providing the vast amount of context existing in the English Wikipedia but they might perhaps if the latter were broken down to scholarly useful stuff, as discussed above. Out of these four wikis, only OpenWetWare and some Wikiversities (here counted as one) currently allow for original research to be published on their site - in the case of OpenWetWare, this is indeed the main purpose.

Further, a number of more specialized scholarly wikis exist (e.g. WikiGenes, the Encyclopedia of Earth, the Encyclopedia of the Cosmos, or the Dispersive PDE Wiki) which can teach us about the usefulness of wikis within specific academic fields. I will not dwell on details here but instead list a number of features I deem desirable for future scholarly wikis, derived from experience with existing ones. These include, in no particular order:

  • search engines that integrate or otherwise compare favourably with major scholarly search engines on the web (the already mentioned Google Scholar and PubMed as well as, say, the BioText Search Engine that searches Open Access text and images)
  • pan-disciplinary scope, with consistent disambiguation of specialist terms (mainly but not fully achieved at Citizendium)
  • some system of peer review (basically, any wiki allows to leave comments, annotations or formal reviews on talk pages of users or articles but these ratings should be featured more prominently; templates like those visualizing article status at Citizendium may help with that); this may be as simple as disallowing individuals to add information to Citizendium when the only available support is their own non-reviewed research published at OpenWetWare - the real name policy will minimize misuse
  • the uploadability of all kinds of media (including videos, which are blocked at the Wikipedias but allowed at Citizendium, for instance, and the scope of the Journal of Visualized Experiments) that traditionally (if you can call a habit that barely is a decade old a tradition already) went along with paper-based publications as "supporting online information" (which would be easily integrated in an all-online article with no sharp space limitations)
  • stable versions for contents that has undergone peer review (like the Approved Articles at Citizendium), along with draft versions for anything else (including improvements to and updates of previous stable versions); like any non-protected page at the Wikipedias, these draft versions can serve as a playground, though a real-name policy would probably make it a more educational one
  • a separate namespace for references (already in use at the Dispersive PDE Wiki and the French Wikipedia, in test at Citizendium); as a side line, this would open up ways for new citation metrics, via the What links here function
  • attributability of contributions (automatically realized, though not in the traditional scholarly way, in any wiki with a real name policy like that at Citizendium, via the User contributions function; special arrangements exist at Scholarpedia and WikiGenes; OpenWetWare does allow nicknames but real names prevail; the Wikiversities have basically the same user name policy as the Wikipedias)
  • easy download of selected sets of pages for local archiving by individual researchers
  • licenses that allow unrestricted reuse and derivative work if the original source is properly acknowledged (typically CC-by-SA or the older GFDL, both of which are hopefully going to be compatible soon)
  • resource-effective design (see also discussions on the energy use of the internet and individual websites)
  • integration with the non-scholarly world (certainly achieved in the Wikipedias and Citizendium), particularly with students (cf. the Eduzendium initiative at Citizendium) and non-English contents
  • automation of the formatting, as already common in non-wiki environments, e.g. with LaTeX templates (none of the wikis I know comes close to that, albeit templates are heavily used at the various Wikipedias and, to a lesser extent but in a more consistent manner, at Citizendium; they seem to be rather rarely used on smaller or more specialized wikis); the same applies to references, though automated wikificationhas already progressed considerably here, despite the lack of wiki export functions at publisher's sites (or of suitable XML-to-wiki converters for those who provide XML)

One of the most useful templates in use at Citizendium is that for subpages (open the Biology article in a separate window to see what this is about) :  

  • The article's main page is a stable version, approved by an author with expertise in that field
  • Next comes the Talk tab that leads to the discussion page, as per default in any wiki
  • the Draft tab leads to the editable version (this only applies for articles that have already been approved; in others, the main page is editable)
  • the Related Articles tab roughly corresponds to "see also" in the Wikipedias but is more usefully structured for navigation and somewhat replaces the categories which are heavily used in Wikipedia but only to a limited extent at Citizendium
  • there are further subpages: Bibliography for further reading, External Links, Gallery, Video and so on

It is interesting to see that these individual subpages largely complement existing social networking tools and have thus the potential to replace them (or to be replaced by them), at least for scholarly purposes:

  • the Bibliography subpage is a context-based alternative to CiteULike, Zotero, BibSonomy and other reference managers, possibly in conjunction with Open Library, scholarly search engines and tools like Scribd or Papers. One problem wikis cannot solve is that of access to paper-based research publications but due to the current spread of Green and Gold Open Access initiatives, this is likely to change in the next few years anyway.
  • the External Links subpage is a context-based alternative to conventional social bookmarking as known from delicious and simpy 
  • Additional subpages could be tailored to meet the needs of individual categories of articles (e.g. properties of chemical elements, genes, stellar constellations etc.) or more general scholarly needs (e.g. peer review, slides, code, protocols, or bot-generated transcripts from video lectures)

Besides, User pages may provide context-based alternatives to individual pages at different networking sites, and possibly even to blogs like this one, while the Recent changes page could turn into an alternative for friendfeed, with items on your Watchlist (if you are logged in) equivalent to friendfeed rooms or personal feeds you are subscribed to. For the record, this social networking component of Citizendium has already been discussed two years ago, prior to its official launch and thus at a time when many of its current structures and their implications were not known yet.

Finally, and importantly, the easy availability of context (once the system would be reasonably well adopted by scholarly communities, and the encyclopedic corpus thus reasonably complete) would make it more easy to guide expert attention and thus to identify obvious gaps in current knowledge (e.g. by means of an expert evaluation of items listed on the Most Wanted page), and science funders could then issue a call for research proposals on such topics (e.g. via a Calls subpage, InnoCentive, Mechanical Turk or by more traditional means). And while we are at it, I think science funders, job committees and review panels would profit from familiarizing themselves with the workings of wikis, particularly the aspects relevant to reliability, attribution, and outreach (your organization, company or university probably has a page on Wikipedia - take a look at it, along with its history and talk pages, and you will almost certainly find something to improve).

To sum up, the still fledgling Citizendium currently seems to be the closest match for a cross-disciplinary scholarly wiki anchored in the real world, and independent of whether it will allow original research to be posted in the future or not, this essential function in scholarly communication can be fulfilled by OpenWetWare (indeed, a similar separation of powers is one of the most healthy elements of most democracies). If widely adopted, this would entail a major shift in the way research is being done and communicated, towards what has come to be known as open science. As a side effect, commercial publishers would have to look for new things to publish, other than original research (non-commercial publishers like scholarly societies may, after the usual period of resistance, see more advantages than disadvantages in the groupware model). Reviews at different levels of expertise may be one option, and tutorials or other learning tools another but all this could be done via some intelligently structured set of groupware, too, depending on the incentives involved (in fact, such reviews are the scope of Scholarpedia). A side effect for researchers would be that they could use the author fees, page and figure charges and all the other money currently required to publish a paper for other purposes.

Of course, there are potential problems with such an enormous concentration of knowledge (e.g. for attacks and misuse, especially in relation to an international author identification that is currently being discussed). The obvious solutions are appropriate mirroring and otherwise transparency. Similar concerns would apply to a journal like PLoS ONE that does not have a scope in the traditional paper-limited sense mentioned above, yet two years after launch, it is doing pretty well, and my guess is that if it were to adopt a symbiosis with a suitable wiki in a way similar to the RNA Biology initiative, it may even do better.

As a next step, I wish to go into more detail concerning the relative merits of paper-based and wiki-based scholarly communication. So I started a Wikiversity page on wikis in scholarly communication and invite you to add to it (I chose Wikiversity such that those who object to real name policies may make their voice heard, too, and I think I can deal with spam should it arise there). This overview may also help in working out an ecological footprint scheme applicable to research, as described previously.

I dedicate this post to my granny who passed away last week.


Open science featured in Boston Globe article

The Boston Globe recently published an article on the spreading of open science activities.

Why did you do your PhD - an interview with Larry Sanger, co-founder of Wikipedia and founder of Citizendium

The German network of PhD candidates and Postdocs, Thesis, publishes (in German) a quarterly journal, THESE, on doctoral and postdoctoral matters, mainly in Germany. For the autumn 2008 issue, I conducted an interview with philosopher Larry Sanger whose postdoctoral activities on the organization of knowledge in projects like Wikipedia, Citizendium and WatchKnow, will certainly be of interest to knowledge workers beyond Germany, and thus an advance online version of the interview is given here.

Why did you do your PhD, what does this have to do with your current activities, would you do it again?

I did my Ph.D. because I have wanted to earn a living as a philosopher since I was about 17 years old. Until I finished my M.A. I thought I would become a philosophy professor; then I became disillusioned with academia in a way that I imagine is pretty typical. I decided to finish my Ph.D. simply because I was so close to doing so, and just in case I changed my mind.

Having gone through the entire academic credentialing process has helped my career and current activities in many ways. It has acquainted me with the nature and justification of editorial and academic standards, and that has proven to be invaluable in leading reference work projects. I think having specialized in philosophy and in epistemology in particular, as well as in philosophy of law, also has helped me to articulate and defend the particular approach I have to collaborative knowledge production. Of course, the mere degree itself has opened doors and made me seem more credible to some of the people I've been trying to organize.

I would certainly do it again. But I might also have taken a few years off and gotten a B.A. or M.S. in Computer Science as well.

Why should PhD candidates and PhD holders contribute to Citizendium, as opposed to other online encyclopedic projects (Wikipedias, Knol, Encyclopedia of Earth, Scholarpedia, Larousse etc)?

There are many potential reasons why an academic might want to contribute to the Citizendium. I believe most do so because they find its unique mission compelling. What do I mean by that? It is the only project in existence with its configuration of qualities. On the one hand, it is a general, open content encyclopedia, fully collaborative, and open to public contribution. On the other hand, we make a general oversight role for experts, and we require real names. This unique combination of policies appeals to those who understand and appreciate the benefits and potential of Wikipedia, but who also understand the drawbacks of Wikipedia's particular system.

In short, the Citizendium may be, currently, the world's best hope for summing up knowledge both freely and credibly in one place. Other projects, such as Knol, Encyclopedia of Earth, and so forth, all have their good points, but they also all have a variety of drawbacks. Perhaps the largest drawback of the other academic-led projects is that very few of them are robustly collaborative. While I can't take the time here to explain my arguments for this, I think that collaboratively produced encyclopedia articles can be far superior to what is produced by individuals. So, while time will tell, I think the Citizendium holds the greatest promise; insofar as others agree with me, they naturally want to be part of something that is so world-changing and so important to spreading knowledge of their fields.

What about non-English sections? And would Citizendium be affected by the recently revised peer review policy at the German Wikipedia?

I'm not able to speak to the revised peer review policy at the German Wikipedia. My understanding is that they are not really engaging in peer review, but making sure that there is not abuse in revisions made by the newest contributors. Simply checking that edits are not vandalism addresses a different problem. It is obviously very far from anything like robust expert involvement or credible peer review.

We do hope to expand into other languages, including German, but it is more than a big enough challenge to get the English Citizendium off the ground very well at this time. The real difficulty will be to find people who will lead the new projects, in other languages, on a full-time basis. We might end up simply announcing a sort of rough franchise of the idea of the Citizendium.

What are the long-term perspectives of integrating encyclopedic projects (which generally operate a "no original research" policy) with scholarly wikis, e.g. of the OpenWetWare type?

I have given that quite a bit of thought, and for a long time I thought that it would be both possible and desirable to pool forces, somehow. Having tried to start the Citizendium as a fork of Wikipedia, however, has given me insight into the special difficulties of incompatible editorial policies and very different communities, or editorial processes. The most profound discovery I think I have made is that content deeply encodes editorial policy, and for that reason it is extremely difficult if not impossible to merge projects that have very different or incompatible editorial policies. But even small differences in editorial policies can have huge effects. So my hopes are not high for usefully combining content projects online, generally. One has only to look at answers.com and other search and reference aggregators, and one gets a sense of what the problem is.

We are, of course, open to people porting content from dormant content projects, but, as you can well imagine, we are not really interested in changing our own editorial policies to make a wholesale merger happen. So any proposal we (or others) might make about a content merger would simply be an invitation to close down their shop and adapt their content to our system. Few if any people will be able to take such an invitation seriously, at least not until we are more credible.

What seems more possible is that content sources might populate Citizendium subpages--pages where you can find different kinds of reference information about a topic.

Citizendium has also launched an educational initiative, Eduzendium. Considering that young researchers near the completion of their PhD are often involved, in overlapping or adjacent periods, with both the student and the teaching side of coursework, is there something special that they might gain from or offer to this project?

Eduzendium has already been successfully demonstrated to be a very innovative, interesting assignment for university students. The task of crafting an excellent, broad introduction to a topic might be easy and boring to the instructor, but to students--especially advanced students—it presents exactly the sort of challenge from which they can learn most. In addition, students whose work is displayed publicly tend to do their best; and they are also sometimes helped by Citizendium authors and editors. You might have heard of instructors assigning work on Wikipedia for college credit. Eduzendium is similar, but we have many, many more topics that are completely open; and our community is far better behaved. In some ways it is a superb venue for public, collaborative writing by advanced students.

Intructors use Eduzendium in a few different ways. For example, you can assign students specific topics, or you can assign groups (or the whole class) work on a topic. It is quite adaptable and I would strongly encourage your readers to give it a try! You will benefit, and by giving free content to the whole world, many others will benefit along with you.

A good occasion for that would be our Write-a-thons on the first Wednesday of each month or the Workgroup Weeks, starting with Biology Week from September 22-28.

How open can science and research be?

I am wondering how far we can get with "Open X" movements in science and research, and I will combine my musings about this with a recommendation to attend a satellite event at the Euroscience Open Forum 2008 in Barcelona.

First, let's consider how far we have come in terms of opening up the research process:
* Open Access in the narrow sense, i.e. to published or at least peer-accepted research results, is real for a substantial share of research output and rapidly gaining ground (for most recent updates, click here).
* Open Access to the scholarly review process is gaining ground (public or interactive peer review, e.g. here).
* Open Access to empirical data (Open Data) is moving forward, too.
* Open Access to software (Open Source) is driving many aspects of society, including wikis and many research projects.
* Open Access to encyclopedic knowledge is becoming real on the heels of Wikipedia and Citizendium.
* Open Access to lab notebooks is being experimented with at OpenWetWare.

To sum up, there are not too many aspects of research that currently remain entirely in the dark. They basically boil down to grant writing (an attempt is here) as well as the associated review and grant allocation procedures, bookkeeping (which is partly open in much of Scandinavia, within the wider framework of Open Government), the actual research and data analysis, and to writing up the results for publication.

I do not see any technical issues prohibiting complete openness of the whole research cycle, and so I deem it a valid
target to aim at, already at the current stage of technology. However, people more involved with the practical implementation of these things may have more complex views on these matters, and so I am glad to see that such topics found their way into the program of ESOF 2008, in the form of a satellite event entitled Collaborating for the future of open science where experts will discuss them.