Three avenues to support open approaches to science - the cases of funding, data acquisition and knowledge curation

Printer-friendly version

Today, I received an email from the Open Society Institute's Information Initiative:

We'd like to ask you to think about two to three emerging opportunities for--or threats to--open society institutions and values that you are aware of which are not receiving sufficient attention and where a funder like OSI could usefully intervene. We encourage you to suggest issues that are still very much on the horizon; there need not be an obvious solution to the points you raise.

I know that the OSI had and has many interesting projects running (also in regions and cultures normally off the radar, including some of those dear to me) but I have often (not just jokingly) taken its abbreviation to stand for "Open Science Institute", and so I take the liberty here to shrink the space of possible replies by concentrating on openness in science, anyway the most prominent topic in my blog.

My intuitive response would be that several inefficiencies in our current knowledge creation and curation systems cry for a test run of open approaches. Not sure whether I can distill this down to three issues, but let's get started by listing some of the ideas, and I hope that you can then help me structure and adapt them appropriately. To facilitate the discussion, I will resort to Cameron's depiction of the research cycle:

Normally, none of these steps are being performed in the open, not even publishing (most scientific journals are still behind subscription barriers, but big changes are ahead), so let's start at the idea stage: Any idea that is neither obvious nor obviously stupid will need time and resources (particularly for experimental studies) to be developed and tested. Where do the resources come from? Well, ideally from a mixture of

  1. Some sort of no-strings-attached baseline grants for established researchers (with no peer review before the money is awarded, but with public peer review from then onwards) or winners of scientific competitions (which necessitates some form of review, but can best be done fully in the open), to try out ideas at their very early stages and collect preliminary data
  2. The classical "calls for proposals" schemes in which funders define the rules, and scientists bend and squeeze their research proposals (ideally in public, so as to avoid multiple reinventions of the wheel) to fit in, to develop ideas until their realization
  3. Some not-so-classical "calls for funders" schemes, in which scientists lay out their best ideas (obviously in public), and funders (possibly including scientists with baseline grants) can choose which ones to fund, or to what extent.

While non-open (and lengthy) variants of 2 and 1 are currently the norm, and open approaches would no doubt render these schemes considerably more efficient (e.g. by avoiding multiple reinventions of the wheel, but also by quicker error correction, and enhanced interaction amongst participants), I think the highest impact can currently be achieved by actively supporting developments in the direction of scheme 3, which is only about to begin to be explored, though public peer review of manuscripts (at the opposite side of our research cycle) is gaining ground and may serve as a model (to avoid confusion: in most contexts of the research cycle, "open" means "in public", while in a peer review context, "public" is used directly instead because "open" has kept its pre-web meaning of "revealing the reviewer's identity", which is not necessary for the web-public system to function). In summary, my first recommendation would be to specifically support collaborative open research funding environments, in which research proposals and funding decisions, along with their discussions, take place in public. If that is too far-fetched, then a scientifically rigorous test of the efficiency of the current peer review system (especially for grants but also for manuscripts) would already be an important step forward. A proposal in this direction has been sketched out here one year ago, ready to be fleshed out for the upcoming deadline of April 30 — anyone interested?

OK, suppose we now have both a developed idea and sufficient funds to put it into practice. The next step in which open approaches can make a difference is then that of recording data and making them available. Coincidentally, a one-day symposium dedicated to precisely that took place on Saturday, attended by about 60 people in person and about 400 via live streaming (another way of open sharing). One of the presentations there focused on the use of free (but not always open) tools for recording, visualizing, analyzing and archiving data in public and is embedded below:

I have recently started to move some parts of my research notes online, using yet another tool — OpenWetWare, which is free and based on open software, though not all customizations seem to have been made public. Via the Recent Changes page there, I got to know another novice there, and we are currently exploring to which extent we could join forces where our research overlaps. Where Jean-Claude bundled together many different tools to handle his data, OpenWetWare is closer to a one-stop shop that can be integrated with some typical research workflows in the biomedical sciences. Yet it is not quite there yet. Along these lines, a research proposal has been drafted in public and was submitted to the DFG 9 months ago, so that notification about the outcome is hopefully not too far away from now. If such data-centric aspects of research could be linked with people-centric aspects by means of social networks, this would allow to move towards more collaborative modes of research and away from using ill-suited journal-level metrics for the evaluation of individual scientists, departments or institutions (and to bypass article-level metrics, which are an important but in itself not sufficient step in this direction). So far, though, few of the so-called "Facebooks for scientists" (including Mendeley) provide integration with scientific workflows, few of them (including which hosts this blog) are entirely open source, and none meet both criteria and none are widely used to discuss the research as it happens. My second recommendation is thus to support collaborative open research environments in which data (however defined, and be it equations) are recorded in a quotable way in public (and with a Panton-compatible license) as soon as possible after they have been gathered (thus reducing the timespan between data acquisition and formal publication, and drastically increasing the amount of data — and with it reproducibility — available for scholarly communication), ideally in a manner that integrates standardized and public tools for the processing of public data and allows for social filtering of information on an open platform (open also in the sense that non-scientists are welcome).

Supposing we now have some relevant data processed in such an open research environment, the next step would be to inform the world about it. Current practice is to write up an article (still called "paper", and PDF is not much different from that), and communication of the results before formal publication is kept to a minimum (usually conferences) in many disciplines. This means that the results, when "published" (we are coming to a close with one cycle) are already outdated in fast moving fields. It also means that the new methods or findings are reported in a container format that is not well suited to establish relevant context because it integrates in a very inefficient fashion with the remainder of existing human knowledge — wikis, for instance, achieve this much more readily. My third recommendation, hence, is to support collaborative open knowledge environmentscollaborative efforts to collect, structure, and update knowledge and to render it conveniently accessible to the public for free. If these knowledge environments are reasonably complete and accurate (as well as appropriately licensed), gaps in the knowledge environments (and cases of duplication or disambiguation) could be more easily identified than in the current literature and thus serve as seeds for new research proposals, possibly even in an automated fashion. Again, such environments with version control would allow to reuse log information for evaluation purposes — way more fine-grained and scalable (and open to newcomers, especially young scientists) than anything possible at the journal level, particularly when contributions to open knowledge environments can be mashed up (transparently, in public) with contributions to the other open environments.

What would you suggest if you had to single out three major avenues in which open institutions or values could be supported with a long-term perspective, starting soon?

The Friendfeed part of the discussion is embedded below:



philacour's picture


Nice post

First, Cameron's image is enlightening, but could be added some extra links, from one element to the other, beside the circle movement.
Second, the term "data" is overrated and should be replaced by the concept of document. In particular when one takes semantics seriously into consideration. The idea of separating data from their "interpretation" (e.g. the question you ask them, the way you use them) is more than questionable. I'll post some more about this point, which is crucial, especially for cultural sciences (Human and Social Sciences + Humanities)
Finally, and more importantly concerning the OSI, what do we mean by "Open" ? Stallman thus states a clear difference between free software and open source, but what is stake with Open Knowledge (a term I would prefer to use, instead of the not-so-open-to-human-and-social-sciences-and-a-bit-positivist-maybe "Open Science") ?

Philippe Lacour
Post-doctorant (Marie Curie Fellowship), Université Libre de Bruxelles
Chercheur attaché, Centre Marc Bloch (Berlin)

Klaus-Peter Speidel's picture

One big issue for Open

One big issue for Open Science (as for all content on the internet) is
the large number of things being published and the risk of missing the
most interesting ones.

There's a big risk of loosing sight of what's really interesting if
there's no curation on a website.

So one issue is curation mecanisms. They should eventually be reviewed
and classified.

It would be interesting to have case studies on how Open Communities
(have) react(ed) to clearly valuable posts. This would help to define
best practices in terms of (community) curation.

A danger for online journals with edited content (but maybe you don't
call this open?) is that the power is simply shifted from reknowed
scientists in journals to less well-trained but highly motivated and
tech-savy curators on websites.

Are voting processes efficient and effective to make sure that the
best posts emerge and get more attention than less interesting ones?

One anecdotical, but communicable way to start this would be to get a
scientist who has gotten his paper accepted in a authetifying journal
to publish it under another name on an Open Science site to see how
the community reacts and whether it earns approval, what returns (s)he
gets, etc.

To tell you the truth: I'm very scared of commitees of online-editors,
but also scared of community decision (public voting on notability of
publications). I think both imply risks of bias (look, for example,
at what sort of stories get featured on slashdot).

It might turn out that the quality prevails on scientific websites
even though this doesn't happen on more general websites like

It might be that people are naturally more concerned with keeping the
quality up on this sort of sites.

Something that could indicate that this is the case: On we
never get any stupid or spam questions and submissions for our R&D
problem-solving competitions.

We think that three things accounts for this:

1. The high technicality of the problems,
2. The fact that you need to create an account and log in (but this is
also the case on slashdot)
3. The fact that solutions are, sorry to say that in this context, not
open, and that your question or solution has no chance of being seen
by too many people

Klaus-Peter Speidel, VP Concepts & Communications

daniel's picture

Re: Nachdenken

1) The post (and also the presentation in which the image originally appeared) were meant to simplify the matter, and I think the image does a good job to help with that.
2) I agree that data are not data when their provenience is not attached to them - that's why open notebooks would be so cool. In the Semantics context, however, "object" may be better than "document".
3) I agree that Open Research is a better label, though at the moment, there is little difference between the two, given that Open approaches are even less common outside science in the narrow sense. For a discussion of the matter from that perspective, see here.

daniel's picture

Re: One big issue

Ad social filtering, see this post.

There are several karma schemes around (with StackOverflow providing a good example) that already go in the direction of establishing trust in online communities.

Not publishing the solutions may well be central to the business model of your site, but publishing them (perhaps after some spam check) would be a greater service to the communities concerned.

philacour's picture

Was ist das, was geöffnet ist ?

Daniel, hi
Could you expand on "object" being better than "document" ?
Also, I wouldn't speak of Open Research, but of Open Culture: Research, Education, Artistic Creation... You claim that Open Approaches have less relevance outside Research : but on what ground, exactly ? See for instance the artistic use of Free Licenses (Creative Commons and the like): for images, music, video...

Philippe Lacour
Post-doctorant (Marie Curie Fellowship), Université Libre de Bruxelles
Chercheur attaché, Centre Marc Bloch (Berlin)

daniel's picture

Open are the doors to knowledge and its creation


to allow for knowledge to advance, claims must be referencable, and so I was thinking of systems like the "Digital Object Identifier" as the prototypic way in which we ought to handle not just formally published articles, but basically anything that researchers do and that can be recorded - give everything a persistent identifier (or Uniform Resource Identifier), and it won't really matter whether you call that resource an "object" or a "document". As a side effect, multiple ways to assess the performance of individuals or groups of researchers will become possible, especially by taking into account relationship between.

I did not claim (nor do I now) "that Open approaches have less relevance outside research" - I merely stated that open approaches within academia are more common in the (data intensive) natural sciences than in other fields. I admit that I might be wrong with that assumption (we are lacking data on the issue), and I agree that open approaches to science and research have to be put into the perspective of a society that is becoming more open in many different ways, including those you mention.