Original PDF Flash format introduction-to-science-commons  


Introduction To Science Commons

Introduction to
Science Commons
John Wilbanks
August 3, 2006
Executive Director, Science Commons
James Boyle
William Neal Reynolds Professor of Law,
Duke Law School

www.sciencecommons.org

Imagine a Brazilian postdoctoral student driven to cure
hundreds of databases to access, and thousands of data
malaria. She knew she would not be able to do her work
sets. The digital knowledge is simply overwhelming. This
in Brazil with the same impact she would have in the
too is a global problem, one that is faced by commercial
United States or Europe (she wouldn't have the resources,
and academic researchers from every country. It is one
or the level of access to journals, tools, and
that is actually exacerbated for the "information rich."
collaborations) so she joined the legions of expatriate
© Robert Cudmore; licensed to the public under Attribution-
scientists in Boston. She is ridiculously talented, and very
ShareAlike 2.0.
lucky. She gets a prestigious grant and finds a position at
Harvard.
Of course, in theory, computers could help us mine the
She is working on a protein called glycophorin A. It's a
key part of the way malaria infects blood cells. She
checks the major literature repository and finds nearly
2000 papers with a glycophorin A search. Her 50%
overhead from her National Institutes of Health grant,
combined with the grants of the other researchers there, is
enough to pay for an elite library with subscriptions to
all the journals. So at least she can read them. Yet behind
her stand thousands of other scientists and potential
scientists from around the world who cannot get access to
this material and who thus are lost to her and to us as
potential collaborators.
There are many problems other than access to scientific
wealth of data that computers have made available to us.
journals and research to be dealt with, some of them more
Our researcher could use some advanced technology to
fundamental. But even after the inequalities in access to
help her. She could use software tools to extract the facts
basic and scientific education, and after eliminating
from the literature, to find new connections in the
research problems that require hugely expensive technical
existing knowledge, to tie datasets and journals together
infrastructure, we still effectively "discard" minds we
and tag the information so that it could be found by
might need to solve problems because they do not have
others in the future. Unfortunately, the contracts that
full access to the research texts they need. Given the
Harvard signed with the publishers often make that
rising cost of scientific publications and research
illegal, and digital rights management technologies
services, this group is not confined to the developing
enforce those contracts.
world. It is a global problem.
Stay with our main character. If she reads all the papers at
the rate of one a day, it will take her five years to process
©3rd Coast Chick; licensed to the public under Attribution-
the relevant knowledge about her target, much less the
NonCommercial-ShareAlike 2.0.
dozens of related entities in the cell that are involved in
malaria. And this is just the documents. There are
Introduction to Science Commons
www.sciencecommons.org
1
John Wilbanks and James Boyle
September 7, 2006

her at Harvard. All this takes time. And even after finding
If she builds a collaboration with the inventor of the
the fourth author, and finding him willing to share the
World Wide Web to try out his new "semantic web"
materials, there are more hurdles.
technologies on the articles and data she needs, she puts
Sending the cell lines from his institution requires the
Harvard at financial risk for breach of contract. [The
execution of a contract called a Materials Transfer
semantic web is explained in more detail in the
Agreement. And everyone involved in science agrees that
penultimate section of this paper.] And she's not allowed
these can be a problem. Wendy Streitz and Alan Bennett,
to email copies of key papers to her collaborators, either,
of the University of California Office of Research
so she can only really work with other scientists who
Administration and Technology, capture the problem
have access to wealthy libraries. There are software
eloquently from the scientist's perspective: "One of your
companies that serve the pharmaceutical industry that
colleagues at BigAg, Inc. (or at BigAg University) says
might be able to help but their software costs $100,000 a
that she'd be happy to send you her transposon insertion
year, more than twice her salary.
lines that saturate the right arm of chromosome 9; you'll
So she uses the services available to her - free text search,
just need to have a material transfer agreement (MTA)
Google, the free digital resources published by the United
signed by your institution. Six months later, the terms of
States National Institutes of Health, some biology driven
the agreement are still under negotiation, you've missed
desktop applications, Microsoft Office - and she narrows
the field season, your grant has expired and there is now a
down to a few key papers. By necessity she has thrown
better resource that's been developed at LittleAg
away the vast majority of information that might be
University--and if you start negotiating an MTA now..."
relevant but is separated by the accident of an inapposite
(2)
or unlikely keyword, or a source in an apparently
Of course there are other reasons things might not go
unrelated scientific process.
well. The scientist with the cells simply won't share at all,
She reads in a paper published by a prestigious journal
perhaps out of fear of being scooped, perhaps out of
that glycophorin A is a key mechanistic part of malaria.
competitive spirit, perhaps because it is a diversion of his
She needs to get some "research tools," actual physical
laboratory's scarce resources to generate materials for
stuff this time – cell lines with and without glycophorin A
another researcher. The Journal of the American Medical
- to verify the published results and start looking at
Association published a study in 2002 describing a world
potential ways to understand the mechanism in the
where 47% of academic geneticists had been rejected in
context of glycophorin A. Her grant covers this, to a
their efforts to secure access to data or materials related
certain extent.
to research by other academics.
So tools are available, but she needs the actual tools that
This represented an increase from 34% who had been
were used in the paper she read to reproduce the result
rejected in a previous study in the mid 1990s. There were
she is interested in. To get access to those is hard. She has
multiple causes involved in this pattern but the leading
to track down the contact information of the key authors,
one was the effort required to produce and transfer the
call the lab, discover that the tool actually came from the
materials, effort to which the MTA negotiation process
fourth author in another lab, call him, ask him to assign a
frequently adds. So our scientist is not alone when she
student to create a supply of the cell line and mail it to
2
Introduction to Science Commons
www.sciencecommons.org
John Wilbanks and James Boyle
September 7, 2006

finds it exceedingly hard to verify the results claimed in
We do not know how many cases like that there are. We
the paper.
do not know how much fuller our faltering drug pipeline
would be if at every stage of the process described, we
She presses on. She spends her grant money on the
had managed to lower even a few of the economic, legal
commercial tools, or tools that are similar to the tools she
and technical barriers to scientific journal research, data
is looking for, and is able to verify some elements of the
mining and linking, materials acquisition and testing. We
research, though it is a second-best approach. She decides
do not know what would happen if we could eliminate
to invest her postdoctoral time on looking at potential
some of the legal and technical barriers to building a
mechanisms for malaria, based on this glycophorin A
"semantic web" for science.
work. One year in, the results are promising. Two years
in she finds a paper, published years earlier, related to
Perhaps the result would be dramatic; some fairly
glycophorin A's activity in a totally unrelated field -
impressive scientists and computer scientists believe so.
cancer. It was published in an obscure journal and it
Perhaps it would be more modest. But where it is
wasn't very well indexed at the time, so it would have
practicable to do so, lowering those barriers is clearly a
been very hard to find. Even if it had turned up in her
good idea. It might be a great idea. That is the idea
searches she would probably have ignored it in her
behind Science Commons and it would be surprisingly
attempt to narrow the field. But it contained a nugget of
cheap - by the standards of science funding - to make the
knowledge that would have saved her a full year of
idea a reality.
money and a full year of progress towards the end goal of
curing disease. And it means that her key result has
already been published, though not in the context she was
exploring, which makes her paper much less likely to
help her get tenure, or another grant.
Sometimes, of course, that nugget of information is
necessary for the science to progress and, though it is out
there in the archives, it is never found. Sometimes it
never even gets into the literature, because the materials
necessary to do the experiment cannot be acquired.
Sometimes the experiment is not even attempted because
scientists with talents and good ideas do not have
practicable access to the literature. And this holds true
whether the research is on a drug for a neglected disease,
like malaria, for which the commercial market is in doubt
and which will probably need alternative sources of
funding, or research on a drug for a disease that has a
thriving commercial market, such as diabetes or heart
disease.
Introduction to Science Commons
www.sciencecommons.org
3
John Wilbanks and James Boyle
September 7, 2006

History of Science Commons
expressed strong interest in the possibilities of developing
the creative commons model in the scientific area.
Creative Commons was formed to deal with a problem of
Several times, in fact, board members expressed the
access to materials caused by the conjunction of
feeling that the Creative Commons approach might be
technological developments - computers' increasing
more of a "killer app" in science than in culture.
capability to store and process data vastly enhanced in
Recognizing that developing open pathways for scientific
effect by interconnection via the World Wide Web--and
research would be complex and contentious, the Creative
legal change. Creative Commons enables creators to
Commons board did not feel that at that point we had the
select among various copyright license options to make
expertise or the technical capability to enter this field.
their work available to the public on generous terms. The
Creating an open regime of sharing and reuse in the
licenses are designed so that they can be understood not
sciences is a complicated proposition. Though copyrights
merely by lawyers, but also by ordinary people and even
guard the final published documents in peer reviewed
by computers - the license terms are expressed in an easy
journals, patents protect inventions (some more unique
to understand "commons deed" complete with icons, but
than others) and a web of handshakes and contracts guard
also in "metadata" so that one can search not only for the
the tools, materials, datasets, databases and informal
content of the work, but also for its degree of legal
knowledge transfer of day-to-day science. What works
openness. (Give me calculus textbooks that are available
for a biologist will likely fail for a physicist, neither of
for non commercial use and modification, say.)
whose solutions will perfectly solve the legal problems of
the anthropologist.
In some fields, especially the life and health sciences, the
commercial opportunities are as immense as the risks -
Creative Commons' charge initially was entirely in the
hundreds of millions of dollars in research and testing
cultural and copyright realms - in the world of music,
costs bet against the possibility that a drug will make it
texts, blogs, pictures, films and so on. Nevertheless, at the
through clinical trials and perhaps become a billion dollar
first board meeting, the founding board members
blockbuster. Thus, any sharing regime for science must
4
Introduction to Science Commons
www.sciencecommons.org
John Wilbanks and James Boyle
September 7, 2006

be flexible, adaptable, contemplate copyrights, patents,
and contracts and more. From the beginning, it must be
compatible with commercial innovation as well as the
academy. Ill-conceived intervention that makes
commercial development more difficult will hurt rather
than help.
The first six months of Science Commons revolved
around building the right set of people to run the project
Adding to the complexity of the pure legal and policy
and conducting a broad survey of the various discipline-
work is the sheer size and variability of the stakeholders.
specific efforts in open science.
Science requires universities, funders, companies,
researchers, publishers, consumers, technicians, librarians
John Wilbanks, an entrepreneur and former
and more. Each stakeholder represents an opportunity to
bioinformatics CEO with experience at Harvard Law
inject control into the scientific process, especially as
School's Berkman Center and the World Wide Web
each one moves into the networked culture - and some of
Consortium, came in to lead the effort as Executive
that control is beneficial or necessary. To find which
Director.
barriers to sharing are unnecessary is a problem that
We built an advisory board composed of two Nobel
demands both interdisciplinary and practical
Laureates, Sir John Sulston and Joshua Lederberg, a
investigation. To remove the unnecessary barriers
Berkeley scientist and leading expert in "open access"
requires an ability to produce consensus among disparate
publishing, Michael Eisen, the distinguished innovation
parties, and even more, a large degree of humility: neither
economist Paul David, and the prominent intellectual
the problems nor their solutions might be predicted by
property academic Arti Rai.
reigning academic theory.
Four board members of Creative Commons join this
Creative Commons returned to science in early 2005 with
group and act as a steering committee: James Boyle, from
the launch of Science Commons. Millions of creative
Duke, Mike Carroll an expert on intellectual property and
works were already on the Web under Creative Commons
scholarly publishing from Villanova, Hal Abelson, a
licenses (the current count is 140,000,000 - ranging from
renowned MIT computer scientist, and Eric Saltzman, a
music, films and political blogs, to textbooks and MIT's
lawyer, filmmaker and former Director of Harvard's
Open Courseware) and we had gained significant
Berkman Center.
experience in open licensing approaches, complex
negotiations, and community building. We had the
Science Commons hosted a series of private meetings
ambition of achieving for the world of science and data,
covering research funding, drug patent licensing,
what Creative Commons had begun to achieve for the
biological materials transfer, and access to scholarly
world of culture, art and educational material: to ease
literature. Wilbanks made a tour of different
unnecessary legal and technical barriers to sharing, to
communities: biology, chemistry, archaeology,
promote innovation, to provide easy, high quality tools
geospatial, physics, geography and more.
that let individuals and organizations specify the terms
We reached out widely and formed relationships with key
under which they wished to share their material.
players in discipline-specific efforts in agriculture,
Scientific American seemed to like the idea.
neuroscience, anthropology, information technology, and
Introduction to Science Commons
www.sciencecommons.org
5
John Wilbanks and James Boyle
September 7, 2006

more. We forged working relations with funders of
among disparate communities, merging legal and
research, universities, technology managers, software
technical solutions, making deals comprehensible to non
companies, standards organizations, and libraries. We
lawyers, and using metadata and the semantic web to
were delighted by the reception that we received. From
produce "usable openness" and machine-readable
the beginning we were guided by a set of principles. Like
contracts. Finally, we sought places where all sides
Creative Commons, our proposals use coordinated
agreed there was a problem and where many stakeholders
private action, not public fiat, to lower barriers to
would benefit from its removal.
research and sharing. This makes these proposals both
Sample metadata from a Creative Commons license
much cheaper and faster to implement than solutions
which require Congress or other regulators to act.
Out of this research, we discovered a mix of legal,
Wherever possible our solutions were based on both
cultural, and technical controls - at least one of which
empirical and interview-based investigation of the
bore down on the scientific process at each step,
problems. We tried to discard preconceptions; when we
preventing the realization of the promise of new
formed the organization, for example, we expected to
technologies like the Semantic Web. Some of these
spend more time on patent pooling. While we do not rule
controls were necessary, of course, but many were not.
that out, we found Materials Transfer Agreements to be a
Some of the problems came from fractured contract
more important area on which to focus initially. We tried
regimes which created high transaction costs and
to come up with projects where success was not an all-or-
confusion, preventing the emergence of smooth
nothing proposition - selecting issues where any
electronic transfer systems for knowledge and research
alleviation of the problems we identified was a good
materials. Other problems were technical: digital controls
thing. We picked projects that played to our strengths and
designed to prevent widespread copying of entire articles,
to the considerable experience that Creative Commons
which also prevented the extraction of key facts from
had acquired in negotiating standard form agreements
papers for publication in new web languages. Some were
based on legal mistake; overbroad claims of copyright in
unoriginal databases, for example. Still others were a
matter of institutional policy, practical difficulty or
scientific culture. For example, commercial publishers
can hardly be blamed when even those scientists who
have the right to "self archive" their articles do not make
their work freely available online. Sharing between
laboratories is inhibited more by a complex mixture of
transactional, practical and prestige obstacles, than it is
by overbroad patents. And so on.
We realized that we could and should tackle each class of
problem individually, but that our overall goal should to
bring the projects together so as to enable the true
possibilities of open science in a networked world.
6
Introduction to Science Commons
www.sciencecommons.org
John Wilbanks and James Boyle
September 7, 2006

Proposals
innovation, or simply where our specific expertise could
add value.
In 2006, we began to act on our conclusions. We targeted
three areas; scholarly publishing, licensing policies, and
i.) Pragmatic Open Access Publishing: Some publishers
the realization of the "semantic web" for science. In each
of peer reviewed science journals are employing a new,
we have been running "proof of concept" projects and we
Open Access business model where the authors grant
now have early-stage efforts in scholar's copyrights,
generous rights in their articles to the public under
biological materials transfer, and the intersection of
Creative Commons licenses. These licenses make clear to
semantic web with Open Access content in neuroscience.
the public the broad range of uses they may make of the
Our projects are designed to intersect to yield evidence of
articles, without further permission or fee.
the benefits of the overarching Science Commons vision
The goal of open access is to broaden the dissemination
of open, networked science, but also to stand on their
of knowledge about the natural world to researchers and
own as worthwhile efforts in their own right.
other readers who can put this knowledge to use. But for
Scholarly Communication
this goal to succeed it is vital that readers easily grasp
what rights they are granted under the license; a
Scholarly communication in the sciences primarily
traditional Creative Commons concern. Publishers that
involves three kinds of information:(1) data generated by
have adopted this approach, and who are using our
experimental research,(2) peer-reviewed journal articles
licenses to implement it, include the Public Library of
explaining and interpreting the data, and(3) metadata that
Science, BioMed Central and Springer's OpenChoice
describes or interprets articles or their underlying data.At
program. (It is notable that this group includes
each of these levels, the Internet and associated digital
commercial, non-profit and government-funded
networks create a range of opportunities and challenges
publishing efforts.)
for changing the nature of what information is gathered,
ii.) Enabling Self-Archiving: It is increasingly common
stored and communicated as well as how and when such
for scholarly authors to be given rights to "self archive"
information is shared, identified and located.
their work in institutional repositories. Some journals
The Science Commons Publishing Project promotes
explicitly give these rights, while others are willing to
effective use of digital networks to broaden access to all
give them only if asked. The rights vary as to the versions
three types of information. Science publishing is
of the paper that may be posted and the timing of the
obviously an area that has attracted a great deal of
post, leading to confusion among researchers. Worst of
attention. There are many stakeholders already engaged
all, perhaps, even where the rights do clearly exist, they
in attempts to make scholarly publishing more open, and
are used only infrequently, at least partly because of the
a variety of strongly - some might say "religiously" - held
perceived practical difficulties involved in the process.
beliefs about which approaches work best. Science
Self-archiving could be an incredibly valuable way of
Commons approach has been extremely pragmatic and
achieving freer access to scholarly materials. Science
"non denominational." We have identified a series of
Commons has analyzed all the impediments to it and is
places where opportunities were not being fully seized,
working to minimize or remove them.
where absence of collaboration was preventing
Introduction to Science Commons
www.sciencecommons.org
7
John Wilbanks and James Boyle
September 7, 2006

We have already developed "Author Addenda" - a range
convened a working group comprised of publishers,
of short amendments, with varying degrees of openness,
librarians, and researchers to explore ways of better
that authors can attach to the copyright transfer form
associating research articles with research data and for
agreements from publishing companies. The Addenda
standardizing the metadata associated with both.
ensure, at a minimum, that scholarly authors retain
enough rights to archive their work on the public Internet.
Licensing
We are spreading the word in the scholarly community
Science Commons' Licensing Project aims to simplify
and finding that there is considerable interest from
licensing so as to speed science. We have been working
institutions who have good reasons to want to ensure that
on the creation of a "research commons" for neglected or
their researchers retain enough rights to enable self-
orphan diseases (so that funders can simply specify that
archiving.
funded research must be available to all researchers in the
In the Fall of 2006 and Spring of 2007 we plan to release field).

A Web-based tool that will enable faculty
We have also been approached by one of the world's
authors to generate the Addendum of their
largest pharmaceutical companies with the idea of
choice with all form fields automatically filled
forming a "tox commons" that allows all researchers to
in.
pool toxicity data from failed commercial drug attempts,
in a pre-competitive process of sharing. The idea is

Layperson-readable versions of the Addenda
simple. While a successful drug application results in
(similar to the Creative Commons "Commons
open data - the FDA requires publication and review -
Deed" copyright documents).
every failed drug results in secrets and obscurity. So a

Machine-readable versions of the Addenda to
tempting target, tried and again and again, can mean
enable advanced software usage of the
repetition of failure. It's as if each company has just a few
Addenda, database tracking, and empirical
pieces of the treasure map, and each company beaches on
evidence gathering. This builds on our pre-
a different set of rocks on the way.
existing metadata partnership with SPARC.
Enter Science Commons.

We are also developing an application that will
Take a drug target for which compound after compound
sit on the scientist's desktop and enable "drag
has failed in the clinic due to toxicity concerns, a
and drop" self-archiving to an appropriate
graveyard of over $10,000,000,000 in sunk costs and
repository. The Internet Archive has agreed to
uncounted years of now-hidden research. Extract all the
host CC licensed material for free
relevant facts about the target and its toxicity, its
permanently, and numerous institutional
mechanisms, interactions, annotations, and more, from
repositories - using tools such as D-Space -
the literature and databases.
are also available.
Attach annotations and data from the internal files of
iii.) Facilitating the Use of Metadata:
pharmaceutical companies who have tried, and failed, to
get drugs to the market. Integrate the relevant
Within its Publishing Project, Science Commons has
descriptions of biological materials and public data sets.
8
Introduction to Science Commons
www.sciencecommons.org
John Wilbanks and James Boyle
September 7, 2006

Broker a set of contracts for access and recontribution to
The long-term impact of this complexity is severe.
the data. Then let the scientists go after the combined
University technology transfer offices can become
knowledge, free of clickwraps and free to exploit
clogged with requests that ought to be routine. Scientists
$10,000,000,000 of previously private, unintegrated,
must waste time trying to negotiate agreements.
inaccessible, invaluable knowledge.
Commercial researchers find it hard to obtain materials.
The end result benefits no one - we get less research, less
In this introduction, though, we will concentrate on one
innovation, less diffusion of knowledge.
project that seems to exemplify the Science Commons
approach - the attempt to streamline of the process of
Discussions with stakeholders reveal a number of
acquisition of research materials.
recurring problems. Supposedly uniform agreements are
actually "customized" in time-consuming negotiations,
Biological Materials Transfer
although all players would benefit if they could bind
Research materials are essential to the practice of modern
themselves to restrict choices to a more limited set of
life sciences' experimentation. Cell lines, model animals,
standard options. Even the "short form" version of
DNA constructs, and screening assays each represent a
agreements are perceived as too long and too complex.
tool for testing and validating hypotheses of biological
The agreements themselves are hard to interpret and
function and human health. Each offers a perspective into
scientists often find them mystifying, (or ignore them
biology that cannot be replicated without access to the
altogether as a result.) Finally, there is no connection
material.
between efforts to streamline the legal process for
clearing materials, and efforts to streamline the practical
Research materials are developed in multiple
process of actually fabricating and transferring the
environments: university laboratories, startup companies,
materials themselves.
biotechnology companies, hospitals, and non-profit
research clinics. Some of the materials are patented;
It would be hard to find an area more perfectly suited to a
many are not. These tools are frequently licensed out to
Creative Commons-type solution. It is Creative
other institutions through material transfer agreements"
Commons' raison d'être to analyze creative communities
(MTAs). Thousands of MTAs are signed each year in the
to find out which are the most common terms under
biological sciences, covering such diverse materials as
which rightsholders are willing to make their works
genes, proteins, chemicals, tissues, model animals,
available, to generate licenses through a simple and
software, databases, "know-how" and reagents.
intuitive radio button interface that allows a range of
those choices to be expressed, (see Figure 1, page 5).
Although "standard" material transfer agreements exist
(the Uniform Biological Material Transfer Agreement, or
These licenses are expressed on three layers - lawyer
UBMTA, was developed in 1995) empirical research
readable contracts, human readable Commons Deeds,
confirms that the licensing of materials remains a
(see Figure 2, page 5) and machine readable meta data.
problem. A complex set of interlocking licenses covering
(See Figure 1, page 6) This is exactly what is needed for
dozens of different materials imposes significant
a more rational Materials Transfer system, particularly if
transaction costs simply to gain the opportunity to begin
the process of building consensus around such a system
research.
can be led by a trustworthy third party - neither a funder,
Introduction to Science Commons
www.sciencecommons.org
9
John Wilbanks and James Boyle
September 7, 2006

nor a research unit, nor an academic institution nor a for
experts with whom we have talked argue that for some
profit company.
materials it is scientifically practicable now. (MIT's
repository of standard biological parts is an example.)
More ambitiously, in the relatively near future the
The principal obstacles are not scientific, or a matter of
material that is now covered by some Materials Transfer
computer science, or metadata expression. They are a
Agreements will be capable of being synthesized directly
matter of law, social engineering and institutional
by DNA synthesizers. One could literally "print out"
commitment.
one's research material, or more likely order it from a
third party specializing in such work. The cost at the
Of course, the difficulties of procuring MTA's are not the
moment is about $2 a base pair, but it is dropping. As
only, not even the main reason that it can be hard to
MIT Professor Drew Endy points out this could
procure research materials. Competitiveness, secrecy, and
revolutionise the process of hypothesis formation, testing
the sheer hassle of producing and shipping the materials
and experiment. (Science Commons has been working
all play a part. The prevalence of these tendencies also
with Professor Endy on dealing with such issues in the
varies from one scientific area to another. But the overall
emerging field of synthetic biology.)
problem of obtaining materials is a huge one. The
literature indicates it may be the single largest reason for
Science Commons is exploring the implications this
the abandonment of promising lines of research.
could have for the MTA process. The "blue sky" idea
beyond streamlining of the MTA process (itself hugely
Even if the legal transaction costs made up only 15% of
valuable) is of a simple procedure by which a researcher
the total impediments, it would be well worth reducing
reading of a development in the literature, could merely
them. If, in doing so, we could make it easier to set up
"click to get the cell line." Materials, or the information
streamlined systems for obtaining "pre-cleared" materials
that allows them to be synthesized, would be
from institutional and commercial repositories, the effect
automatically deposited with intermediaries or clearing
would be remarkable. And for some scientists it might
houses, accompanied by metadata-expressed licenses that
actually affect the increasing tendency towards
clearly expressed the uses to which those materials might
possessiveness and secrecy. Studies on social sharing
be put. Clear licenses with, clear machine readable terms
networks indicate that one's willingness to help others is
would allow quick, perhaps even automated, matching of
directly related to one's experience of receiving such help
institutions or activities with the restrictions on a license.
in the near past.
At the very least, simple licenses with iconic
These beneficial spillover effects are possible but, the
representations of their terms would allow researchers to
licensing effort does not depend for its success on the
know which materials were available. It is hard to
achievement of such ambitious goals. If we could simply
overstate the advantages in streamlining that such a
streamline the MTA, or make it easier for Foundations
process could offer. And in some areas at least, one might
working on orphan or neglected diseases to create a
be able to click right from the description in the literature
"research commons" for all such research, we would have
of an experiment using a DNA sequence, to a cheap
achieved something extraordinarily important.
"print out" of that sequence ordered online from a low
Beyond that tangible set of metrics for success, the
cost intermediary, applying the terms of the standard
licensing project does express a larger vision - one central
MTA. This sounds like science fiction, but some of the
10
Introduction to Science Commons
www.sciencecommons.org
John Wilbanks and James Boyle
September 7, 2006

to Science Commons. The point is that the social and
field. When I search for Bronte within the author field,
legal engineering of science has largely lagged behind
my time is not wasted with articles or books about the
the technical engineering and investigation that it
Bronte's, nor with maps of a place called Bronte. But
seeks to facilitate.
metadata tagging can do much more than this. The
semantic web holds extraordinary promise for science.
Science Commons' licensing project attempts to use some
of the developments in computers and metadata, together
At its most ambitious. it would allow seamless
with more traditional legal and consensus building skills,
integration between scholarly articles, the data those
to make the process of legal clearance and practical
articles refer to, and to cross references with other articles
availability move at a pace closer to that of science itself.
dealing with similar processes in different areas of
science. But the process of mining, linking, tagging and
Data
cross-referencing that the semantic web requires faces
Introduction to the Semantic Web
extraordinary difficulties. Some of those difficulties are
financial. Tagging takes time and costs money. Some of
In the course of this paper, we have several times used
the difficulties involve the coordination of standards and
the term "semantic web." The phrase may be unfamiliar
formats for metadata, something that Creative Commons
to some, but the idea behind it is quite simple. "The
has considerable experience in.
Semantic Web is about two things. It is about common
formats for interchange of data, where on the original
Perhaps the single greatest obstacle to the semantic web,
Web we only had interchange of documents. Also it is
however, is that the process of integration it requires is
about language for recording how the data relates to real
now impeded by multiple barriers. The journal article is
world objects. That allows a person, or a machine, to start
copyrighted, and sits behind a digital fence. The data to
off in one database, and then move through an unending
which the article refers cannot be integrated because it
set of databases which are connected not by wires but by
too, is protected by licensing agreements, assertions of
being about the same thing.
copyright (some of them unfounded), and technical
controls. These legal and technical restrictions may be
A different way to put it is that right now we mainly use
aimed at preventing very different activities than those
network searches that look for words - say the word
necessary for the semantic web. (Stopping wholesale
"glycophorin." But of course, such a search would pull up
copying and transmission of the text of journal articles,
this paper (of little use to our Brazilian researcher) as
say.) But their negative effect is real.
well as a paper that was actually talking about the
biochemical process she wished to investigate.
To solve these problems, one needs an organization with
considerable experience in law, publishing, computer
The semantic web allows searches by function, or
architecture and metadata. And those of course, are the
meaning. "Show me all the statements in the literature
central focii of Creative Commons, and of the people
which deal with X interaction between glycophorin A and
who run Science Commons. (John Wilbanks actually
the malaria disease process." This is accomplished by
came to us from the World Wide Web Consortium
"tagging" information with metadata. One simple
(W3C) initiative on the semantic web for science, and
example that is familiar from another context, is to tag a
MIT computer science professor Hal Abelson serves on
bibliographic record with an "author," "title" and "date"
Introduction to Science Commons
www.sciencecommons.org
11
John Wilbanks and James Boyle
September 7, 2006

the Creative Commons board and Science Commons
out earlier, the logarithmic explosion of information in
steering committee.)
science overwhelms any one individual's ability to store
and model all the relevant science in her head.
Science Commons is pursuing a number of projects
aimed at enabling the semantic web for science. The most
The result is a "scalability problem" in life sciences:
fully developed at the moment is the Neurocommons.
while methods for generating information have gone
digital, methods for using that information remain
Neurocommons
stolidly analog. Technology can help. Bandwidth,
The Neurocommons project, a collaboration between
processing and storage are cheap. Machines can
Science Commons and the Teranode Corporation, is
transmute from a string such as "aaattcaggagattacaggta"
building on Open Access scientific knowledge to create a
to a physical molecule of DNA - and back again, making
Semantic web for neurological research. The project has
genetic information truly fungible, something that can be
three distinct goals.
shared via the Web. Advances in language processing
and ontology development allow for the construction of

To demonstrate that scientific impact is
machine-readable and interpretable representations of
significantly related to the freedom to reuse and
scientific information. Logic and reasoning engines can
technically transform scientific information
crawl across massive data sets and come back with
without violating the law. In short, that a large
suggestions on causation.
degree of Open Access is an essential
foundation for innovation.
As we suggested in our introduction to the semantic web
it is neither cheap nor easy to seize the moment and use

To establish a framework that increases the
technological advances to solve these human and
impact of investment in neurological research in
scientific problems. Legal and economic factors have to
a public and clearly measurable manner.
date muted the impact of new technologies on the life

To develop an open community of
sciences: copyrights and contracts intertwine with
neuroscientists, funders of neurological
software-enforced restrictions on reusing and
research, technologists, physicians, and patients
republishing knowledge in a more usable format.
to extend the Neurocommons work in an open,
The Neurocommons Project rests on the hypothesis that
collaborative, distributed manner.
there is enough information on the Web, in the form of
Today's life scientist faces a dizzying array of knowledge
taxpayer-funded databases and openly licensed scientific
sources. Peer reviewed journal articles, online
literature, to demonstrate the utility of a legally open,
repositories of sequences and pathways, robot-driven data
technically standardized approach to knowledge. In doing
collection, all must be integrated into experimental design
so, we wish to sow the seeds of a massive change in how
and analysis. Many scientists spend as much time on
scientific knowledge is licensed and reused.
Google and PubMed as they do at the bench; the
The life sciences represent an ideal test case for the
difference between success and failure in the lab or clinic
semantic web. Semantic Web technologies make the most
can be the judicious and timely utilization of information.
sense where there is a certain set of conditions.
But this is all local knowledge utilization. As we pointed
12
Introduction to Science Commons
www.sciencecommons.org
John Wilbanks and James Boyle
September 7, 2006

1. A massive amount of data: We certainly have
For problems which have these features, the semantic
that: Clinical images, robot-arrayed "gene
web is a natural fit. Like the Web itself, the semantic web
chips", machines that can sort materials cell-by-
is intended to "scale" - to be capable of dramatic
cell, gene sequencers and massively high
expansion in size, mission, and reach - through a process
throughput chemical screens. There are
of decentralization rather than centralization, and an
hundreds of public databases, from flies to
emphasis on information reuse, not recreation. It is a
humans to plants, each potentially able to
means to capture and network the relationships implicit in
inform a decision or experimental design.
high volume data sets, or the outputs of sophisticated
analytic software. It can relate anything to anything, as
2. Rapidly changing knowledge: Every journal
long as that anything has a unique name. Data-driven
article, every paper, every experiment in the lab
relationships can attach to the descriptions of related
creates new knowledge about our bodies and the
genes and proteins, and to the knowledge about those
world we live in. This makes it very hard to
genes and proteins as described in the scientific literature.
apply traditional computational approaches or
even integrate the data. We know what goes
The semantic web does not require that the picture be
into a car - engine, tires, wheels, axles, fenders -
complete. If the relationships between one gene and
and thus we can create a fairly fixed
another change as our knowledge changes, the technical
representation of a car for a computer, for
burden is no lower than adding another hyperlink
model building and more. But we don't have
between web pages. And the concept of integration
anything resembling consensus to items as
around unique names makes it easy to create serendipity
fundamental as "what is the role of the non-
between researchers: instead of bumping into a colleague
coding DNA in the human genome?"
in the hall at the right time, a scientist can see the
ecosystem of knowledge around a particular gene
3. Distributed knowledge and expertise: the nature
expression in the brain. Whether that knowledge comes
of modern life science is specialization. One
from her work on Alzheimer's or a distant colleague's
scientist is an expert on the genetics of
work on Huntington's makes no difference. It all gets
Huntington's Disease (a rare neurodegenerative
published to the semantic web.
disease) another an expert on the impact of
protein folding on Alzheimer's Disease. The two
The legal and economic problem
both work on the brain, on many of the same
genes and proteins. But they attend different
If these potential advantages are real, why do we not
conferences and are pressed for time to study
already have a vast semantic web for life sciences? The
the refereed literature outside their own disease.
technological and standards problems are being solved.
Possible synchronicities between the researchers
The National Institutes of Health has invested in the
are at a minimum because their knowledge can't
national centers for biomedical ontologies, language
interoperate without distracting them from the
processing technologies are evolving in leaps and bounds,
lab.
and public databases are investing in machine readability
and open licensing. The problem is simpler. Despite what
Introduction to Science Commons
www.sciencecommons.org
13
John Wilbanks and James Boyle
September 7, 2006

appears to be an information overload, the sparse
publishing techniques to automatically add knowledge to
availability of truly machine-readable scientific
the Neurocommons graph, and active community
knowledge has prevented robust testing of the Semantic
development.
Web. The barriers we described earlier - legal, technical
Again, Science Commons has sought partners who can
and digital - have prevented easy aggregation of data, and
provide credibility and competence. Teranode, a for
thus have denied us the ability to test rigorously whether
profit company, provides direct financial support to the
this approach will indeed be as productive as it promises
Neurocommons project as well as in-kind donations of
to be.
software and services.
The Strategy
Jonathan Rees, formerly in charge of the curated protein-
protein interactions database at Millenium
Rather than complain about the problem that machine
Pharmaceuticals and a veteran of MIT's project MAC,
readable knowledge is sparse, the Neurocommons Project
leads the project on a day to day basis as a Science
is taking as its focus those areas where we do have truly
Commons Fellow.
open access information and thus can build a test case.
We have formed an initial community of
The project is deeply involved with the World Wide Web
neuroinformaticists, practicing neuroscientists, Semantic
Consortium's Health Care and Life Sciences Interest
Web experts and language experts to ensure our work is
Group as well as MIT's Computer Science and Artificial
accurate and scientifically valid. The first stage is
Intelligence Laboratory (which hosts Science Commons).
underway:
Conclusion

Using automated technologies, we are
extracting machine-readable representations of
We have tried to pick projects where "even if you fail a
neuroscience-related knowledge as contained in
bit, you still succeed." In our most heady and optimistic
full-text Open Access literature, free text such
moments, we imagine a very different landscape for
as the PubMed abstracts, and legally open
science. No longer would the price of access to scientific
databases
literature act as such an impediment to research. Our
Brazilian researcher would be joined by potential

We then assemble those representations into a
collaborators from poorer countries and institutions, and
semantic web for neuroscience publish the
could share information freely with them.
resulting "graph" freely and
Whether the material was obtained from an invigorated

Assemble a standard software implementation
practice of self-archiving, from commercial or non profit
to store, update, and manage the changes to
open access journals, of from journals which make their
the graph as knowledge evolves.
work openly available after a fixed period of time, the
Plans for stage two of the project involve the deployment
world of scholarly literature would be more open - and
of additional software infrastructure, the development of
more open on standard and interoperable terms, easily
operational manuals so that interested parties can "port"
understood by all participants.
the entire Neurocommons approach into new scientific
What's more, literature searches would be transformed, as
domains without involving Science Commons, new
the technology of the semantic web cut through the
14
Introduction to Science Commons
www.sciencecommons.org
John Wilbanks and James Boyle
September 7, 2006

information glut that overwhelms scientists, allowing the
A Final Note on Funding
kind of cross-discipline, and cross-disease insights we
Science Commons has achieved a remarkable amount
can only imagine right now. When hypotheses were
despite the relatively modest amount of start-up funding
formed, the researcher would be able to click to obtain
it has received. In part that is because it has been able to
research tools and materials, according to truly standard,
draw on Creative Commons' resources and on massive
machine and human readable Materials Transfer
amounts of highly skilled volunteer labor from its Board,
Agreements. In many cases, those materials could be
Advisory Board and pro bono lawyers. In part it is
obtained automatically and at low cost from depository
because the problems are simply ripe for solution and we
institutions.
have found many partners willing to work with us and
The results of this research, in turn, would be fed back
leverage our efforts. But now the first stage of our plan is
into the web of scientific knowledge. The universe of
almost complete and we need significant new funding to
science would be enlarged, participation rendered more
realize the promise of the projects we describe here, all of
egalitarian, commercial exploration of drug targets easier,
which are already under way.
the drug pipeline fuller and so on.
In each of these three areas we believe we
have a high probability of success. The
communities around the projects are the best
indicator. The major stakeholders agree
there is a problem and that we are a good
vehicle for discussing - and, we hope, creating - the
Nature editorial, November 2005
solution. Even in publishing, where the tension between
corporate publishers, universities and open access
That is the utopian vision, and we genuinely believe it
advocates can break through the surface, we maintain
has real chance of succeeding, at least in part. But assume
strong relations with major publishers such as Nature and
that we fall short of such lofty aspirations, what happens?
Springer. Indeed, Springer even uses Creative Commons
At worst, scientific publications are made more
licenses as part of the Open Choice alternative for
accessible, on more easily comprehensible terms, more
authors. Also, as you can tell from this document, we are
researchers self-archive, and finding the result of that
particularly excited about the prospects for the licensing
self-archived material is easy. We form examples of
and data areas.
Semantic Webs for science and test their validity. We
help researchers on orphan and neglected diseases build
In the short term, (4-8 months) we need $500,000 to build
research commons, and help companies to pool their
the staffing and institutional framework necessary to
knowledge of discarded drug candidates. We simplify the
continue the projects described here, to fund the meetings
process of Materials Transfer, and cut down on the truly
and conferences that will attract new partners, and to pay
crushing burden that it imposes on all participants. And,
for the expensive legal advice that will be necessary even
in the process, we learn from our mistakes and come back
beyond the generous pro bono contributions we already
with a better plan to try again.
receive.
Introduction to Science Commons
www.sciencecommons.org
15
John Wilbanks and James Boyle
September 7, 2006

In the longer term, we estimate that we need $5 million to
1. John Wilbanks is Executive Director, Science
$6 million over the 3 years beyond that to bring all of
Commons.
these projects to fruition, and to build on the benign
James Boyle is William Neal Reynolds Professor of Law
feedback they will generate for each other. (We would be
at Duke Law School and faculty co-director, the Center
happy to provide a more detailed budget, of course,
for the Study of the Public Domain.
explaining precisely where the money would go.)
This is a draft introduction aimed at readers with a wide
The amount of work necessary is staggering, but the
range of backgrounds prepared for the Science Commons
goals are concrete, achievable and worthwhile.
Funders Meeting, Duke Law School Aug 3rd , 2006.
We have tried in this document to describe accessibly and
Please do not circulate without permission.
for a general audience our goals, techniques, strategies
The projects described in these pages were made possible
and institutional resources. We hope you found it of
by seed funding from the HighQ Foundation and Creative
interest and would be delighted to outline our projects
Commons. Science Commons receives additional funding
more precisely, and rigorously, as well as to go into
from the Omidyar Network and the Teranode
details about the projects not covered here. We ask for
Corporation; Edwards Angell Palmer & Dodge LLP
your feedback and, in particular, your suggestions as to
provides pro bono legal services.
possible funding sources.
2. Material Transfer Agreements: A University
Perspective
Plant Physiology 133:10-13 (2003)
http://www.plantphysiol.org/cgi/content/full/133/1/10
3. Campbell et al,
Data Withholding in Academic Genetics, JAMA Vol.
287 No. 4, January 23, 2002.
4. W3C statement on Semantic Web activity.
http://www.w3.org/2001/sw/
16
Introduction to Science Commons
www.sciencecommons.org
John Wilbanks and James Boyle
September 7, 2006