The Sybil Attack
The Sybil Attack
John R. Douceur
Microsoft Research
johndo@microsoft.com
“One can have, some claim, as many electronic personas as one has time and energy to create.”
– Judith S. Donath [12]
Abstract – Large-scale peer-to-peer systems face
If the local entity has no direct physical
security threats from faulty or hostile remote
knowledge of remote entities, it perceives them
computing elements. To resist these threats, many
only as informational abstractions that we call
such systems employ redundancy. However, if a
identities. The system must ensure that distinct
single faulty entity can present multiple identities,
identities refer to distinct entities; otherwise, when
it can control a substantial fraction of the system,
the local entity selects a subset of identities to
thereby undermining this redundancy.
One
redundantly perform a remote operation, it can be
approach to preventing these “Sybil attacks” is to
duped into selecting a single remote entity
have a trusted agency certify identities.
This
multiple times, thereby defeating the redundancy.
paper shows that, without a logically centralized
We term the forging of multiple identities a Sybil
authority, Sybil attacks are always possible except
attack [30] on the system.
under extreme and unrealistic assumptions of
It is tempting to envision a system in which
resource parity and coordination among entities.
established identities vouch for other identities, so
that an entity can accept new identities by trusting
1. Introduction
the collective assurance of multiple (presumably
independent) signatories, analogous to the PGP
We* argue that it is practically impossible, in
web of trust [37] for human entities. However,
a distributed computing environment, for initially
our results show that, in the absence of a trusted
unknown remote computing elements to present
identification authority (or unrealistic assumptions
convincingly distinct identities. With no logically
about the resources available to an attacker), a
central, trusted authority to vouch for a one-to-one
Sybil attack can severely compromise the initial
correspondence between entity and identity, it is
generation of identities, thereby undermining the
always possible for an unfamiliar entity to present
chain of vouchers.
more than one identity, except under conditions
Identification authorities can take various
that are not practically realizable for large-scale
forms, not merely that of an explicit certification
distributed systems.
agency such as VeriSign [33]. For example, the
Peer-to-peer systems commonly rely on the
CFS cooperative storage system [8] identifies
existence of multiple, independent remote entities
each node (in part) by a hash of its IP address.
to mitigate the threat of hostile peers.
Many
The SFS network file system [23] names remote
systems [3, 4, 8, 10, 17, 18, 29, 34, 36] replicate
paths by appending a host identifier to a DNS
computational or storage tasks among several
name.
The EMBASSY [22] platform binds
remote sites to protect against integrity violations
machines to cryptographic keys embedded in
(data loss). Others [5, 6, 7, 16, 28] fragment tasks
device hardware. These approaches may thwart
among several remote sites to protect against
Sybil attacks, but they implicitly rely on the
privacy violations (data leakage). In either case,
authority of a trusted agency (such as ICANN [19]
exploiting the redundancy in the system requires
or Wave Systems [35]) to establish identity.
the ability to determine whether two ostensibly
In the following section, we define a model of
different remote entities are actually different.
a distributed computing environment that lacks a
central authority. Building on this model, Section
*
3 proves a series of lemmas that severely limit the
Use of the plural pronoun is customary even in solely
ability of an entity to determine identity. Section
authored research papers; however, given the subject of
the present paper, its use herein is particularly ironic.
4 surveys related work, and Section 5 concludes.
1
This model has two noteworthy qualities:
2. Formal model
First, it is quite general. By leaving the internals
As a backdrop for our results, we construct a
of the cloud unspecified, this model includes
formal model of a generic distributed computing
virtually any interconnection topology of shared
environment.
Our model definition implicitly
segments, dedicated links, routers, switches, or
limits the obstructive power of corrupt entities,
other components.
Second, the environment in
thereby strengthening our negative results. The
this model is very friendly. In particular, in the
universe, shown schematically in Fig. 1, includes:
absence of resource constraints, denial-of-service
• A set E of infrastructural entities e
attacks are not possible.
A message from a
• A broadcast communication cloud
correctly functioning entity is guaranteed to reach
•
all other correctly functioning entities.
A pipe connecting each entity to the cloud
We place a minimal restriction on the relative
Set E is partitioned into two disjoint subsets,
computational resources available to each entity,
C and F. Each entity c in subset C is correct,
namely that there exists some security parameter n
abiding by the rules of any protocol we define.
for which all entities can perform operations
Each entity f in subset F is faulty, capable of
whose computational complexity is (low-order)
performing any arbitrary behavior except as
polynomial in n but for which no entity can
limited by explicit resource constraints.
(The
perform operations that are superpolynomial in n.
terms “correct” and “faulty” are standard in the
This restriction allows entities to use public-key
domain of Byzantine fault tolerance [21], even
cryptography [24] to establish virtual point-to-
though terms such as “honest” and “deceptive”
point communication paths that are private and
might be more appropriate.)
authenticated. Although these virtual paths are as
Entities communicate by means of messages.
secure as point-to-point physical links, they come
A message is an uninterrupted, finite-length bit
to exist only when created by pairs of entities that
string whose meaning is determined either by an
have acknowledged each other.
Our model
explicit protocol or by an implicit agreement
excludes direct links between entities because a
among a set of entities.
An entity can send a
physical link provides a form of centrally supplied
message through its pipe, thereby broadcasting it
identification of a distinct remote entity. Also, in
to all other entities. The message will be received
the real world, packets can be sniffed and spoofed,
by all entities within a bounded interval of time.
so the base assumption of a broadcast medium
Message delivery is guaranteed, but there is no
(augmented by cryptography) is not unrealistic.
assurance that all entities will hear messages in
An identity is an abstract representation that
the same order.
persists across multiple communication events.
Each entity e attempts to present an identity i to
entities
other entities in the system.
(Without loss of
generality, we state our results with respect to a
specific local entity l that is assumed to be
correct.) If e successfully presents identity i to l,
we say that l accepts identity i.
pipes
A straightforward form for an identity is a
communication
secure hash of a public key.
Under standard
cloud
cryptographic assumptions, such an identifier is
unforgeable. Furthermore, since it can generate a
symmetric key for a communication session, it is
also persistent in a useful way.
Each correct entity c will attempt to present
one legitimate identity. Each faulty entity f may
attempt to present a legitimate identity and one or
more counterfeit identities.
Ideally, the system
local entity
should accept all legitimate identities but no
counterfeit entities.
Fig. 1: Formal model of distributed environment
2
Lemma 1: If ρ is the ratio of the resources of a
3. Results
faulty entity f to the resources of a minimally
This section presents four simple lemmas,
capable entity, then f can present g = ρ distinct
with nearly trivial proofs, that collectively show
identities to local entity l.
the impracticality of establishing distinct identities
Proof: Define rM as the resources available to a
in a large-scale distributed system.
minimally capable entity.
By hypothesis, g
An entity has three potential sources of
entities can present g identities to l; therefore, g rM
information about other entities: a trusted agency,
resources are sufficient to present g identities.
itself, or other (untrusted) entities. In the absence
Since ρ ≥ g, f has at least g rM resources available,
of a trusted authority, either an entity accepts only
so it can present g identities to l.
identities that it has directly validated (by some
means) or it also accepts identities vouched for by
Lemma 1 states a lower bound on the damage
other identities it has already accepted.
achievable by a faulty entity. To show how this
For direct validation, we show:
can be enforced as an upper bound, we present
• Even when severely resource constrained,
three mechanisms that can (at least theoretically)
a faulty entity can counterfeit a constant
exploit limitations in three different resources:
number of multiple identities.
communication, storage, and computation.
• Each correct entity must simultaneously
If communication resources are restricted,
validate all the identities it is presented;
local entity l can broadcast a request for identities
otherwise, a faulty entity can counterfeit
and then only accept replies that occur within a
an unbounded number of identities.
given time interval.
Large-scale distributed systems are inevitably
If storage resources are restricted, entity l can
heterogeneous, leading to resource disparities that
challenge each identity to store a large amount of
exacerbate the former result.
The latter result
unique, uncompressible data. By keeping small
presents a direct impediment to scalability.
excerpts of this data, entity l can verify, with
For indirect validation, in which an entity
arbitrarily high probability, that all identities
accepts identities that are vouched for by already
simultaneously store the data they were sent.
accepted identities, we show:
If computation resources are restricted, entity
• A sufficiently large set of faulty entities
l can challenge each identity to solve a unique
can counterfeit an unbounded number of
computational puzzle.
For example, the local
identities.
entity can generate a large random value y and
•
challenge the identity to find, within a limited
All entities in the system must perform
time, a pair of values x, z such that the
their identity validations concurrently;
concatenation x | y | z, when run through a secure
otherwise, a faulty entity can counterfeit a
hash
function,
yields
a
value
whose
least
constant number of multiple identities.
significant n bits are all zero:
Since the number of faulty entities in the system is
likely to grow as the system size increases, the
given y, find x, z s.t. LSBn(hash(x | y | z)) = 0
former result places another limit on system scale.
The time to solve* such a puzzle is proportional to
The latter restriction becomes harder to satisfy as
2n–1.
The time to verify the result is constant.
system size increases.
(The reason for allowing the challenged entity to
find a prefix x and a suffix z, rather than merely
3.1. Direct identity validation
one or the other, will become clear in Section 3.2.)
The only direct means by which two entities
can convince a third entity that they are distinct is
*
by performing some task that a single entity could
measured in count of hash function evaluations. For a
random oracle [2] hash function, the only way to find a
not. If we assume that the resources of any two
solution is to iterate through candidate values of x
entities differ by at most a constant factor, a local
and/or z; compute the hash for each x | y | z triple; and
entity can demand proof of a remote entity’s
test the result. Actual implementation requires a hash
resources before accepting its identity. However,
function that is both preimage-resistant and resistant to
this leaves us with the following limitation:
non-brute-force attacks such as chaining attacks [24].
3
To be effective, these resource challenges
Lemma 3: If local entity l accepts any identity
must be issued to all identities simultaneously:
vouched for by q accepted identities, then a set F
of faulty entities can present an arbitrarily large
Lemma 2: If local entity l accepts entities that are
number of distinct identities to l if either |F| ≥ q or
not validated simultaneously, then a single faulty
the collective resources available to F at least
entity f can present an arbitrarily large number of
equal those of q + |F| minimally capable entities.
distinct identities to entity l.
Proof: Define r
Proof: Faulty entity f presents an arbitrarily long
F as the total resources available
to set F, r
succession of distinct identities to l.
The
k as the resources available to each
faulty entity f
resources required for each presentation are used
k, and rM as the resources available to
a minimally capable entity. Then:
and then freed for the subsequent presentation.
r
r
r
Lemma 2 is insurmountable for intrinsically
q + F
F
k
k
≤
= ∑ < ∑ + F
temporal resources, such as computation speed
rM
f
F
∈ r
∈
k
M
f
F
r
k
M
and communication bandwidth. However, since
By Lemma 1, entity f
≥
k can present rk / rM
1
storage is not inherently time-based, entity l can
identities to l, so F can present q identities to l.
indefinitely extend the challenge duration by
Thereafter, all of F’s identities vouch for an
periodically demanding to see newly specified
arbitrarily large number of counterfeit identities,
excerpts of the stored data. If an accepted identity
all of which will be accepted by l.
ever fails to meet a new challenge, the local entity
can discard it from its acceptance list, thereby
As in the case of direct identity validation,
eventually catching a Sybil attack that it might
indirect identity validation also has a concurrency
have initially missed. A major practical problem
requirement.
In particular, all entities must
with this extension is that (by assumption) the
perform their resource challenges concurrently:
challenge consumes the majority of an entity’s
Lemma 4: If the correct entities in set C do not
storage resources, so extending the challenge
coordinate time intervals during which they accept
duration greatly impedes the ability of the entity
identities, and if local entity l accepts any identity
to perform other work. (However, the challenge
vouched for by q accepted identities, then even a
data itself could be valuable data, compressed and
minimally capable faulty entity f can present
encrypted by the local entity before sending it to
g = |C| / q distinct identities to l.
the remote entities, using a different key for each
remote entity to maintain challenge uniqueness.)
Proof:
Define rM as the resources required to
present one identity. By assumption, entity f has
3.2. Indirect identity validation
rM resources available.
Partition set C into g
disjoint subsets Ck of minimum cardinality q.
As described in the introduction, the reason
Faulty entity f presents identity ik to each entity in
for establishing the distinctness of identities is to
Ck, using rM resources during time interval Tk.
allow the local entity to employ redundancy in
Since Tk need not overlap with Tk′ for k ≠ k′, rM
operations it delegates to remote entities.
One
resources are available during interval Tk′ to
such operation it could conceivably delegate is the
present identity ik′ ≠ ik to entities in set Ck′. At
validation of other identities. Thus, in addition to
least q entities in each set Ck will vouch for
accepting identities that it has directly validated
distinct identity ik, so l will accept all g identities.
using one of the challenge mechanisms described
above, an entity might also accept identities that
Lemma 4 shows the need for multiple entities
have been validated by a sufficient count of other
to issue challenges concurrently.
Whether it is
identities that it has already accepted.
possible for a correct entity to satisfy multiple
If an entity that has presented identity i
concurrent challenges depends upon the resource:
1
claims to have accepted another entity’s identity
In our formal model, all communication is
i
broadcast, so an entity can simultaneously reply to
2, we say that i1 vouches for i2.
An obvious
danger of accepting indirectly validated identities
communication challenges from arbitrarily many
is that a group of faulty entities can vouch for
entities. (However, this is rather less practical in
counterfeit identities:
actual networks than in our abstract model.)
4
It may not be possible to satisfy multiple
5. Summary and conclusions
concurrent storage challenges, and there are
information-theoretic reasons for believing that it
Peer-to-peer systems often rely on redundancy
is impossible, since every bit of data stored for
to diminish their dependence on potentially hostile
one challenger consumes one bit of storage space
peers. If distinct identities for remote entities are
that is thus unavailable to serve another challenger
not established either by an explicit certification
(and the data from all challengers is, of necessity,
authority (as in Farsite [3]) or by an implicit one
incompressible).
This may prevent storage
(as in CFS [8]), these systems are susceptible to
challenges from being used for indirect validation.
Sybil attacks, in which a small number of entities
For computation challenges, it is possible for
counterfeit multiple identities so as to compromise
an entity to solve multiple puzzles simultaneously
a disproportionate share of the system.
by combining them.
If an entity receives m
Systems that rely upon implicit certification
puzzles y1, y2, … ym, it can find a w such that:
should be acutely mindful of this reliance, since
LSB
apparently unrelated changes to the relied-upon
n(hash(0 | y1 | y2 | … ym | w)) = 0
mechanism can undermine the security of the
Then, the solution to each puzzle yk is:
system. For example, the proposed IPv6 privacy
x
extensions [26] obviate much of the central
k = 0 | y1 | y2 | … yk–1 and zk = yk+1 | … ym | w
allocation of IP addresses assumed by CFS.
An obvious danger here is that if a validating
In the absence of an identification authority, a
entity issues challenges to multiple identities that
local entity’s ability to discriminate among
have been counterfeited by a single faulty entity,
distinct remote entities depends on the assumption
the faulty entity could combine the challenges and
that an attacker’s resources are limited. Entities
solve them together. However, the challenger can
can thus issue resource-demanding challenges to
identify this attempted Sybil attack by checking
validate identities, and entities can collectively
whether x1 | y1 | z1 = x2 | y2 | z2 for any two
pool the identities they have separately validated.
solutions from putatively different identities.
This approach entails the following conditions:
Like Lemma 1, the result of Lemma 4 is that a
• All entities operate under nearly identical
faulty entity can amplify its influence. A system
resource constraints.
that can tolerate a fraction φ of all identities being
• All presented identities are validated
faulty can tolerate only φ/g of all entities being
simultaneously by all entities, coordinated
faulty. In some systems, this may be acceptable.
across the system.
• When accepting identities that are not
4. Related work
directly validated, the required number of
vouchers exceeds the number of system-
Most prior research on electronic identities
wide failures.
has focused on persistence and unforgeability [14,
We claim that in a large-scale distributed
15, 27, 31], rather than on distinctness.
system, these conditions are neither justifiable as
Computational puzzles are an old technique
assumptions nor practically realizable as system
[25] that has become popular recently for resisting
requirements.
denial-of-service attacks [1, 9, 20] by forcing the
attacker to perform more work than the victim.
Acknowledgements
Dingledine et al. [11] suggest using puzzles to
provide a degree of accountability in peer-to-peer
The author thanks Miguel Castro for issuing
systems, but this still allows a resourceful attacker
the challenge that led to this paper, Jon Howell
to launch a substantial attack, especially if the
and Sandro Forin for reviewing drafts, Dan Simon
potential for damage is disproportionate to the
for sanity checking the combinable computational
fraction of the system that is compromised.
puzzle described in Section 3.2, the anonymous
The issue of establishing on-line identities for
IPTPS reviewers for their helpful suggestions for
humans has been studied for some time [12, 32],
improving this presentation, and Brian Zill for
with solutions that generally depend on some
suggesting the term “Sybil attack.”
direct interaction in the physical world [13, 37].
5
[18] J. H. Hartman, I. Murdock, T. Spalink, “The
References
Swarm Scalable Storage System”, 19th ICDCS,
1999, pp. 74-81.
[1] T. Aura, P. Nikander, J. Leiwo, “DoS-Resistant
[19] ICANN, Internet Corporation for Assigned Names
Authentication with Client Puzzles”, Cambridge
and Numbers, 4676 Admiralty Way, Suite 330,
Security Protocols Workshop, Springer, 2000.
Marina del Rey, CA 90292-6601, www.icann.org.
[2] M. Bellare and P. Rogaway, “Random Oracles are
[20] A.
Juels,
J.
Brainard,
“Client
Puzzles:
A
Practical: A Paradigm for Designing Efficient
Cryptographic
Defense
against
Connection
Protocols”, 1st Conference on Computer and
Depletion Attacks”, NDSS ’99, ISOC, 1999, pp.
Communications Security, ACM, 1993, pp. 62-73.
151-165.
[3] W. J. Bolosky, J. R. Douceur, D. Ely, M. Theimer,
[21] L. Lamport, R. Shostak, M. Pease, “The Byzantine
“Feasibility of a Serverless Distributed File
Generals Problem”, TPLS 4(3), 1982.
System Deployed on an Existing Set of Desktop
[22] K. R. Lefebvre, “The Added Value of EMBASSY
PCs”, SIGMETRICS 2000, 2000, pp. 34-43.
in the Digital World”, Wave Systems Corp. white
[4] M. Castro, B. Liskov, “Practical Byzantine Fault
paper, www.wave.com, 2000.
Tolerance”, 3rd OSDI, 1999.
[23] D. Mazières, M. Kaminsky, M. F. Kaashoek, E.
[5] D. Chaum, “Untraceable Electronic Mail, Return
Witchel, “Separating Key Management from File
Addresses, and Digital Pseudonyms”, CACM 4
System Security”, 17th SOSP, 1999, pp. 124-139.
(2), 1982.
[24] A. J. Menezes, P. C. van Oorschot, S. A.
[6] B. Chor, O. Goldreich, E. Kushilevitz, M. Sudan,
Vanstone.
Handbook of Applied Cryptography.
“Private Information Retrieval”, 36th FOCS, 1995.
CRC Press, 1997.
[7] I. Clarke, O. Sandberg, B. Wiley, T. Hong,
[25] R. C. Merkle, “Secure Communications over
“Freenet: A Distributed Anonymous Information
Insecure Channels”, CACM 21, 1978, pp. 294-299.
Storage and Retrieval System”, Design Issues in
[26] T. Narten, R. Draves, “Privacy Extensions for
Anonymity and Unobervability, ICSI, 2000.
Stateless Address Autoconfiguration in IPv6”,
[8] F. Dabek, M. F. Kaashoek, D. Karger, R. Morris,
RFC 3041, 2001.
I. Stoica, “Wide-Area Cooperative Storage with
[27] K. Ohta, T. Okamoto, “A Modification to the Fiat-
CFS”, 18th SOSP, 2001, pp. 202-215.
Shamir Scheme”, Crypto ’88, 1990, pp. 232-243.
[9] D. Dean, A. Stubblefield, “Using Client Puzzles to
[28] M. K. Reiter, A. D. Rubin, “Crowds: Anonymous
Protect TLS”, 10th USENIX Security Symp., 2001.
Web Transactions”, Transactions on Information
[10] R. Dingledine, M. Freedman, D. Molnar “The Free
System Security 1 (1), ACM, 1998.
Haven Project: Distributed Anonymous Storage
[29] A. Rowstron, P. Druschel, “Storage Management
Service”,
Design
Issues
in
Anonymity
and
and Caching in PAST, a Large-Scale, Persistent
Unobservability, 2000.
Peer-to-Peer Storage Utility”, 18th SOSP, 2001,
[11] R. Dingledine, M. J. Freedman, D. Molnar
pp. 188-201.
“Accountability”, Peer-to-Peer: Harnessing the
[30] F. R. Schreiber, Sybil, Warner Books, 1973.
Power of Disruptive Technologies, O’Reilly, 2001.
[31] A. Shamir, “An Efficient Identification Scheme
[12] J. S. Donath, “Identity and Deception in the
Based on Permuted Kernels”, Crypto ’89, 1990,
Virtual Community”, Communities in Cyberspace,
pp. 606-609.
Routledge, 1998.
[32] S. Turkle, Life on the Screen: Identity in the Age of
[13] C.
Ellison,
“Establishing
Identity
Without
the Internet, Simon & Schuster, 1995.
Certification Authorities”, 6th USENIX Security
[33] VeriSign,
Inc.
487
East
Middlefield
Road,
Symposium, 1996, pp. 67-76.
Mountain View, CA 94043, www.verisign.com.
[14] U. Feige, A. Fiat, A. Shamir, “Zero-Knowledge
[34] M. Waldman, A. D. Rubin, L. F. Cranor, “Publius:
Proofs of Identity”, Journal of Cryptology 1 (2),
A Robust, Tamper-Evident Censorship-Resistant
1988, pp. 77-94.
Web Publishing System”, 9th USENIX Security
[15] A. Fiat, A. Shamir, “How to Prove Yourself:
Symposium, 2000, pp. 59-72.
Practical Solutions of Identification and Signature
[35] Wave Systems Corp. 480 Pleasant Street, Lee, MA
Problems”, Crypto ’86, 1987, pp. 186-194.
01238, www.wave.com
[16] Y. Gertner, S. Goldwasser, T. Malkin, “A Random
[36] J. J. Wylie, M. W. Bigrigg, J. D. Strunk, G. R.
Server Model for Private Information Retrieval”,
Ganger, H. Kilite, P. K. Khosla, “Survivable
RANDOM ’98, 1998.
Information Storage Systems”, IEEE Computer 33
[17] A. Goldberg, P. Yianilos, “Towards an Archival
(8), IEEE, 2000, pp. 61-68.
Intermemory”, International Forum on Research
[37] P. Zimmerman, PGP User’s Guide, MIT, 1994.
and Technology Advances in Digital Libraries,
IEEE, 1998, pp. 147-156.
6