Original PDF Flash format it/'s-no-secret  


It/'s No Secret

It’s no secret
Measuring the security and reliability of authentication via ‘secret’ questions
Stuart Schechter
A. J. Bernheim Brush
Serge Egelman
Microsoft Research
Microsoft Research
Carnegie Mellon University
stus@microsoft.com
ajbrush@microsoft.com
egelman@cs.cmu.edu
Abstract
question be answered in order to reset an account’s
password. Concerns over the security of these ques-
All four of the most popular webmail providers –
tions abound, in part, because webmail is so popular;
AOL, Google, Microsoft, and Yahoo! – rely on personal
the top two webmail services each claim a quarter
questions as the secondary authentication secrets used
of a billion active users [7], [9]. Public awareness
to reset account passwords. The security of these
of the potential weaknesses of personal authentication
questions has received limited formal scrutiny, almost
questions reached new heights when 2008 Republican
all of which predates webmail. We ran a user study
vice presidential nominee Sarah Palin’s Yahoo! Mail
to measure the reliability and security of the questions
account was compromised via her question [2].
used by all four webmail providers. We asked partici-
In fact, prior research suggests that a single personal
pants to answer these questions and then asked their
question is not a sufficiently secure authenticator. In
acquaintances to guess their answers. Acquaintances
two different studies, one in 1990 [17] and another
with whom participants reported being unwilling to
in 1996 [10], participants were asked personal au-
share their webmail passwords were able to guess
thentication questions and those they were close to –
17% of their answers. Participants forgot 20% of their
spouses, family members, or close friends – were able
own answers within six months. What’s more, 13%
to guess 33%-39% of their answers. These studies also
of answers could be guessed within five attempts by
addressed the memorability of questions; participants
guessing the most popular answers of other partici-
forgot 20%-22% of their own answers within three
pants, though this weakness is partially attributable to
months.
the geographic homogeneity of our participant pool.
Given these statistics, why do webmail providers
still authenticate users by asking a single personal
1. Introduction
question? Perhaps they have used the past twelve years
to develop a generation of questions with answers
The four largest webmail providers – AOL, Google,
that are easier to remember and harder for others to
Microsoft, and Yahoo! – all use personal (a.k.a. ‘se-
guess. Or maybe earlier research should be disregarded
cret’) questions to authenticate account holders who
because, while answers were vulnerable to guessing
are unable to login using their passwords. While other
by trusted significant others, less trusted acquaintances
web services may authenticate users who have forgot-
would be far less likely to guess the correct answers.
ten their passwords via their email addresses, webmail
To quantify the security and reliability of personal
services cannot always do so; many of their users
authentication questions as they are used today, we
employ their accounts as a primary email address and
examined the real-world questions in use as of March
may not have another dependable email account for use
2008 by the top four webmail providers. We invited
as a backup authenticator. These same users often rely
participants to our laboratory in pairs, asked them these
on their webmail addresses as backup authenticators
personal questions, and then asked them to guess their
for other services, raising the consequences should
partners’ answers. We extend prior research by mea-
their webmail accounts’ authentication mechanisms
suring the security of these questions against guessing
fail.
not just by significant others, but by untrusted acquain-
Despite the consequences of authentication failures,
tances as well. We also examine the vulnerability of
the four largest webmail providers require only one
these questions to statistical guessing attacks, which

identify the most popular answers to each question and
of Table 6). Participants in their study recalled 78%
try each one until no more guesses are allowed.
of their answers after three months, which was also
For those participants who brought partners who
similar to the figures in our study (80%, which is
they would not trust with their Hotmail password,
expressed as a recall failure rate of 20% in the second
we found that these partners could still guess an
data column of Table 4).
alarming 17% of their answers. Many answers could
Podd et al. conducted a similar study in 1996, and
be guessed without even knowing the participant. From
found similar recall rates (80%) and higher guessing
the geographically-homogenous set of participants in
rates (39.5%) [10]. They, too, focused on partners who
our laboratory study, 13% of their answers could be
were significant others.
guessed within five attempts using statistical guessing.
Many of the 20 questions Zviran and Haga used
For user-written question/answer pairs, we categorized
were likely to have a small set of common answers—
roughly 25% as vulnerable to family members, friends,
e.g. favorite color, favorite class in high school, and
or coworkers and another 15% as guessable within five
favorite flower. Such small answer spaces may have
tries with no knowledge of the victim.
been acceptable to Zviran and Haga as they proposed
the use of multiple questions to reduce false authenti-
2. Background and Related Work
cations and rejections. To our knowledge, no previous
research has explored the vulnerability of personal
Personal authentication questions have received a
questions to statistical attacks: those that walk down a
great deal of attention from the popular press. The
list of the most popular answers for a target population.
most recent burst in coverage came as the result of
Statistical guessing attacks could be refined further
the revelation that 2008 Republican vice presidential
by examining answer popularity as a function of the
nominee Sarah Palin’s Yahoo! account had been com-
language of the account holder, geographic locale, or
promised by someone who researched the answer to
other traits discernable to an attacker.
the question Where did you meet your spouse? [2],
[6]. Though the Palin story focused on the security of
Despite earlier findings and unanswered questions
these questions, others have focused on their reliability
about their security, personal questions have been
as an authenticator and the plight of those who cannot
adopted for use as a backup authentication mechanism
get into their accounts [14].
by all of the top four webmail providers (as identified
The Palin incident came only months after coverage
by Hitwise [8]): AOL, Google (Gmail), Microsoft
of a paper by Ariel Rabkin who examined the questions
(Hotmail), and Yahoo!. All rely on a single question,
used by twenty bank websites [11]. He manually
though some may also verify the user’s zip code.
categorized questions he believed to be ambiguous,
Google will not allow a password to be reset until
not applicable to over 15% of the general public, not
an account has been inactive for a period of time.
memorable, easily guessable with no knowledge of the
While many other sites also use personal questions for
victim, or easily guessable with minimal knowledge
backup authentication, webmail services are uniquely
of the victim. However, he did not actually quantify
dependent on them because they cannot assume their
the level of vulnerability that resulted from any given
users have an alternate email address as a backup
question.
authenticator.
The use of personal questions for authentication
Google also lets users opt to write their own security
was studied by Zviran and Haga in 1990 [17]. They
question, which some have speculated is more secure
examined how well others might be able to guess the
than relying on standardized questions [15]. While
answer to users’ personal authentication questions, but
we refer to these as user-written questions, others
focused on guessing by “significant others,” the great
have called them ‘open’ questions [5]. Toomim et
majority of whom were participants’ spouses (77%).
al. have investigated using user-written questions for
The remainder were close friends (17%), siblings (4%),
authenticating members of social groups, as motivated
and parents (2%). Zviran and Haga did not report
by a scenario in which an individual wants to share
whether they collected their data electronically or on
photos with friends who were at the same party [16].
paper, or whether they compared answers manually
They investigated the security of these questions by
or algorithmically. They only allowed one guess in
offering rewards for correct answers on Mechanical
both the recall and guessing phases. Partners in their
Turk. However, the simulated attackers in their study
study guessed 33% of participants’ answers, which
were at a disadvantage compared to real attackers: they
was quite similar to the 34% of answers guessed
had no contextual knowledge about who had written
by spouses in our study (see the first data column
the question or what it was intended to protect.
2
Schechter, Brush, & Egelman

3. Study recruitment and methodology
Table 1. Order of laboratory visit tasks
To study the reliability and security of personal
1) Move to room separate from partner
2) Answer demographic questions
questions, we ran a laboratory study over four separate
3) Authenticate to Hotmail using personal question (cohorts 1-3)
days between March 22 and June 23, 2008, with a
4) Answer personal questions for top four webmail services
follow-up study in September and October. The cohorts
5) Describe relationship with partner
6) Guess partner’s answers to personal questions
assigned to each day are shown in in Table 2a. The
7) Attempt to recall answers to own personal questions
study encompassed both the personal questions used
8) Second chance to guess partner’s questions using online
by Windows Live’s password-reset workflow and the
research (cohorts 2-4)
questions used by the top four webmail services.
3.1. Participant recruitment
3.2.1. Authentication to Hotmail. We explained to
participants how personal questions could be used
Our recruiting team selected participants from a
to reset the passwords participants’ used to login to
larger pool of potential participants they maintain for
Hotmail. We asked the 116 participants in the first three
all studies at Microsoft. The pool contains members of
cohorts (those selected to be Hotmail users) to attempt
the general public who had been recruited via public
to answer their personal question. We asked them only
events, lotteries, and our website. We required that
to authenticate (provide the answer to their question)
participants speak English as their primary language
and not to actually reset their password if successful.
and not be employed by Microsoft.
Our recruiters selected a balance of men and women;
64 participants were male and 66 female. The re-
3.2.2. Initial answers to personal questions. We then
cruiters also selected participants with a diversity of
asked all 130 participants to answer all of the personal
ages and professions. While the professions are too
questions in use by the top four webmail services.
numerous to list, the age ranges are broken down in
We told participants that we would ask the same
Table 2b.
questions later to determine how well they remembered
Participants in the first three cohorts were required
the answers. We offered two prizes (an XBOX 360
to be Hotmail users for at least three months and to
and a Zune digital music player) and gave participants
access their account at least three times a week. The
a virtual lottery ticket for each question they both
great majority of participants (83%) had been using
answered and later recalled.
their Hotmail account for at least four years, as detailed
We randomized the question order for each partic-
in Table 2d.
ipant. We asked participants to mark questions they
After reaching one qualified participant, our re-
were either unable or unwilling to answer. We in-
cruiters would ask if the participant had a coworker,
structed participants that capitalization, punctuation,
friend, or family member who might also be qualified
and spaces would be ignored when comparing answers.
for the study. Recruiters then interviewed potential
We anticipated participants might try to increase
partners to ensure they met our requirements. All
their chance of recalling their answers by providing
participants were required to have partners and the
the same answer for all questions. We added a rule
categories of relationships between participants and
that eliminated rewards for recalling the same answer
their partners are broken down in Table 2c.
numerous times. We also feared that if participants
3.2. Initial laboratory visit
anticipated being asked to recall their questions again
at a future date, they might record their answers
We scheduled participants for a two-hour visit to
following the study session. We thus asked participants
perform the tasks summarized in Table 1.
to recall their answers at the end of their session and
Participants in each session were split into groups
ran the lottery for the laboratory session prizes based
and placed into different rooms such that no two part-
on these recollections. We did not inform participants
ners were in the same room. Each partner was placed
that we would follow-up to test their recollections in
at a computer. We seated participants sufficiently far
the future.
from each other to ensure that their screens, on which
After participants had been asked all of the questions
their answers might appear while being typed, could
used by the top four webmail services, we asked them
not be seen by others. All questions were asked using
what they would choose if they could write their own
web survey software, though participants were required
question. We also asked them to answer the question
to be on-site to prevent collusion.
they wrote.
3
Schechter, Brush, & Egelman

relationship
date of
# ppts in study
age
to partner
participants
webmail
first visit
main
recall
group
participants
Spouse
18
(14%)
account age
participants
March 22
40
15
< 18
2
( 2%)
Relative
23
(18%)
< 6 months
6
( 5%)
April 26
44
20
18–25
28
(22%)
Fiance/SO
4
( 3%)
½–1 year
4
( 3%)
May 31
32
14
26–35
51
(39%)
Friend
51
(39%)
1–4 years
10
( 9%)
June 23
14
0
36–55
31
(24%)
Coworker
32
(25%)
> 4 years
96
(83%)
Total
130
49
55+
18
(14%)
Other
2
( 2%)
(d) Webmail account ages
(a) Cohorts
(b) Age groups
(c) Relationships
Table 2. Demographics
3.2.3. Guessing by acquaintances. We asked partic-
they differed from the original only in the use of
ipants to describe their relationship with their partner
white space, punctuation, and capitalization. This was
and asked them whether they would trust their partner
the strictest of the comparison algorithms we wanted
with their Hotmail password. Then we asked them to
to examine. By only acknowledging a participant’s
guess their partners’ answers. As before, we presented
answer as correct if it met the strictest requirements,
the questions in random order and rewarded success
we could later test how less strict algorithms would
with an increased opportunity to win one of our prizes,
have increased recall rates and reduced the number of
though we could not tell participants which answers
attempts required.
were correct. We allowed participants to guess up to
To encourage participants to do their best at recall-
five times by placing guesses on separate lines. We
ing their original answers we offered all participants
restricted participants from communicating answers to
a new incentive, again based on the percentage of
each other by asking them to turn off their mobile
answers they recalled. The top quartile received an
devices (“as a courtesy to others”), isolating them in
Amazon.com gift card worth $15, the second quar-
separate rooms, and monitoring their behavior.
tile received one worth $10, the third $5, and the
After running the first cohort of the study (40 partic-
last quartile received no performance-based gratuity.
ipants), we discovered that many participants weren’t
In addition, all participants received some form of
guessing as hard as we had hoped. Most were provid-
base gratuity just for participating; some participants
ing at most one guess per answer and none appeared to
were offered a software gratuity for completing the
be performing any online research. We thus gave the
recall task along with a separate study, whereas others
90 participants in the three remaining cohorts (cohorts
were offered a $10 Amazon.com gift certificate for
2–4) a second opportunity to guess their partners’
completing the recall study alone.
answers. In this second guessing round, we encouraged
them to use search engines and social networking sites
3.4. Limitations
to research the answers to their partners’ questions. We
also told them that this was the last task of the study
While we provided incentives for participants to
in hopes that they might feel less rushed.
answer questions as if they were setting up their
3.3. Reliability (memorability) follow-up
account, and to guess their partners’ answers as best
they could, they may not have done so.
To determine how well participants remembered the
Some individuals may be more invested in picking
answers to the personal questions we had asked, we
a memorable and secure question/answer pair when
followed up with them between September 5 and Octo-
setting up a real account than when in the lab [13].
ber 31. Of the 116 participants in the first three cohorts,
Others may discount the need for secure and reliable
we contacted all 87 who had consented to receive
backup authentication when setting up a real account,
emails from us and 49 volunteered to participate.
but feel obligated to help researchers when in the lab.
We used a custom-built web tool to ask participants
While participants in the laboratory had to guess
to recall the answers to the questions that they had
the answers to all questions during a limited amount
chosen to answer in the laboratory study. For each
of time, a real attacker need only answer one question
question, we allowed them to respond as many times
to compromise an account and may invest as much
as they liked until they either correctly recalled their
time as he or she wishes. Thus, our estimates of the
original answer or chose to move onto the next ques-
abilities of acquaintances to guess answers are likely
tion. Answers were judged as correct recollections if
to underestimate their true potential.
4
Schechter, Brush, & Egelman

Table 3. Answer comparison algorithms
guessed by partner
forgot within
guessed
broken down by would you trust your partner with your Hotmail password?
algorithm
3–6 months
by partner
no
some circumstances
yes
equality
256/1070 (23.9%)
substring
240/1070 (22.4%)
588/2870 (20.5%)
110/662 (16.6%)
146/942
(15.5%)
332/1266
(26.2%)
distance
213/1070 (19.9%)
628/2870 (21.9%)
115/662 (17.4%)
162/942
(17.2%)
351/1266
(27.7%)
The equality algorithm could not be run on partners’ guesses because our survey tool represented all guesses as a single concatenated string (see
Section 4).
4. Answer comparison algorithms
edit distance cost of one) for every five characters in
the original answer.
In total, 130 participants initially provided 2,874
Table 3 illustrates the performance of each algorithm
answers and 49 participated in the follow-up study and
over all of the questions.
tried to recall 1,074 of those answers. We needed an
Moving from the substring algorithm to the distance
algorithm for determining whether a recollection, or
algorithm reduces the number of answers forgotten (not
partner’s guess, sufficiently matched the original. We
recalled within 5 attempts) by 2.5% as a percent of
tested three different algorithms.
total answers, from 22.4% to 19.9%. This represents
For
all
algorithms,
we
removed
all
non-
a 11.3% reduction from the answers deemed forgotten
alphanumeric
characters
and
forced
letters
into
by the substring algorithm.
Alas, moving from the substring algorithm to the
lower case. When counting the number of attempts
distance algorithm also increased the percentage of an-
to recall an answer, we did not count repetitions of
swers guessed by participants’ partners by 1.4%. That’s
the same guess.1 Attackers learn nothing by being
a 6.8% relative increase over the percent guessed using
able to repeat a guess, whereas account holders, who
the substring algorithm. However, when we closely
may repeat the same answer thinking they previously
analyzed the answers that had been reclassified from
mistyped it, will not be penalized for this mistake.
not guessed to correctly guessed, we were convinced
The first algorithm, simple equality, compares the
the trade-off was well worth it. In 34 of the forty cases
resulting simplified strings character for character. This
where a guess was treated as incorrect by the substring
is the algorithm that was used, during the memorability
algorithm but correct by the distance algorithm (80%),
follow-up study, to provide participants with feedback
the guessing partner clearly knew the correct answer:
as to whether they had recalled their answers correctly.
the difference was a one character typing error that an
Unfortunately, we could not use the equality al-
attacker could easily fix with a second guess. In four
gorithm for examining partners’ guesses due to an
of the remaining six cases, it was clear that the partner
artifact of our study. The Illume survey software we
knew the answer but excluded a few characters, such
used to collect the guesses participants provided for
as entering a city but excluding a two character state
their partners’ answers fails to store carriage returns,
suffix. In only two cases did manual inspection fail to
which we had asked participants to use to separate their
reveal convincingly that the partner knew the answer.
guesses.
Those two cases represent less than a 0.1% increase in
To address this problem our second algorithm, the
total answers guessed over the substring algorithm.
substring algorithm, treated a guess as valid if it
Given that the benefit of the distance algorithm
contained a substring that matched the original answer,
appeared to greatly outweigh its cost, we used it for
as suggested by Toomim et al. [16].
the duration of the study and recommend a variant
The final algorithm we tested was the Levenshtein
for real-world deployment. In such a deployment, the
edit distance algorithm with two modifications. First,
length of the guess should be truncated (as a function
we reduced the cost of transpositions of two characters
of the original answer length) so that an attacker cannot
(‘swapped’→‘sawpped’) from two to one. This reduces
concatenate multiple guesses together.
the cost of this very common typo to be equal to that
of a single mistyped character. Second, we removed
5. Results
the cost of extra characters at the beginning or end of
the guess, to adjust for the artifact that all guess strings
We briefly cover the results for participants who
were concatenated together. We allowed one error (an
tried to authenticate to their Hotmail accounts using
their personal question, then examine the results from
1. We first learned of this heuristic from Charlie Kaufman.
our data on the top four webmail services’ questions.
5
Schechter, Brush, & Egelman

5.1. Real-world memorability results
14 (11%) opted not to answer any of them. Google
lets users choose to write their own personal question,
While we asked all 116 participants in the first
the implication of which are examined in Section 5.6.
three cohorts to try to reset their password using their
personal question, not all accounts had a question
5.3. Reliability (memorability)
configured. Furthermore, an answer alone was not suf-
ficient to authenticate: a zip code previously associated
The second data column of Table 4 shows the num-
with the account was also required.
ber and percentage of participants who answered each
A total of 99 participants reported being asked to
question, but who were unable to recall their answer
provide the answer to their personal question. Only
within five guesses during the follow-up study. For
43 (43%) reported being able to successfully provide
those who did recall their answer within five guesses,
the correct answer and their zip code. The majority,
76% did so on the first guess. A detailed breakdown
56 (57%) could not reset their password and reported
of the number of guesses required is in Table 8 in the
being unable to remember either the answer or the zip
Appendix.
code they had provided when they set up the account.
One participant answered all questions with “pass-
When asked why they had trouble authenticating,
word”, which he was able to remember when asked to
75% participants suspected they may have been unable
recall his answers at the end of the laboratory session.
to answer their personal question and 31% reported
However, during the follow-up study he had forgotten
that they may have been unable to recall the zip
that he had done this and so he failed to answer all
code they had previous provided. A surprising 13% of
questions. This individual was responsible for one of
participants suspected that the reason they could not
the answers forgotten in every row of the ‘forgot’ col-
answer their personal question was because they had
umn. We opted not to remove this contribution because
intentionally provided a bogus answer when setting up
this may be a real-world mechanism for coping with
their account.
these questions, even if it proved ineffective in this
case.
5.2. Willingness to answer
Among the questions with answers forgotten 25%
of the time or more, which appear in boldface in the
The results for all questions used by the top four
second column of Table 4, all but one fall into two
webmail services2 (as of March, 2008) are summarized
categories: preferences and ID numbers. Preferences
in Table 4. The questions appear in the order in which
may be hard to remember because a participant’s
the webmail services present them to the user.
choice of childhood hero, historical person, song, film,
The first data column of Table 4 shows the number
or pastime may be subject to whims of the moment.
and percentage of participants who opted to answer
ID numbers, such as frequent flyer and library card
each question. We excluded all answers in which
numbers, may not have been stored in memory to start
participants expressed being uncomfortable, unwilling,
with. Remembering the correct frequent flyer number
or unable to provide an original answer. While we had
may be particularly difficult if one has many frequent
prescribed a method of indicating a non-answer (n/a
flyer accounts or if one’s favorite airline goes bankrupt.
for not applicable and n/c for not comfortable), we
manually identified numerous other indicators used by
5.4. Security against statistical guessing
participants, such as “not willing”, “unknown”, and
“don’t have one”, and treated them as non-answers as
The third data column of Table 4 shows the vul-
well.
nerability of answers to a statistical guessing attack.
For three of the four services (AOL, Microsoft, and
An answer is deemed vulnerable to this attack if it is
Yahoo!), participants opted to answer their questions
among the five most popular answers provided by other
between 81% and 85% of the time. All participants
participants (excluding the participant’s partner). In
opted to answer at least one of Yahoo!’s questions and
other words, we compute the five most popular answers
only one participant opted not to answer any of AOL’s
for all participants except the participant who answered
or Microsoft’s questions.
the question and that participant’s partner, break ties
In contrast, participants opted to answer each of
randomly, and then mark an answer as statistically
Google’s questions an average of 50% of the time and
guessable if it matches one of those five answers. We
2. One question used by Microsoft, Name of first pet, is excluded
have highlighted in boldface those questions for which
due to a data collection error documented in Appendix A.
more than 10% of answers were statistically guessable.
6
Schechter, Brush, & Egelman

2%)
1%)
6%)
0%)
3%)
2%)
5%)
4%)
1%)
9%)
4%)
0%)
0%)
0%)
0%)
0%)
7%)
1%)
1%)
0%)
1%)
2%)
4%)
4%)
5%)
4%)
8%)
1%)
0%)
2%)
4%)
3%)
ement
(
(12%)
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(10%)
(
(
(
(
r
ound
v
ppts
guess
o
90
to
not
2nd
who,
25%
10%
impr
1/61
6/50
1/77
4/71
0/83
2/66
2/88
4/82
3/84
1/74
4/47
0/90
0/90
0/82
0/89
0/351
5/69
1/79
1/88
0/83
1/82
8/401
3/76
3/74
4/87
3/70
5/66
1/81
7/67
0/68
1/61
28/783
27/650
62/2124
e
e
v
v
second
abo
abo
partners
partners
The
of
of
d?
8%)
0%)
0%)
0%)
5%)
2%)
9%)
5%)
participants
those
Those
(58%)
(57%)
(23%)
(33%)
(20%)
(33%)
(
(17%)
(19%)
(27%)
(55%)
(33%)
(
(
(13%)
(
(
(40%)
(29%)
(
(13%)
(
(19%)
(38%)
(28%)
(
(34%)
(58%)
(26%)
(43%)
(31%)
(58%)
(35%)
(28%)
wn
3.2.2).
of
ability
es
ppts
and
passwor
y
the
56
breakdo
Section
8/41
3/40
9/52
8/42
0/12
0/12
5/38
0/43
5/105
1/49
6/47
4/47
2/43
partner).
26/45
32/56
12/52
14/43
17/51
15/56
31/56
21/52
16/55
48/250
15/39
15/54
17/50
25/43
12/47
20/47
17/54
26/45
number
s
175/534
149/422
351/1266
answered
the
(see
Hotmail
the
In
-represent
by
your
ws.
partner
6%)
3%)
6%)
0%)
0%)
9%)
3%)
4%)
9%)
9%)
7%)
6%)
participant’
under
ro
wn
(
(
(
(
(
(
(
(
(
(
(
(
contains
questions
with
some
(39%)
(33%)
(16%)
(21%)
(29%)
(18%)
(18%)
(38%)
(21%)
(20%)
(11%)
(13%)
(13%)
(20%)
(16%)
(15%)
(43%)
(28%)
(19%)
(39%)
(21%)
(17%)
by
do
ppts
of
the
opportunity
all
got
algorithm).
en
42
of
cumstances
for
that
under
6/37
7/33
2/36
1/30
7/38
2/36
7/39
0/9
0/19
3/35
1/32
4/95
8/40
4/37
3/33
3/35
4/30
4/20
6/38
2/30
6/39
2/35
8/42
columns
brok
partner
cir
11/28
14/42
11/38
15/40
83/397
22/175
15/35
10/36
11/28
64/303
162/942
number
en
sum
xcluding
v
guessed
distance
(e
gi
the
your
labeled
the
the
partner
on
viders
were
by
than
trust
7%)
8%)
0%)
8%)
9%)
0%)
0%)
5%)
0%)
2%)
4%)
8%)
(40%)
(45%)
(
(30%)
(
(19%)
(
(
(
(13%)
(48%)
(21%)
(
(
(
(
(
(17%)
(11%)
(
(17%)
(18%)
(13%)
(19%)
(
(25%)
(17%)
(32%)
(12%)
(15%)
(20%)
(40%)
(20%)
(17%)
column
(using
pro
less
you
no
ppts
based
who
participants
32
The
are
tries
guessed
are
8/20
2/28
6/20
2/24
5/26
0/24
2/24
2/22
4/30
0/8
0/8
1/20
0/18
1/54
5/29
3/28
1/23
4/24
4/22
4/21
2/24
6/24
5/29
7/22
3/26
4/26
6/30
8/20
e
vice
would
14/31
15/31
60/280
17/126
45/222
115/662
other
v
the

partners.
,
totals
all
ser
participants
by
these
participated.
Percentages
within
90
4%)
0%)
0%)
1%)
4%)
5%)
untrusted
(48%)
(47%)
(17%)
(28%)
(12%)
(29%)
(
(16%)
(12%)
(21%)
(48%)
(26%)
(
(
(10%)
(
(
(28%)
(19%)
(
(12%)
(12%)
(16%)
(29%)
(20%)
(10%)
(24%)
(47%)
(16%)
(31%)
(25%)
(48%)
(27%)
(22%)
the
opportunity
who
tries.
chosen
Thus,
ebmail
ppts
.
e
partner
these
for
this
w
w
partner
v
guessed
130
130

e
the
v
ro
by
4/94
0/29
0/39
9/93
1/93
5/105
20%
45/93
60/129
20/117
27/96
12/101
33/115
18/114
12/100
26/125
61/127
10/254
34/121
23/120
13/106
12/99
87/551
23/80
23/116
10/97
28/118
47/100
17/108
34/109
31/126
45/93
the
answers
by
ha
f
our
318/1211
258/947
628/2870
of
within
than
guessing
not
bottom
out
did
top
the
common
guessed
more
0%)
6%)
1%)
3%)
8%)
0%)
0%)
0%)
6%)
2%)
1%)
0%)
1%)
5%)
0%)
in
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
by
the
(15%)
(23%)
(30%)
(10%)
(19%)
(17%)
(12%)
(10%)
(25%)
(13%)
(10%)
(13%)
(28%)
(19%)
(57%)
(17%)
(25%)
(18%)
(13%)
ppts
algorithm)
most
y
question
e
totals
b
v
answers
research-based
130

participants
of
the
statistically
guessable
0/93
7/117
1/101
3/94
0/29
0/39
0/93
6/93
6/254
1/120
0/105
1/116
5/108
0/93
each
guessed
of
19/129
22/96
34/115
11/114
19/100
10/125
22/127
12/121
27/106
13/99
53/551
10/80
27/97
23/118
57/100
18/109
31/126
distance
the
40
in
148/1211
172/947
379/2870
used
the
As
round
.
answer
number
once
by
among
to
questions
the
8%)
2%)
6%)
9%)
8%)
5%)
7%)
7%)
8%)
as
vement
(
(
w
second
(25%)
(17%)
(15%)
(19%)
(42%)
(36%)
(23%)
(21%)
(
(19%)
(50%)
(50%)
(17%)
(23%)
(28%)
(14%)
(18%)
(21%)
(38%)
(32%)
(24%)
(22%)
(
(48%)
(34%)
(
(
(
(
(
(16%)
(20%)
o
within
judged
it
those
a
counted
ppts
opted
months
if
in
impr
(as
49
contains
only
Questions
got
who
en
v
is
f
or
3–6
3/40
1/48
6/35
5/34
8/43
8/35
3/48
5/10
8/16
6/36
7/31
6/44
8/44
8/38
8/36
4/43
3/38
2/42
3/42
3/46
3/40
11/44
14/33
15/42
10/48
84/450
26/93
15/40
12/37
49/203
16/33
15/44
57/364
highlight
gi
guess
ut
4.
213/1070
b
answers
guessable
partner
we
le
by
those
r
ound
ab
ed
participants
ord,
ahoo!,
T
correct
2nd
Y
ppts
(72%)
(99%)
(90%)
(74%)
(78%)
(88%)
(72%)
(88%)
(77%)
(96%)
(98%)
(85%)
(22%)
(30%)
(72%)
(72%)
(49%)
(93%)
(92%)
(81%)
(82%)
(76%)
(85%)
(62%)
(89%)
(75%)
(91%)
(77%)
(83%)
(84%)
(97%)
(72%)
(81%)
(79%)
of
the
guessed
passw
included
and
93
96
94
29
39
93
93
99
80
97
93
statistically
answer
130
129
117
101
115
114
100
125
127
254
121
120
105
106
551
116
118
100
108
109
126
947
1211
2870
labeled
OL
number
recall
A
labeled
the
to
deemed
Hotmail
columns
both
as
s
column
w
by
unable
column
the
number?
e?
ed
contains
partner
in
school?
bik
ask
ed
were
answer
The
by
flyer
,
participant’
wn
is
name?
s
name?
or
do
s
first
team?
mascot?
ace.
school?
number?
number?
spouse?
hero?
car
sites
answer
study
en
wn?
guessed
your
boldf
name?
restaurant?
.
your
singer?
to
song?
film?
book?
job?
up?
frequent
brok
s
card
pastime?
sports
middle
first
phone
teacher’
your
of
school
in
answering
the
name?
s
w-up
w
participant’
person
name?
pet’
s
of
are
born?
orite
s
labeled
the
in
orite
orite
orite
orite
orite
first
orite
orite
e
bmail
A
gro
first
first
friend
meet
your
childhood
high
w
follo
pet’
occupation
name
you
f
a
v
f
a
v
f
a
v
f
a
v
f
a
v
primary
library
f
a
v
f
a
v
f
ather’
as
pet’
name
f
a
v
your
your
s
you
w
the
with
your
your
you
the
all
bold.
results
your
your
e
successfully
is
your
your
the
your
your
your
your
as
your
your
birthplace
teacher
historical
your
your
your
your
column
in
highlighted
guesses
were
your
w
did
as
as
s
did
as
as
as
is
is
is
ather’
f
or
is
is
is
is
is
is
is
w
w
childhood
w
w
is
is
is
w
mak
is
The
during
are
are
trusted
The
round
guess
What
orite
orite
OL
otal
otal
otal
ahoo!
otal
otal
A
a
v
a
v
What
Where
What
What
Who
What
What
What
What
Where
Where
T
Google
What
What
What
What
T
Microsoft
Mother’
Best
F
F
Grandf
T
Y
Where
What
Who
What
What
What
What
What
What
T
T
7
Schechter, Brush, & Egelman

For all answers to all questions, 13% were statis-
and under some circumstances to the trust question)
tically guessable. An attacker with a larger sample
was strongly significant, t(130) = −3.096, p = .002.
of answers than we used might be able to do even
Questions with answers that participants found eas-
better. Given that almost all participants were from
iest to recall appeared to be those that their partners
the same metropolitan area, it’s not surprising that their
found easiest to guess. A non-parametric Kendall τ
favorite sports teams can be guessed more than half the
test, examining the correlation between the fraction
time and many had the same favorite towns. However,
of answers recalled for each question and the fraction
other questions had popular answers that seem unlikely
guessed by participants’ partners, indicates a strong
to change much within the United States. Favorite
correlation, τ (56) = .496, p < 0.001.
pastimes (e.g. travel, reading) are fairly geographically
There are numerous reasons to believe these num-
universal activities, and both childhood heros (e.g.
bers may underestimate the ability of participants’
Superman) and historical persons (e.g. Jesus Christ)
partners to guess their answers. A third of the questions
are often drawn from a culture that is growing ever
received no guesses and when participants did guess,
more globalized.
most used only one of their their five allotted guesses.
Participants would likely have done better if they had
received feedback on which guesses were correct. For
5.5. Security against guessing by acquaintance
example, in Table 7 (in the Appendix) we see that
half of the participants whose partner was their spouse
For all participants who answered a question, the
were unable to guess how their partner answered the
fourth data column of Table 4 (labeled guessed by
question where did you meet your spouse?.
partner) shows the number and percentage of those
Furthermore, the first cohort of 40 participants were
participants’ partners were able to guess their answers.
not asked to perform online research and not given
These figures include guesses made in the second
a second guessing round at the end of the laboratory
guessing round, at the end of the laboratory session,
session. The rightmost column in Table 4 shows the
though the 40 participants in the first cohort did not
influence of the second round on the results of those
have this guessing opportunity. The fourth, fifth, and
participants who had been given this second oppor-
sixth columns show these figures broken down based
tunity. Not surprisingly, questions like high school
on how participants responded to the question would
mascot become easier to guess with online research.
you trust your partner with your Hotmail password?
to which they might answer no, yes, or under some
5.6. The security of user-written questions
circumstances. Highlighted are percentages above 20%
in the no column.
A total of 127 of our 130 participants responded to
Google’s questions performed the best by this met-
our request to write their own question and provide the
ric. Nobody guessed the answer to their partner’s ID
answer.
card answers and the overall guess rate was just 4%.
User-written questions are harder for us to analyze,
Microsoft’s questions came in a very distant second at
and possibly harder to attack, in as automated a manner
16% and AOL and Yahoo! trailed at 26% and 27%,
as site-written questions. This does not mean they are
respectively.
immune to attack. In our study, seven participants (6%)
Participants who were not trusted with their part-
chose questions that matched, or were significantly
ners’ Hotmail password, or who were trusted only
similar to site-written questions and are likely to have
under some circumstances, had roughly equal success
similar answer distributions. These questions are listed
in guessing their partners’ answers. Both groups were
in Table 5.1.
able to guess the answers to roughly 17% of their
Through manual analysis, we identified concrete
partners’ questions. In contrast, those who were trusted
problems with 63 of the 120 user-written ques-
by their partners were able to guess their partners’
tion/answer pairs that remained, which constituted half
answers 28% of the time. We ran a t-test to measure
of all user-written question/answer pairs. We broke
the effect of this trust question on the percentage
these into two subcategories, each of which contained
of answers guessed by participants’ partners. The
roughly a quarter of all question/answer pairs.
difference between partners who participants would
The first subcategory, in Table 5.2, contains 31
trust with their password and those who they would
question/answer pairs (24% of all pairs) vulnerable to
never trust did not meet the threshold of significance,
attacks that require no personal knowledge beyond the
t(88) = −1.750, p = .0860. The difference between
geographic location of the account holder. Using the
partners who were fully trusted and all others (both no
techniques described in the table, nineteen of these
8
Schechter, Brush, & Egelman

Table 5. If we allowed you to write your own personal question for your own use, what would it be?
1. Questions similar to those already used by webmail services (7 of 127, 6%)
Question written by participant
Similar question asked by webmail service
What is your first job
Where was your first job? (AOL)
What is my favorite thing to do?
What is your favorite pastime? (Yahoo!)
my childhood best friend
Best childhood friend (Microsoft)
First Car
What make was your first car or bike? (Yahoo!)
What is the name of your first pet as an adult?
What is your pets name? (AOL & Yahoo!)
First pet’s first name
What is your pets name? (AOL & Yahoo!)
What is your favorite pet’s name?
What is your pets name? (AOL & Yahoo!)
2. Vulnerable with no personal knowledge other than geographic region (31 of 127, 24%)
answer space: size (categorized into 5,10,25) & how obtained
i. Answer can be found via simple web search (2, 2%)
The story of a Number
5
“the story of a number” (answer is top hit)
What’s your favorite cookie at Panera Bakery?
5
“panera cookies” (answer in five cookie names listed on menu)
ii. Answer space ≤ 5 (11, 8%), ≤ 10 (15, 12%) & ≤ 25 (18, 14%)
Water or Pop?
5
“water”, “pop”
How many children do I have?
5
Count up from 0, “zero”
Favorite kink
5
Four examples in wikipedia definition
when did i graduate college?
5
Years backwards from 2008
What color are your eyes?
5
Four most common color names (“brown”, “hazel”, “green”, “blue”)
What is the color of your eyes
5
Four most common color names (“brown”, “hazel”, “green”, “blue”)
what color was your triumph
5
Three primary colors (“red”, “green”, “blue”)
Who should our next President be?
5
Presidential candidates (spring 2008)
What is my blood type
5
Four primary blood types (“o”, “a”, “b”, “ab”)
Where do I want to be living in 10 years?
5
Local city names (by size)
Where do you live?
5
Local city names (by size)
How tall am I?
10
Count up from 5’0” and ”five foot zero”
how tall are you
10
Count up from 5’0” and ”five foot zero”
What inseam do you wear?
10
Count up from 26, “26 inches”
number of times i got stitches?
10
Count up from 0, “zero”
What is your favorite number?
25
Count up from 0, “zero”
What was the year you graduated high school?
25
Years backwards from 2008
how many words I can type in on minute
25
Numbers around average typing speed in US
iii. Answer high on easily searchable popularity lists, top 5 (6, 5%), top 25 (11, 7%)
What is your have soda? [sic]
5
Best selling sodas in US
Favorite Food
5
Most popular foods in US
what sports team would you love to see lose
5
Most popular or top grossing sports teams
Which sports team do you love to hate?
5
Most popular or top grossing sports teams
Favorite TV show
5
Highest rated TV shows
First car
5
Top selling auto makers
Which is my favorite holiday?
25
Most popular holidays
Favorite Beer?
25
Top beer brands (US)
favorite beer
25
Top beer brands (US)
Who is your favorite actor?
25
Top grossing actors of all time
Best video game ever created?
25
Top selling games of all time
3. Vulnerable to coworkers, clients, or family members (32 of 127, 25%)
i. Vulnerable to family members (29, 23%)
ii. Vulnerable to coworkers/clients (3, 2%)
question
question
mother’s maiden name? (10 occurrences)
Who is your current boss?
mothers middle name (4 occurrences)
What is my line of work
father’s first name (2 occurrences)
What do you keep on your desk at all times?
Daughter’s Middle Name
favorite relative’s name
First child’s middle name
first initial of your sisters names from oldest to youngest
mothers name?
name of my wife
place of child birth
place where you were married
significant other’s middle name?
StepFathers middle name
What is [child’s name]’s favorite toy?
when were you married
your birthdate
your children’s godparents
9
Schechter, Brush, & Egelman

pairs (15% of all pairs) would be guessed within 5
be chosen by users in the first place, and when chosen
attempts. Two of these pairs had answers that could
they are less likely to be remembered.
be easily identified via a simple web search. Another
While the most well publicized attacks on personal
eleven drew from a small answer space, such as
questions have been targeted at individuals, our results
eye colors, with the answer falling within that space.
show that large scale attacks are also possible. Black-
Another six answers were within the top five results of
hats already have mailing lists containing large lists of
easily searchable online popularity lists. For example,
user accounts hosted by webmail services. Our results
one participant’s question was Favorite TV show and
show that a significant fraction of these accounts could
among the top five rated shows according to the first
be compromised simply by providing the most popular
popularity list we searched—the Nielson television
answers to users’ personal authentication questions.
ratings. Note that our statistics and the list of ques-
Furthermore, there are a number of other threats
tion/answer pairs in Table 5.2.iii includes only those
against personal questions that we did not address
pairs with popular answers; we exclude pairs with
in our study. The two questions that went unguessed
the same or similar questions for which participants
during the study asked for ID numbers: frequent flyer
responded with less popular answers. For example, two
numbers and library card numbers. The accounts of
pairs we excluded included the question favorite food
users who choose the former, and who use their web-
but had answers that were not at the top of popularity
mail account to communicate with airlines and travel
lists.
agencies, can be compromised by anyone with access
The second subcategory, in Table 5.3, contains 32
to these firms’ databases. Airlines, travel agencies, and
question/answer pairs (25% of all pairs) vulnerable
libraries may not guard these ID numbers with the
to attack by family members or others the partic-
same vigilance that a user would expect his or her
ipants knew. Fourteen participants asked for either
password to be guarded with. The answers to these
their mother’s maiden name or their mother’s middle
questions may also be easier for an attacker to obtain
name. A total of 29 questions (23% of all user-
than they were for our participants to guess in the
written questions) were categorized as vulnerable to
laboratory. For example, an attacker might offer the
family members. Another three questions were clearly
owner of a target account a prize if she can prove she
vulnerable to coworkers and clients, though many from
traveled in the last month—“just send an itinerary as
the family category (especially name of my wife) would
proof of travel”.
also be vulnerable.
Finally, a few participants may have not understood
6.1. Improving questions
that while the answers they wrote would not be public,
the questions would be. One proposed question, my
sobriety date, would be a poor choice if the participant
Many shared secret authentication schemes, includ-
considered his history of alcoholism to be private.
ing those that use personal questions, limit the user to
Another user-written question, what is my favorite
a fixed threshold of responses (answers). One way to
kink?, might also have conveyed more information than
make secret questions more secure and reliable would
intended by the participant who wrote it.
be to dynamically adjust the threshold based on the
Advocates of user-written questions might argue
types of responses received. In other words, certain
that these questions would have worked better if
responses are penalized (move a user closer to the
we had only taught participants how to choose a
threshold) than others.
strong question; we are skeptical. Users would have
To reduce vulnerability to statistical guessing at-
to take the time to read or view the instructions, learn
tacks, responses could be penalized in proportion to
and understand the different types of threats to their
their popularity. This could limit attackers to two
answers, generate a sufficient number of candidate
or three popular answers. The size of the penalty
question/answer pairs to come across one that is both
would depend on the likelihood that a legitimate user
memorable and secure against all threats, and reject all
would respond with multiple popular answers before
pairs that failed to meet these complex criteria.
guessing the correct one. Table 9 illustrates that of the
900 answers eventually recalled by participants in our
6. Discussion
follow-up study, 44 (5%) were preceded by a response
that was both incorrect and that matched one of the
Our results do not give us confidence that today’s
five most popular answers for that question. Only one
personal questions make adequate authentication se-
of the 900 correct answers (0.1%) was preceded by
crets. Those that are hard to guess are less likely to
two incorrect but popular answers.
10
Schechter, Brush, & Egelman

Users who are trying to recall a correct answer
with little effort an attacker can collect answers from
may provide answers that are similar to each other.
the public and derive more accurate popularity statis-
For example, a user who had no trouble remembering
tics than could be obtained from the authentication
where her mother was born might still walk through
system.
numerous possible answers: ‘Coney Island’, ‘coney
Some websites’ backup authentication systems allow
island’, ‘Brooklyn’, ‘Brooklyn, NY’, ‘Brooklyn, New
users to configure hints that will help them recall the
York’, ‘New York’, ‘New York, NY’, and so on. A
correct answer in the future. While we did not examine
user should not be penalized for a response that is
this practice, our findings on user-written questions
identical to a previous response for the purposes of
leave us concerned that users might be unable to
authentication, and should be penalized less when
sufficiently tune hints to remind them of their answers
responses are similar to each other or to the correct
without revealing these answers to others.
answer. For example, ‘Coney Island’ and ‘coney is-
Other websites’ backup authentication systems re-
land’ are lexographically identical with the exception
quire users to configure multiple questions and answer
of capitalization, and so the latter response should not
a subset to authenticate. Designers of such systems
be penalized. The responses ‘Brooklyn’ and ‘Brooklyn,
must decide whether to reveal which answers a user got
NY’ are lexographically similar—they share a common
correct if he or she fails to provide a sufficient number
prefix. Other responses, such as ‘Coney Island’ and
of correct answers. This is likely to be a common case,
‘Brooklyn’, are semantically similar as one might
as we found 24% of answers that were eventually re-
infer using a geographic database. When similarity
called by our participants were not correctly recalled in
is not as easy to obtain as in this example, previous
the first guess. (For more detailed statistics, see Table 8
users’ responses might be mined to reveal commonly
in the Appendix.) If users were asked all questions
confused answers.
at once and not told which questions they answered
Another way to reduce vulnerability to statistical
correctly and which they had not, many users who
guessing attacks is to reduce the proportion of popular
would have been able to answer a sufficient number of
answers. We propose eliminating questions that are
questions asked individually would no longer be able to
currently statistically guessable more than 10% of the
do so. On the other hand, if incorrect answers were in-
time. For the remaining questions, we propose flagging
dividually identified, adversaries could determine how
and rejecting answers that exceed a certain threshold of
close they were to a sufficient number of answers and
popularity (e.g. 1%). Users would be asked to choose
which they needed to research further. Furthermore,
another question or a more specific answer.
using multiple questions might lull users into believing
Unpopular answers may also be harder for ac-
that it is safe to reveal individual answers, thinking that
quaintances to guess. For all 379 answers deemed
the remaining questions are likely a sufficient defense.
statistically guessable, 168 (44%) were guessed by
One approach to backup authentication using multi-
participants’ partners. In contrast, only 460 of the
ple questions, proposed by Jakobsson et al. [4], relies
2491 answers that were not deemed statistically guess-
on preference-based questions, similar to those on
able (18%) were guessed by participants’ partners. A
online dating websites, with answers rated on a scale.
Fisher’s exact test shows the difference to be statisti-
However, this approach requires both a large number
cally significant, p < 0.0001.
of questions to be configured and a large number of
Guiding users away from popular answers may also
responses during authentication.
increase the likelihood that they will forget them. We
suggest occasionally inserting a query for each user’s
6.2. Alternative backup authenticators
answer after login has completed. We would encourage
those who have trouble recalling their answers to
One barrier to the deployment of new, and po-
choose a new answer (or an entirely new question). The
tentially better, backup authentication options is that
first such query should occur shortly after a question is
the comparative risks are unknown. We hope that by
configured – perhaps a few days – to ensure the answer
quantifying the risks of personal questions, we will
was encoded to long-term memory. Additional queries
help to catalyze the development of quantitatively-
could be separated by much longer periods (e.g. six
superior alternatives.
months) and ensure the answers had not changed.
One current alternative, authentication via a code
Some might be concerned that an authentication
sent to an alternate email address, is often not vi-
system that alerts users when they have chosen a
able for users’ primary email accounts. Even when
popular answer could be used by attackers as an oracle
users have alternate addresses they can provide, these
to identify these popular answers—it could. However,
addresses may expire when users change their ISP,
11
Schechter, Brush, & Egelman

school, job, or other affiliation. Simultaneous creden-
the five most popular answers of other users. User-
tial loss could occur if a user stored her password on
written questions were no better: roughly half were
a work computer, used her work email address as her
vulnerable to guessing by either acquaintances or those
backup authenticator, and then lost her job.
who had never met the account holder.
Mobile phones are already in use as a second
Whatever options users are given for backup authen-
authentication factor by some banks [3], which send
tication, all have risks and users have the right to know
authentication codes to users in SMS messages. Au-
about them. We hope this work helps users to choose
thentication using mobile phones is attractive because
whether and how to answer backup authentication
of phones’ ubiquity. However, phones are also fre-
questions. We also hope that by quantifying the bar
quently shared, lost, and stolen. The security of SMS
over which new backup authentication mechanisms
message transmission is also a concern.
must pass, we will inspire the creation, measurement,
Many users protect against memory loss by writing
and deployment of new alternatives to ‘secret’ ques-
passwords down. Rather than admonish them for this
tions.
practice, a backup authentication system could instead
offer to print a list of single-use account-recovery
Acknowledgments
codes and encourage users to store them in a locked
filing cabinet, safe, or safe-deposit boxes. As with
This paper was inspired by the griping of Jon
written passwords, a printed list might not be available
Howell. We are indebted to Will Ip, Maritza Johnson,
when the user was away from the location(s) at which
and Arry Shin for their assistance in running our study.
it was stored. Furthermore, simultaneous credential
We are also grateful for the valuable feedback on
loss could occur if a user stored her password in her
earlier drafts provided by Robert W. Reeder and the
browser, stored her authentication list in a safe near
anonymous reviewers.
the computer, and then lost both in a natural disaster.
In previous papers, Brainard et al. [1] and we [12]
Epilog
have proposed and tested systems in which user-
selected trustees vouch for the identity of the user.
On November 12, 2008, we contacted AOL, Google,
While early reliability and security results from our
and Yahoo! to provide them with a draft of this paper
work show promise, communicating with one or more
and share our intent to publish at this symposium. We
trustees requires far more work than typing a simple
asked to be notified by the end of 2008 if they had
answer to a question. For many users, the conse-
concerns that might warrant the delay of publication, so
quences of having an account lost or compromised may
as to provide ample time to discuss these concerns with
not be significant enough to justify the extra effort.
them and, if necessary, withdraw the paper. AOL and
Google sent email explicitly consenting to publication
7. Conclusion
in advance of the deadline. Yahoo! made no request
to delay publication. We learned in February 2009
that Yahoo! had replaced all nine of the personal
Backup authentication mechanisms should reliably
authentication questions that its users may choose from
enable account holders to regain access to accounts
when signing up for a new account.
for which they have forgotten their passwords, and
do so without significantly increasing the risk that the
account can be compromised.
The secret questions employed by the top four web-
mail services are not sufficiently reliable authentica-
tors. Even for the webmail service with the most mem-
orable set of questions (Yahoo!), participants forgot
an average of 16% of the answers to those questions
within six months.
The security of personal questions appears signif-
icantly weaker than passwords. Acquaintances with
whom participants reported being unwilling to share
their Hotmail passwords were able to guess 17% of
answers. For our geographically-homogenous sample,
13% of answers could be guessed by iterating through
12
Schechter, Brush, & Egelman

References
[13] S. E. Schechter, R. Dhamija, A. Ozment, and I. Fischer.
The emperor’s new security indicators: An evaluation
of website authentication and the effect of role playing
[1] J. Brainard, A. Juels, R. L. Rivest, M. Szydlo, and
on usability studies.
In Proceedings of the 2007
M. Yung. Fourth-factor authentication: somebody you
IEEE Symposium on Security and Privacy, pages 51–
know. In CCS ’06: Proceedings of the 13th ACM Con-
65, Washington, DC, USA, May 20–23 2007. IEEE
ference on Computer and Communications Security,
Computer Society.
pages 168–178, New York, NY, USA, 2006. ACM.
[14] R. Stross. What would you do if you logged onto your
[2] T. Bridis.
Hacker impersonated Palin, stole e-mail
e-mail and received an unfamiliar message: ‘user name
password, Sept. 18, 2008. Associated Press.
and password do not match’? The New York Times,
Oct. 4, 2008.
http://www.nytimes.com/2008/10/05/
[3] CommonwealthBank. NetBank NetCode SMS, 2008.
business/05digi.html.
http://www.commbank.com.au/netbank/netcodesms/.
[15] B. Sullivan. ‘forgot your password?’ may be weakest
[4] M. Jakobsson, E. Stolterman, S. Wetzel, and L. Yang.
link.
MSNBC Red Tape Chronicles, Aug. 26,
Love and authentication. In CHI ’08: Proceeding of the
2008.
http://redtape.msnbc.com/2008/08/almost-
Twenty-Sixth Annual SIGCHI Conference on Human
everyone.html.
Factors in Computing Systems, pages 197–200, New
York, NY, USA, 2008. ACM.
[16] M. Toomim, X. Zhang, J. Fogarty, and J. A. Landay.
Access control by testing for shared knowledge. In CHI
[5] M. Just. Designing authentication systems with chal-
’08: Proceedings of the ACM SIGCHI Conference on
lenge questions. In L. F. Cranor and S. Garfinkel, edi-
Human Factors in Computing Systems, Florence, Italy,
tors, Security and Usability: Designing Secure Systems
2008. ACM.
that People Can Use, pages 143–155, Sebastopol, CA,
2005. O’Reilly Media, Inc.
[17] M. Zviran and W. J. Haga.
User authentication by
cognitive passwords: an empirical assessment. In JCIT:
[6] G.
Keizer.
Yahoo,
Hotmail,
Gmail
all
Proceedings of the Fifth Jerusalem Conference on In-
vulnerable
to
Palin-style
password-reset
formation technology, pages 137–144, Los Alamitos,
hack.
Computerworld,
Sept.
19,
2008.
CA, USA, 1990. IEEE Computer Society Press.
http://www.computerworld.com/action/article.do?
command=viewArticleBasic&articleId=9115187.
Appendix A.
[7] J. Kremer. Happy 10th birthday, Yahoo! Mail, Oct.
The missing question
2007.
http://ycorpblog.com/2007/10/08/happy-10th-
birthday-yahoo-mail/.
One of the personal authentication questions used
[8] H.
P.
Ltd.
Top
20
websites,
2008.
by Microsoft is name of first pet. Due to a clerical
http://www.hitwise.com/datacenter/rankings.php.
error, our original survey instead asked participants
the question your first pet. The removal of “name of”
[9] Microsoft
Corporation.
Windows
live
hotmail
fact
sheet,
May
2007.
had an important effect on the answers: many were
http://www.microsoft.com/presspass/newsroom/
simply dog or cat. Further confounding the problem,
msn/factsheet/hotmail.mspx.
we asked the correct question (name of first pet) during
the longitudinal recall study. Thus, we had no choice
[10] J. Podd, J. Bunnell, and R. Henderson. Cost-effective
but to exclude these results from our findings. We
computer security: Cognitive and associative pass-
words. In OZCHI ’96: Proceedings of the 6th Aus-
audited all other questions and found no other errors
tralian Conference on Computer-Human Interaction
of this type.
(OZCHI ’96), page 304, Washington, DC, USA, 1996.
The excluded question was similar to a question
IEEE Computer Society.
shared by AOL and Yahoo: what is your pet’s name?.
One important difference is that Microsoft’s question
[11] A. Rabkin.
Personal knowledge questions for fall-
back authentication: security questions in the era of
asks specifically about a first pet, which may be less
facebook.
In SOUPS ’08: Proceedings of the 4th
well known to acquaintances than one’s current pet.
Symposium on Usable Privacy and Security, pages 13–
If the results for Microsoft’s question were equivalent
23, New York, NY, USA, 2008. ACM.
to those for what is your pet’s name, the aggregate
vulnerability of all Microsoft questions to guessing
[12] S. Schechter, S. Egelman, and R. W. Reeder. It’s not
what you know, but who you know: A social approach
by partners would have increased. However, aggregate
to last-resort authentication. In CHI ’09: Proceedings
statistics for both participant recall (memorability) and
of the ACM SIGCHI Conference on Human Factors in
resilience to statistical guessing would have improved.
Computing Systems, Boston, MA, 2009. ACM.
13
Schechter, Brush, & Egelman

Table 6. Guesses broken down by partner relationship
Spouse
Relative
Fiance/SO
Friend
Coworker
Other
18 ppts
23 ppts
4 ppts
51 ppts
32 ppts
2 ppts
AOL
What is your pet’s name?
13/14 (93%)
8/16 (50%)
2/4 (50%)
17/36
(47%)
5/22 (23%)
0/1
( 0%)
Where were you born?
13/18 (72%)
15/23 (65%)
2/4 (50%)
20/51
(39%)
10/31 (32%)
0/2
( 0%)
What is your favorite restaurant?
5/17 (29%)
2/19 (11%)
0/3
( 0%)
12/48
(25%)
1/29
( 3%)
0/1
( 0%)
What is the name of your school?
3/14 (21%)
3/17 (18%)
0/2
( 0%)
14/39
(36%)
7/22 (32%)
0/2
( 0%)
Who is your favorite singer?
3/15 (20%)
3/15 (20%)
0/2
( 0%)
5/42
(12%)
1/27
( 4%)
0/0
What is your favorite town?
6/17 (35%)
5/21 (24%)
0/2
( 0%)
13/47
(28%)
9/28 (32%)
0/0
What is your favorite song?
1/14
( 7%)
0/13
( 0%)
0/2
( 0%)
1/40
( 3%)
2/24
( 8%)
0/1
( 0%)
What is your favorite film?
6/15 (40%)
0/18
( 0%)
1/3 (33%)
9/50
(18%)
2/28
( 7%)
0/0
What is your favorite book?
2/13 (15%)
3/17 (18%)
0/3
( 0%)
5/41
(12%)
2/25
( 8%)
0/1
( 0%)
Where was your first job?
5/18 (28%)
4/20 (20%)
1/4 (25%)
13/51
(25%)
3/31 (10%)
0/1
( 0%)
Where did you grow up?
9/18 (50%)
11/22 (50%)
0/4
( 0%)
26/50
(52%)
15/31 (48%)
0/2
( 0%)
Total
66/173 (38%)
54/201 (27%)
6/33 (18%)
135/495 (27%)
57/298 (19%)
0/11 ( 0%)
Google
What is your primary frequent flyer number?
0/4
( 0%)
0/6
( 0%)
0/2
( 0%)
0/8
( 0%)
0/8
( 0%)
0/1
( 0%)
What is your library card number?
0/4
( 0%)
0/4
( 0%)
0/3
( 0%)
0/19
( 0%)
0/9
( 0%)
0/0
What was your first phone number?
3/13 (23%)
3/13 (23%)
0/2
( 0%)
2/38
( 5%)
1/27
( 4%)
0/0
What was your first teacher’s name?
0/13
( 0%)
1/13
( 8%)
0/2
( 0%)
0/38
( 0%)
0/27
( 0%)
0/0
Total
3/34
( 9%)
4/36 (11%)
0/9
( 0%)
2/103
( 2%)
1/71
( 1%)
0/1
( 0%)
Microsoft
Mother’s birthplace
8/16 (50%)
8/20 (40%)
0/4
( 0%)
12/49
(24%)
6/30 (20%)
0/2
( 0%)
Best childhood friend
9/17 (53%)
5/20 (25%)
0/3
( 0%)
8/49
(16%)
1/29
( 3%)
0/2
( 0%)
Favorite teacher
0/16
( 0%)
1/16
( 6%)
0/2
( 0%)
4/43
( 9%)
0/28
( 0%)
0/0
Favorite historical person
4/17 (24%)
1/16
( 6%)
0/2
( 0%)
6/47
(13%)
2/24
( 8%)
0/0
Grandfather’s occupation
3/15 (20%)
3/15 (20%)
1/3 (33%)
3/39
( 8%)
2/25
( 8%)
0/2
( 0%)
Total
24/81 (30%)
18/87 (21%)
1/14 ( 7%)
33/227 (15%)
11/136 ( 8%)
0/6
( 0%)
Yahoo!
Where did you meet your spouse?
9/18 (50%)
7/14 (50%)
2/3 (67%)
4/29
(14%)
1/15
( 7%)
0/1
( 0%)
What was the name of your first school?
4/18 (22%)
8/17 (47%)
0/2
( 0%)
8/49
(16%)
3/30 (10%)
0/0
Who was your childhood hero?
1/13
( 8%)
1/19
( 5%)
0/1
( 0%)
3/37
( 8%)
5/27 (19%)
0/0
What is your favorite pastime?
6/18 (33%)
6/22 (27%)
2/4 (50%)
10/45
(22%)
4/28 (14%)
0/1
( 0%)
What is your favorite sports team?
11/16 (69%)
9/19 (47%)
0/0
17/40
(43%)
10/24 (42%)
0/1
( 0%)
What is your father’s middle name?
8/16 (50%)
6/16 (38%)
0/4
( 0%)
3/43
( 7%)
0/27
( 0%)
0/2
( 0%)
What was your high school mascot?
7/14 (50%)
8/20 (40%)
0/2
( 0%)
14/44
(32%)
5/29 (17%)
0/0
What make was your first car or bike?
5/18 (28%)
7/20 (35%)
0/4
( 0%)
13/51
(25%)
5/31 (16%)
1/2 (50%)
What is your pet’s name?
13/14 (93%)
8/16 (50%)
2/4 (50%)
17/36
(47%)
5/22 (23%)
0/1
( 0%)
Total
64/145 (44%)
60/163 (37%)
6/24 (25%)
89/374 (24%)
38/233 (16%)
1/8 (13%)
Total for all webmail sites
144/419 (34%)
128/471 (27%)
11/76 (14%)
242/1163 (21%)
102/716 (14%)
1/25 ( 4%)
Table 7. Guesses broken down by how long partners knew each other
< 6 months
6 months–1 year
1–4 years
> 4 years
6 ppts
11 ppts
30 ppts
83 ppts
AOL
What is your pet’s name?
0/3
( 0%)
0/4
( 0%)
7/23 (30%)
38/63
(60%)
Where were you born?
3/5
(60%)
3/11
(27%)
11/30 (37%)
43/83
(52%)
What is your favorite restaurant?
0/5
( 0%)
1/9
(11%)
1/28
( 4%)
18/75
(24%)
What is the name of your school?
2/3
(67%)
3/9
(33%)
7/21 (33%)
15/63
(24%)
Who is your favorite singer?
0/5
( 0%)
0/9
( 0%)
0/24
( 0%)
12/63
(19%)
What is your favorite town?
3/5
(60%)
3/9
(33%)
5/26 (19%)
22/75
(29%)
What is your favorite song?
0/3
( 0%)
0/9
( 0%)
1/24
( 4%)
3/58
( 5%)
What is your favorite film?
0/5
( 0%)
2/10
(20%)
3/30 (10%)
13/69
(19%)
What is your favorite book?
0/5
( 0%)
1/8
(13%)
3/25 (12%)
8/62
(13%)
Where was your first job?
1/5
(20%)
2/11
(18%)
3/30 (10%)
20/79
(25%)
Where did you grow up?
2/5
(40%)
3/11
(27%)
13/30 (43%)
43/81
(53%)
Total
11/49
(22%)
18/100
(18%)
54/291 (19%)
235/771 (30%)
Google
What is your primary frequent flyer number?
0/0
0/1
( 0%)
0/8
( 0%)
0/20
( 0%)
What is your library card number?
0/1
( 0%)
0/3
( 0%)
0/8
( 0%)
0/27
( 0%)
What was your first phone number?
0/4
( 0%)
0/8
( 0%)
1/24
( 4%)
8/57
(14%)
What was your first teacher’s name?
0/4
( 0%)
0/7
( 0%)
0/27
( 0%)
1/55
( 2%)
Total
0/9
( 0%)
0/19
( 0%)
1/67
( 1%)
9/159
( 6%)
Microsoft
Mother’s birthplace
1/5
(20%)
1/10
(10%)
5/30 (17%)
27/76
(36%)
Best childhood friend
0/4
( 0%)
0/10
( 0%)
2/29
( 7%)
21/77
(27%)
Favorite teacher
0/5
( 0%)
0/10
( 0%)
1/28
( 4%)
4/62
( 6%)
Favorite historical person
0/3
( 0%)
2/9
(22%)
1/24
( 4%)
10/70
(14%)
Grandfather’s occupation
1/2
(50%)
0/7
( 0%)
2/26
( 8%)
9/64
(14%)
Total
2/19
(11%)
3/46
( 7%)
11/137 ( 8%)
71/349 (20%)
Yahoo!
Where did you meet your spouse?
1/1
(100%)
0/5
( 0%)
2/14 (14%)
20/60
(33%)
What was the name of your first school?
1/5
(20%)
0/11
( 0%)
2/28
( 7%)
20/72
(28%)
Who was your childhood hero?
1/4
(25%)
1/9
(11%)
3/24 (13%)
5/60
( 8%)
What is your favorite pastime?
0/3
( 0%)
2/9
(22%)
5/27 (19%)
21/79
(27%)
What is your favorite sports team?
3/4
(75%)
1/7
(14%)
8/21 (38%)
35/68
(51%)
What is your father’s middle name?
0/4
( 0%)
0/9
( 0%)
0/26
( 0%)
17/69
(25%)
What was your high school mascot?
2/5
(40%)
2/9
(22%)
2/28
( 7%)
28/67
(42%)
What make was your first car or bike?
2/5
(40%)
2/11
(18%)
3/30 (10%)
24/80
(30%)
What is your pet’s name?
0/3
( 0%)
0/4
( 0%)
7/23 (30%)
38/63
(60%)
Total
10/34
(29%)
8/74
(11%)
32/221 (14%)
208/618 (34%)
Total for all webmail sites
23/108 (21%)
29/235
(12%)
91/693 (13%)
485/1834 (26%)
14
Schechter, Brush, & Egelman

Table 8. Guesses required to recall those answers that could eventually be recalled
Guess Number
1
2
3
4
5
6
7
8
9
10
AOL
What is your pet’s name?
89%
97%
97%
100%
100%
100%
100%
100%
100%
100%
Where were you born?
85%
91%
94%
100%
100%
100%
100%
100%
100%
100%
What is your favorite restaurant?
74%
91%
91%
97%
97%
100%
100%
100%
100%
100%
What is the name of your school?
67%
77%
87%
97%
97%
97%
97%
100%
100%
100%
Who is your favorite singer?
72%
76%
100%
100%
100%
100%
100%
100%
100%
100%
What is your favorite town?
71%
94%
94%
100%
100%
100%
100%
100%
100%
100%
What is your favorite song?
62%
86%
86%
90%
90%
95%
100%
100%
100%
100%
What is your favorite film?
61%
75%
82%
96%
96%
100%
100%
100%
100%
100%
What is your favorite book?
78%
93%
96%
100%
100%
100%
100%
100%
100%
100%
Where was your first job?
59%
77%
87%
90%
97%
97%
97%
100%
100%
100%
Where did you grow up?
76%
93%
96%
100%
100%
100%
100%
100%
100%
100%
Total
73%
87%
92%
98%
98%
99%
99%
100%
100%
100%
Google
What is your primary frequent flyer number?
60%
100%
100%
100%
100%
100%
100%
100%
100%
100%
What is your library card number?
88%
100%
100%
100%
100%
100%
100%
100%
100%
100%
What was your first phone number?
87%
97%
97%
100%
100%
100%
100%
100%
100%
100%
What was your first teacher’s name?
75%
92%
96%
100%
100%
100%
100%
100%
100%
100%
Total
81%
96%
97%
100%
100%
100%
100%
100%
100%
100%
Microsoft
Mother’s birthplace
77%
90%
95%
95%
97%
100%
100%
100%
100%
100%
Best childhood friend
79%
92%
92%
92%
92%
92%
95%
97%
97%
97%
Favorite teacher
61%
87%
90%
94%
97%
97%
97%
97%
97%
100%
Favorite historical person
68%
80%
92%
96%
100%
100%
100%
100%
100%
100%
Grandfather’s occupation
70%
78%
81%
89%
93%
93%
96%
100%
100%
100%
Total
72%
86%
91%
93%
96%
96%
98%
99%
99%
99%
Yahoo!
Where did you meet your spouse?
60%
80%
87%
90%
93%
93%
93%
97%
100%
100%
What was the name of your first school?
73%
85%
90%
95%
95%
95%
98%
98%
100%
100%
Who was your childhood hero?
50%
78%
89%
94%
94%
94%
94%
94%
94%
94%
What is your favorite pastime?
63%
70%
83%
90%
97%
100%
100%
100%
100%
100%
What is your favorite sports team?
71%
94%
97%
100%
100%
100%
100%
100%
100%
100%
What is your father’s middle name?
95%
100%
100%
100%
100%
100%
100%
100%
100%
100%
What was your high school mascot?
95%
100%
100%
100%
100%
100%
100%
100%
100%
100%
What make was your first car or bike?
66%
91%
98%
98%
98%
98%
100%
100%
100%
100%
What is your pet’s name?
89%
97%
97%
100%
100%
100%
100%
100%
100%
100%
Total
76%
90%
94%
97%
98%
98%
99%
99%
100%
100%
All questions
74%
89%
93%
97%
98%
98%
99%
99%
100%
100%
Each column i represents the number of answers guessed within the first i tries as a percentage of the number of answers that could be recalled given
an unlimited number of attempts. (In most of this paper, participants are said to have forgotten their answer if they fail in the first five attempts.)
15
Schechter, Brush, & Egelman

Table 9. Statistical guessing and popular answers
answers
answers deemed
# of incorrect but popular (among top five)
among five
statistically
responses before correct answer recalled
most popular
guessable
0
1
2
AOL
What is your pet’s name?
10/93
(11%)
0/93
( 0%)
37/37
(100%)
0/37
( 0%)
0/37
( 0%)
Where were you born?
27/129
(21%)
19/129
(15%)
43/45
(96%)
2/45
( 4%)
0/45
( 0%)
What is your favorite restaurant?
19/117
(16%)
7/117
( 6%)
32/34
(94%)
2/34
( 6%)
0/34
( 0%)
What is the name of your school?
22/96
(23%)
22/96
(23%)
30/30
(100%)
0/30
( 0%)
0/30
( 0%)
Who is your favorite singer?
11/101
(11%)
1/101
( 1%)
28/28
(100%)
0/28
( 0%)
0/28
( 0%)
What is your favorite town?
37/115
(32%)
34/115
(30%)
31/35
(89%)
4/35
(11%)
0/35
( 0%)
What is your favorite song?
3/94
( 3%)
3/94
( 3%)
21/21
(100%)
0/21
( 0%)
0/21
( 0%)
What is your favorite film?
18/114
(16%)
11/114
(10%)
26/28
(93%)
2/28
( 7%)
0/28
( 0%)
What is your favorite book?
21/100
(21%)
19/100
(19%)
27/27
(100%)
0/27
( 0%)
0/27
( 0%)
Where was your first job?
16/125
(13%)
10/125
( 8%)
34/37
(92%)
3/37
( 8%)
0/37
( 0%)
Where did you grow up?
27/127
(21%)
22/127
(17%)
42/43
(98%)
1/43
( 2%)
0/43
( 0%)
Total
211/1211 (17%)
148/1211 (12%)
351/365
(96%)
14/365 ( 4%)
0/365
( 0%)
Google
What is your primary frequent flyer number?
0/29
( 0%)
0/29
( 0%)
5/5
(100%)
0/5
( 0%)
0/5
( 0%)
What is your library card number?
0/39
( 0%)
0/39
( 0%)
7/7
(100%)
0/7
( 0%)
0/7
( 0%)
What was your first phone number?
0/93
( 0%)
0/93
( 0%)
30/30
(100%)
0/30
( 0%)
0/30
( 0%)
What was your first teacher’s name?
6/93
( 6%)
6/93
( 6%)
24/24
(100%)
0/24
( 0%)
0/24
( 0%)
Total
6/254
( 2%)
6/254
( 2%)
66/66
(100%)
0/66
( 0%)
0/66
( 0%)
Microsoft
Mother’s birthplace
19/121
(16%)
12/121
(10%)
34/38
(89%)
4/38
(11%)
0/38
( 0%)
Best childhood friend
10/120
( 8%)
1/120
( 1%)
39/39
(100%)
0/39
( 0%)
0/39
( 0%)
Favorite teacher
0/105
( 0%)
0/105
( 0%)
31/31
(100%)
0/31
( 0%)
0/31
( 0%)
Favorite historical person
30/106
(28%)
27/106
(25%)
21/24
(88%)
3/24
(13%)
0/24
( 0%)
Grandfather’s occupation
20/99
(20%)
13/99
(13%)
25/27
(93%)
2/27
( 7%)
0/27
( 0%)
Total
79/551
(14%)
53/551
(10%)
150/159
(94%)
9/159 ( 6%)
0/159
( 0%)
Yahoo!
Where did you meet your spouse?
19/80
(24%)
10/80
(13%)
25/29
(86%)
4/29
(14%)
0/29
( 0%)
What was the name of your first school?
9/116
( 8%)
1/116
( 1%)
41/41
(100%)
0/41
( 0%)
0/41
( 0%)
Who was your childhood hero?
34/97
(35%)
27/97
(28%)
15/18
(83%)
3/18
(17%)
0/18
( 0%)
What is your favorite pastime?
32/118
(27%)
23/118
(19%)
24/27
(89%)
3/27
(11%)
0/27
( 0%)
What is your favorite sports team?
59/100
(59%)
57/100
(57%)
30/35
(86%)
5/35
(14%)
0/35
( 0%)
What is your father’s middle name?
17/108
(16%)
5/108
( 5%)
40/40
(100%)
0/40
( 0%)
0/40
( 0%)
What was your high school mascot?
21/109
(19%)
18/109
(17%)
39/39
(100%)
0/39
( 0%)
0/39
( 0%)
What make was your first car or bike?
36/126
(29%)
31/126
(25%)
37/44
(84%)
6/44
(14%)
1/44
( 2%)
What is your pet’s name?
10/93
(11%)
0/93
( 0%)
37/37
(100%)
0/37
( 0%)
0/37
( 0%)
Total
237/947
(25%)
172/947
(18%)
288/310
(93%)
21/310 ( 7%)
1/310 (0.3%)
All questions
523/2870 (18%)
379/2870 (13%)
855/900
(95%)
44/900 ( 5%)
1/900 (0.1%)
A participant’s answer was deemed statistically guessable if it was among the five most common answers chosen by all other participants (excluding
the participant’s partner).
16
Schechter, Brush, & Egelman