Original PDF Flash format fingerprinting-blank-paper-using-commodity-scanners  


Fingerprinting Blank Paper Using Commodity Scanners

To appear in Proc. IEEE Symposium on Security and Privacy, May 2009. For updates, visit http://citp.princeton.edu/paper/.
Fingerprinting Blank Paper Using Commodity Scanners
William Clarkson∗, Tim Weyrich∗,†, Adam Finkelstein∗, Nadia Heninger∗,
J. Alex Halderman‡ and Edward W. Felten∗
∗Department of Computer Science
†Department of Computer Science
‡Department of Electrical Engineering
Princeton University
University College London
and Computer Science
{wclarkso, af, nadiah, felten}
t.weyrich@ucl.ac.uk
The University of Michigan
@cs.princeton.edu
jhalderm@eecs.umich.edu
Abstract
surface
paper
This paper presents a novel technique for authenticating
physical documents based on random, naturally occurring
light

imperfections in paper texture. We introduce a new method
normal
for measuring the three-dimensional surface of a page
using only a commodity scanner and without modifying
sensor
the document in any way. From this physical feature, we
generate a concise fingerprint that uniquely identifies the
down
document. Our technique is secure against counterfeiting
to glass
and robust to harsh handling; it can be used even before
(a)
(b)
(c)
any content is printed on a page. It has a wide range of
applications, including detecting forged currency and tickets,
Figure 1. Since the surface of a sheet of paper is not
authenticating passports, and halting counterfeit goods.
perfectly flat, a scanner will produce a different image
Document identification could also be applied maliciously to
depending on the orientation of the page. The light
de-anonymize printed surveys and to compromise the secrecy
reaching the sensor depends on the relative angles of
of paper ballots.
the light source and surface normal, (a). A 10 mm tall
region of a document scanned from top to bottom, (b),
1. Introduction and Roadmap
appears different from the same region scanned from left
to right, (c). By combining (b) and (c) we can estimate
Viewed up close, the surface of a sheet of paper is not
the 3-D texture.
perfectly flat but is a tangled mat of wood fibers with a
rich three-dimensional texture that is highly random and
difficult to reproduce. In this study, we introduce a new
in Sections 7 and 8. Some of these applications may be
method for identifying a physical document—and verifying
harmful; for example, our method allows re-identification of
its authenticity—by measuring this unique natural structure.
supposedly anonymous surveys and paper ballots.
We show that paper texture can be estimated using only
In contrast with previous efforts, our technique measures
a flatbed scanner coupled with appropriate software, and
paper’s 3-D texture, requires no exotic equipment, produces
that this feature is robust against rough treatment—such as
a concise document fingerprint, does not require modifying
printing or scribbling on the document or soaking it in water—
the document, and may be applied to blank paper before
and adversarial counterfeiting. Under normal conditions, our
content is printed. Previous systems lack one or more of these
technique can identify documents with near-perfect accuracy
properties. For example, Laser Surface Authentication [1]
and a negligible false positive rate.
requires a costly laser microscope to image paper texture,
It has long been known how to authenticate the content
while the technique proposed by Zhu et al. [2], which focuses
printed on a page by using cryptographic methods such as
on ink splatter caused by randomness in the printing process,
digital signatures. We address a different problem: how to
requires the paper to be printed with known content prior to
authenticate the paper itself. For some kinds of documents,
fingerprinting. We discuss these and other related work in
such as currency and tickets, it matters not only that the
Section 2.
content is unaltered but also that the document is a genuine
The physical document authentication technique we pro-
original rather than a copy or forgery. Physical document
pose is a three-stage process culminating in a robust and
authentication has many applications, which we discuss
secure fingerprint. In the first stage, we scan the original

document and estimate its surface texture. Scanners normally
in order to extract other additional information. The feature
measure only the color of a document, but by scanning the
vector used by Cowburn and Buchanan is not concise, and
paper several times at different orientations, we can estimate
their fingerprint is not secure. An adversary with access to
the shape of the surface (see Figure 1). In the second stage,
the fingerprint is able to easily discover the surface texture
we condense the surface texture into a concise feature vector,
of the document, possibly making forgery less difficult.
which robustly identifies the page. The third and final stage
Zhu et al. focus on identifying “non-repeatable randomness
uses a secure sketch to generate a fingerprint that reveals little
existing in the printing process” [2]. They generate a
information about the feature vector. The fingerprint can be
fingerprint from the random ink splatter that occurs around
printed on the document (e.g., as a 2-D bar code) or stored
the edges of any features printed on a page. Unlike our
in a database. The verification procedure is similar to the
scheme, their method can only be applied after a document
fingerprinting process, with a different final stage that verifies
has been printed. Furthermore, their implementation requires
that the generated feature vector is correct. We describe each
modifying the original document by printing a known target
of these stages in detail in Section 3.
pattern.
We designed our technique to satisfy several security and
Our method is an improvement over previous work because
usability goals:
we measure the surface texture of a document without the
• Uniqueness
Every document should be identifiable
requirement of expensive equipment. We utilize the unique
and distinguishable from all others.
fiber structure as identified and relied upon by Metois et al.,
• Consistency
A fingerprint should be verifiable by
Cowburn and Buchanan, and Zhu et al. but do so without
multiple parties over the lifetime of the document.
modifying the document in any way. Our method allows
• Conciseness
Document fingerprints should be short
documents to be fingerprinted before or after content is
and easily computable.
printed. In fact, fingerprinting and tracking using our system
• Robustness
It should be possible to verify a finger-
can begin during the paper manufacturing process. We have
printed document even if it has been subjected to harsh
also developed methods for hiding the target feature vector
treatment.
through the use of a secure sketch. This means a potential
• Resistance to Forgery
It should be very difficult or
counterfeiter cannot learn what features he needs to reproduce
costly for an adversary to forge a document by coercing
from the fingerprint alone but would need access to the
a second document to express the same fingerprint as
original document to even attempt a forgery.
an original.
Sections 4–6 evaluate our system in terms of these goals.
3. Fingerprinting Process
The most recent version of this paper can be found on our
Our fingerprinting process allows for registration and
web site, http://citp.princeton.edu/paper/.
validation of a sheet of paper without the participation of a
central registration authority. Depending on the application, a
2. Related Work
document’s fingerprint can be stored in a database for future
verification, or it can be printed on the document along with
The Fiberfingerprint system of Metois et al. first introduced
a digital signature, creating a self-verifying original. The
the notion of using surface texture to uniquely identify a
fingerprint can be used to ascertain whether two documents
document [3]. Employing a custom device, Fiberfingerprint
share the same feature vector without revealing the registered
measures “inhomogeneities in the substrate” of a document,
feature vector itself.
from which a unique identifier is derived. The system employs
The registration and validation processes are quite similar,
alignment marks that are added to the document in order
as shown in Figure 2. In the registration process, we scan a
to orient the verification system, and requires a specialized
document, estimate its three-dimensional surface texture, and
hardware device rather than a commodity scanner.
generate a feature vector V that represents the document’s
Laser Surface Authentication is a technique that measures
unique texture. We consider two documents to be the same
the texture of a page using a high-powered laser micro-
if they have similar feature vectors. To protect the feature
scope [1]. Creating and verifying fingerprints in their system
vector and inhibit forgeries that might seek to reproduce
requires access to this specialized device, which may be
an exact feature vector, the fingerprint contains only a one-
prohibitively expensive for many users. Their system also
way hash H(V) of the extracted feature vector. To achieve
requires that the verifier be online, which may rule out
robustness against measurement errors in the feature vector,
applications such as third-party ticket verification.
the registration process derives error-correction information
A recent patent application by Cowburn and Buchanan de-
from V and stores it in the fingerprint in the form of a
scribes using a commodity scanner to identify documents [4].
secure sketch. The fingerprint also contains a random seed
This method does not measure the normal vector field of a
to initialize the pseudorandom number generator used to
document, but rather uses scans from multiple orientations
compute the feature vector, as described in Section 3.2.
2

Registration
Estimate
Determine
Compute Error
Apply Hash
Scan Document
Surface Texture
Feature Vector
Correction Bits
Function
Fingerprint
Random Seed
Error Correction Bits
Hash Value
Validation
Estimate
Determine
Apply Error
Apply Hash
?
Scan Document
Surface Texture
Feature Vector
Correction
Function
=
Figure 2. Registration and validation pipelines. Registration: Our method creates a fingerprint that consists of a hash
value, error correction information, and a random seed. Validation: A document is authenticated if a newly computed
hash value is identical to the one stored in the fingerprint. The stored error correction information is used to correct
potentially faulty bits in the feature vector.
The validation process has no access to the original
feature vector. Validating a document requires determining
a document’s feature vector anew, using the seed stored in
the fingerprint. Validation assumes a potentially flawed raw
feature vector V and uses the secure sketch to obtain an
error corrected V, as described in Section 3.3. The candidate
document is considered valid if this feature vector maps to
the same hash value stored in the fingerprint—that is, if
H(V) = H(V). The remainder of this section discusses the
Figure 3. Difference image between two 1200 DPI scans
registration and validation pipeline in detail.
showing the surface texture measured by the scanner in
the y direction. Actual size: “sum”.
3.1. Estimating document surface texture
To capture the surface texture of a document, we scan it
derive a novel photometric stereo solution for flatbed scanners,
at four orientations: 0◦, 90◦, 180◦, and 270◦. This allows
which provides us with information on surface orientation
recovery of the surface orientation for every sampled surface
without the need for dedicated calibration.
point. Our procedure assumes that paper is perfectly diffuse,
Let us define a coordinate system for the paper and the
which is an assumption that largely holds for near-orthogonal
scanner so that the paper lies in the xy-plane, the z-axis points
observation. Diffuse materials reflect a portion of the incident
away from the flatbed of the scanner, and the scanner’s linear
light that is proportional to the cosine of the angle between
light source is parallel to the x-axis. We approximate this light
the direction of incidence and the surface normal—that is,
source by a line segment extending from x1 to x2. We further
proportional to the dot product of these normalized directions.
assume that the light source is offset with respect to the line
This property is commonly exploited in photometric stereo
on the paper imaged by the CCD sensor (see Figure 1(a))
to reconstruct surface normals from multiple images under
by oy in the y-direction and by oz in the z-direction.
varying illumination by a point light source [5]. Similarly,
Each point on the paper has a normal n and a diffuse color,
we apply photometric stereo to the four captured scans.
or albedo, ρ. Without loss of generality, we concentrate on
Flatbed scanners, however, contain a linear light source,
a surface point at the origin of our coordinate system. The
rather than a point source, which disallows the application
observed intensity of such a surface point is then:
of traditional photometric stereo. Brown et al. recently
x2
(x, o
demonstrated how normals can be derived from flatbed scans
y, oz)
I = ρ
n,
dx ,
(1)
(x, o
under multiple orientations [6]. Their method, however, relies
x1
y, oz)
3
on an extensive calibration procedure, which would make
which is the integral over all light diffusely reflected off that
it impractical for authentication purposes. Instead, we will
surface point and originating from points (x, oy, oz) along
3

the linear light source. As every flatbed scanner is designed
spacing we draw these locations from a Voronoi distribution
for even illumination, any limiting effects near ends of the
[7]: we use the random seed stored in the fingerprint to
light source are negligible and we shall ignore the integral
initialize P pseudorandom start locations on the page and
limits in the remainder of this discussion.
use Lloyd’s Voronoi relaxation to obtain a set of locations
Scanning the same surface point a second time with the
distributed evenly across the document, as shown in Figure 4.
paper rotated by 180◦ displaces the light source from oy to
In principle one could now directly compare the patches
−oy. Subtracting the resulting two scans I0◦ and I180◦ from
of a document A to corresponding patches in a document
each other leads to:
B in order to verify two documents as being the same. The
disadvantages are that this requires access to the patches of B
dy = I0◦ − I180◦
when verifying A, which would require an amount of storage
(x, oy, oz)
(x, −oy, oz)
prohibitive for offline applications, and, more importantly,
= ρ
n,

dx
(x, oy, oz)
3
(x, −oy, oz)
3
that it would reveal the original document’s structure to a
(0, 2o
forger. Hence, we derive a compressed feature vector and
y, 0)
= ρ
n,
dx
store its hash along with a secure sketch to hide the feature
(x, oy, oz)
3
vector from an adversary.
2oy
= ny ρ
dx
Each patch contains 64 2-D samples di, i = 1, . . . , 64,
(x, oy, oz)
3
which we stack to create a patch vector p ∈ IR128. Each
= ny ρs .
(2)
patch contributes T bits to the feature vector. We compute
these feature bits f
That is, the difference d
i, i = 1, . . . , T , by subsequently comparing
y directly yields the y component ny
the patch vector to T template vectors t
of the surface normal n, multiplied by the albedo ρ and a
i. The template
vectors are a set of pseudorandomly chosen orthonormal
fixed constant s that is dependent on the scanner geometry
vectors in IR128 generated using the same seed that is used
only. Analogously, dx = I270◦ − I90◦ = nx ρs. With four scans
to determine patch locations: the t
we can determine each surface normal’s projection into to
i are initialized with vector
components drawn from a N(0, 1) distribution, followed by
xy-plane, n2 = (nx, ny), up to a scale. The factor s is assumed
Gram-Schmidt orthonormalization. Each template vector can
to be fairly constant across the page, and the remaining scale
be interpreted as a template patch of 8×8 2-vectors denoting
is given by the local surface reflectance ρ of the paper at
surface orientation.
any given location.
The comparison is performed by correlating the patch
Application of equation (2) requires precise alignment of
vector p and each template vector t
each surface point across all scans. To reduce the effect
i; i.e., by computing
the dot product
p, t
of alignment imprecision and to isolate frequencies of the
i . Positive correlation means that
surface orientations in the patch and the template patch
document that are stable across scans and different scanners,
we apply a low-pass filter to the document and down-sample
it. In our experiments we scanned each document at 1200
100 Locations Drawn from Voronoi Distribution
SPI (samples per inch) and down-sampled it by a factor of
1
eight, resulting in an effective resolution of 150 SPI.
0.9
After processing the four scans of a document, we recover
0.8
the surface texture as a two-dimensional vector field with d =
(dx, dy) = ρs n2 defined at each location of the document.
0.7
0.6
3.2. Computing the feature vector
0.5
0.4
From this vector field d we determine the feature vector
of the document. A good feature vector captures unique
Y-Axis of Unit Square 0.3
characteristics of a document, while at the same time being
0.2
concise. We model the feature vector V as an N-bit vector
0.1
of independent bits fi whose values are derived from the
surface normals of the document.
00
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
In contrast to previous approaches, we do not extract a
X-Axis of Unit Square
feature vector from a single region of the document, but we
compute the feature vector from a collection of representative
Figure 4. Sample Voronoi distribution of 100 points in the
subsections, patches, of the document. For documents down-
unit square. Voronoi distributions give relatively uniform
sampled to 150 SPI, we choose square patches of 8×8
coverage of a region, while simultaneously ensuring no
samples, centered at a series of random locations p
overlap of patches.
i. For even
4

agree; negative correlation denotes mostly opposing surface
to V, without revealing V to an adversary who does not have
orientations. The respective feature bit is determined by the
any information about V.
sign of the correlation:
Suppose the registered value for a document is an N-bit
1 + sign( p, t
value V, and we wish to accept any V within Hamming
i )
fi =
.
(3)
distance
2
δ N of V. The secure sketch proposed by Juels and
Wattenberg chooses a random codeword x from an error-
See Algorithm 1 for further illustration.
correcting code of length N that can correct δ N errors, and
stores ss(V) = V ⊕ x. To recover V from a candidate V, the
V = bool[PT ]
system calculates ˆ
x = ss(V) ⊕ V, corrects ˆ
x to the nearest
Retrieve surface orientation vectors of document
codeword, and verifies that H(V) = H(x ⊕ ss(V)). If V and
Extract P patches based on Voronoi distribution
V have Hamming distance less than δ N, the system correctly
for p = 1 to P do
outputs V.
template = generate new set of T pseudo-
Dodis et al. [9] show that the number of bits of information
random orthonormal template vectors
about the fingerprint revealed by the secure sketch is N − k,
for i = 1 to T do
where k = log K is the dimension of the error-correcting code
c =
patch[p], template[i]
used in the secure sketch when it has K codewords. Thus, in
f(p−1)P+i = TRUE if c > 0
order to maximize the security of the system for a fixed N,
end for
the error-correcting code should have as high a dimension k
end for
as possible.
Algorithm 1: Feature vector generation.
Low-Density Parity Check (LDPC) codes, along with turbo
codes, are among the strongest error-correcting codes in use,
The number of independent bits that can be extracted
thanks to efficient decoding algorithms developed in the last
from a patch in this way is limited and depends on the
two decades. In our implementation, we used the LDPC
amount of information contained in a patch. A standard
library written by Neal [11]. LDPC codes are well-suited
tool to characterize this amount of information is principal
to this application because they work well on large block
component analysis (PCA) [8]. We performed PCA on a large
sizes and in practice can often correctly decode beyond
set of randomly chosen patches from different documents.
the minimum distance of the code [12]. In addition, the
The results show that for 8×8-patches 75% of the information
LDPC decoding algorithm can take into account a confidence
can be expressed with only 32 principal components; that
level specified for each individual bit to further improve the
is, within a 32-dimensional subspace. We hence decided to
performance of the code. In our case, this confidence level
restrict ourselves to T = 32 of 128 possible orthonormal
can be calculated from the magnitude of the dot product of
template vectors, as additional template vectors are likely
to produce increasingly correlated feature bits. We further
choose 100 patches, P = 100, leading to a feature vector of
3,200 bits for each document.
Cumulative Energy of Eigenvalues of Template Basis Functions
1
0.9
3.3. Creating the document fingerprint
0.8
From the feature vector we can create a document
0.7
fingerprint that can be used to authenticate the document
0.6
without revealing information about the document. The
fingerprinting method should be both concise and robust to
0.5
errors. This situation is similar to that of biometrics, where a
0.4
user provides a value V which is close to, but not identical
0.3
to the registered value V (e.g., the Hamming distance is
% Information Accounted For
relatively small). Additionally, providing an adversary with
0.2
the full feature vector may not be desirable, as it provides a
0.1
blueprint for potential forgery.
A document fingerprint consists of a hash of the feature
00
20
40
60
80
100
120
Number of Templete Vectors
vector H(V), where H is a collision-resistant cryptographic
hash function, along with a secure sketch ss(V) following
the ideas of Dodis et al. [9] and Juels and Wattenberg [10].
Figure 5. Principal component analysis of a large num-
The secure sketch allows the system to correct any errors that
ber of 8×8-patches shows that 75% of the information
may occur in the candidate V, assuming V is close enough
has been extracted after 32 components.
5

% Error in Fingerprint vs. % Successful Decoding
Dot Product Magnitude vs. Error Probability of Bit
100
1000
800
90
500
400
300
80
200
70
lity
60
50
40
30
Bit Error Probabi
20
% Successful Decoding of Fingerprint
10
0
0
10
15
20
25
30
35
−5
−4
−3
−2
−1
0
1
2
3
4
5
% Error in Fingerprint
Dot Product Magnitude
Figure 6. Fraction of fingerprints successfully decoded
Figure 7. Correspondence between dot product magni-
for varying fingerprint error rates using LDPC codes of
tude and error probability during validation. (Fit to normal
different dimensions k.
distribution with µ=0 and σ =0.1314.)
the template vector with the patch vector. The correspondence
the fragility of a document fingerprint under various treatment
between the two is graphed in Figure 7.
conditions. Our goal is to test whether our technique typically
The length of our feature vector is N = 3200 bits. We
validates different observations of the same document (true
experimented with codes of suitable dimension to correct bit
positives) and rejects pairs of observations of different
error rates between δ = 10%, allowing correct identification
documents (true negatives), while rarely validating pairs
of all types of paper we experimented with under ideal
of observations of different documents (false positives) or
conditions (see Figure 8), and δ = 30%, suitable to identify
rejecting different observations of the same document (false
documents under less-ideal conditions such as soaking and
negatives).
scribbling (see Figure 9). For the case of decoding under
Our experiments show that a fingerprint can be found for
ideal conditions, a code of dimension k = 1000 and N = 3200
a variety of different types of paper. Unless otherwise noted,
is sufficient to correctly verify all test documents, with no
each experiment began with a document scanned at 1200
false positives. For the case of decoding under less ideal
DPI on an Epson Perfection v700 scanner. Each test focused
conditions, a code of dimension k = 300 and N = 3200
on a 3×3 inch square in the center of the page.1
sufficed to correctly verify 95% of all test documents, with
For each test, we captured five observations of a set of five
no false positives. See Figure 6 for a summary of these
documents, for a total of 25 observations. Each observation
results.
consisted of four scans taken at 0◦, 90◦, 180◦, and 270◦
The feature vector length can be adjusted to suit the needs
that we used to estimate surface normals. We expect no two
of the application (and the expected document treatment
scans of the same document to be exactly alike due to slight
conditions) by increasing or reducing the number of patches.
variations in placement on the scanner, random noise, and
Longer feature vectors provide a higher level of accuracy
other uncontrollable variables in the process.
when distinguishing two documents, especially under harsh
The amount of error tolerated in a matching fingerprint can
treatment, but require increased storage. We chose N = 3200
be adjusted by choosing an appropriate error-correcting code
bits as our feature vector length to ensure that it would fit
during the fingerprinting process described in Section 3. The
in a 2-D barcode.
number of bits that can be corrected by the code should be
determined by the needs of the application, as it establishes
4. Robustness
1. In our robustness experiments, we used a printed box on the test pages
Section 3 describes a process for registration and validation
to identify the region to be fingerprinted and to align the different scans.
of a document fingerprint. In this section we evaluate docu-
However, alignment could be accomplished by other means, such as by
ment fingerprints across different types of paper, including
relying on the boundaries of the page or other printed material, or by simply
recording a few patches at high resolution [13]. Different sets of patches
normal copy paper (Boise Aspen 50), university letterhead
should be used for alignment and verification, because using the same
with a visible watermark, and index cards. We also evaluate
patches could increase the false positive rate.
6

a tradeoff between the security of the system and the relative
4.2. Non-ideal handling conditions
likelihood (or harm) of a false positive or false negative.
The previous experiments were performed under ideal han-
4.1. Ideal handling conditions
dling conditions. We performed additional tests to ascertain
the robustness of fingerprints when the document is subjected
As a baseline test, we measured the frequency of correct
to less-than-ideal conditions. These tests included scribbling
and incorrect validation and rejection under ideal handling
on the paper, printing on it with ink, and soaking the page
conditions, when we expect no document degradation.
in water.
We began with 25 observations (5 from each of 5 docu-
ments). We chose 40 random seeds and sampled a fingerprint
Scribbling. We first scanned a set of five blank documents,
from each observation for each seed, following the process
then scanned them again after scribbling over them with a
described in Section 3. We made 25 = 300 comparisons
pen. In each document the scribble was unique, covering an
2
for each seed, yielding a total of 12, 000 comparisons.
average of 8% of the test region. In this test, 25 pre-scribble
For each comparison, we computed the Hamming distance
observations were compared against their 25 post-scribble
between the two fingerprints. These distances are summarized
counterparts, for a total of 625 pairs. We used 40 different
for the 12,000 comparisons by the histogram shown in the
fingerprint samples per document to yield a total of 25,000
top graph in Figure 8. Under non-adversarial conditions,
comparisons. The Hamming distances resulting from these
document fingerprints of normal copy paper differ, on average,
comparisons are plotted in the top graph in Figure 9. The
in only 99 (3.1%) of the 3200 bits. In contrast, as one would
sensitivity index in this case is lower (ds = 28.8), although the
expect, the average Hamming distance for fingerprints made
curves remain quite well-separated. With a decision threshold
from observations of different documents is 50% of the
of 1130 bit errors in the fingerprint, the chance of a false
bits. These distributions are well-separated; the maximum
positive or false negative is 1 in 1047.
Hamming distance between feature vectors from the same
document is 177, while the minimum distance between
Printing. In this experiment we printed single-spaced text
feature vectors of different documents is 1490. An error
in 12 pt. Times New Roman lettering over the test region,
tolerance anywhere in this range should give no false positives
covering approximately 13% of the area with ink. The
and no false negatives for these tests. We found similar results
distributions shown in the middle graph of Figure 9 were
for index cards and university letterhead; see Figure 8.
obtained as in the scribble test. Even in this experiment, in
The distributions for the “same” fingerprint comparison
which most patches used for the fingerprint were partially
tests and “different” fingerprint comparison tests seem to be
covered by ink, the sensitivity index is 26.1 and the chance
reasonably approximated by a normal distribution. Fitting
of a false positive or false negative at the crossover is 1 in
Gaussian curves to this data, we can find a summary statistic,
1038.
Egan’s sensitivity index for Gaussian distributions of signal
and noise with unequal variances, given by:
Wetting and drying. The bottom graph in Figure 9 shows
the resiliency of document fingerprints after the document
ds = 2(µ2 − µ1)/(σ1 + σ2)
(4)
was submerged in water for five minutes. We dried each test
where µ
document and ironed it until it was as flat as possible. Using
1, µ2, σ1 and σ2 are the means and standard
deviations of the distributions [14]. For this experiment,
the same evaluation protocol as for the scribble and printing
d
tests, we found that documents could still be validated with
s = 52.0. To give some intuition about the significance
of this statistic, the two Gaussians intersect at a Hamming
100% reliability, even with this fairly extreme mistreatment
distance of 731 bits; the heights of the curves are such that
of the page (ds = 33.3).
the chance of a single comparison resulting in either a false
These experiments demonstrate that our fingerprinting
positive or false negative is 1 in 10148. If we reduce the
method is robust when a document is handled under certain
feature vector length from N = 3200 bits to 1600, 800, or
rough conditions. The ability to identify a document before
400 bits, the probability of such errors is 1 in 1096, 1 in 1057,
and after it is printed on, scribbled on, or soaked in water
or 1 in 1035, respectively.
has many potential applications.
We repeated these experiments on different scanner models
and found similar results. When comparing a document
5. Security
fingerprinted on one model and verified on another, results
are slightly worse.2
The security of our method relies on the inability of an
attacker to reproduce the document’s surface, either because
2. We chose several of the parameters of our algorithm (e.g., the
he does not know what features to produce or because he
downsample factor and size of the patches) based on preliminary experiments
using the Epson v700 scanner. The optimal settings for verification of
cannot recreate the normal vectors at the required scale. The
documents using other scanner models may vary.
threat model of each application is determined by several
7

Histogram of Distances Beteween Fingerprints of Copy Paper
Histogram of Distances Between Fingerprints: After Scribbling
0.015
Mean: 99
Mean: 1600
0.015
Mean: 286
Mean: 1599
Std: 24.29
Std: 33.42
Std: 58.75
Std: 32.56

0.01
0.01
0.005
0.005
Probability of Hamming Distance
Probability of Hamming Distance
0
0
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
% Hamming distance between document fingerprints (length 3200 bits)
% Hamming distance between document fingerprints (length 3200 bits)
Histogram of Distances Between Fingerprints of Index Cards
Histogram of Distances Between Fingerprints: After Printing
0.018
Mean: 98
Mean: 1597
Mean: 562
Mean: 1600
Std: 35.45
Std: 28.16
0.016
Std: 49.77
Std: 29.84
0.016
0.014
0.014
0.012
0.012
0.01
0.01
0.008
0.008
0.006
0.006
Probability of Hamming Distance 0.004
0.004
Probability of Hamming Distance
0.002
0.002
0
0
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
% Hamming distance between document fingerprints (length 3200 bits)
% Hamming distance between document fingerprints (length 3200 bits)
Histogram of Distances Between Fingerprints: After Soaking
Histogram of Distances Between Fingerprints of Letterhead
Mean: 98
Mean: 1596
0.015
Mean: 570
Mean: 1604
0.016
Std: 31.93
Std: 29.86
Std: 31.36
Std: 32.64
0.014
0.012
0.01
0.01
0.008
0.006
0.005
0.004
Probability of Hamming Distance
Probability of Hamming Distance
0.002
0
0
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
% Hamming distance between document fingerprints (length 3200 bits)
% Hamming distance between document fingerprints (length 3200 bits)
Figure 8. Distributions of Hamming distances between
Figure 9.
Distributions of Hamming distances after
fingerprints for three paper types: copy paper (top), index
subjecting documents to non-ideal treatments: scribbling
cards (middle), and letterhead (bottom). In all graphs,
(top), printing (middle), and soaking in water (bottom).
the curve on the left depicts the distribution for scans of
The curves remain well separated even under these
the same document, while the curve on the right gives
adverse conditions.
the distribution for different documents.
8

factors: the availability of an original to the attacker, whether
“trusted” we mean that the device outputs a Boolean match/no-
verification is performed online or offline, and whether the
match result but does not leak any other information.
verification device is trusted. Under the most common threat
The secret information stored in the device could be the
models, our method should prevent an attacker from forging
public key of the registration entity. The seed stored in the
a copy of an original.
document fingerprint could be encrypted under the secret
Performing verification online or offline results in different
key of the registration entity. Therefore, knowledge of the
considerations. Here “online” means that the verification
fingerprint for a document does not reveal the patch locations.
device can communicate with a remote trusted server which
The hash of the feature vector could also be signed by the
can store data and perform computations; “offline” means
registration entity, preferably using a separate key. This allows
that the verification device cannot communicate, although it
only trusted devices to determine patch locations and verify
can be preprogrammed with a limited amount of information
the authenticity of a document. No access to the registration
such as a constant number of cryptographic keys. Online
entity is required, provided that the device has knowledge of
verification of a document has a straightforward solution,
the decryption and verification keys of the registration entity.
while offline verification requires security tradeoffs.
In this threat model the adversary does not know which
patches will be analyzed. This forces the attacker to recreate
5.1. Online verification
the surface normals across the entire document to ensure
verification of the document.
Online verification need not reveal in advance the patch
5.2.2. Offline: untrusted device, no access to original. In
locations that will be analyzed. This forces an attacker to
the next case, the verification device is offline and untrusted
reproduce the entire surface of a document before presenting
(i.e., it might leak everything it knows to the attacker) and
it for verification. In one approach, the verification server
the attacker has not seen the original document that he is
requests complete raw scans of the document at each of four
trying to forge. In this case, the attacker cannot forge the
orientations, which the server uses to perform the verification
document because he does not know anything useful about
algorithm. Under this construction, the verification server
the normal field that he must create. At most, he knows the
does not reveal the chosen patches.
fingerprint (if it is stored in the device) but this does not
In an alternative approach, the verification server provides
help him because the fingerprint is a secure sketch.
a fresh pseudorandom challenge to the client, and the client
uses the challenge to seed a pseudorandom generator which is
5.2.3. Offline: untrusted device, access to original. The
used to pick the patches and templates used in the verification
final case is the most challenging one, where the verification
algorithm. The client then computes the feature vector and
device is offline and untrusted, and the attacker has access to
sends it to the server. The server, having computed the same
an original document that he wants to copy. Because there are
feature vector on its stored scans of the original document,
no secrets from the attacker—he sees the original document
verifies that the two feature vectors are similar enough.
including anything printed on it, and he knows the full state
In this threat model an attacker does not know a priori
of the device—the attacker knows exactly which patches will
which patch locations on a document will be sampled. This
be used for verification and how the feature vector will be
forces an attacker to reproduce the surface texture of the
computed from those patches. The attacker’s only task is to
document at each sample point in order to pass a counterfeit
make a document that will generate a feature vector close
as an original.
enough to the original. Whether the attacker can do this is
the subject of the next section.
5.2. Offline verification
6. Forging a Document
The security of offline verification depends on whether
the verification client is trusted and on the availability of
Suppose an attacker has seen an original document and
an original to the attacker. In the offline case, we assume
wants to create a second document that will pass verification
that the fingerprint of the legitimate original document is
as the original. The attacker will start with an arbitrary piece
either pre-stored on the client or is printed onto the document
of paper which, by assumption, will have a very different
(perhaps as a 2-D barcode) along with the authority’s digital
feature vector from the original. He will then try to modify
signature of the fingerprint. In either case, the client device
the target paper so that its feature vector is close to that
checks the document against a known fingerprint.
of the original document. To do this, the attacker needs to
make fine-grained modifications to the document’s surface
5.2.1. Offline: trusted device. Currency and ticket coun-
normals. This might be done via lithography or photographic
terfeit detection at banks and concerts are two important
techniques, but these will be expensive and will probably
examples of offline verification with a trusted device. By
require special equipment. The most effective, economical
9

way to control the surface, we believe, is to print on the
document.
Equation (2) shows that the fingerprinted vector (dx, dy)
contains additional factors s and ρ. The scanner-dependent
factor s can be assumed to be fairly constant across the page
and hence has no influence on the sign of the correlation
results in the feature vector generation. The remaining scale
is given by the local surface reflectance ρ of the paper at a
given location, which should be stable across multiple scans.
On empty paper it is nearly constant; in the presence of
print, ρ is greatly attenuated, which lessens the influence of
the printed portion onto the correlation result. The adversary
can try to control bits of the feature vector by printing dark
ink at selected points in order to reduce their influence in
the correlation calculations. Besides reducing ρ, printing at
a point tends to flatten the document surface, as shown in
Figure 3.
Figure 10. The smallest dots that can be produced by
An adversary who aims at forging a document might
our test printer are 1/240 inch—20 samples wide in this
try to leverage these effects by printing a set of carefully
4800 SPI scan—despite the printer’s nominal 2400 DPI
placed dots, either to cause the surface texture of a candidate
positional accuracy.
document to express the same fingerprint as an original,
or to down-weight unfavorable contributions to the patch
correlation. To do this the forger must overcome two hurdles:
printing dots on the page at desirable locations and/or printing
caused by a dot. We performed an experiment where we
dots with favorable surface normal vectors. Dark ink on a
printed a series of black dots in a region of the document.
document would directly affect reflectivity, while light ink
We identified the black dots and measured the normal vectors
might solely change the normal vectors at a specific location.
in the surrounding region. For each printed dot, the desired
We assume that the adversary uses commercially available
normal vector of a location occurred on only one point of
equipment and is able to print dots in any color from black
the surface.
to white. He has less control over the exact shape of the
The bottom-line question is how many degrees of freedom
dots, which varies by printing technology and type of paper.
the adversary has in controllably modifying the normal vector
We conducted experiments to characterize the ability of a
field in a patch. Given the linear transformation used to
forger to precisely control the effect of a printed pattern. We
determine each feature vector bit, the adversary will likely
measured the effective resolution—the number of distinct
be able to achieve a desired set of feature vector values if
printable dots—for a high-end office printer, a Xerox Phaser
he has enough degrees of freedom.
8550, with a nominal resolution of 2400x2400 DPI. The
effective resolution is limited by dot gain, which causes
If there are N feature vector bits, and each bit is computed
printed dots to become larger than intended (see Figure
as the sign of the correlation of the normal field with a
10) due to factors such as the viscosity of the ink and the
random vector field, then a truly random normal field value
absorbency of the paper. The smallest dots that the test printer
would match all N feature vector bits with probability 2−N .
could produce on a normal piece of copy paper are 1/240
However, it is likely that the feature vector bits are not fully
inch, or 20x20 samples when scanned at 4800 SPI. This
independent. Although we have some evidence about the
limits the effective resolution to 240 DPI. On the other hand,
degree of independence (see, e.g., Figure 5), we do not have
the positional accuracy of the printer seems closer to the rated
a precise estimate of how much entropy is encoded in the
2400 DPI. We conclude that a forger could use commodity
feature vector.
printers to print dots with positional accuracy similar to what
We are thus left with an open question: does the amount
commodity scanners can measure but size much greater than
of information in a patch, as encoded in a feature vector,
the scanner’s pixels.
exceed the adversary’s ability to inject information into the
Because printed dots typically span more than one sample
patch? If we knew the answer to this question, we could
in a patch, printing a dot at a specific location affects the
state with some confidence whether an adversary could forge
neighboring surface normal vectors in unpredictable and
a document in the most favorable case (for the adversary),
uncontrollable ways. Due to paper variations as well as
where the adversary sees the original document and the
limited precision in the placement and viscosity of ink, the
verification device is offline and untrusted. Unfortunately, we
forger does not have precise control over the normal vectors
have to leave this question for future work.
10

7. Applications
painted. Thus art authenticity or forgery might be detectable
by applying a technique like ours to the canvas, most probably
There are a large number of applications that could
on the back side of the painting.
benefit from the ability to uniquely identify a document.
Lottery tickets are similar to currency except that players
Many situations where physical possession of an item must
need not be aware of a fingerprinting technique at all. In
be verified or authentication of an item is required could
order for a lottery winner to collect on their winnings, the
fruitfully employ our technique. Currency, ticket, and art
ticket must be verified by the lottery authority. The fingerprint
counterfeit detection, as well as verification of product
of a winning ticket need not be printed on the document at
packaging are some of the applications where physical
all. Fingerprints of all possible winning lottery tickets can
document authentication is desirable.
be privately maintained, and any claimants can be required
Counterfeit currency detection is one obvious application.
to produce the actual winning ticket, with correctly verified
The financial impact of counterfeit currency is large. Esti-
fingerprint, in order to collect their winnings.
mates of annual global revenue loss range from $250- to
The accurate identification of paper based product pack-
$500 billion [15], [16]. The ability to authenticate bills could
aging could benefit from this technique as well. When
change the way currency is produced and fraud is detected.
inspecting cargo, customs officials often inspect the contents
Such a system would begin during currency production. The
of packages to weed out counterfeit goods. We can increase
government would generate a fingerprint for each bill. This
confidence in package contents by authenticating a product’s
fingerprint could be stored in a database, along with the bill’s
packaging. If the packaging of a product is legitimate, then
serial number, or the government could digitally sign the
the contents of the package have a much higher likelihood
fingerprint and print the fingerprint and signature on the bill.
of being authentic.
Any party wishing to verify a particular bill would scan the
bill and verify that the fingerprint matched the one signed
8. Privacy Implications
by the government. The authentication of a bill could be
performed offline or online. Businesses and banks accepting
The feasibility of paper-based authentication demonstrates
large cash deposits could verify the currency was legitimate
that some undesirable attacks are possible. Because our
before completing the transaction. Offline authentication
results do not modify the paper in any way, there is no way to
could be performed provided that the verification device
detect, by inspecting a piece of paper, whether its fingerprint
had the public key of the currency issuer.
might have been recorded in advance by an adversary. This
Ticket forgery at major concerts and sporting events is
fact violates the traditional assumption that pieces of paper
another large black-market business. Counterfeit event passes
cannot easily be traced without the addition of distinguishing
were widespread at the 2008 Beijing Olympics [17], and
marks. Even unopened sheaves of blank printer paper might in
a British website recently sold more than $2.5 million in
principle have been fingerprinted at the factory. Applications
fake tickets [18]. The ability for purchasers to verify the
such as paper-based voting, in which the secrecy of individual
authenticity of tickets prior to purchase could greatly reduce
ballots is important, are challenged by our results.
the prevalence of online ticket fraud. Trust in ticket purchases
For example, consider an optical-scan voting system in
on websites such as Stub Hub and eBay could be dramatically
which voters fill out paper ballots. In such a system, the
increased if the seller had to prove access to the item being
secrecy of ballots relies on the assumption that individual
auctioned or sold. Ticket clearing houses such as Ticketmaster
paper ballots are indistinguishable. Our work shows that this
could maintain an online database of fingerprints for all
assumption may not be valid.
purchased tickets. Any party selling a ticket could scan and
A corrupt official could scan the blank ballots in advance
upload the ticket to Ticketmaster and receive verification of
and record a unique fingerprint for each ballot in the stack.
authenticity.
If ballots are given out to voters in a predictable order (e.g.,
Forgery of artwork is a black-market business where the
from the top of the stack down) and the order of voters is
application of our technique may not be initially obvious.
recorded, as it is in many polling places, or observable by the
European police estimate that over half of the works in
attacker, then ballots can be re-identified after the election.
international markets are forgeries [19]. One family of art
Even worse, because pre-scanning leaves no evidence on
forgers was able to make $2 million before they were
the ballots themselves, a mere rumor that someone might
caught. The ability of art forgers to reproduce the individual
have scanned the ballots in advance would be very difficult
brush strokes of a work makes authenticating paintings
to disprove, and such a rumor would make coercion and
increasingly difficult. In the best forgeries, art verifiers must
vote-buying more credible.
sometimes rely on the chain of custody of the work in order
More generally, the ability to re-identify ordinary sheets
to authenticate it [20]. However, we believe that it would
of paper casts doubt on any purportedly private information
be difficult to duplicate features of the canvas (down to the
gathering process that relies on paper forms. “Anonymous”
detailed arrangement of the weave) upon which the work is
surveys or reporting systems may not in fact be anonymous.
11

Though it has long been possible to track sheets of
[3] E. Metois, P. Yarin, N. Salzman, and J. R. Smith, “FiberFin-
paper using subtle chemical markers or “invisible ink,” these
gerprint identification,” in Proc. 3rd Workshop on Automatic
methods require some level of special expertise, and the
Identification, 2002, pp. 147–154.
presence of markers leaves evidence of the attack. Our
[4] R. P. Cowburn and J. D. R. Buchanan, “Verification of
research shows that an attacker armed with only ordinary
authenticity,” US patent application 2007/0028093, Jul. 2006.
equipment—a commodity scanner—is able to re-identify
paper reliably without leaving any telltale marks.
[5] R. Woodham, “Photometric stereo: A reflectance map tech-
nique for determining surface orientation from image intesity,”
in Proc. 22nd SPIE Annual Technical Symposium, vol. 155,
9. Conclusion and Future Work
1978, pp. 136–143.
[6] B. Brown, C. Toler-Franklin, D. Nehab, M. Burns, A. Vla-
Our work shows that ordinary pieces of paper can be
chopoulos, C. Doumas, D. Dobkin, S. Rusinkiewicz, and
fingerprinted and later identified using commodity desktop
T. Weyrich, “A system for high-volume acquisition and
scanners. The technique we developed functions like a
matching of fresco fragments: Reassembling Theran wall
“biometric” for paper and allows original documents to be
paintings,” ACM Trans. Graphics (Proc. SIGGRAPH 2008),
securely and reliably distinguished from copies or forgeries.
p. 84 (9 pp.), Aug. 2008.
At least two questions remain to be answered in future
[7] A. Okabe, B. Boots, K. Sugihara, and S. N. Chiu, Spatial
work. First, in the threat model where the adversary has
Tesselations: Concepts and Applications of Voronoi Diagrams.
access to the original document and the fingerprint, we do
Wiley, 2000.
not know for certain that a clever adversary cannot forge
a copy of the document with a high-resolution printer. Our
[8] J. E. Jackson, A User’s Guide to Principal Component Analysis.
Wiley-Interscience, 2003.
initial work could not determine conclusively whether an
adversary who can use a good printer will have enough
[9] Y. Dodis, R. Ostrovsky, L. Reyzin, and A. Smith, “Fuzzy
degrees of freedom in modifying a document to make the
extractors: How to generate strong keys from biometrics and
document match a known fingerprint. Second, while we
other noisy data,” SIAM Journal on Computing, vol. 38, no. 1,
conjecture that our method can be applied to other materials
pp. 97–137, 2008.
such as fabric, more testing is needed to verify this, and
[10] A. Juels and M. Wattenberg, “A fuzzy commitment scheme,” in
special methods might be needed for some materials. We
Proc. 6th ACM Conference on Computer and Communications
leave both of these questions for future work.
Security, 1999, pp. 28–36.
Our results are a tribute to the resolution of today’s
scanners. Future scanners will capture ever more detailed
[11] R. M. Neal. (2006, Feb.) Software for low density parity
check codes. [Online]. Available: http://www.cs.utoronto.ca/
images of paper documents, eventually allowing individual
∼radford/ldpc.software.html
wood fibers to be imaged clearly. The security of our methods
against forgery, in cases where the adversary has full access
[12] D. MacKay and R. Neal, “Near shannon limit performance of
to information, will depend ultimately on a race between the
low density parity check codes,” Electronics Letters, vol. 33,
resolution of the printers that can create patterns on a page,
no. 6, pp. 457–458, Mar. 1997.
and the resolution of the scanners that can observe patterns.
[13] C. Sorzano, P. Thevenaz, and M. Unser, “Elastic registration
of biological images using vector-spline regularization,” IEEE
10. Acknowledgments
Trans. Biomedical Engineering, vol. 52, no. 4, pp. 652–663,
Apr. 2005.
We thank Andrew Appel, Victoria Hill, Andrew Moore, N.
[14] D. McNicol, A Primer on Signal Detection Theory. Lawrence
J. A. Sloane, Joshua R. Smith, and the anonymous reviewers
Erlbaum Assoc., 2004.
for their invaluable suggestions and assistance.
[15] L. S. Amine and P. Magnusson, “Cost-benefit models of stake-
holders in the global counterfeiting industry and marketing
References
response strategies,” Multinational Business Review, vol. 15,
no. 2, pp. 1–23, 2007.
[1] J. D. R. Buchanan, R. P. Cowburn, A.-V. Jausovec, D. Petit,
[16] U.S. Department of Commerce. Top 10 ways to protect
P. Seem, G. Xiong, D. Atkinson, K. Fenton, D. A. Allwood,
yourself from counterfeiting and piracy. [Online]. Available:
and M. T. Bryan, “Forgery: ‘fingerprinting’ documents and
http://www.stopfakes.gov/pdf/Consumer Tips.pdf
packaging,” Nature, vol. 436, p. 475, 2005.
[17] C. Balmer and K. Wills, “Beijing games hit by internet ticket
[2] B. Zhu, J. Wu, and M. S. Kankanhalli, “Print signatures for
scam,” Reuters, Aug. 4, 2008.
document authentication,” in Proc. 10th ACM Conference on
Computer and Communications Security, 2003, pp. 145–154.
[18] “Ticket site closed on fraud fears,” BBC News, Oct. 21, 2008.
12

[19] J. L. Shreeve, “Art forgers: What lies beneath,” The Indepen-
Patch-pair comparisons. Recall from Algorithm 1 that the
dent, Sep. 3, 2008.
vector p contributes K bits to the overall fingerprint by taking
the signs of the dot product of p and a series of ortho-normal
[20] R. D. Spencer, The Expert versus the Object: Judging Fakes
template vectors. We have also considered (and implemented)
and False Attributions in the Visual Arts.
Oxford University
an alternate version of the algorithm where the bits of the
Press, 2004.
fingerprint are taken to be the signs of the dot products of
[21] F. A. P. Petitcolas, R. J. Anderson, and M. G. Kuhn,
pairs of patches p and q. The na¨ıve version divides the pool
“Information hiding—a survey,” Proc. IEEE, vol. 87, no. 7,
of patch positions into pairs and computes one bit of the
pp. 1062–1078, Jul. 1999.
feature vector from each pair. Unfortunately that approach
allows an attacker to tweak each pair in turn independently.
Appendix
A more robust version considers bits from all patch pairs
(p, q) where p = q. For example, for 64 patches each patch
would participate in 63 bits, and this scheme could generate
Section 3 introduces a process for fingerprinting and
64
= 2016 total bits.
verifying the fingerprint of a document. In this appendix
2
In the case where a forger has a copy of an original
we briefly outline some alternative strategies that might be
document and therefore knows the fingerprint he is trying to
desirable under different criteria for robustness or different
reproduce (Section 5.2.3), this formulation has the advantage
levels of concern about forgery.
that the bits of the fingerprint are more tightly bound than
those of the template vectors. Any attack on a single bit—for
Using albedo versus normals. Because the high-resolution
example, printing on a patch—is likely to impinge on the
paper scans shown in figures throughout this paper reveal
other (62) bits affected by that patch. Thus, a forger would
obvious color variation in addition to surface texture, perhaps
have to solve an optimization problem to figure out how best
a more straightforward approach would be to use the albedo
to perform the attack.
(color) of the page as the basis for a fingerprint, rather than, or
However, the bits of the fingerprint generated from all patch
in addition to, the shape. Indeed, our initial implementations
pairs seem to be less independent than the bits generated
explored this approach, using a single scan (which combines
by the template vectors. Preliminary experiments similar to
albedo and normal information) to construct the fingerprint
those described in Section 4 indicate that “all-pairs” bits are
of a document. This approach is simpler and offers the
mostly independent, but not as independent as the “template”
substantial benefit that the document can be fingerprinted or
bits. Since the arguments in Section 5.2.2 for security against
verified more quickly, through a single scan.
“blind” attackers rely on bit independence, we generally prefer
The intensities of most of the pixels in a scanned page are
the “template” scheme.
modeled well by a truncated normal distribution, centered
around the “white” color. To use this data as the basis
Short fingerprints with no error-correcting information.
for a fingerprint, we simply construct the vector p as the
Section 3.2 describes a process for generating fingerprints
concatenation of these intensities from a given patch. For
composed of a hash of 3200 or more bits concatenated
example, an 8 × 8 patch would yield a vector p ∈ IR64. The
with some error correction bits. For some applications
fingerprint is then extracted from a collection of patches as
requiring less security, fewer feature vector bits may be used.
described in Algorithm 1.
Suppose only 100 bits are used, and further suppose that the
We did not pursue this approach because we believe this
application tends to produce fewer bit errors (say 15% or less).
form of fingerprint may not resist forgers who use very
In such scenarios an alternate approach would be to simply
light ink to print a desired pattern on the page. Another
record the secure hash of those bits. An attacker, without
drawback is that any black ink on the page, which lies well
the benefit of the original, is forced to guess among 2100 bit
outside the roughly-normal distribution of intensities found in
sequences, checking guesses against the hash. Unfortunately,
blank paper, contributes to a very strong negative value in p,
this leaves the na¨ıve authentication process with no way to
introducing a bias in the dot products for the patch. Thus, any
do error correction other than to guess among the roughly
value outside the range of the truncated normal distribution
1017 strings within Hamming distance of 15 of the sequence
must be zeroed out before constructing the fingerprint. This
extracted from a page—easier, but also daunting.
provides another opportunity for a forger to deliberately zero
Fortunately, there is a better approach for the authentication
out regions of the patch with the goal of flipping bits towards
process. Recall that the bits of the fingerprint are taken as
a desired fingerprint. These attacks might be difficult to carry
the signs of a series of dot products (patches and templates).
out in practice, since they require excellent registration in the
We have observed that these dot products are well-modeled
printing process. Therefore, albedo-based fingerprints may
by samples from a truncated normal distribution. Moreover,
be suitable for applications where some added risk of forgery
we have also observed that the flipped bits mostly come
is an acceptable tradeoff for increased speed and simplicity.
from dot products near zero, and that the bit-flipping process
13

seems to be well-modeled by the addition of “noise” also
The benefits of this approach are that it is simple to
selected from a truncated normal distribution (with smaller
implement and provides no information to an attacker in
standard deviation than that of the “signal”). With this model
the form of error-correction bits. The main disadvantages
in hand, the verification process can search for bit strings
are that it does not scale well to longer bit sequences and
similar to the extracted fingerprint while taking into account
that the stochastic nature of the algorithm provides only
which bits are more likely to have flipped. Specifically, the
probabilistic guarantees of running time. Therefore, it would
process flips bits at random with probability relative to the
likely be used only in conjunction with other approaches.
likelihood that the bit has flipped, each time checking against
For example, an application might attempt this method for
the secure hash. We simulated this approach and found that
offline verification and fall back to an online method in cases
about 90% of the time it will find the correct string within
when it fails.
106 guesses for the example distribution described above.
14

Document Outline