Original PDF Flash format meme-tracking-and-the-dynamics-of-the-news-cycle  


Meme Tracking And The Dynamics Of The News Cycle

Meme-tracking and the Dynamics of the News Cycle
∗†


Jure Leskovec
Lars Backstrom
Jon Kleinberg


Cornell University
Stanford University
jure@cs.cornell.edu
lars@cs.cornell.edu
kleinber@cs.cornell.edu
ABSTRACT
abilistic term mixtures have been successful at identifying long-
Tracking new topics, ideas, and “memes” across the Web has been
range trends in general topics over time [5, 7, 16, 17, 30, 31]. At the
an issue of considerable interest. Recent work has developed meth-
other extreme, identifying hyperlinks between blogs and extracting
ods for tracking topic shifts over long time scales, as well as abrupt
rare named entities has been used to track short information cas-
spikes in the appearance of particular named entities. However,
cades through the blogosphere [3, 14, 20, 23]. However, between
these approaches are less well suited to the identification of content
these two extremes lies much of the temporal and textual range
that spreads widely and then fades over time scales on the order of
over which propagation on the web and between people typically
days — the time scale at which we perceive news and events.
occurs, through the continuous interaction of news, blogs, and web-
We develop a framework for tracking short, distinctive phrases
sites on a daily basis. Intuitively, short units of text, short phrases,
that travel relatively intact through on-line text; developing scalable
and “memes” that act as signatures of topics and events propagate
algorithms for clustering textual variants of such phrases, we iden-
and diffuse over the web, from mainstream media to blogs, and vice
tify a broad class of memes that exhibit wide spread and rich vari-
versa. This is exactly the focus of our study here.
ation on a daily basis. As our principal domain of study, we show
Moreover, it is at this intermediate temporal and textual granular-
how such a meme-tracking approach can provide a coherent repre-
ity of memes and phrases that people experience news and current
sentation of the
events. A succession of story lines that evolve and compete for at-
news cycle — the daily rhythms in the news media
that have long been the subject of qualitative interpretation but have
tention within a relatively stable set of broader topics collectively
never been captured accurately enough to permit actual quantitative
produces an effect that commentators refer to as the news cycle.
analysis. We tracked 1.6 million mainstream media sites and blogs
Tracking dynamic information at this temporal and topical resolu-
over a period of three months with the total of 90 million articles
tion has proved difficult, since the continuous appearance, growth,
and we find a set of novel and persistent temporal patterns in the
and decay of new story lines takes place without significant shifts
news cycle. In particular, we observe a typical lag of 2.5 hours
in the overall vocabulary; in general, this process can also not be
between the peaks of attention to a phrase in the news media and
closely aligned with the appearance and disappearance of specific
in blogs respectively, with divergent behavior around the overall
named entities (or hyperlinks) in the text. As a result, while the
peak and a “heartbeat”-like pattern in the handoff between news
dynamics of the news cycle has been a subject of intense interest to
and blogs. We also develop and analyze a mathematical model for
researchers in media and the political process, the focus has been
the kinds of temporal variation that the system exhibits.
mainly qualitative, with a corresponding lack of techniques for un-
dertaking quantitative analysis of the news cycle as a whole.
Our approach to meme-tracking, with applications to the news
1.
INTRODUCTION
cycle. Here we develop a method for tracking units of information
as they spread over the web. Our approach is the first to scalably
A growing line of research has focused on the issues raised by
identify short distinctive phrases that travel relatively intact through
the diffusion and evolution of highly dynamic on-line information,
on-line text as it evolves over time. Thus, for the first time at a large
particularly the problem of tracking topics, ideas, and “memes” as
scale, we are able to automatically identify and actually “see” such
they evolve over time and spread across the web. Prior work has
textual elements and study them in a massive dataset providing es-
identified two main approaches to this problem, which have been
sentially complete coverage of on-line mainstream and blog media.
successful at two correspondingly different extremes of it. Prob-
Working with phrases naturally interpolates between the two ex-
This research was supported in part by the MacArthur Foundation, a
tremes of topic models on the one hand and named entities on the
Google Research Grant, a Yahoo! Research Alliance Grant, and NSF grants
other. First, the set of distinctive phrases shows significant diversity
CCF-0325453, CNS-0403340, BCS-0537606, and IIS-0705774.
over short periods of time, even as the broader vocabulary remains
relatively stable. As a result, they can be used to dissect a general
topic into a large collection of threads or memes that vary from day
to day. Second, such distinctive phrases are abundant, and there-
Permission to make digital or hard copies of all or part of this work for
fore are rich enough to act as “tracers” for a large collection of
personal or classroom use is granted without fee provided that copies are
memes; we therefore do not have to restrict attention to the much
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
smaller collection of memes that happen to be associated with the
republish, to post on servers or to redistribute to lists, requires prior specific
appearance and disappearance of a single named entity.
permission and/or a fee.
From an algorithmic point of view, we consider these distinctive
KDD ’09 Paris, France
phrases to act as the analogue of “genetic signatures” for different
Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00.

memes. And like genetic signatures, we find that while they remain
gests a way of identifying sites that are regularly far ahead of the
recognizable as they appear in text over time, they also undergo sig-
bulk of media attention to a topic.
nificant mutation. As a result, a central computational challenge in
Further related work. In addition to the range of different ap-
this approach is to find robust ways of extracting and identifying all
proaches for tracking topics, ideas, and memes discussed above,
the mutational variants of each of these distinctive phrases, and to
there has been considerable work in computer science focused on
group them together. We develop scalable algorithms for this prob-
news data in particular. Two dominant themes in this work to date
lem, so that memes end up corresponding to clusters containing all
have been the use of algorithmic tools for organizing and filtering
the mutational variants of a single phrase.
news; and the role of blogging and the production of news by indi-
As an application of our technique, we use it to produce some
viduals rather than professional media organizations. Some of the
of the first quantitative analysis of the global news cycle. To do
key research issues here have been the identification of topics over
this, we work with a massive set of 90 million news and blog arti-
time [5, 11, 16], the evolving practices of bloggers [25, 26], the
cles that we collected over the final three months of the 2008 U.S.
cascading adoption of stories [3, 14, 20, 23], and the ideological di-
Presidential Election (starting August 1).1 In this context, the col-
visions in the blogosphere [2, 12, 13]. This has led to development
lection of distinctive phrases that will act as tracers for memes are
of a number of interesting tools to help people better understand
the set of quoted phrases and sentences that we find in articles —
the news (e.g. [5, 11, 12, 13, 16]).
that is, quotations attributed to individuals. This is natural for the
Outside of computer science, the interplay between technology,
domain of news: quotes are an integral part of journalistic prac-
the news media, and the political process has been a focus of con-
tice, and even if a news story is not specifically about a particular
siderable research interest for much of the past century [6, 22]. This
quote, quotes are deployed in essentially all articles, and they tend
research tradition has included work by sociologists, communica-
to follow iterations of a story as it evolves [28]. However, each in-
tion scholars, and media theorists, usually at qualitative level ex-
dividual quote tends to exhibit very high levels of variation across
ploring the political and economic contexts in which news is pro-
its occurrence in many articles, and so the aspects of our approach
duced [19], its effect on public opinion , and its ability to facilitate
based on clustering mutational variants will be crucial.
either polarization or consensus [15].
Thus, our analysis of the news cycle will consist of studying the
An important recent theme within this literature has been the
most significant groups of mutational variants as they evolve over
increasing intensity of the news cycle, and the increasing role it
time. We perform this analysis both at a global level — under-
plays in the political process. In their influential book Warp Speed:
standing the temporal variation as a whole — and at a local level —
America in the Age of the Mixed Media, Kovach and Rosenstiel dis-
identifying recurring patterns in the growth and decay of a meme
cuss how the excesses of the news cycle have become intertwined
around its period of peak intensity. At a global level, we find a
with the fragmentation of the news audience, writing, “The classic
structure in which individual memes compete with another over
function of journalism to sort out a true and reliable account of the
short time periods, producing daily and weekly patterns of varia-
day’s events is being undermined. It is being displaced by the con-
tion. We also show how the temporal patterns we observe arise nat-
tinuous news cycle, the growing power of sources over reporters,
urally from a simple mathematical model in which news sources
varying standards of journalism, and a fascination with inexpen-
imitate each other’s decisions about what to cover, but subject to
sive, polarizing argument. The press is also increasingly fixated on
recency effects penalizing older content. This combination of imi-
finding the ’big story’ that will temporarily reassemble the now-
tation and recency can produce synthetic temporal patterns resem-
fragmented mass audience” [19]. In addition to illuminating their
bling the real data; neither ingredient alone is able to do this.
effect on the producers and consumers of news, researchers have
At a local level, we identify some of the fine-grained dynamics
also investigated the role these issues play in policy-making by gov-
governing how the intensity of a meme behaves. We find a charac-
ernment. As Jayson Harsin observes, over time the news cycle has
teristic spike around the point of peak intensity; in both directions
grown from being a dominant aspect of election campaign season
away from the peak the volume decreases exponentially with time,
to a constant feature of the political landscape more generally; “not
but in an 8-hour window of time around the median, we find that
only,” he writes, “are campaign tactics normalized for governing
volume y as a function of time t behaves like y(t) ≈ a log(t). This
but the communication tactics are themselves institutionally influ-
function diverges at t = 0 — indicating an explosive amount of ac-
enced by the twenty-four hour cable and internet news cycle” [15].
tivity right at the peak period. Further interesting dynamics emerge
Moving beyond qualitative analysis has proven difficult here, and
when one separates the websites under consideration into two dis-
the intriguing assertions in the social-science work on this topic
tinct categories — news media and blogs. We find that the peak of
form a significant part of the motivation for our current approach.
news-media attention of a phrase typically comes 2.5 hours earlier
Specifically, the discussions in this area had largely left open the
than the peak attention of the blogosphere. Moreover, if we look
question of whether the “news cycle” is primarily a metaphorical
at the proportion of phrase mentions in blogs in a few-hour win-
construct that describes our perceptions of the news, or whether
dow around the peak, it displays a characteristic “heartbeat”-type
it is something that one could actually observe and measure. We
shape as the meme bounces between mainstream media and blogs.
show that by tracking essentially all news stories at the right level
We further break down the analysis to the level of individual blogs
of granularity, it is indeed possible to build structures that closely
and news sources, characterizing the typical amount by which each
match our intuitive picture of the news cycle, making it possible to
source leads or lags the overall peak. Among the “fastest” sources
begin a more formal and quantitative study of its basic properties.
we find a number of popular political blogs; this measure thus sug-
1
2.
ALGORITHMS FOR CLUSTERING
This is of course a period when news coverage was particularly
high-intensity, but it gives us a chance to study the news cycle over
MUTATIONAL VARIANTS OF PHRASES
precisely the kind of interval in which people’s general intuitions
We now discuss our algorithms for identifying and clustering
about it are formed — and in which it is enormously consequential.
In the latter regard, studying the effect of communication technol-
textual variants of quotes, capable of scaling to our corpus of roughly
ogy on elections is a research tradition that goes back at least to the
a hundred million articles over a three-month period. These clus-
work of Lazarsfeld, Berelson, and Gaudet in the 1940s [22].
ters will then form the basic objects in our subsequent analysis.

pal around with terrorists who targeted their own country
terrorists who would target their own country
palling around with terrorists who target their own country
palling around with terrorists who would target their own country
sees america as imperfect enough to pal around with terrorists who targeted their own country
we see america as a force of good in this
someone who sees america as imperfe
that he s palling around with terrorists who would target their own country
a force for good in the world
world we see an america of exceptionalism
around with terrorists who targeted th
we see america as a force for good in this world we see america as
imperfect enough that he s palling around
this is someone who sees america as impe
is palling around with terrorists
a force for exceptionalism our opponents see america as imperfect
with terrorists who would target their country
around with terrorists who targeted th
enough to pal around with terrorists who would bomb their own country
our opponent is someone who sees america as imperfect enough to pal around with
as being so imperfect he is palling around with terrorists who would target their own country
terrorists who targeted their own country
someone who sees america it seems as being so imperfect that he s palling around
our opponent is someone who sees america as imperfect enough to pal around with
with terrorists who would target their own country
terrorists who target their own country
imperfect imperfect enough that
is someone who sees america it seems as being so imperfect that he s palling
ld target their own country
around with terrorists who would target their own country
perfect imperfect enough that
our opponent is someone who sees america it seems as being so imperfect that
this is not a man who sees america as you see america and as i see america
would target their own country
he s palling around with terrorists who would target their own country
s as being so imperfect enough
our opponent though is someone who sees america it seems as being so imperfect
this is not a man who sees america as you see it and how i see america
uld target their own country
that he s palling around with terrorists who would target their own country
america it seems as being so imperfect
this is not a man who sees america as you see it and how i see america we see
Figure 1: A small portion of the full set of variants of Sarah Palin’s quote, “Our opponent is someone who sees America, it seems,
as being so imperfect, imperfect enough that he’s palling around with terrorists who would target their own country.” The arrows
indicate the (approximate) inclusion of one variant in another, as part of the methodology developed in Section 2.
8
4
phrases with this property are exclusively produced by spammers.
13
1
(We use ε = .25, L = 4, and M = 10 in our implementation.)
9
After this pre-processing, we build a graph G on the set of quoted
5
phrases. The phrases constitute the nodes; and we include an edge
2
10
14
(p, q) for every pair of phrases p and q such that p is strictly shorter
6
than q, and p has directed edit distance to q — treating words as
3
11
15
tokens — that is less than a small threshold δ (δ = 1 in our im-
7
plementation) or there is at least a k-word consecutive overlap be-
12
tween the phrases we use k = 10). Since all edges (p, q) point from
Figure 2: Phrase graph. Each phrase is a node and we want to
shorter phrases to longer phrases, we have a directed acyclic graph
delete the least edges so that each resulting connected compo-
(DAG) G at this point. In general, one could use more complicated
nent has a single root node/phase, a node with zero out-edges.
natural language processing techniques, or external data to create
By deleting the indicated edges we obtain the optimal solution.
the edges in the phrase graph. We experimented with various other
techniques and found the current approach robust and scalable.
To begin, we define some terminology. We will refer to each
Thus, G encodes an approximate inclusion relationship or long
news article or blog post as an item, and refer to a quoted string
consecutive overlap among all the quoted phrases in the data, al-
that occurs in one or more items as a phrase. Our goal is to pro-
lowing for small amounts of textual mutation. Figure 1 depicts a
duce phrase clusters, which are collections of phrases deemed to
very small portion of the phrase DAG for our data, zoomed in on
be close textual variants of one another. We will do this by building
a few of the variants of a quote by Sarah Palin. Only edges with
a phrase graph where each phrase is represented by a node and di-
endpoints not connected by some other path in the DAG are shown.
rected edges connect related phrases. Then we partition this graph
We now add weights wpq to the edges (p, q) of G, reflecting
in such a way that its components will be the phrase clusters.
the importance of each edge. The weight is defined so that it de-
We first discuss how to construct the graph, and then how we par-
creases in the directed edit distance from p to q, and increases in
tition it. The dominant way in which one finds textual variants in
the frequency of q in the corpus. This latter dependence is impor-
our quote data is excerpting — when phrase p is a contiguous sub-
tant, since we particularly wish to preserve edges (p, q) when the
sequence of the words in phrase q. Thus, we build the phrase graph
inclusion of p in q is supported by many occurrences of q.
to capture these kinds of inclusion relations, relaxing the notion of
Partitioning the phrase graph. How should we recognize a good
inclusion to allow for very small mismatches between phrases.
phrase cluster, given the structure of G? The central idea is that
The phrase graph. First, to avoid spurious phrases, we set a lower
we are looking for a collection of phrases related closely enough
bound L on the word-length of phrases we consider, and a lower
that they can all be explained as “belonging” either to a single long
bound M on their frequency — the number of occurrences in the
phrase q, or to a single collection of phrases. The outgoing paths
full corpus. We also eliminate phrases for which at least an ε frac-
from all phrases in the cluster should flow into a single root node
tion occur on a single domain — inspection reveals that frequent
q, where we define a root in G to be a node with no outgoing edges

(e.g., nodes 13, 14, 15 in Fig. 2). So, the phrase cluster should be a
109

x
Phrases: ∝ x-1.8
subgraph for which all paths terminate in a single root node.
108
Clusters: ∝ x-2.1
Thus, informally, to identify phrase clusters, we would like delete
107
Lipstick: ∝ x-0.85
edges of small total weight from the phrase graph so it falls apart
106
into disjoint pieces, with the property that each piece “feeds into”
105
a single root phrase that can serve as the exemplar for the phrase
104
cluster. More precisely, we define a directed acyclic graph to be
103
single-rooted if it contains exactly one root node. (Note that ev-
102
ery DAG has at least one root.) We now define the following DAG
101
partitioning problem:
100
No. of items with volume 10-1
DAG Partitioning: Given a directed acyclic graph with
100
101
102
103
104
105
edge weights, delete a set of edges of minimum to-
Volume, x
tal weight so that each of the resulting components is
single-rooted.
Figure 3: Phrase volume distribution. We consider the volume
of individual phrases, phrase-clusters, and the phrases that
For example, Figure 2 shows a DAG with all edge weights equal to
compose the “Lipstick on a pig” cluster. Notice phrases and
1; deleting indicated edges forms the unique optimal solution.
phrase-clusters have similar power-law distribution while the
We now show that DAG Partitioning is computationally intractable
“Lipstick on a pig” cluster has much fatter tail, which means
to solve optimally. We then discuss the heuristic we use for the
that popular phrases are have unexpectedly high popularity.
problem on our data, which we find to work well in practice.
PROPOSITION 1. DAG Partitioning is NP-hard.
Dataset description. Our dataset covers three months of online
mainstream and social media activity from August 1 to October 31
Proof Sketch. We show that deciding whether an instance of DAG
2008 with about 1 million documents per day. In total it consist of
Partitioning has a solution of total edge weight at most W is NP-
90 million documents (blog posts and news articles) from 1.65 mil-
complete, using a reduction from an NP-complete problem in dis-
lion different sites that we obtained through the Spinn3r API [27].
crete optimization known as the Multiway Cut problem [9, 10]. In
The total dataset size is 390GB and essentially includes complete
an instance of Multiway Cut, we are given a weighted undirected
online media coverage: we have all mainstream media sites that are
graph H in which a subset T of the nodes has been designated as
part of Google News (20,000 different sites) plus 1.6 million blogs,
the set of terminals. The goal is to decide whether we can delete
forums and other media sites. From the dataset we extracted the
a set of edges of total weight at most W so that each terminal T
total 112 million quotes and discarded those with L < 4, M < 10,
belongs to a distinct component. Due to space constraints we give
and those that fail our single-domain test with ε = .25. This left us
the details of this construction at the supporting website [1].
with 47 million phrases out of which 22 million were distinct. Clus-
An alternate heuristic. Given the intractability of DAG Partition-
tering the phrases took 9 hours and produced a DAG with 35,800
ing, we develop a class of heuristics for it that we find to scale well
non-trivial components (clusters with at least two phrases) that to-
and to produce good phrase clusters in practice.
gether included 94,700 nodes (phrases).
To motivate the heuristics, note that in any optimal solution to
Figure 3 shows the complementary cumulative distribution func-
DAG Partitioning, there is at least one outgoing edge from each
tion (CCDF) of the phrase volume. For each volume x, we plot the
non-root node that has not been deleted. (For if a non-root node had
number of phrases with volume ≥ x. If the quantity of interest is
all its outgoing edges deleted, then we could put one back in and
power-law distributed with exponent γ, p(x) ∝ x−γ , then when
still preserve the validity of the solution.) Second, a subgraph of
plotted on log-log axes the CCDF will be a straight line with slope
the DAG where each non-root node has only a single out-edge must
−(γ + 1). In Figure 3 we superimpose three quantities of interest:
necessarily have single-rooted components, since the edge sets of
the volume of individual phrases, phrase clusters (volume of all
the components will all be in-branching trees. Finally, if — as a
phrases in the cluster), and the individual phrases from the largest
thought experiment — for each node v we happened to know just
phrase-cluster in our dataset (the “lipstick on a pig” cluster). No-
a single edge e that was not deleted in the optimal solution, then
tice all quantities are power-law distributed. Moreover, the volume
the subgraph consisting of all these edges e would have the same
of individual phrases decays as x−2.8, and of phrase-clusters as
components (when viewed as node sets) as the components in the
x−3.1, which means that the tails are not very heavy as for γ > 3
optimal solution of DAG Partitioning. In other words, it is enough
power-law distributions start to have finite variances. However, no-
to find a single edge out of each node that is included in the optimal
tice that volume of the “lipstick on a pig” cluster decays as x−1.85
solution to identify the optimal components.
in which case the tail is much heavier. In fact, for γ < 2 power-
With this in mind, our heuristics proceed by choosing for each
laws have infinite expectations. This means that variants of popular
non-root node a single outgoing edge. Thus each of the compo-
phrases, like “lipstick on a pig,” are much more “stickier” than what
nents will be single-rooted, as noted above, and we take these as
would be expected from overall phrase volume distribution. Pop-
the components of our solution. We evaluate the heuristics with
ular phrases have many variants and each of them appears more
respect to the total amount of edge weight kept in the clusters if a
frequently than an “average” phrase.
random edge out of each phrase is kept. We found that keeping an
edge to the shortest phrase gives 9% improvement over the base-
3.
GLOBAL ANALYSIS: TEMPORAL VARI-
line, while keeping an edge to the most frequent phrase gives 12%
improvement. Proceeding from the roots down the DAG and greed-
ATION AND A PROBABILISTIC MODEL
ily assigning each node to the cluster to which it has the most edges
Having produced phrase clusters, we now construct the individ-
gives 13% improvement over the baseline. We also experimented
ual elements of the news cycle. We define a thread associated with
with simulated annealing but that did not improve the solution, sug-
a given phrase cluster to be the set of all items (news articles or
gesting further evidence for the effectiveness of our heuristics.
blog posts) containing some phrase from the cluster, and we then

Figure 4: Top 50 threads in the news cycle with highest volume for the period Aug. 1 – Oct. 31, 2008. Each thread consists of all news
articles and blog posts containing a textual variant of a particular quoted phrases. (Phrase variants for the two largest threads in
each week are shown as labels pointing to the corresponding thread.) The data is drawn as a stacked plot in which the thickness of the
strand corresponding to each thread indicates its volume over time. Interactive visualization is available at http://memetracker.org.
Figure 5: Temporal dynamics of top threads as generated by our model. Only two ingredients, namely imitation and a preference to
recent threads, are enough to qualitatively reproduce the observed dynamics of the news cycle.
track all threads over time, considering both their individual tem-
sponding, respectively, to the Democratic and Republican National
poral dynamics as well as their interactions with one another.
Conventions, the overwhelming volume of the “lipstick on a pig”
Using our approach we completely automatically created and
thread, the beginning of peak public attention to the financial crisis,
also automatically labeled the plot in Figure 4, which depicts the
and the negotiations over the financial bailout plan. Notice how the
50 largest threads for the three-month period Aug. 1 – Oct. 31. It
plot captures the dynamics of the presidential campaign coverage
is drawn as a stacked plot, a style of visualization (see e.g. [16])
at a very fine resolution. Spikes and the phrases pinpoint the exact
in which the thickness of each strand corresponds to the volume of
events and moments that triggered large amounts of attention.
the corresponding thread over time, with the total area equal to the
Moreover, we have evaluated competing baselines in which we
total volume. We see that the rising and falling pattern does in fact
produce topic clusters using standard methods based on probabilis-
tell us about the patterns by which blogs and the media successively
tic term mixtures (e.g. [7, 8]).2 The clusters produced for this time
focus and defocus on common story lines.
period correspond to much coarser divisions of the content (poli-
An important point to note at the outset is that the total number
tics, technology, movies, and a number of essentially unrecogniz-
of articles and posts, as well as the total number of quotes, is ap-
able clusters). This is consistent with our initial observation in Sec-
proximately constant over all weekdays in our dataset. (Refer to [1]
tion 1 that topical clusters are working at a level of granularity dif-
for the plots.) As a result, the temporal variation exhibited in Fig-
ferent from what is needed to talk about the news cycle. Similarly,
ure 4 is not the result of variations in the overall amount of global
producing clusters from the most linked-to documents [23] in the
news and blogging activity from one day to the next. Rather, the
dataset produces a much finer granularity, at the level of individual
periods when the upper envelope of the curve are high correspond
articles. For reasons of space, we refer the reader to the supporting
to times when there is a greater degree of convergence on key sto-
website [1] for the full results of these baseline approaches.
ries, while the low periods indicate that attention is more diffuse,
Global models for temporal variation. From a modeling per-
spread out over many stories. There is a clear weekly pattern in
this (again, despite the relatively constant overall volume), with
2
the five large peaks between late August and late September corre-
As these do not scale to the size of the data we have here, we could
only use a subset of 10,000 most highly linked-to articles.

spective, it is interesting to ask for a minimal set of dynamic be-
Analysis and simulation results. We find through simulation that
haviors that will produce this type of sustained temporal variation
this model produces fluctuations that are similar to what is observed
over time. Rather than trying to fit the curve in Figure 4 exactly, the
in real news-cycle data. Figure 5 shows the results of a simulation
question here is to find basic ingredients that can produce synthetic
of the model with the function f taking a power-law functional
dynamics of a broadly similar structure.
form, and with a exponentially decaying form for the recency func-
To begin with, there are interesting potential analogies to natu-
tion δ. (The threads of highest volume are depicted by analogy with
ral systems that contain dynamics similar to what one sees in the
Figure 4.) We see that although the model introduces no exoge-
news cycle. For example, one could imagine the news cycle as a
nous sources of variability as time runs forward, the distribution of
kind of species interaction within an ecosystem [18], where threads
popular threads and their co-occurrence in time can be highly non-
play the role of species competing for resources (in this case media
uniform, with periods lacking in high-volume threads punctuated
attention, which is constant over time), and selectively reproduc-
by the appearance of popular threads close together in time.
ing (by occupying future articles and posts). Similarly, one can
In Figure 6, we illustrate the basic reasons why one cannot pro-
see analogies to certain kinds of biological regulation mechanisms
duce these effects with only one of the two ingredients. When
such as follicular development [21], in which threads play the role
there is only a recency effect but no imitation (so the probability
of cells in an environment with feedback where at most one or a few
of choosing thread j is proportional only to δ(t − tj ) for some
cells tend to be dominant at any point in time. However, the news
function δ), we see that no thread ever achieves significant volume,
cycle is distinct in that there is a constant influx of new threads on
since each is crowded out by newer ones. When there is only imi-
a time scale that is comparable to the rate at which competition and
tation but no recency effect, (so the probability of choosing thread
selective reproduction is taking place.3 A model for the dynamics
j is proportional only to f (nj ) for some function f ), then a single
of the news cycle must take this into account, as we now discuss.
thread becomes dominant essentially forever: there are no recency
We argue that in formulating a model for the news cycle, there
effects to drive it away, although its dominance shrinks over time
are two minimal ingredients that should be taken into account. The
simply because the total number of competing threads is increasing.
first is that different sources imitate one another, so that once a
Rigorous analysis of the proposed model appears to be quite
thread experiences significant volume, it is likely to persist and
complex. However, one can give an argument for the characteristic
grow through adoption by others. The second, counteracting the
shape of thread volume over time in Figure 5 through an approxi-
first, is that threads are governed by strong recency effects, in which
mation using differential equations. If we focus on a single thread
new threads are favored to older ones. (There are other effects that
j in isolation, and view all the competing threads as a single ag-
can be included as well, including the fact that threads differ in
gregate, then the volume X(t) of j at time t can be approximated
their initial attractiveness to media sources, with some threads hav-
by X(t + 1) = cf (X(t))δ(t), where for notational simplicity we
ing inherently more likelihood to succeed. However, we omit this
translate time so tj = 0, and we use c to denote a normalizing
and other features from the present discussion, which focuses on
constant for the full distribution. Subtracting X(t) from both sides
identifying a minimal set of ingredients necessary for producing
to view it as a difference equation, we can in turn approximate this
the patterns we observe.)
using the differential equation dx/dt = cf (x)δ(t) − x.
We seek to capture the two components of imitation and recency
For certain choices of f (·) and δ(·) we can solve this analytically
in a stylized fashion using the following model, whose dynamics
in closed-form, obtaining an expression for the volume x(t) as a
we can then study. The model can be viewed as incorporating a
function of time. For example, suppose we have f (x) = qx and
type of preferential attachment [4], but combined with factors re-
δ(t) = t−1; then
lated to the effects of novelty and attention [32]. Time runs in dis-
dx
crete periods t = 1, 2, 3, . . . , T , and there is a collection of N
= cqxt−1 − x = x(cqt−1 − 1).
media sources, each of which reports on a single thread in one time
dt
period. Simply for the sake of initialization, we will assume that
Dividing through by x we get
each source is reporting on a distinct thread at time 0. In each time
1 dx
step, a new thread j is produced.
dt =
(cqt−1 − 1)dt
x dt
Also in each time step t, each source must choose which thread
to report on. A given source chooses thread j with probability pro-
and hence x = Atcqe−t. This function has the type of “saw-tooth”
portional to the product f (nj )δ(t − tj ), where nj denotes the num-
shape of increase followed by exponential decrease as in Figure 5.
ber of stories previously written about thread j, time t is the current
Again, this functional form arises from a particular choice of
time, and time tj is the time when j was first produced. The func-
f (·) and δ(·) and is intended to give a sense for the behavior of
tion δ(·) is monotonically decreasing in t − tj . One could take this
volume over time. At a more general level, we feel that any model
decrease to be exponential in some polynomial function of t − tj
of the news cycle will need to incorporate, at least implicitly, some
based on research on novelty and attention [32]; or, following re-
version of these two ingredients; we view a more general and more
search on human response-time dynamics [24, 29], one could take
exact analysis of such models as an interesting open question.
it to be a heavy-tailed functional form. The function f (·) is mono-
tonically increasing in nj , with f (0) > 0 since otherwise no source
4.
LOCAL ANALYSIS: PEAK INTENSITY
would ever be the first to report on a thread. Based on considera-
tions of preferential attachment [4], it is natural to consider func-
AND NEWS/BLOG INTERACTIONS
tional forms for f (·) such as f (nj ) = (a + bnj ) or more generally
So far we have examined the dynamics of the news cycle at a
f (nj ) = (a + bnj )γ . Again, we note that while the imitative effect
global level, and proposed a simple model incorporating imitation
created by f (·) causes large threads to appear, they cannot persist
and recency. We now analyze the process at a more fine-grained
for very long due to the recency effects imposed by δ(·).
level, focusing on the temporal dynamics around the peak intensity
of a typical thread, as well as the interplay between the news media
and blogs in producing the structure of this peak.
3We thank Steve Strogatz for pointing out the analogies and con-
trasts with these models to us.
Thread volume increase and decay. Recall that the volume of a

Figure 6: Only a single aspect of the model does not reproduce dynamic behavior. With only preference to recency (left) no thread
prevails as at every time step the latest thread gets attention. With only imitation (right) a single thread gains most of the attention.
0.18
1
Data
Mainstream media
a
0.16
·log(t)+c
Blogs
0.8
exp(-b·t)+c
0.14
0.12
0.6
0.1
0.08
Volume
0.4
0.06
0.2
0.04
0.02
Fraction of phrase volume
0
0
-3
-2
-1
0
1
2
3
4
5
-12 -9
-6
-3
0
3
6
9
12
Time [days], t
Time lag [hours], t
Figure 7: Thread volume increase and decay over time. Notice
Figure 8: Time lag for blogs and news media. Thread volume in
the symmetry, quicker decay than buildup, and lower baseline
blogs reaches its peak typically 2.5 hours after the peak thread
popularity after the peak.
volume in the news sources. Thread volume in news sources in-
creases slowly but decrease quickly, while in blogs the increase
thread at a time t is simply the number of items it contains with
is rapid and decrease much slower.
timestamp t. First we examine how the volume of a thread changes
over time. A natural conjecture here would be to assume an expo-
left t → 0− we have a = 0.076, while as t → 0+ we have
nential form for the change in the popularity of a phrase over time.
a = 0.092. This suggests that the peak builds up more slowly
However, somewhat surprisingly we show next that the exponential
and declines faster. A similar contrast holds for the exponential de-
function does not increase fast enough to model the behavior.
cay parameter b. We fit ebt and notice that from the left b = 1.77,
Given a thread p, we define its peak time tp as the median of the
while after the peak b = 2.15, which similarly suggests that the
times at which it occurred in the dataset. We find that threads tend
popularity slowly builds up, peaks and then decays somewhat more
to have particularly high volume right around this median time, and
quickly. Finally, we also note that the background frequency be-
hence the value of tp is quite stable under the addition or deletion
fore the peak is around 0.12, while after the peak it drops to around
of moderate numbers of items to p. We focus on the 1,000 threads
0.09, which further suggests that threads are more popular before
with the largest total volumes (i.e. the largest numbers of items).
they reach their peak, and afterwards they decay very quickly.
For each thread, we determine its volume as a function of time; we
then normalize and aligne these curves so that tp = 0 for each, and
Time lag between news media and blogs. A common assertion
so that the volume of each at time 0 was equal to 1. Finally, for
about the news cycle is that quoted phrases first appear in the news
each time t we plot the median volume at t over all 1,000 phrase-
media, and then diffuse to the blogosphere, where they dwell for
clusters. This is depicted in Figure 7.
some time. However, the real question is, how often does this hap-
In general, one would expect the overall volume of a thread to be
pen? What about the propagation in the opposite direction? What
very low initially; then as the mass media begins joining in the vol-
is the time lag? As we show next, using our approach we can de-
ume would rise; and then as it percolates to blogs and other media
termine the lag within temporal resolution of less than an hour.
it would slowly decay. However, it seems that the behavior tends to
We labeled each of our 1.6 million sites as news media or blogs.
be quite different from this. First, notice that in Figure 7 the rise and
To assign the label we used the following rule: if a site appears on
drop in volume is surprisingly symmetric around the peak, which
Google News then we label it as news media, and otherwise we
suggests little or no evidence for a quick build-up followed by a
label it as a blog. Although this rule is not perfect we found it to
slow decay. We find that no one simple justifiable function fits the
work quite well in practice. There are 20,000 different news sites in
data well. Rather, it appears that there are two distinct types of be-
Google News, which a tiny number when compared to 1.65 million
havior: the volume outside an 8-hour window centered at the peak
sites that we track. However, these news media sites generate about
can be well modeled by an exponential function, e−bx, while the
30% of the total number of documents in our dataset. Moreover, if
8-hour time window around the peak is best modeled by a logarith-
we only consider documents that contain frequent phrases then the
mic function, a| log(|x|)|. The exponential function is increasing
share of news media documents further rises to 44%.
too slowly to be able to fit the peak, while the logarithm has a pole
By analogy with the previous experiment we take the top 1000
at x = 0 (| log(|x|)| → ∞ as x → 0). This is surprising as it
highest volume threads, align them so that each has an overall peak
suggests that the peak is a point of “singularity” where the number
at tp = 0, but now create two separate volume curves for each
of mentions effectively diverges. Another way to view this is as a
thread: one consisting of the subsequence of its blog items, and the
form of Zeno’s paradox: as we approach time 0 from either side,
other consisting of the subsequence of its news media items. We
the volume increases by a fixed increment each time we shrink our
will refer to the sizes of these as the blog volume and news volume
distance to time 0 by a constant factor.
of the thread. Figure 8 plots the median news and blog volumes
Fitting the function a log(t) to the spike we find that from the
and reveals that this time our intuition was right. First, notice that

Rank
Lag [h]
Reported
Site
0.62
1
-26.5
42
hotair.com
0.6
2
-23
33
talkingpointsmemo.com
0.58
4
-19.5
56
politicalticker.blogs.cnn.com
5
-18
73
huffingtonpost.com
0.56
6
-17
49
digg.com
0.54
7
-16
89
breitbart.com
8
-15
31
thepoliticalcarnival.blogspot.com
0.52
9
-15
32
talkleft.com
0.5
10
-14.5
34
dailykos.com
16
-14
54
blogs.abcnews.com
0.48
Fraction of blog mentions
30
-11
32
uk.reuters.com
0.46
34
-11
72
cnn.com
-9 -6 -3 0
3
6
9 12 15 18
40
-10.5
78
washingtonpost.com
Time [hours], t
48
-10
53
online.wsj.com
49
-10
54
ap.org
Figure 9: Phrase handoff from news to blogs. Notice a heart-
beat like pulse when news media quickly takes over a phrase.
Table 1: How quickly different media sites report a phrase.
Lag: median time between the first mention of a phrase on a
site and the time when its mentions peaked. Reported: per-
the median for a thread in the news media typically occurs first, and
centage of top 100 phrases that the site mentioned.
then a median of 2.5 hours later the median for the thread among
blogs occurs. Moreover, news volume both increases faster, and
gives us a sense for how early or late the site takes part in the thread,
higher, but also decreases quicker than blog volume. For news vol-
relative to the bulk of the coverage. Table 1 gives a list of sites with
ume we make an observation similar to what we saw in Figure 7:
the minimum (i.e. most negative) lags. Notice that early mention-
the volume increases slower than it decays. However, in blogs this
ers are blogs and independent media sites; behind them, but still
is exactly the opposite. Here the number of mentions first quickly
well ahead of the crowd, are large media organizations.
increases, reaches its peak 2.5 hours after the news media peak, but
Quotes migrating from blogs to news media. The majority of
then decays more slowly.
phrases first appear in news media and then diffuses to blogs where
One interpretation is that a quoted phrase first becomes high-
it is then discussed for longer time. However, there are also phrases
volume among news sources, and is then “handed off” to blogs.
that propagate in the opposite way, percolating in the blogosphere
The news media are slower to heavily adopt a quoted phrase and
until they are picked up the news media. Such cases are very im-
subsequently quick in dropping it, as they move on to new content.
portant as they show the importance of independent media. While
On the other hand, bloggers rather quickly adopt phrases from the
there has been anecdotal evidence of this phenomenon, our ap-
news media, with a 2.5-hour lag, and then discuss them for much
proach and the comprehensiveness and the scale of our dataset
longer. Thus we see a pattern in which a spike and then rapid drop
makes it possible to automatically find instances of it.
in news volume feeds a later and more persistent increase in blog
To extract phrases that acquired non-trivial volume earlier in the
volume for the same thread.
blogosphere, we use the following simple heuristic. Let tm de-
Handoff of phrases from news media to blogs. To further inves-
note the median time of news volume for a thread. Then let fb
tigate the dynamics and transitions of phrases from the news media
be the fraction of the total thread volume consisting of blog items
to the blogosphere we perform the following experiment: we take
dated at least a week before tm. We look for threads for which
the top 1000 threads, align them so that they all peak at time t
0.15 < f
p = 0,
b < 0.5.
Here the threshold of 0.15 ensures that the
but now calculate the ratio of blog volume to total volume for each
phrase was sufficiently mentioned on the blogosphere well before
thread as a function of time.
the news media peak, and 0.5 selects only phrases that also had a
Figure 9 shows a “heartbeat”-like like dynamics where the phrase
significant presence in the news media.
“oscillates” between blogs and mainstream media. The fraction of
Table 2 lists the highest-volume thread as automatically returned
blog volume is initially constant, but it turns upward about three
by our rule. Manual inspection indicates that almost all correspond
hours before the peak as early bloggers mention the phrase. Once
to intuitively natural cases of stories that were first “discovered” by
the news media joins in, around t = −1, the fraction of blog vol-
bloggers. Moreover, out of 16,000 frequent phrases we considered
ume drops sharply; but it then jumps up after t = 0 once the news
in this experiment 760 passed the above filter. Interpreting this
media begins dropping the thread and blogs continue adopting it.
ratio in light of our heuristic, it suggests that about 3.5% of quoted
The fraction of blog mention peaks around t = 2.5, and after 6-9
phrases tend to percolate from blogs to news media, while diffusion
hours the hand-off is over and the fractions stabilize. It is inter-
in the other direction is much more common.
esting that the constant fraction before the peak (t ≤ −6) is 56%,
while after the peak (t ≥ 9) is actually higher, which suggests a
5.
CONCLUSION
persistent effect in the blogosphere after the news media has moved
We have developed a framework for tracking short, distinctive
on. This provides a picture of the very fine-scale temporal dynam-
phrases that travel relatively intact through on-line text and pre-
ics of the handoff of news from mainstream media to blogs, aggre-
sented scalable algorithms for identifying and clustering textual
gated at the very large scale of 90 million news articles.
variants of such phrases that scale to a collection of 90 million ar-
Lag of individual sites on mentioning a phrase. We also inves-
ticles, which makes the present study one of the largest analyses
tigate how quickly different media sites mention a phrase. Thus,
of on-line news in terms of data scale. Our work offers some of
we define the lag of a site with respect to a given thread to be the
the first quantitative analyses of the global news cycle and the dy-
time at which the site first mentions the associated quoted phrase,
namics of information propagation between mainstream and social
minus the phrase peak time. (Negative lags indicate that the site
media. In particular, we observed a typical lag of 2.5 hours between
mentioned the quoted phrase before peak attention.) This measure
the peaks of attention to a phrase in the news media and in blogs,

M
fb
Phrase
[3] E. Adar, L. Zhang, L. Adamic, R. Lukose. Implicit structure
Well uh you know I think that whether you’re looking at it
and dynamics of blogspace. Wks. Weblogging Ecosystem’04.
from a theological perspective or uh a scientific perspective
2,141 .30
[4] R. Albert and A.-L. Barabási. Statistical mechanics of
uh answering that question with specificity uh you know is
complex networks. Rev. of Modern Phys., 74:47–97, 2002.
uh above my pay grade.
A changing environment will affect Alaska more than any
[5] J. Allan (ed). Topic Detection and Tracking. Kluwer, 2002.
826
.18 other state because of our location I’m not one though who
[6] L. Bennett. News: The Politics of Illusion. A. B. Longman
would attribute it to being man-made.
(Classics in Political Science), seventh edition, 2006.
It was Ronald Reagan who said that freedom is always just
[7] D. Blei, J. Lafferty. Dynamic topic models. ICML, 2006.
one generation away from extinction we don’t pass it to our
[8] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet
children in the bloodstream we have to fight for it and pro-
allocation. JMLR, pages 3:993–1022, 2003.
tect it and then hand it to them so that they shall do the same
763
.18
[9] G. Calinescu, H. Karloff, Y. Rabani. An improved
or we’re going to find ourselves spending our sunset years
telling our children and our children’s children about a time
approximation algorithm for multiway cut. JCSS 60(2000).
in America back in the day when men and women were free.
[10] E. Dahlhaus, D. S. Johnson, C. H. Papadimitriou, P. D.
After trying to make experience the issue of this campaign
Seymour, and M. Yannakakis. The complexity of
John McCain celebrated his 72nd birthday by appointing
multiterminal cuts. SIAM J. Comput., 23(4):864–894, 1994.
a former small town mayor and brand new governor as his
[11] E. Gabrilovich, S. Dumais, and E. Horvitz. Newsjunkie:
vice presidential nominee is this really who the republican
745
.18
Providing personalized newsfeeds via analysis of
party wants to be one heartbeat away from the presidency
given Sarah Palin’s lack of experience on every front and on
information novelty. In WWW ’04, 2004.
nearly every issue this vice presidential pick doesn’t show
[12] M. Gamon, S. Basu, D. Belenko, D. Fisher, M. Hurst, and
judgement it shows political panic.
A. C. Kanig. Blews: Using blogs to provide context for news
Clarion fund recently financed the distribution of some 28
articles. In ICWSM ’08, 2008.
million DVDs containing the film obsession radical islam’s
670
.38
[13] N. Godbole, M. Srinivasaiah, and S. Skiena. Large-scale
war against the west in what many political analysts de-
sentiment analysis for news and blogs. In ICWSM ’07, 2007.
scribe as swing states in the upcoming presidential elections.
[14] D. Gruhl, D. Liben-Nowell, R. V. Guha, and A. Tomkins.
Table 2: Phrases first discovered by blogs and only later
Information diffusion through blogspace. In WWW ’04, 2004.
adopted by the news media. M : total phrase volume, f
[15] J. Harsin. The rumour bomb: Theorising the convergence of
b: frac-
tion of blog mentions before 1 week of the news media peak.
new and old trends in mediated U.S. politics. Southern
Review: Communication, Politics and Culture, 39(2006).
with a “heartbeat”-like shape of the handoff between news and
[16] S. Havre, B. Hetzler, L. Nowell. ThemeRiver: Visualizing
blogs. We also developed a mathematical model for the kinds of
theme changes over time. IEEE Symp. Info. Vis. 2000.
temporal variation that the system exhibits. As information mostly
[17] J. Kleinberg. Bursty and hierarchical structure in streams. In
propagates from news to blogs, we also found that in only 3.5%
KDD ’02, pages 91–101, 2002.
of the cases stories first appear dominantly in the blogosphere and
[18] M. Kot. Elements of Mathematical Ecology. Cambridge
subsequently percolate into the mainstream media.
University Press, 2001.
Our approach to meme-tracking opens an opportunity to pursue
[19] B. Kovach and T. Rosenstiel. Warp Speed: America in the
long-standing questions that before were effectively impossible to
Age of Mixed Media. Century Foundation Press, 1999.
tackle. For example, how can we characterize the dynamics of mu-
[20] R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. Structure
tation within phrases? How does information change as it propa-
and evolution of blogspace. CACM, 47(12):35–39, 2004.
gates? Over long enough time periods, it may be possible to model
[21] M. Lacker and C. Peskin. Control of ovulation number in a
the way in which the essential “core” of a widespread quoted phrase
model of ovarian follicular maturation. In AMS Symposium
emerges and enters popular discourse more generally. One could
on Mathematical Biology, pages 21–32, 1981.
combine the approaches here with information about the political
[22] P.F. Lazarsfeld, B. Berelson, and H. Gaudet. The People’s
orientations of the different news media and blog sources [2, 12,
Choice. Duell, Sloan, and Pearce, 1944.
13], to see how particular threads move within and between op-
[23] J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance,
posed groups. Introducing such types of orientation is challenging,
M. Hurst. Cascading behavior in large blog graphs. SDM’07.
however, since it requires reliable methods of labeling significant
[24] R. D. Malmgren, D. B. Stouffer, A. Motter, and L. A. N.
fractions of sources at this scale of data. Finally, a deeper under-
Amaral. A poissonian explanation for heavy tails in e-mail
standing of simple mathematical models for the dynamics of the
communication. PNAS, to appear, 2008.
news cycle would be useful for media analysts; temporal relation-
[25] J. Schmidt. Blogging practices: An analytical framework.
ships such as we find in Figure 8 suggest the possibility of employ-
Journal of Computer-Mediated Communication, 12(4), 2007.
ing a type of two-species predator-prey model [18] with blogs and
[26] J. Singer. The political j-blogger. Journalism, 6(2005).
the news media as the two interacting participants. More generally,
[27] Spinn3r API. http://www.spinn3r.com. 2008.
it will be useful to further understand the roles different participants
[28] M. L. Stein, S. Paterno, and R. C. Burnett. Newswriter’s
play in the process, as their collective behavior leads directly to the
Handbook: An Introduction to Journalism. Blackwell, 2006.
ways in which all of us experience news and its consequences.
[29] A. Vazquez, J. G. Oliveira, Z. Deszo, K.-I. Goh, I. Kondor,
Acknowledgements. We thank David Strang and Steve Strogatz
and A.-L. Barabasi. Modeling bursts and heavy tails in
for valuable conversations and the creators of Flare and Spinn3r
human dynamics. Physical Review E, 73(036127), 2006.
for resources that facilitated the research.
[30] X. Wang and A. McCallum. Topics over time: a non-markov
continuous-time model of topical trends. Proc. KDD, 2006.
6.
REFERENCES
[31] X. Wang, C. Zhai, X. Hu, R. Sproat. Mining correlated bursty
topic patterns from coordinated text streams. KDD, 2007.
[1] Supporting website: http://memetracker.org/supp
[32] F. Wu and B. Huberman. Novelty and collective attention.
[2] L. Adamic and N. Glance. The political blogosphere and the
Proc. Natl. Acad. Sci. USA, 104, 2007.
2004 U.S. election. Workshop on Link Discovery, 2005.