Correction: Limits on reliable information flows through stochastic populations

[This corrects the article DOI: 10.1371/journal.pcbi.1006195.].


Initial configuration
The initial configuration is described in several layers. First, the neutral initial configuration corresponds to the initial states of the agents, before the sources and the desired opinion to converge to are set. (The term neutral is motivated by a physical analogy, as opposed to a charged initial configuration.) A random initialization is then applied to the given neutral initial configuration, which determines the set of sources and the opinion that agents need to converge to. This will result in what we call the charged initial configuration. It can represent, for example, an external event that was identified by few agents which now need to deliver their knowledge to the rest of the population.

Neutral initial configuration x (0)
Each agent v starts the execution with an input that contains, in addition to its identity: • an initial state taken from some discrete set of states, and • a binary opinion variable λ v ∈ {0, 1}.
(The opinion of an agent could have been considered as part of the state of the agent. We separate these two notions merely for the presentation purposes.) The neutral initial configuration x (0) is the vector whose i'th index, x (0) i for i ∈ {1, 2, . . . , n}, is the input of the agent with identity i.

Charged initial configuration and correct opinion
The charged initial configuration is determined in three stages. The first corresponds to the random selection of sources, the second to the selection of the correct opinion, and the third to a possible update of states of sources, as a result of being selected as sources with a particular opinion.
1st stage -Random selection of sources. Given an integer s ≤ n, a set S of size s is chosen uniformly at random (u.a.r) among the agents. The agents in S are called sources. Note that any agent has equal probability of being a source. We assume that each source knows it is a source, and conversely, each non-source knows it is not a source.
2nd stage -Random selection of correct opinion. In the main model we consider, after sources have been determined in the first stage, the sources are randomly initialized with an opinion, called the correct opinion. That is, a fair coin is flipped to determine an opinion in {0, 1} and all sources are assigned with this opinion.
3rd stage -Update of initial states of sources. To capture a change in behavior as a result of being selected as a source with a particular opinion, we assume that once the opinion of a source u has been determined, the initial state of u may change according to some distribution f source−state that depends on (1) its identity, (2) its opinion, and (3) the neutral configuration. Each source samples its new state independently.

Alphabet and messages
Agents communicate by observing each other according to some random pattern (for details see the Section Random interaction patterns). To improve communication agents may choose which content, called message, they wish to reveal to other agents that observe them. Importantly, however, such messages are subject to noise.
More specifically, at any given time, each agent v (including sources) displays a message m ∈ Σ, where Σ is some finite alphabet. The alphabet Σ agents use to communicate may be richer than the actual information content they seek to disseminate, namely, their opinions. This, for instance, gives them the possibility to express several levels of certainty [15]. We can safely assume that the size of Σ is at least two, and that Σ includes both symbols 0 and 1. We are mostly concerned with the case where Σ is of constant size (i.e., independent of the number of agents), but note that our results hold for any size of the alphabet Σ, as long as the noise criterion is satisfied (see below).

δ-uniform noise
When an agent u observes some agent v, it receives a sample of the message currently held by v. The noise in the sample is characterized by a noise parameter 0 < δ ≤ 1/2. One of the important aspects in our theorems is that they are general enough to hold assuming any distribution governing the noise, as long as it satisfies the following noise criterion.
Definition 1 (The noise criterion with parameter δ). Any time some agent u observes an agent v holding some message m ∈ Σ, the probability that u actually receives a message m is at least δ, for any m ∈ Σ. We assume that all noisy samples are independent.
Observe that the aforementioned criterion implies that δ ≤ 1/|Σ|, and that the case δ = 1/|Σ| corresponds to messages being completely random, and the rumor spreading problem is thus unsolvable.
We next define a weaker criterion, that is particularly meaningful in cases in which sources are more restricted in their message repertoire than general agents. This may be the case, for example, if sources always choose to display their opinion as their message (possibly together with some extra symbol indicating that they are sources). Formally, we define Σ ⊆ Σ as the set of possible messages that a source can hold together with the set of messages that can be observed when viewing a source (i.e., after noise is applied). Our theorems actually apply to the following criterion, that requires that only messages in Σ are attained due to noise with some sufficient probability.

Random interaction patterns
We consider several basic interaction patterns. Our main model is the parallel-PULL model. In this model, time is divided into rounds, where at each round i ∈ N + , each agent u independently selects an agent v (possibly u = v) u.a.r from the population and then u observes the message held by v. The parallel-PU LL model should be contrasted with the parallel-PU SH model, in which u can choose between sending a message to the selected node v or doing nothing. We shall also consider the following variants of PU LL model.
• parallel-PULL(k). Generalizing parallel-PU LL for an integer 1 ≤ k ≤ n, the parallel-PU LL(k) model allows agents to observe k other agents in each round. That is, at each round i ∈ N + , each agent independently selects a set of k agents (possibly including itself) u.a.r from the population and observes each of them.
• sequential-PU LL. In each time step t ∈ N + , two agents u and v are selected uniformly at random (u.a.r) among the population, and agent u observes v.
• broadcast-PU LL. At each time step t ∈ N + one agent is chosen u.a.r. from the population and all agents observe it, receiving the same noisy sample of its message.
The broadcast-PU LL model is mainly used for technical considerations. We use it in our proofs as it simplifies our arguments while not harming their generality. Nevertheless, this broadcast model can also capture some situations in which agents can be seen simultaneously by many other agents, where the fact that all agents observe the same sample can be viewed as noise being originated by the observed agent.
Regarding the difference in time units between the models, since interactions occur in parallel in the parallel-PU LL model, one round in that model should informally be thought of as roughly n time steps in the sequential-PU LL or broadcast-PU LL model.

Liberal assumptions
As mentioned, we shall assume that agents have abilities that surpass their realistic ones. This does not only increases the generality of our lower bounds, but also simplifies their proofs. Specifically, the following liberal assumptions are considered.
• Unique identities. Each agent is equipped with a unique identity id(v) ∈ {1, 2, . . . , n}, that is, for every two agents u and v, we have id(u) = id(v). Moreover, whenever an agent u observes some agent v, we assume that u can infer the identity of v. In other words, we provide agents with the ability to reliably distinguish between different agents at no cost.
• Unlimited internal computational power. We allow agents to have unlimited computational abilities including infinite memory capacity. Therefore, agents can potentially perform arbitrarily complex computations based on their knowledge (and their id).
• Complete knowledge of the system. Informally, we assume that agents have access to the complete description of the system except for who are the sources and what is their opinion. More formally, we assume that each agent has access to: the neutral initial configuration x (0) , all the systems parameters, including the number of agents n, the noise parameter δ, the number of sources s, and the distribution f source−state governing the update the states of sources in the third stage of the charged initial configuration.
• Full synchronization. We assume that all agents are equipped with clocks that can count time steps (in sequential-PU LL or broadcast-PU LL) or rounds (in parallel-PULL(k)). The clocks are synchronized, ticking at the same pace, and initialized to 0 at the beginning of the execution. This means, in particular, that if they wish, the agents can actually share a notion of time that is incremented at each time step.
• Shared randomness. We assume that algorithms can be randomized. That is, to determine the next action, agents can internally toss coins and base their decision on the outcome of these coin tosses. Being liberal, we shall assume that randomness is shared in the following sense. At the outset, an arbitrarily long sequence r of random bits is generated and the very same sequence r is written in each agent's memory before the protocol execution starts. Each agent can then deterministically choose (depending on its state) which random bits in r to use as the outcome of its own random bits. In particular, since agents are allowed to have distinct initial states (e.g. by having unique identity labels), they can choose to make use of disjoint sets of random bits, thus making use of independent random variables. On the other hand, the shared randomness also implies that, for example, two agents can possibly make use of the very same random bits or merely observe the outcome of the random bits used by the other agents. Furthermore, the above implies that, conditioning on an agent u being a non-source agent, all the random bits used by u during the execution are accessible to all other agents.
• Coordinated sources. Even though non-source agents do not know who the sources are, we assume that sources do know who are the other sources. This means, in particular, that the sources can coordinate their actions.

Algorithm
Upon observation, each agent can alter its internal state (and in particular, its message to be seen by others) as well as its opinion. In reality, the updates of these variables may follow different constraints.
In the case of ants for example, it may take a long time to change their message even if their internal state changes. As part our liberal approach, we allow agents to change any part of their internal state instantaneously.
The strategy in which agents update these variables is called "algorithm". As mentioned, algorithms can be randomized, that is, to determine the next action, agents can use the outcome of coin tosses in the sequence r (see the shared randomness assumption in Liberal assumptions). Overall, the action of an agent u at time t depends on: Definition 3. We say that convergence has been achieved if one can specify a particular non-source agent v, for which it is guaranteed that its opinion is the correct opinion with probability at least 2/3. The time complexity is the number of time steps (respectively, rounds) required to achieve convergence.
We remark that the latter definition encompasses all three models considered.
Remark 1 (Different sampling rates of sources). We consider sources as agents in the population but remark that they can also be thought of as representing the environment. In this case, one may consider a different rate for sampling a source (environment) vs. sampling a typical agent. For example, the probability to observe any given source (or environment) may be x times more than the probability to observe any given non-source agent. This scenario can also be captured by a slight adaptation of our analysis. When x is an integer, we can alternatively obtain such a generalization by considering additional artificial sources in the system. Specifically, we replace each source u i with a set of sources U i consisting of x sources that coordinate their actions and behave identically (recall that we assume that sources know who are the other sources and can coordinate their actions), simulating the original behavior of u i . Since the number of sources increases by a multiplicative factor of x, our lower bounds (see Theorem 4 and Corollary 14.1) decrease by a multiplicative factor of x 2 .

Related works in computer science
In Rumor Spreading problems (also referred to as Broadcast) a piece of information typically held by a single designated agent is to be disseminated to the rest of the population. It is the subject of a vast literature in theoretical computer science, and more specifically in the distributed computing community, see, e.g., [3,4,6,7,8,11,13,14,16]. While some works assume a fixed topology, the canonical setting does not assume a network. Instead agents communicate through uniform PU SH/PULL based interactions (including the phone call model), in which agents interact in pairs with other agents independently chosen at each time step uniformly at random from all agents in the population. The success of such protocols is largely due to their inherent simplicity and fault-tolerant resilience [10,14]. In particular, it has been shown that under the PU SH model, there exists an efficient rumor spreading protocol that uses a single bit per message and can overcome flips in messages (noise) [11].
The line of research initiated by El-Gamal [9], also studies a broadcast problem with noisy interactions. The regime however is rather different from ours: all n agents hold a bit they wish to transmit to a single receiver. This line of research culminated in the Ω(n log log n) lower bound on the number of messages shown in [13], matching the upper bound shown many years earlier in [12].
Several works have investigated algorithmic properties of networks with unstable topological structure, such as ephemeral networks, evolving graphs and edge-Markovian evolving graphs [2,5,1]. Such works prove analytical results assuming that the evolution of the topology satisfies certain constraints, and did not consider the case of noise affecting communication in conjunction with the dynamicity of the topology.
Theorem 4. Assume that the relaxed δ-uniform noise criterion is satisfied.
• Let k be an integer. Any rumor spreading protocol on the parallel-PU LL(k) model cannot converge in fewer rounds than • Consider either the sequential-PU LL or the broadcast-PU LL model. Any rumor spreading protocol cannot converges in fewer time steps than To prove the theorem, we first prove (in Reducing to the broadcast-PU LL Model) that an efficient rumor spreading algorithm in either the noisy sequential-PULL model or the parallel-PULL(k) model can be used to construct an efficient algorithm in the broadcast-PU LL model. The resulting algorithm has the same time complexity as the original one in the context of sequential-PULL and adds a multiplicative factor of kn in the context of parallel-PU LL(k).
We then show how to relate the rumor spreading problem in broadcast-PULL to a statistical inference test (Rumor Spreading and hypothesis testing). A lower bound on the latter setting is then achieved by adapting techniques from mathematical statistics (Proof of Theorem 7).
Remark 2. The lower bound of Theorem 4 loses relevance when s is of order greater than √ n. Indeed, the following simple protocol in the broadcast-PU LL model turns out to match the lower bound. For simplicity's sake, let us consider the case of a binary alphabet Σ = {0, 1}, and assume without loss of generality that the sources' opinion is 1. Each non-source agent, at each time step, chooses a random message u.a.r. in Σ, while each source agent always displays the correct message. After n time steps, the agents have collected n observations. If s √ 10n, a straightforward application of the Chernoff bound shows that with high probability at least s/2 of the n observations come from source agents. Thus, at most n − s/2 of the observations have distribution Bernoulli( 1 2 ), while at least s/2 of them are identically 1 before the effect of noise is taken into account. Since s/2 is of the same order of the standard deviation of the non-source messages, the agents have a good probability to correctly infer the correct opinion by choosing the most frequent message among the n observations.

Reducing to the broadcast-PULL Model
The following lemma establishes a formal relation between the convergence times of the models we consider. We assume all models are subject to the same noise distribution.
Lemma 5. Any protocol operating in sequential-PU LL can be simulated by a protocol operating in broadcast-PU LL with the same time complexity. Moreover, for any integer 1 ≤ k ≤ n, any protocol P operating in parallel-PULL(k) can be simulated by a protocol operating in broadcast-PU LL with a time complexity that is kn times that of P in parallel-PULL(k).
Proof. Let us first show how to simulate a time step of sequential-PU LL in the broadcast-PULL model. Recall that in broadcast-PULL, in each time step, all agents receive the same observation sampled u.a.r from the population. Upon drawing such an observation, all agents use their shared randomness to generate a (shared) uniform random integer X between 1 and n. Then, the agent whose unique identity corresponds to X is the one processing the observation, while all other agents ignore it. This reduces the situation to a scenario in sequential-PULL, and the agents can safely execute the original algorithm designed for that model.
As for simulating a time step of parallel-PU LL(k) in the broadcast-PULL model, agents divide time steps in the latter model into rounds, each composing of precisely kn time steps. Recall that the model assumes that agents share clocks that start when the execution starts and tick at each time step. This implies that the agents can agree on the division of time into rounds, and can further agree on the round number. For an integer i, where 1 ≤ i ≤ kn, during the i-th step of each round, only the agent whose identity is (i mod n)+1 receives the observation, while all other agents ignore it. Observe that receiving the observation doesn't imply that the agent processes this observation. In fact, it will store it in its memory until the round is completed, and process it only then. The aforementioned rule for receiving a message, ensures that when a round is completed in the broadcast-PU LL model, each agent receives precisely k independent uniform samples as it would in a round of parallel-PULL(k). Therefore, at the end of each round j ∈ N + in the broadcast-PULL model, all agents can safely execute their actions in the j'th round of the original protocol designed for parallel-PU LL(k). This draws a precise bijection from rounds in parallel-PU LL(k) and rounds in broadcast-PU LL. The multiplicative overhead of kn simply follows from the fact that each round in broadcast-PU LL consists of kn time steps.
Thanks to Lemma 5, Theorem 4 directly follows from the next theorem.
Theorem 6. Consider the broadcast-PU LL model and assume that the relaxed δ-uniform noise criterion is satisfied. Any rumor spreading protocol cannot converge in fewer time steps than The remaining of the section is dedicated to proving Theorem 6. Towards achieving this, we view the task of guessing the correct opinion in the broadcast-PU LL model, given access to noisy samples, within the more general framework of distinguishing between two types of stochastic processes which obey some specific assumptions.

Rumor Spreading and hypothesis testing
Consider the following class of problems.
Adaptive Coin Distinguishing Task (ACDT). A distinguisher is presented with a sequence of observations taken from a coin of type η where η ∈ {0, 1}. We can think of the type η as initially set to 0 or 1 with probability 1/2 (independently of everything else). The goal of the distinguisher is to determine the type η, based on the observations. More specifically, for a given time step t, denote the sequence of previous observations (up to, and including, time t − 1) by For each time t, we denote the probability distribution of the observation X (t) η ∈ Σ that the distinguisher receives, given the type η ∈ {0, 1} and the history of previous observations x (<t) , as We follow the common practice to use uppercase letters to denote random variables and lowercase letter to denote a particular realisation, e. g. X (≤t) for the sequence of observations up to time t, and x (≤t) for a corresponding realization.
Remark 3. We note that ACDT generalizes in several respects the canonical problem of distinguishing between an unbiased coin and a coin with fixed bias ε (see, e.g., Chapter 5 in [17]). It is more general because 1. the probabilities of observations may vary adaptively as a function of the outcome of the previous samples, since the coins and p = m | x (<t) ) in (1) actually depend on x (<t) , the history of observations up to time t − 1, and 2. instead of binary random variables we consider Σ-valued random variables.
The bounded family ACDT(ε, δ). We consider a family of instances of ACDT, called ACDT(ε, δ), governed by parameters ε and δ. Specifically, this family contains all instances of ACDT such that for every t, and every history x (<t) , we have: In the rest of the current section, we show how Theorem 6, that deals with the broadcast-PULL model, follows directly from the next theorem that concerns the adaptive coin distinguishing task, by setting The actual proof of Theorem 7 appears in Proof of Theorem 7.
Theorem 7. Consider any protocol for any instance of ACDT(ε, δ), The number of samples required to distinguish between a process of type 0 and a process of type 1 with probability of error less than 1 3 is at least ln 2 9 In particular, if 10ε < δ, then the number of necessary samples is Ω δ ε 2 .

Proof of Theorem 6 assuming Theorem 7
Consider a rumor spreading protocol P in the broadcast-PULL model. Fix a node u. We first show that running P by all agents, the perspective of node u corresponds to a specific instance of ACDT( 2s(1−δ|Σ|) n , δ) called Π(P, u). We break down the proof of such correspondence into two claims.
The ACDT instance Π(P, u). Recall that we assume that each agent knows the complete neutral initial configuration, the number of sources s, and the shared of random bits sequence r. We avoid writing such parameters as explicit arguments to Π(P, u) in order to simplify notation, however, we stress that what follows assumes that these parameters are fixed. The bounds we show hold for any fixed value of r and hence also when r is randomized.
Each agent is interested in discriminating between two families of charged initial configurations: Those in which the correct opinion is 0 and those in which it is 1 (each of these possibilities occurs with probability 1 2 ). Recall that the correct opinion is determined in the 2nd stage of the charged initial configuration, and is independent form the choice of sources (1st stage).
We next consider the perspective of a generic non-source agent u, and define the instance Π(P, u) as follows. Given the history x (<t) , we set P (X (t) η = m | x (<t) ), for η ∈ {0, 1}, to be equal to the probability that u observes message m ∈ Σ at time step t of the execution P. For clarity's sake, we remark that the latter probability is conditional on: • the history of observations being x (<t) , • the sequence of random bits r, • the correct opinion being η ∈ {0, 1}, • the neutral initial configuration, • the identity of u, • the algorithm P, and • the system's parameters (including the distribution f source−state and the number of sources s).
Claim 8. Let P be a correct protocol for the rumor spreading problem in broadcast-PU LL and let u be an agent for which the protocol is guaranteed to produce the correct opinion with probability at least p by some time T (if one exists), for any fixed constant p ∈ (0, 1). Then Π(P, u) can be solved in time T with correctness being guaranteed with probability at least p.
Proof. Conditioning on η ∈ {0, 1} and on the random seed r, the distribution of observations in the Π(P, u) instance follows precisely the distribution of observations as perceived from the perspective of u in broadcast-PU LL. Hence, if the protocol P at u terminates with output j ∈ {0, 1} at round T , after the T -th observation in Π(P, u) we can set Π(P, u)'s output to j as well. Given that the two stochastic processes have the same law, the correctness guarantees are the same. Proof. Since the noise in broadcast-PU LL flips each message m ∈ Σ into any m ∈ Σ with probability at least δ, regardless of the previous history and of η ∈ {0, 1}, at all times t we have Consider a message m ∈ Σ \ Σ (if such a message exists). By definition, such a message could only be received by observing a non-source agent. But given the same history x (<t) , the same sequence of random bits r, and the same initial knowledge, the behavior of a non-source agent is the same, no matter what is the correct opinion η. Hence, for m ∈ Σ \ Σ we have P (X It remains to show that d ε (x (<t) ) ≤ 2(1−δ|Σ|)s n . Let us consider two executions of the rumor spreading protocol, with the same neutral initial configuration, same shared sequence of random bits r, same set of sources, except that in the first the correct opinion is 0 while in the other it is 1. Let us condition on the history of observations x (<t) being the same in both processes.
As mentioned, given the same history x (<t) , the behavior of a non-source agent is the same, regardless of the correct opinion η. It follows that the difference in the probability of observing any given message is only due to the event that a source is observed. Recall that the number of sources is s. Therefore, the probability of observing a source is s/n, and we may write as a first approximation ε(m, x (<t) ) ≤ s/n . However, we can be more precise. In fact, ε(m, x (<t) ) is slightly smaller than s/n, because the noise can still affect the message of a source.
We may interpret ε(m, x (<t) ) as the following difference. For a source v ∈ S, let m v η be the message of u assuming the given history x (<t) and that v is of type η ∈ {0, 1} (the message m v η is deterministically determined given the sequence r of random bits, the neutral initial configuration, the parameters of the system, and the identity of v). Let α m ,m be the probability that the noise transforms a message m into a message m. Then By the definition of ACDT(ε, δ), it follows that either α m v . Thus, to bound the right hand side in (2), we can use the following claim (proven in Proof of Claim 10) Claim 10. Let P and Q be two distributions over a universe Σ such that for any element m ∈ Σ, δ ≤ P (m), Q(m) ≤ 1 − δ. Then m∈Σ |P (m) − Q(m)| ≤ 2(1 − δ|Σ|).
Thanks to Claims 8 and 9, Theorem 6 regarding the broadcast-PULL model becomes a direct consequence of Theorem 7 on the adaptive coin distinguishing task, taking More precisely, the assumption (1−δ|Σ|) δsn ≤ c for some small constant c, ensures that ε δ ≤ c as required by Theorem 7. The lower bound Ω ε 2 δ corresponds to This concludes the proof of Theorem 6. To establish our results it remains to prove Theorem 7.

Proof of Theorem 7
We start by recalling some facts from Hypothesis Testing.
We use the notation log(·) to denote the base 2 logarithms, i.e., log 2 (·) and for a probability distribution P , use the notation P (x) as a short for P (X = x). First let us recall two standard notions of (pseudo) distances between probability distributions. Given two discrete distributions P 0 , P 1 over a probability space Ω with the same support, the total variation distance is defined as and the Kullback-Leibler divergence KL(P 0 , P 1 ) is defined as The assumption that the support is the same is not necessary but it is sufficient for our purposes, and is thus made for simplicity's sake.
The following lemma shows that, when trying to discriminate between distributions P 0 , P 1 , the total variation relates to the smallest error probability we can hope for.
Lemma 11 (Neyman-Pearson [17, Lemma 5.3 and Proposition 5.4]). Let P 0 , P 1 be two distributions. Let X ∈ Ω be a random variable of distribution either P 0 or P 1 . Consider a (possibly probabilistic) mapping f : Ω → {0, 1} that attempts to "guess" whether the observation X was drawn from P 0 (in which case it outputs 0) or from P 1 (in which case it outputs 1). Then, we have the following lower bound, The total variation is related to the KL divergence by the following inequality.
We are now ready to prove the theorem.
Proof of Theorem 7. Let us define P η ( · ) = P ( · | "correct distribution is η") for η ∈ {0, 1}. We denote P (≤t) η , η ∈ {0, 1}, the two possible distributions of X (≤t) . We refer to P (≤t) 0 as the distribution of type 0 and to P (≤t) 1 as the distribution of type 1. Furthermore, we define the correct type of a sequence of observations X (≤t) to be 0 if the observations are sampled from P (≤t) 0 , and to be 1 if they are sampled from P (≤t) 1 .
After t observations x (≤t) = (x (1) , . . . , x (t) ) we have to decide whether the distribution is of type 0 or 1. Our goal is to maximize the probability of guessing the type of the distribution, observing X (≤t) , which means that we want to minimize P f (X (≤t) ) = "correct type" = η∈{0,1} P η f (X (≤t) ) = 1 − η P ("correct type is η") . ( Recall that the correct type is either 0 or 1 with probability 1 2 . Thus, the error probability described in (3) becomes By combining Lemmas 11 and 12 with X = X (≤t) and P η = P (≤t) η for η = 0, 1, we get the following Theorem. Although for convenience we think of f as a deterministic function, it could in principle be randomized (see A remark about random guess functions, in the Appendix).
Theorem 13. Let f be any guess function. Then Theorem 13 implies that for the probability of error to be small, it must be the case that the term KL P (≤t) 0 , P (≤t) 1 is large. Our next goal is therefore to show that in order to make this term large, t must be large.
Note that P (≤T ) η for η ∈ {0, 1} cannot be written as the mere product of the marginal distributions of the X (t) s, since the observations at different times may not necessarily be independent. Nevertheless, we can still express the term KL(P where . (6) Since we are considering an instance of ACDT (ε, δ), we have • for every m ∈ Σ such that ε(m, x (<t) ) = 0, it holds that δ ≤ P η (X We make use of the previous facts to upper bound the KL divergence terms in the right hand side of (6), as follows (recall that we omit the dependency of p (<t) and ε (<t) on the past observations x (<T ) ∈ Σ t−1 , in the interest of readability).
Recall that we assume We make use of the following claim, which follows from the Taylor expansion of log(1 + u) around 0. More details can be found in the Appendix, Proof of Claim 14.
Claim 14. Let x ∈ [−a, a] for some a ∈ (0, 1). 3 . Using Claim 14 with a = ε δ , we can bound the inner sum appearing in (7) from above and below with Since m |ε(m, x (<t) )| ≤ ε, we also have that m (ε(m, x (<t) )) 2 ≤ ε 2 . The latter bound, together with the fact that P (X Finally, we can similarly bound the term m∈Σ (ε(m, x (<t) )) 3 /P (X Recall that m ε(m, x (<t) ) = 0, thus the first term in (8) disappears. Hence, substituting the bounds (9) and (10) in (8) If we define the right hand side (11) to be W (ε, δ) and we substitute the previous bound in (7), we get and combining the previous bound with (5), we can finally conclude that for any integer T , we have Thus, from Theorem 13 and the latter bound, it follows that the error under a uniform prior of the source type, as defined in (4), is at least Hence, the number of samples T needs to be greater than to allow the possibility that the error be less than 1/3.
In particular, if we assume that 10ε < δ, then we can bound .
It follows that (11) This completes the proof of Theorem 7 and hence of Theorem 6.

Detectable sources
In this section, we aim to prove the following. • Consider the sequential-PU LL model. Assume that sT ≥ C log n, for a large enough constant C.
Any rumor spreading scheme cannot converge in less than Ω (T ) time steps.
• Let k be an integer. Assume that sT /k ≥ C log n, for a large enough constant C. Any rumor spreading protocol in the parallel-PU LL(k) model cannot converge in less than Ω(T /k) rounds.
Proof. Let us start with the first item of the corollary, namely the lower bound in the sequential-PULL model. For any step t, let S(t) denote the set of sources together with the agents that have directly observed at least one of the sources at some point up to time t. We have S = S(0) ⊆ S(1) ⊆ S(2) ⊆ . . .. The size of the set S(t) is a random variable which is expected to grow at a moderate speed. Specifically, letting s = 11 10 · s · T , we obtain: This last expression is a sum of n·T independent Bernoulli variables with parameter s/n. In other terms, it is a binomial variable with probability s/n and T ·n trials. By a standard Chernoff bound the probability that it deviates by a multiplicative factor 11 10 from its mean s · T is less than exp(−Ω(sT )) ≤ n −10 . The last bound holds because we assume sT ≥ C log n for some large enough constant C.
Denote by E the event that |S(t)| ≤ s for every t ≤ T . Using Claim 15, we know that P (E) ≥ 1−n −10 . Our next goal is to prove that the probability ρ that a given agent correctly guesses the correct opinion is low for any given time t ≤ cT , where c is a small constant. For this purpose, we condition on the highly likely event E. Removing this conditioning will amount to adding a negligible term (of order at most n −10 ) to ρ.
In order to bound ρ, we would like to invoke Theorem 6 with the number of sources upper bounded by s . Let us explain why it applies in this context. To begin with, we may adversarially assume (from the perspective of the lower bound) that all agents in S(t) learn the value of the correct bit to spread. Thus, they essentially become "sources" themselves. In this case the number of sources varies with time, but the proof of Theorem 6 can easily be shown to cover this case as long as s (i.e., s here) is an upper bound on the number of sources at all times. We can therefore safely apply Theorem 6 with s . By the choice of T , Hence, we can set c to be a sufficiently small constant such that for all times t ≤ cT , the probability of guessing correctly, even in this adversarial scenario, is less than 1/3. In other words, we have ρ ≤ 1/3. All together, this yields a lower bound of Ω(T ) on the convergence time.
As for the parallel-PU LL(k) model, the argument is similar. After T = T /k parallel rounds, using a similar claim as Claim 15, we have that with high probability, at most O(ksT ) agents have directly observed one of the s sources by time T . Applying Theorem 6 with s = O(ksT ) = O(sT ) yields a lower bound (in terms of samples in the broadcast model) of Θ n 2 δ (s ) 2 (1 − δ|Σ|) 2 = Θ n 2 δ s 2 T 2 (1 − δ|Σ|) 2 = Θ(T ).
The last line follows by choice of T . Hence T is a lower bound on the number of samples, which is attained in T rounds of parallel-PU LL(k) model.