## Figures

## Abstract

The neuronal code arising from the coordinated activity of grid cells in the rodent entorhinal cortex can uniquely represent space across a large range of distances, but the precise conditions for optimal coding capacity are known only for environments with finite size. Here we consider a coding scheme that is suitable for unbounded environments, and present a novel, number theoretic approach to derive the grid parameters that maximise the coding range in the presence of noise. We derive an analytic upper bound on the coding range and provide examples for grid scales that achieve this bound and hence are optimal for encoding in unbounded environments. We show that in the absence of neuronal noise, the capacity of the system is extremely sensitive to the choice of the grid periods. However, when the accuracy of the representation is limited by neuronal noise, the capacity quickly becomes more robust against the choice of grid scales as the number of modules increases. Importantly, we found that the capacity of the system is near optimal even for random scale choices already for a realistic number of grid modules. Our study demonstrates that robust and efficient coding can be achieved without parameter tuning in the case of grid cell representation and provides a solid theoretical explanation for the large diversity of the grid scales observed in experimental studies. Moreover, we suggest that having multiple grid modules in the entorhinal cortex is not only required for the exponentially large coding capacity, but is also a prerequisite for the robustness of the system.

## Author summary

Navigation in natural, open environments poses serious challenges to animals as the distances to be represented may span several orders of magnitudes and are potentially unbounded. The recently discovered grid cells in the rodent brain are though to play a crucial role in generating unique representations for a large number of spatial locations. However, it is unknown how to choose the parameters of the grid cells to achieve maximal capacity, i.e., to uniquely encode the utmost locations in an open environment. In our manuscript, we demonstrate the surprising robustness of the grid cell coding system: The population code realised by grid cells is close to optimal for unique space representation irrespective of the choices of grid parameters. Thus, our study reveals a remarkable robustness of the grid cell coding scheme and provides a solid theoretical explanation for the large diversity of the grid scales observed in experimental studies.

**Citation: **Vágó L, Ujfalussy BB (2018) Robust and efficient coding with grid cells. PLoS Comput Biol 14(1):
e1005922.
https://doi.org/10.1371/journal.pcbi.1005922

**Editor: **Jakob H. Macke,
Stiftung caesar, GERMANY

**Received: **March 13, 2017; **Accepted: **December 8, 2017; **Published: ** January 8, 2018

**Copyright: ** © 2018 Vágó, Ujfalussy. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All relevant data are within the paper.

**Funding: **B.B.U. and L.V. were supported by the National Brain Research Program of Hungary (KTIA-NAP-12-2-201). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Optimising neuronal systems for efficient processing and representation of information is a key principle for both understanding and designing neuronal circuits [1], but deciding whether a particular neuronal phenomenon reflects an optimisation process is often difficult. Grid cells in the medial entorhinal cortex have been suggested to efficiently represent spatial location of the animal by their spatially periodic firing fields near optimally [2, 3, 4, 5]. However, it remained controversial whether the efficiency of the grid cell code is the result of the precise tuning of the grid parameters [6, 7, 8] or the performance of the system is relatively insensitive to the actual parameter settings [4, 5, 9].

Grid cells are spatially tuned neurons with multiple firing fields organised along the vertices of a triangular grid (Fig 1a; [10, 11]). Grid cells of any particular animal are organised into functional modules [12, 13] cells within a module share the same grid scale and orientation, but differ in the location of their firing fields, i.e., their preferred firing phase within the grid period (Fig 1a). Modules form the functional units of the grid representation: The joint activity of all (possibly hundreds of) cells within each module is captured by the (two dimensional) phase of the given module (Fig 1b; [14, 15]) and the relationship between different cells from the same module remains stable across different environments [16], during sleep [17, 18] or after environmental distortions [13]. A given spatial location is represented by the phases of the different modules (‘phase vector’). The representations are unique up to a critical distance above which the coding becomes ambiguous: the phase vectors, and hence the firing rates of all grid cells, become (nearly) identical at two separate physical locations (Fig 1c).

**(a)** Schematic firing fields (circles) of two-dimensional grid cells as function of spatial position. Grid cells are organised into modules: Cells from the same module share the orientation and scale parameter but differ in their spatial phase (top, shades of purple). Different modules have different scale and orientation (top to bottom). **(b)** Grid cell spikes encodes the phase of a module. Spiking of grid cells (black ticks, each spike is shown three times, at the maxima of the cells’ firing rate) from a single module represents the movement of the animal (light-blue line) in a 1 dimensional environment. Since the firing rate of the cells (right, olive) is periodic, the position (left: colormap, right: black) which is represented by the phase of the module is also periodic. The uncertainty of the representation fluctuates over time around a typical value, *δ* (right). **(c)** Grid cell coding schemes. The location of the animal (origin, filled arrow) is jointly encoded by the phases of the different modules in both nested (top) and modulo arithmetic (bottom) codes. Grey and empty arrowheads indicate locations with large or catastrophic interference between the modules, respectively.

Depending on the magnitude of the critical distance compared to the largest grid scale, two complementary coding schemes have been proposed for grid cells (Fig 1c): In nested coding [4, 6, 8] smaller grid modules iteratively refine the position coding of larger modules and the modules span a wide range of scales. The capacity of nested codes, defined as the ratio of the coding range and the resolution, is exponential in the number of modules. Maximal capacity can be achieved by setting the coding range equal to the maximal grid period and then optimising the resolution by a geometric progression of the grid scales [6]. When the total capacity is utilised to encode locations within the maximal grid scale, catastrophic interference will cause ambiguity in the grid code beyond this distance (Fig 1c).

When the coding is not optimised for a fixed range, the unique combination of the activity of grid modules can encode a potentially unbounded range that can be substantially larger than the scale of the largest module using a modulo arithmetic (MA) code [2, 3, 14] (Fig 1c). In this case the grid periods can be similar in magnitude (e.g., co-prime integers or a geometric progression with a relatively small ratio). However, it is not known under what conditions the MA coding system can achieve exponential capacity [3, 14], and how robust is the capacity against the choice of the grid periods or neuronal noise.

Here we develop a novel approach to study the capacity of the grid coding system that is based on Diophantine approximations, i.e., approximation of real numbers by rational numbers. First, we apply the technique to study coding with two grid modules. We show that the capacity of the system is extremely sensitive to the number theoretic properties of the scale ratio between the modules. Next, we generalise our approach to the case of multiple modules, and show both analytically and numerically that the exponential capacity of the grid cell coding system can be achieved using the MA coding scheme. Finally, we demonstrate that when the coding range is constrained by neuronal noise, the capacity of the system is extremely robust against the choices of the scaling of the modules.

## Results

In the first section of the Results we briefly introduce the terminology and define the concepts used throughout the paper. We include this section for completeness, although several ideas presented in this section have been described before, e.g., in [3, 4].

We investigate grid cell population codes along a linear trajectory as the one dimensional results extend to two (or higher) dimensions without difficulty, at least for axis aligned grid modules (Methods) [3, 4]. The periodic population activity of module *i* can be summarised by its spatial phase
which depends on the position (*x*) and the scale of the module (*α*_{i}, Methods) with *α*_{0} = 1, which means that distances are expressed in the unit of the smallest grid period. We assume, that, without loss of generality, at the spatial origin all modules are in their 0 phase. Spikes of the neurons in module *i*, represent the spatial location of the animal with a maximum error *δ* *α*_{i} (i.e., with the same 0.01 ≤ *δ* ≤ 0.2 phase error for all modules; Methods; [8]). Ambiguity occurs if the phase difference *ϵ* between the modules is smaller than *δ* at integer distance *ℓ* from the origin (Figs 1c and 2b, inset, Methods):
(1)
where ||*ψ*|| means distance from the nearest integer.

**(a)** Interference with rational scale ratio. Left: Representative posteriors (*P*(*x*|**s**)) for two modules with scale 1 and *α* = 3/2. Encoding becomes ambiguous at distance 3 from the origin where perfect interference occurs (3 = 2*α*). Right: Phase plot of the two modules, with the colour (red to blue) encoding the distance from the origin (see the coloured line below the left panel). Perfect interference occurs when the phase-curve overlaps with itself. **(b)** Interference with *α* = 1.76 …, which is close to 7/4 and therefore leads to strong interference at distance 7. Right: Interference occurs when the distance between two neighbouring segments of the phase curve becomes smaller than the limit set by the neuronal noise (grey squares of side *δ* around the origin, see inset). Note, that both grids are around phase 0.3 at the distance 2.3 without interference. **(c)** Interference with *α* = *σ* ≈ 1.618, which is the golden ratio. Interference still becomes stronger at larger distances, (e.g. at distance 5, since ). Interference in grid codes is related to the approximation of irrationals with rational numbers having small denominators (see text for further details). Right: Interference is inevitable since the phase space has a limited volume.

To analyse the coding properties of the grid cell system, we follow the same three logical steps both in the two module and in the multi-module case (Fig 3). First, we show the existence of an upper bound on how the maximal phase difference *ϵ*(*ℓ*) between the modules decreases with the distance. Intuitively, this upper bound expresses the fact that interference between the modules necessarily becomes stronger at larger distances. Second, we demonstrate that for appropriately chosen scale ratios a lower bound on the phase difference also exists and is parallel with the upper bound (Fig 3). For these scale choices catastrophic interference is avoided until a critical distance, that depends on the noise level in the system. Importantly, the slope of the bounds depends only on the number of modules, but not on the choice of the scale parameters. Therefore the efficiency of the scale choices (the magnitude of the critical distance) can be characterised by the offset parameter, *c*_{α} (defined below), associated with the lower bound. Thus, our third step is to estimate *c*_{α} for various choices of the scale parameter *α*.

First, we provide an absolute upper bound on the phase difference in the function of the distance, that is linear in log-log scale (left). Second, we show, that for certain scales a lower bound also exists (middle). Third, we characterise the efficiency of the scales (*α*) by their offset, *c*_{α} (right).

Our analytic derivations provide an estimate for the asymptotic performance of the system that is valid in the low-noise limit. The main advantage of our approach is that it provides strict bounds on the achievable coding efficiency that can be used as a metric to evaluate the efficiency of different scale choices at realistic noise levels, two or more modules alike. As we found using numerical simulations, these bounds can be approached with random scale choices at realistic levels of noise and number of modules.

### Coding is extremely sensitive to the scale ratio with two modules

We can formalise the problem of interference between two modules as having a pair of integers *k* and *ℓ* with *ℓ* ≈ *kα*, meaning that module 2 (with scale *α*) is close to being in phase 0 at distance *ℓ*, which would cause ambiguity between the coding of the spatial point *ℓ* and the origin. This is formally identical to the number theoretic question of the approximability of the scale *α* ≈ *ℓ*/*k* with rationals having numerator *ℓ*, also known as Diophantine approximations (Fig 2b and 2c).

Hurwitz’s theorem [19, 20] states that for all irrational numbers *α* > 1 there are infinitely many relative primes *k*, *ℓ* such that the error of the approximation, defined as
(2)
is smaller than the upper bound:
(3)
Note that the approximation error *ϵ*(*ℓ*) (Eq 2) is the same as the phase difference between the modules, defined in Eq 1, since *ψ*_{2}(*ℓ*) = [*ℓ*/*α*] mod 1 = |*k* − *ℓ*/*α*| for an appropriately chosen integer *k* (Fig 2b and 2c). We call *ϵ*(*ℓ*) ‘approximation error’ only when we are talking about approximating irrationals with integer ratios while in the context of grid cells we will call *ϵ*(*ℓ*) the ‘phase difference’.

Applied to the grid cells, Hurwitz’s theorem provides an upper bound on how the phase difference between the modules shrinks with the distance. Specifically, the theorem states that there are infinitely many integer distances *ℓ*, where the phase difference is smaller than *ϵ*(*ℓ*) ∝ 1/*ℓ* (Fig 4a–4c, dashed lines), implying that on the long run interference can not be avoided no matter how carefully we choose *α*. This is a fundamental upper bound on the efficiency of coding with grid cells.

**(a-c)** The phase difference in the function of distance for different values of *α*. The zigzag line indicates the phase difference (PD) at all integer distances, circles indicate record low PD. Dashed line shows the theoretical upper bound of the PD, solid line shows the numerical fit on the lower bound (allowing finitely few exceptions at low *ℓ*). Note, that the lower and the upper bound coincides in b. Also note the 1/*ℓ* scaling of PD for algebraic scale ratios (b-c). Grey shading indicates the range of PD smaller than the neuronal noise, (1 + *α*)*δ*. **(d)-(f)** The scaled phase difference (, Eq 7) for different distances and scale ratios. The highest constant under which there are only finitely few values of at small distances estimates . The value of is slightly higher for the golden ratio (e) than for (f), and much larger than for non-algebraic numbers (d, , ).

The critical distance where the phase difference *ϵ*(*ℓ*) leads to interference, and the representation of the position becomes ambiguous, depends on the noisiness of the two modules, *δ* and *αδ*. Interference occurs if there is a spatial point *x* for which both |*x* − *kα*| < *αδ* and |*x* − *ℓ*| < *δ* for integers *k*, *ℓ*, or equivalently, if |*kα* − *ℓ*| < (1 + *α*)*δ*. Hence, by the definition of *ϵ*(*ℓ*) (Eq 2) the coding is ambiguous near *ℓ* if and only if
(4)
Therefore, no matter how we chose *α*, we can expect ambiguity at distances *ℓ* from the origin if the noise in the system is larger than the upper bound on efficiency provided by Eq 3, i.e.
(5)
that is, at distance of order 1/*δ*. Consequently, it is impossible to code position with two modules better than this bound.

The question arises then whether the above theoretical bound is achievable, at least for some appropriately chosen *α*. The answer is yes, namely the upper bound in Eq 3 is sharp for the golden ratio . Practically, this also introduces a limit on *ϵ*(*ℓ*), saying that the phase difference between the modules remains always larger than a specific lower bound:
(6)
except for a couple of small distances, even for arbitrary small *ε* > 0 [19] (Fig 4b).

It may sound strange that there are finitely many exceptions, but in our simulations we found only a few instances with *ℓ* being small (Fig 4e). Therefore, if the ratio of the two grid modules equals the golden ratio then the phase difference between the two modules is guaranteed to be larger than the lower bound defined by Eq 6. Since *ε* can be arbitrarily small, the lower bound for the golden ratio approaches the theoretical upper bound (Eq 3) and *σ* is an optimal choice for the scale ratio to avoid interference in case of two modules. To give a geometric picture, the golden ratio guarantees approximately uniform coverage of the phase space for both short and arbitrarily long distances (Fig 2c, right).

However, it turns out that there are many good choices [20]: for any algebraic integer *α* of order 2 (i.e. irrational which is a root of a polynomial of degree 2 with integer coefficients, see Methods) there exists a maximal positive constant *c*_{α} > 0 such that
(7)
holds except for a couple of small distances (Fig 4c–4f, [21]). Hence, from Eqs 4 and 7 we see that the representation is unambiguous whenever *c*_{α}(1 + *α*)/(*α* *ℓ*) > *δ*(1 + *α*)/*α*, that is up to
(8)
for all *δ* which is small enough. This last condition on the magnitude of the noise is only needed to exclude the possible exceptionally small *ℓ* distances in Eq 6, which in practice is not a crucial condition (Fig 4d and 4e).

The constant *c*_{α} is the single parameter that determines the critical distance up to which encoding is unique (coding range) as well as the information rate of the system (Methods). Therefore, we use *c*_{α} to compare the efficiency of different choices of *α* (Fig 4d–4f). We have already noted that for the golden ratio the lower and the upper bounds (Eqs 3 and 7) coincide (Fig 4b), but the critical distance may be larger for some *α* even if the corresponding lower bound on the phase difference is weaker, since the upper bound also depends on *α* (Eq 5).

We estimated the value of *c*_{α} for various scale ratios at different noise levels (Methods). Unlike for algebraic numbers, of real numbers depends on the distance range used for the estimation, which we controlled by setting different intervals for *δ* in the simulations.

Our simulations confirmed that *σ* is the best scale ratio choice in case of two modules with , but also showed that, on both short and long run, is extremely sensitive to the choice of *α* (Fig 5): in case of a small error in the tuning of *α*, the efficiency can drop substantially and becomes practically 0, implying that in the immediate neighbourhood of the optimal *α*, there are close to pessimal grid cell configurations. This is because the lower bound on the phase difference (Eq 7) requires *α* to be an algebraic number, and in an arbitrary small neighbourhood of any algebraic number there are (infinitely) many non-algebraic numbers, i.e., transcendental numbers (, Fig 4a) or rational numbers (*α* = 3/2, *c*_{α} → 0, Fig 2a). As non-algebraic irrational numbers can be much better approximated with rationals than algebraic numbers, non-algebraic grid scale ratios will lead to much stronger interference between the two modules, but only at distances moderately large compared to the scale of the modules (Fig 5).

The values are shown for 1000 *α* randomly selected from the interval (1, 2). The *c*_{α} of algebraic (, , *σ*, *σ* − 1/2) and non-algebraic () irrationals are also shown in red and black, respectively. We estimated *c*_{α}, based on Eq 8, as *c*_{α} ≈ inf{*L*_{max} *δ* ∣ *δ*_{min} < *δ* < *δ*_{max}}, for different noise ranges [*δ*_{min}, *δ*_{max}] in the three panels: (a) 0.05 < *δ* < 0.2, (b) 0.005 < *δ* < 0.05, (c) 0.001 < *δ* < 0.01. Approximate *L*_{max} values are indicated on the top of the panels for *α* = *σ*.

The extremely rough landscape of *c*_{α} renders optimisation for *α* an especially difficult problem: it is very unlikely that a biological system would be able to find the global optimum for the scale ratio of two grid modules and a relatively small mistuning from a local optimum could significantly deteriorate the efficiency of the system. Therefore, at least in the case of two modules, it seems to be impossible to achieve asymptotically optimal scale ratio for the grid cells.

### Generalisation to multiple modules

To derive the general solution for *M* grid modules, we focus on a set of 1-dimensional grids with scales *α*_{0} = 1 < *α*_{1} < ⋯ < *α*_{M−1}. Spatial representation is unambiguous up to a distance *L* from the origin if there is at least one module for which the phase is significantly different from 0 (Fig 6). Interestingly, avoiding interference between adjacent modules (giving *α* = *σ*) is not a good solution, since it leads to interference between the distant modules wherever the adjacent modules are in close apposition (Methods). The logic of the general solution for multiple modules is the same as in the case of two modules. Here we only state the main results and the technical details of the analysis can be found in the Methods.

**(a)** Top: Posterior densities for three modules with rational scale ratios. The overlap between the modules is shown in black, its height indicates the interference of the three modules as a function of distance from the origin. The representation becomes ambiguous only if all 3 modules interfere, as at distance 6. Bottom: Ambiguity in position coding quantified by the multi-modality of the combined posterior. **(b)** Posterior densities for three modules with pairwise optimal scale ratios. The scales are 1 (blue), *σ* (red), and *σ*^{2} (olive), where *σ* is the golden ratio. As we have more modules (3) than the order of *σ* (2), wherever any two modules interfere with each other, then they interfere with the third as well: at distance 8 the three peaks almost coincide. **(c)** The same as in (b) for scales 1, 2^{1/3}, 2^{2/3}, powers of a third order algebraic number. Although pairwise interference can be very strong between any pairs (e.g. at distances 5, 6.2 and 8), the total interference is substantially lower than in panel b (bottom).

First, we show that a similar upper bound exists for the maximal phase difference between the modules. Compared to the two-module case, the bound is weaker when *M* > 2 as the phase difference scales only with 1/*ℓ*^{1/(M−1)} ≫ 1/*ℓ* meaning that it ensures simultaneous interference between all modules only at much larger distances.

Second, we found that the upper bound can be satisfied, up to a constant multiplier, , for algebraic scale ratios (Methods). Specifically, if the scales of *M* modules form a geometric series with common ratio *α* being an algebraic number of degree *M*, the upper bound is tight, meaning that the phase difference does not shrink faster than 1/*ℓ*^{1/(M−1)}. Intuitively, this scaling indicates that there is always at least one pair of modules for which the phase difference at the integer distance *ℓ* from the origin is larger than the lower bound.

The critical distance *L*_{max} up to which coding is unambiguous can be expressed as (cf. Eq 8):
(9)
for all *δ* which is small enough, where and its estimate are defined analogously as in the two modules case (see Methods for the definition). Intuitively, Eq 9 expresses an exponential scaling of the maximal distance uniquely represented by a population of grid cells with the number of grid modules, *M*. The coding range of a particular set of the grid scales, , depends on both the noise in the system and on the basis of the exponential .

Interestingly, the above described geometric sequence of algebraic numbers are the only known explicit examples of badly approximable vectors (to the best of our knowledge). However, it is known that there are much more such vectors which do not form geometric sequences [22], therefore the scale ratio of a well-tuned MA grid cell system does not have to be constant.

The expression about the exponential scaling (Eq 9) is similar to the capacity estimates of Fiete et al. [3] (see their Eq 6) obtained using a combinatorial upper bound and numerical simulations. Importantly, our analytic derivation also provides insight about why certain grid systems are more efficient than others and give examples for efficient grid scales. Moreover, when our formula for the capacity of the grid code becomes identical to the theoretically maximum capacity found in the case of nested coding [4, 6].

In the next sections we first numerically estimate the value of for various choices of the grid scales and then we show that with sufficiently large number of modules is guaranteed to approach its theoretical maximum for randomly chosen grid periods.

### Numerical estimation of the

We developed an efficient method to numerically estimate the value of for various parameter settings that is based on the simultaneous Diophantine approximations of a set of irrational numbers (Methods). Using realistic noise levels we found that, in contrast to the case of two modules, the sensitivity of the coding efficiency to the choice of *α* gradually vanishes when the number of grid modules is increased (Fig 7a and 7b). In particular, with *M* = 10 modules for almost all choices of the grid scales (Fig 7b), both when the scales follow a geometric series with a common scale ratio *α* (Fig 7b) and when all the *M* scales are chosen from the bounded interval (1, 2). We also found that vanishes only for pathological examples such as rational numbers or powers of the second order algebraic number *α* = *σ* − 1/2 ≈ 1.118 (Fig 7a and 7b, red). The only random scale choice that significantly degrades the performance is when *α* ≈ 1 (Fig 7a and 7b) in which case all grid modules have nearly identical spatial scale.

**(a)-(b)**: Values of estimated for 100 *α* randomly selected from the interval (1, 2) with *M* = 5 (a) and *M* = 10 (b) (see also Fig 5a for *M* = 2). The scales form a geometric series, i.e., . Red circle indicates for a second order algebraic number *α* = *σ* − 1/2 ≈ 1.118. Green (cyan) shows for 5th (10th) order algebraic numbers, respectively. Noise level is the same as in Fig 5a (0.05 < *δ* < 0.2). **(c)-(d)**: Mean (c) and coefficient of variation (d) of evaluated on the range *α* = {1.1, 1.9}. For *M* = 10 the is also shown for two alternative selection of the scales: if all 10 scales are selected randomly from the interval [1, 2] (olive) and when *α*s form a geometric series perturbed as , where *ε*_{i} are i.i.d. uniform random variables on the range [−0.01, 0.01] (purple). **(e)-(f)** Phase difference (, Eq 25) in the function of the distance, *ℓ*. (e) Effect of number theoretic properties with *M* = 10. When *α* is the root of the 10th order polynomial *x*^{10} − *x*^{7} − 1, *α* ≈ 1.12725, *ϵ*(*ℓ*) decays as 1/*ℓ*^{1/9} (black). When *α* is second order, *α* = *σ* − 1/2 ≈ 1.11803, the initial decay is similar, but after a critical distance at *ℓ* ≈ 10^{6} the decay becomes 1/*ℓ* (yellow). **(f)** The critical distance grows with the number of modules (*α*_{i} = (*σ* − 1/2)^{i}). Grey shading in (e-f) indicates the range of phase difference smaller than noise (Eq 26).

To quantify the sensitivity of the grid system against the choice of the scale parameters we calculated the mean and the coefficient of variation of with random choices of *α* (Fig 7c and 7d). We found, that the average increased monotonically with the number of grid modules indicating that the system’s performance becomes closer to the ideal value as the number of modules increased (Fig 7c). Moreover, the variability of consistently decreased with the number of modules reflecting the improved robustness of the system to the choice of grid periods (Fig 7d). Therefore not only the maximal coding distance increases exponentially with the number of modules, but the basis of the exponential, , also increases.

To further investigate the mechanisms responsible for the robustness of the system, we numerically evaluated the minimal phase difference between the modules, *ϵ*(*ℓ*), in the function of the distance (Fig 7e and 7f). In line with the predictions of the theory (Eq 25), we found that the phase difference decreased with *ℓ*^{−1/(M−1)}, i.e., with a small negative power of the distance for *α* being an order *M* algebraic number (Fig 7e, black). For suboptimal *α*-s, the scaling of the phase difference was nearly optimal up to a critical point beyond which the scaling followed the algebraic rank of *α* (i.e., second order *α* scales with 1/ *ℓ*, Fig 7e, yellow). Importantly, this critical point, where the transition occurs between ideal and number theoretical scaling is located at increasingly larger distances when the number of modules is increased (Fig 7f). Therefore, the asymptotic, number theoretical properties of the grid periods have a gradually lower impact on the performance of the system in the distance range limited by the intrinsic variability on neuronal spiking (Fig 7e and 7f, background shading).

These observations suggest that even random scale choices might achieve optimal performance as the number of modules grow. In the next section we make this statement mathematically precise and demonstrate that indeed, approaches its theoretical maximum, 0.5, when the number of modules grow and the scales are chosen uniformly at random from a bounded interval.

### Capacity of non-geometric grid scales

Our number theoretic argument (Eq 9) alone does not imply exponential capacity, since it does not exclude the possibility that the base of the exponential, , converges to 0 as *M* increases (although we observed the opposite trend, see Fig 7c). In this section we investigate the asymptotic properties of the grid code when the number of modules increases and the relative uncertainty *δ* of the modules remains fixed. Here we only state these results informally, and leave the precise statements and the slightly technical mathematical proof to the Methods.

The main idea behind the proof is that the phase of a given module at particular distance *x* from the origin depends only on the scale of that module, *α*. If the scale is randomly chosen from a bounded interval [1, *α*_{max}], then the phase is also a random variable with probability distribution approaching the uniform distribution as the distance increases. Then, the probability of simultaneous interference between *M* modules, that is, the probability of all modules being near phase 0 at some distance *x*, is proportional to the volume of an M-dimensional hypercube, which is *V* = (2*δ*)^{M}, where the side of the cube is 2*δ*. The ratio of the volume of the hypercube and the unit cube (the number of distinguishable phases) diminishes exponentially with *M*, and the total distance (expressed in units of *α*^{0} = 1) covered without ambiguity is . Specifically, our statement is, roughly speaking, that if 0 < *δ* < 1/2 is fixed, *M* is large enough, and the module scales are drawn uniformly at random from a not too narrow bounded interval, e.g. from (1, 2), then the representation is unambiguous up to the exponential distance
(10)
with probability approaching 1, and approaching 1/2. Although the above statement applies only for *M* → ∞, and it does not provide examples for efficient scale choices for finite *M*, we emphasise that this result is stronger than our previous derivation (Eq 9) in four aspects: First, our previous derivation (Eq 9) allowed to tend to 0 as *M* increased. Now we showed that this does not happen for random scale choices, rather the value of the constant tends to its theoretical maximum, [6] for large *M* with high probability, confirming our previous numerical results (Fig 7c). Second, one can achieve this nearly optimal performance without increasing the scales exponentially, with the scales chosen from a bounded interval. Third, this almost optimal efficiency is not only reached for some appropriately chosen scales, but for almost all choices. Fourth, near-optimal performance is guaranteed for 2 or higher dimensional grid codes even if the modules are randomly rotated relative to each other or in the absence of long-range coherence within the modules.

Thus, our results demonstrate that no meticulous tuning of the grid scales is required for close to optimal grid system performance.

## Discussion

In this paper we developed a novel analytic technique to investigate the coding properties of grid cells. Using this technique, which is based on Diophantine approximation of real numbers by fractions of integers, we were able to derive several novel and non-trivial properties of the grid cell code. First, we demonstrated that on the long run, the capacity of the system depends heavily and chaotically on the number theoretic properties of the scale ratio between the successive modules. To achieve optimal performance in a system with *M* modules the scale ratio has to be an algebraic number of order *M*. Second, we showed that in the presence of neuronal noise the capacity of the grid code becomes increasingly more robust to the choice of the scale parameters when the number of modules is increased: when *M* > 2, randomly chosen scales perform nearly as well as the optimal scales. Finally, we demonstrated that the capacity of MA and nested grid codes are asymptotically identical (in the large *M* limit), even for randomly chosen scale parameters for the MA codes.

### Exponential coding range

Previous works used specific assumptions to derive exponential coding range for the grid cell coding system: they assumed either a nested coding scheme [5, 6] or presumed that the phase space is covered evenly and that the readout noise in a given module decreases when the number of modules increases [3, 14]. Here we generalised these findings and demonstrated that nested and MA codes have asymptotically equal capacity.

When we studied the capacity of MA codes we realised that achieving uniform coverage of the phase space is not trivial in the case of two modules, but can only be attained with appropriately chosen scales. Specifically, we recognised that approximately uniform coverage of the phase space by the phase curve at arbitrary distances is guaranteed if the scale ratio between the two modules is an algebraic number of order 2. Using our formalism allowed us to generalise this intuition for arbitrary number of grid modules and to demonstrate that even a random choice of grid scales guarantees uniform coverage of the phase space when the number of modules is high.

We also relaxed the assumption of an earlier study [14] that the total amount of the noise remains constant in the grid system even when the number of modules is increased, i.e., the readout error of each module decreases with *M*. Here we derived these results using the more general assumption that the coding precision of each module is independent of *M* and proportional to the scale of the module.

We confirmed our analytical results by extensive numerical simulations regarding the simultaneous interference between grid systems with various choices of the scale parameters. In line with previous results [3, 9], our simulations supported that the grid system is robust to the choice of the scale parameter and that the coding range is exponential in the number of modules.

### Nested coding versus MA code

Although the efficiency of the coding investigated in this paper is slightly worse than that of the optimal nested coding [5, 6], MA codes also have several advantages. First it uses orders of magnitude smaller scale lengths than the maximal distance up to which the coding works properly. The largest grid scales measured experimentally are ∼3 m [23, 24] and extrapolations based on the dorso-ventral location of the recording electrodes within the entorhinal cortex extend to ∼10 m [13], a period still substantially smaller than the typical distances travelled daily by rodents (several hundreds of meters [25]) or bats (several kilometres, [26]; see also [27]).

Second, while the consequence of a module failure simple decreases the capacity of the system in the case of MA coding, it can have more dramatic effect in nested codes: Although malfunction of the largest or smallest module reduces either the capacity or the resolution of nested codes, respectively, the lack of intermediate modules functionally breaks the interaction between the remaining modules decreasing both the resolution and the capacity of the system in a disproportionate manner.

Third, once the scales are optimised for a given noise level, the coding range of nested grid codes does not depend on *δ*. Therefore, contrary to MA codes, it is not possible to increase the capacity by inserting more neurons into the same modules or by observing more grid cells from the same set of modules. Conversely, the functioning of the nested codes critically depends on accurate decoding of each module: If the readout neuron does not have access to enough presynaptic neurons from a given module, then the corresponding posterior becomes too wide leading to interference between the modules. This has similar consequences as the absence of the given module in nested codes. In contrast, in MA codes the coding properties remain similar for postsynaptic neurons receiving different number of synapses from different modules, although the coding range is the function of the precision available for the observer (Eq 9).

When encoding dynamic trajectories instead of static locations, the number of neurons required to participate in a given module decreases quadratically with the scale of the module, i.e., [8]. For example, if representing the position in the 2D space with some fixed accuracy with *α*_{i} = 0.2 m requires ∼ 4000 neurons then *α*_{j} = 2 m needs only ∼ 40 neurons. This scaling implies that the coding range of the nested grid system can be easily and parsimoniously extended by adding a new module with larger scale but containing only relatively few neurons. Although the relationship between the number of neurons in a module and its scale holds also for MA codes, the total number of neurons required to achieve similar coding range can be substantially smaller in nested codes.

Another consequence of dynamical coding is that the time constant of the readout has to be matched to the scale of the grid modules [8]. As the grid scale varies over a large range in the case of nested codes, the postsynaptic neuron has to integrate inputs from different grid cells with time constants ranging from 1 ms to 1 second [8]. In MA codes, the modules have similar scales and their outputs can be integrated with similar time constants.

Finally we note that nested coding and MA coding are not mutually exclusive: although they imply fundamentally different way of decoding the same positional information [7, 14], but both can be present in the same system. The MA code has a larger coding range if so it is favoured by small *α* (small differences between scales) and small *δ* (high accuracy). Even in this case locations within the largest grid scale can be decoded as in nested coding, while MA decoder is required beyond this distance.

### Planar grid cells

In the Methods we show that the coding capacity of two or higher dimensional grid cells depend on the same number theoretic properties, and therefore the results obtained in dimension one extend to planar or cubic grid cells as well [28], provided that the main axes of the different modules remain aligned with each other.

If the two dimensional grid modules are rotated compared to each other, then the scale choices which perform well will be different from the scales that are optimal for axis aligned modules. Consider for example that , which is a relatively good choice for *M* = 2 (rightmost red circle in Fig 5a–5c), leads to cathastrophic interference at *ℓ* = *α* when the grids are rotated by 30°. Consequently, the incoherent reorientation of the grid modules during global remapping [13] renders the optimisation of the grid scales unfeasible. However, the main point of this paper is that we have shown analytically that almost all scale choices perform near optimally if the number of modules is high enough, which also applies for grid cells rotated uniformly at random relative to each other (Methods).

Moreover, the 2D grids does not need to show perfect triangular symmetry to achieve high capacity: environmental boundaries [29, 24] or non-euclidean geometry [30, 31] can distort the grid pattern, but as long as the distortion is coherent among modules, our theory applies unchanged. If the scales slightly vary on the long range, then our derivation based on the Diophantine approximations does not apply. However, our derivation stating exponential capacity for grid systems with many random scales (Methods) remains still valid.

### Optimization and robustness

The highly organised, regular patterns formed by the firing fields of grid cells suggest that the characteristics of the grids must be closely related to the computational function of these neurons: optimally representing and processing information about the spatial location of the animal [32, 33, 14, 4, 11]. Besides the general optimality of triangular grid-like firing fields for representing unbounded 2D space [28], recent theoretical work derived optimal scale ratio of successive grid modules in the case of nested coding [6, 7, 8].

These studies, using different assumptions, arrived at slightly different conclusions regarding the optimal value of *α*. Stemmler et al., [7] fixed both the coding range to *L*_{max} = 3 for a pair of grid modules with scales {1, *α*} and found that *α* = 3/2 minimises the ambiguity errors within that range. Mathis et al., [4] and Wei et al., [6] also fixed the coding range and minimised the number of neurons required to achieve a given resolution and provided both estimates for the maximal capacity of the grid cell coding system and a specific architecture (i.e., optimised nested codes) that achieves maximal efficiency. The optimal scale ratio for nested codes was found to depend both on the magnitude of the noise in the system and on the type of decoder [4, 6]. Rather than fixing the coding range, we were interested in grid codes that work for potentially unbounded environments and found a similar asymptotic capacity for MA codes using random grid scales. Although predictions derived from nested coding roughly agree with the average scale ratio observed in the entorhinal cortex [12, 13, 29], they do not explain the substantial amount of variability which characterises the data.

In our derivations we assumed that the decoding error of a given grid module is larger than *δ* with some small probability. Inaccurate decoding of a single grid module can lead to disproportionally high error in the position representation if the subsequent time frames are decoded independently [14, 9]. However, the chance of catastrophic ambiguity errors can be substantially reduced if a dynamical decoder combines prior information representing the predicted spatial position with the location encoded by the incoming grid cell spikes [34, 14, 8].

Our results based on the Diophantine approximations requires that the scale of the modules are set precisely, so that the phase of the different modules does not drift relative to each other (i.e., ). Although theoretical considerations suggest that drift can not be completely suppressed in a noisy neuronal system [35, 36], whether different grid modules respond coherently to distortions caused by environmental manipulations is not known [15, 29, 24]. The remarkable robustness of the grid system’s efficiency against the choice of the scale ratio suggests that grids with loosely set scale parameters could also obtain a similar performance. Indeed, our derivation using randomly selected grid scales does not require precisely set scale parameters yet it provides the asymptotically exponential capacity for the grid system.

The optimization principle assumes that substantial improvement in the performance of the system can be achieved with precise tuning of its parameters. In the present study we demonstrated that this is indeed the case in the absence of noise. However, even in this case, optimization would be almost unfeasible for three reasons. First, the coding range is an extremely irregular, discontinuous function of the scale parameter, making optimisation essentially a trial and error game. Second, a scale parameter that is optimal for a given number of modules is guaranteed to be inefficient when the number of modules is increased precluding the possibility of pairwise or modular optimization. Finally, the optimal grid scales depend on the rotation of the modules relative to each other, which can change independently during changes in the environment [13].

However, taking the variability of neuronal firing into account changes the picture dramatically. We demonstrated that when the coding accuracy of grid modules is limited by neuronal noise, the capacity of the system becomes surprisingly robust to the choice of the scale parameters making its optimization unnecessary. Note, that even if the grid periods are not optimized across modules, generating the regular, periodic firing fields of grid cells demands accurate integration of velocity inputs [37, 36] and repeated error correction [38, 35], both requiring the precise tuning of single neuron and network parameters within a given module. In conclusion, our study demonstrates that the capacity of the grid cell system is nearly optimal with randomly chosen grid scales, and, instead of accurate parameter tuning, the experimentally observed scales could reflect the combined effect of random fluctuations and a gradient in the cellular properties along the dorso-ventral axis of the entorhinal cortex [39, 40].

### Predictions

Our finding, that grid cells have an exponentially large coding range even with randomly chosen grid scales of similar magnitudes makes several important predictions. First, MA coding predicts that the coding range is substantially larger than the largest grid period. Since grid cells are likely to be involved in path integration [32, 41] this prediction could be tested by probing path integration abilities of rodents beyond distances of the largest grid period [42].

Second, in the case of MA coding, different modules have similar contributions to the coding range of the system. Therefore, the effect of targeted dMEC lesion (inactivating a single module, as in [43]) on the rat’s navigation behaviour would be largely independent of the actual location of the lesion (i.e., which module is inactivated).

Third, since the performance of the system is independent of the precise choice of the grid scales, we expect a large variability in the scale ratio of successive grid modules both within and across animals. This prediction is consistent with the experimental data available [12, 13, 29], although further statistical analysis would be required to specifically determine the distribution of scale ratios.

Finally, we predict that the performance of the system is not particularly sensitive to incoherent changes in the scale parameter of a subset of modules during e.g., global remapping induced by environmental changes [16]. It has been shown that under certain conditions simultaneously recorded grid cells respond coherently within a module and independently across modules to environmental distortions [13]. To test the prediction of our theory, the behavioural consequences of incoherent realignment across modules should be assessed and compared with the effects of environmental manipulations inducing coherent realignment [29] or coherent distortion in the shape of the grid pattern [24, 29, 30, 31].

## Methods

### Grid cells in the 2D plane

Consider a system *G*^{2} of planar grid cells with a set of scales . Suppose that the axis of all modules are aligned and use the coordinate system
(11)
which is naturally generated by the triangular lattice. To compare with consider the one dimensional grid cell system *G*^{1} which has the same number of modules with the same set of scales, and for which each module represents the position of the animal with the same relative precision. To achieve this, the two dimensional modules need squared as many cells, nevertheless they also able to distinguish between squared as many spatial positions within one period of the scale.

If *G*^{2} represents a planar position ambiguously, i.e., , then clearly planar positions and are also represented ambiguously. Therefore, the corresponding one dimensional positions *x* and *y* are represented ambiguously by *G*^{1} as well. Conversely, if *z* is represented ambiguously by *G*^{1}, then , will be ambiguous in *G*^{2}. Therefore, an ambiguity of position at a given distance from the origin in case of planar cells can be matched to an ambiguity at the same order of magnitude of distance in the one dimensional grid system, and vica versa. The above argument also shows that the same scale choices perform best for both one dimensional grid cells and two or higher dimensional ones when the axes are aligned with each other.

### Estimating the precision of a single module

We chose *α*_{0} = 1 and fix the resolution of the system to *δ* < *α*_{0} (defined below) and investigate its coding range. A formally identical system with a fixed coding range and optimised resolution can be achieved by appropriately rescaling the grid scales.

We numerically estimated the precision of position coding by a single module by first simulating the motion of the animal as a one dimensional Gaussian random walk:
(12)
with Δ*t* = 1 ms temporal resolution and *D* = 0.005 m^{2}/s, which gives ≈ 5 cm displacement in 0.5 s [8]. We simulated the activity of *N* = [10, 300] grid cells from a single module. Grid cells had a circular tuning curve:
(13)
with the following parameters: *r*_{max} = 15 Hz, *r*_{0} = 0.1 Hz, λ = 0.25 m and *ϕ*^{k} chosen to uniformly cover the interval [0, 2*π*]. The power *n* = 22 was set to match the mean firing rate of the grid cells, 〈*r*(*x*)〉 = 2.5 Hz, to experimental data [16]. Larger (λ = 2.5 m) grid spacing was modelled by decreasing the speed of the animal by a factor of 10 (*D* = 0.00005 m^{2}/s). The firing rate is shown in Fig 1b, right (olive).

Spike trains were generated as an inhomogeneous Poisson process with neurons conditionally independent given the simulated location: (14)

Spikes of the neurons in module *i*, **s**_{0:t,i}, represent the spatial location of the animal with error *δ α*_{i} (i.e., with the same *δ* phase error for all modules) which can be interpreted as the width of the (periodic) posterior probability distribution *P*(*x*|**s**_{0:t,i}). For an ideal observer this posterior distribution quantifies how much a given spatial location is consistent with the observed spike pattern. The posterior distribution of the position was numerically calculated by recursive Bayesian filtering:
(15)
The colormap in Fig 1b shows this posterior distribution with *N* = 50 cells and λ = 0.25 m.

Naturally, the width of the posterior depends on several factors, most importantly on the number of neurons observed in a given module and on the scale of the modules relative to the typical speed of the animal [8]. At each timestep the posterior distribution was fitted with a von Mises distribution with a location *μ*_{t} and a concentration parameter *κ*_{t}. The width of the posterior relative to the grid scale was estimated as:
(16)
For analytic tractability, we use a bounded noise model in the derivations assuming that the location decoded from the spikes of a module is within *δ* *α*_{i} distance from the true location. To be conservative, we chose *δ* to be the 99% of the empirical CDF of *δ*_{t}. The largest *δ* = 0.12 was found with λ = 0.25 m and *N* = 10 cells. The smallest *δ* = 0.01 corresponds to the parameters λ = 2.5 m and *N* = 300 cells.

We assume that the modules are conditionally independent given the location of the animal, and hence position decoding, or representation, can be implemented by an ideal observer independently reading out the spikes, **s**_{i}, emitted by the different modules: *P*(*x*|**s**) = ∏_{i} *P*(*x*|**s**_{i}). When loosely talking about interference between the grid modules at a spatial point we refer to the interference between these periodic posterior distributions *P*(*x*|**s**_{i}), i.e., all module posteriors being larger than 0 at a location different from the origin (Fig 1c).

### Interference at integer distances

Since we measure the distance in units of the smallest grid scale (*α*_{0} = 1), avoiding interference at integer distances from the origin also guarantees the absence of interference elsewhere, i.e., all positions in the interval [0, *L*] will be distinguishable by the grid code. Hence we loosely call *ϵ*(*ℓ*) defined in Eq 1 the phase difference, but note that it is the phase difference at integer distance *ℓ*. Indeed, if the grid code was ambiguous confusing spatial locations *x*_{1} and *x*_{2}, then it would also confuse the origin with |*x*_{1} − *x*_{2}| as well, since the phase differences of each module are the same between 0 and |*x*_{1} − *x*_{2}| and between *x*_{1} and *x*_{2} (Fig 2b and 2c, right). But |*x*_{1} − *x*_{2}| can be confused with the origin only if |*x*_{1} − *x*_{2}| is an integer, that is a multiple of the smallest scale, 1. Note that this argument is correct only if the phase representation ambiguity of each module is independent of the actual position, which holds if we suppose that firing fields of cells from the same module are spaced evenly, which we do assume.

Graphically, interference between locations occurs when two segments of the phase curve come close to each other. Since the segments of the phase curve are parallel (Fig 2), and we started the phase curve in the origin, interference first occurs in the origin. Avoiding interference at the origin as much as possible at arbitrary distances thus also guarantees that the segments of the phase curve are separated from each other as much as possible, leading to a uniform coverage of the phase space [14].

### Definition of algebraic numbers

We call a real number *α* algebraic of order *n* (positive integer), if *n* is the least integer such that *α* is the root of a polynomial of degree *n* with integer coefficients. Algebraic numbers of order one are exactly the rational numbers. Another example is the golden ratio, *σ*, which is irrational, and is the root of *x*^{2} − *x* − 1, a integer polinomial of degree two. Therefore, *σ* is an algebraic number of order two.

### Information rate

Since we fixed the resolution, the capacity of the code is proportional to the coding range. Moreover, as the coding precision of the modules was the same, we assume that the population size of each module is approximately *N* for grid scales chosen randomly from a bounded interval. The information rate of the grid system, defined as the ratio of the logarithm of the capacity and the total number of conveyed bits [14] is
(17)
(18)
(19)
where is the average firing rate of a grid cell and in the third line we used that *δ* = *k*/log(*N*) [14]. Thus, the information rate is independent of the number of modules and increases with log *c*_{α}.

For a geometric code with scale ratio *α* the optimal population size for dynamical decoding and constant *δ* decreases as where λ_{i} = *α*^{i} is the scale of module *i* and *n*_{0} is the number of neurons in the first module [8]. In this case the total number of neurons in the population is
(20)
Since the total number of neurons does not grow linearly with the number of modules, the information rate becomes proportional to *M*:
(21)
Although a constraint on the minimal number of cells per module will limit the finite information rate to remain finite, Eq 21 emphasises that adding further modules with larger periods increases the efficiency of the grid system if the number of cells per module is set optimally for dynamical decoding [8]. Although a geometric progression of scales is consistent with both nested and MA codes, the information rate is higher for optimal nested codes since they maximise *α*.

### Interference with *M* modules I: Golden ratio is suboptimal

In this section we demonstrate that a set of grid cells with scale ratio (*α*) optimally chosen between pairs of successive grid modules is close to being pessimal for efficient space representation for *M* > 2. Such pairwise optimisation leads to a set of scales showing geometric progression with the scale ratio being *α*, i.e., [1, *α*, *α*^{2}, …], which is consistent with the experimental data [10, 12, 23, 13]. The representation of the position becomes ambiguous if all modules show interference at the same location, i.e., the phase of all modules are very close to 0 at distance *ℓ* from the origin.

Consider for example the golden ratio *α* = *σ*, which is a second order algebraic number, i.e., it is the root of the integer coefficient polynomial *x*^{2} − *x* − 1. Therefore, the phase *ψ*_{2}(*x*) = (*x* mod *σ*^{2})/*σ*^{2} of any spatial point *x* according to the third module can be simply expressed with the phase of the first two modules as
(22)
To see this, consider that by the definition of the phases *ψ*_{i}(*x*) when the animal is at distance *x* from the origin there are some integers *ℓ*, *k*_{1}, *k*_{2} so that
Using that *σ*^{2} − *σ* − 1 = 0 we get that
Rearranging terms yields

In other words, the phase of the third module provides no additional information given the phase of the other two modules. In particular, if both *ψ*_{0}(*x*) and *ψ*_{1}(*x*) are close to 0 (Fig 6b), then so is *ψ*_{2}(*x*) and hence the third module fails to resolve the ambiguity when the two first modules interfere. Similarly, if we have *n* grid cell modules with scales 1, *α*, …, *α*^{n−1} with *α* being an algebraic number of order *k* < *n*, then all of the *n* phases can be expressed by any *k* of them, leading to redundant and inefficient representation.

Clearly the same argument works not only for the powers of the golden ratio, but for powers of any algebraic number of order lower than the number of modules.

### Interference with *M* modules II

To derive the general solution for *M* grid modules, we consider a set of 1-dimensional grids with scales *α*_{0} = 1 < *α*_{1} < ⋯ < *α*_{M −1}. Again, the interference between the modules can be expressed by the simultaneous Diophantine approximation of the vector using fractions of integers with the common numerator *ℓ*, i.e., *α*_{i} ≈ *ℓ*/*k*_{i}. Importantly, a theorem by Dirichlet provides an upper bound on the efficiency of the approximation. Namely, for all (*M* − 1)-tuple of irrational numbers *α*_{1}, …, *α*_{M −1} we have infinitely many collections of integers *k*_{0}, *k*_{1}, …, *k*_{M −1} (with *k*_{0} = *ℓ*), such that the approximation error defined as
(23)
is simultaneously smaller than the upper bound for all items in the tuple:
(24)
Note, that differs from *ϵ* defined for two modules (Eq 2) as it is not normalised with *α*.

*Proof of* Eq 24. First we prove that any vector of irrationals can be approximated to the claimed order with rationals having the same denominator. Let . To approximate with rationals of denominator at most *Q* let us define the vectors , *j* = 0, …, *Q*, where floor is understood coordinate-wise. Let us partition the unit cube [0, 1]^{n−1} into small cubes of side length *Q*^{−1/(n−1)}, so that altogether we have *Q* of them. Since we have *Q* + 1 many **a**_{j}-s each falling into [0, 1]^{n−1}, hence there will be (at least) 2 of them falling into the same small cube, **a**_{k} and **a**_{l}, say. Then
with the inequalities holding coordinate-wise. Therefore, because of |*k* − *l*| ≤ *Q*, is approximable with denominator |*k* − *l*| and numerator (vector) with error not exceeding |*k* − *l*|^{−(1+1/(n−1))}. The desired statement follows then by simultaneously approximating the numbers 1/*α*_{i} with common denominator, which is also a simultaneous approximation of *α*_{i} with common numerator, which completes the proof.

For a set of grid scales *α*_{i} = *α*^{i} (*i* = 0, …, *M* − 1) where *α* is an algebraic number of degree *M*, there exists a maximal positive constant , such that
(25)
holds, except for at most finitely many integers *ℓ*.

To see that Eq 25 holds, we start from the work of [44] (see also [45]) stating that powers of an algebraic number are badly simultaneously approximable with common denominator in the following sense. Let *β* be an algebraic number of order *M*. There exists *c*_{β} > 0 such that for all integer *ℓ*, *k*_{i} there is *i* ∈ {1, …, *M* − 1} for which

*Derivation of* Eq 25. Our goal is to give a lower bound on |*α*^{i} *k*_{i} − *α*^{j} *k*_{j}|, where *α* is algebraic of order *M*, 0 ≤ *i*, *j* ≤ *M* − 1. Without loss of generality suppose that *i* < *j*.
Now the fact that *k*_{i} ∼ *ℓ*/*α*^{i} implies Eq 25 if is chosen appropriately.

The position representation is unambiguous if there is at least one pair of modules for which the phase difference is larger than the threshold set by the noise, i.e., which holds if
(26)
From here, the critical distance *L*_{max} up to which coding is unambiguous can be expressed as (cf. Eq 9):
(27)
for all *δ* which is small enough.

To directly compare the capacity of the MA grid cell system derived in Eq 9 with previous estimates for nested coding [4, 6], we also calculate *N*_{max}, the number of distinguishable spatial phases:
(28)
Efficient coding with nested modules requires that *α*_{i} = *r*^{i} with 0 ≤ *i* ≤ *M* − 1 and *r* being the scale ratio with fixed relative uncertainty of modules 2*δ* = 1/*r* [6]. The position of the animal can be determined at precision 1/*r* without ambiguity if the animal is restricted to move in an environment with the size identical to the scale of the largest module, *r*^{M−1}. In this case the number of distinguishable spatial phases is , which is identical to the capacity we found for non-nested coding when (Eq 28).

### Coding is unambiguous up to exponential distance in the number of modules

To derive Eq 27 we first show that interference of the grid representation is equivalent to pairwise interference between all pairs of modules. To test unambiguity of coding note that the place at distance *x* from the origin is confusable with 0 if for all *i* = 0, …, *M* − 1 there exists an integer *k*_{i} such that
(29)
where *δ* is the relative uncertainty of modules. It turns out that, as for *M* = 2, there is no need to consider all *x* ∈ [0, *L*_{max}], it is enough to care with integers:

**Claim**. *There exists x* ∈ [0, *L*_{max}] *for which* Eq 29 *holds for all i exactly when the following pairwise interference occurs between all modules*:
(30) *for all i, j with some integers k _{i} (i* = 0, …,

*M*− 1

*) such that*0 <

*k*

_{i}

*α*

_{i}≤

*L*

_{max}.

*Proof*. Let us fix *k*_{i}, *i* = 0, …, *M* − 1. Pairwise interference means that there is a point *x*_{i,j} in the intersection of (*k*_{i} *α*_{i} − *α*_{i} *δ*, *k*_{i} *α*_{i} + *α*_{i} *δ*) = (*a*_{i}, *b*_{i}) and (*k*_{j} *α*_{j} − *α*_{j} *δ*, *k*_{j} *α*_{j} + *α*_{j} *δ*) = (*a*_{j}, *b*_{j}). Due to the topology of the line, it is easy to see by induction that the intersection of all such intervals is nonempty and hence one can chose *x*_{i,j} = *x*. The statement is obvious for *M* = 2. Now suppose that the intersection . Then it is the interval (*a*, *b*) with
If (*a*_{n+1}, *b*_{n+1}) intersects (*a*_{i}, *b*_{i}), then both *a*_{n+1} < *b*_{i} and *b*_{n+1} > *a*_{i}, and therefore *a*_{n+1} < *b* and *b*_{n+1} > *a*, which completes the induction. Therefore Eq 29 implies Eq 30. The other direction is immediate.

Now using the above Claim Equation Eq 27 easily follows by rearranging Eq 25.

### Asymptotic capacity of the random grid cell system

Let us fix the relative uncertainty of modules *δ* < 1/2 and a number *α*_{max} > (1 + *δ*)/(1 − *δ*). We show that if scales *α*_{1}, *α*_{2}, … are drawn uniformly at random from [1, *α*_{max}], independently of each other, then for any *δ* < *ζ* < 1/2 the representation with *M* modules having scales *α*_{1}, *α*_{2}, …, *α*_{M} is unambiguous in every spatial position *x* > 0 up to
(31)
with probability of order 1 − (2*ζ*)^{M} as *M* → ∞.

Here *ζ* is the analog of which characterises the capacity of a particular grid cell system. As we will see, the convergence holds for any *ζ* < 1/2, but the speed of the convergence depends on *ζ*: higher efficiency is guaranteed to be achieved only for larger number of modules.

*Proof:* Let *α*_{1}, *α*_{2}, … be independent random variables distributed uniformly on [1, *α*_{max}]. Let *x* be a spatial point and let denote the phase of module *i* (with scale *α*_{i}) at *x*, that is
Note that for fixed *x* the distribution of phases are independent of each other since the *α*-s are independent. We also use the notation *p*_{1}(*x*) for the probability that the phase is (almost) indistinguishable from 0, defined in the following way:
where *ε* > 0 is determined later. It is easy to see that *p*_{1}(*x*) does not depend on *i*, i.e., it is the same for all modules. Moreover, the distribution of converges to uniform as the distance increases, in particular lim_{x → ∞} *p*_{1}(*x*) = 2(1 + *ε*)*δ*. The convergence of this distribution to the uniform is a key observation that remains true even in higher dimensions with uniform random rotations or in case of slight variation of the grid scales on the long range. Hence there exists a critical distance, *x*_{0} = *x*_{0}(*δ*, *ε*) for which all *x* > *x*_{0} we have |*p*_{1}(*x*) − 2(1 + *ε*)*δ*| ≤ *δε*. Therefore, for *x* > *x*_{0} we have
(32)

It also implies a bound on the probability of interference of many modules at a given point *x*. If we consider *M* modules with scales drawn uniformly at random from [1, *α*_{max}] and independently of each other, then by Eq 32 for *x* > *x*_{0} the probability of all phases being close to 0 is
(33)
that is, *p*_{M}(*x*) is exponentially small in *M*.

There remains to estimate the probability of interference of many modules anywhere up to a maximally allowed spatial distance. Our goal is to show that
(34)
as *M* → ∞, where , as in Eq 31. Note, that satisfying Eq 34 is not trivial, since *X*_{max} increases exponentially with *M*.

There is no need to investigate all *x* < *X*_{max}, it is enough to show, that there is no interference on a set which is dense enough in [0, *X*_{max}] in the stronger sense of Eq 33. Indeed, let *Y* be an *ε* dense set in [0, *X*_{max}] with at most 2*X*_{max}/*ε* elements. Then
where we used the fact that the since *α*_{i} was chosen from the interval [1, *α*_{max}]. The corresponding inequality for the probabilities of these events is
Now for these finitely many points *x* ∈ *Y* we can use Eq 33 one by one, if *x* > *x*_{0}:
if , which we assume, where in the first inequality we used Eq 33 and union bound, and then in the second one that . We have to remark that interference in different spatial points is not independent of each other, but union bound works even in that case.

There remains to show that the grid cell representation works up to *x*_{0}. Clearly there is no ambiguity up to *x* = 1 + *δ*. To estimate the probability
(35)
we first have to observe that the cardinality of *Y* ∩ [1 + *δ*, *x*_{0}] is independent of *M*. Therefore to guarantee that the probability in Eq 35 goes to 0 we need to show that for all 1 + *δ* ≤ *x* ≤ *x*_{0} there is a scale *α* ∈ [1, *A*] which is able to distinguish *x* from the origin, that is *α* such that
This is so because *x*/*α* is monotonically decreasing in *α* and because
where we used that *α*_{max} > (1 + *δ*)/(1 − *δ*) and *x* > 1 + *δ*. Therefore can not lay in [0, *δ*] ∪ [1 − *δ*, 1] for all *α* ∈ [1, *α*_{max}].

### Numerical estimation of the with *M* modules

The constant (and *c*_{α}) is well defined only for algebraic numbers, but can also be estimated for real numbers from the scaling of the phase difference with distance using numerical simulations. As is defined asymptotically (Eq 25), in order to estimate it numerically we need an approximation of it for finite distances. An alternative definition of (equivalent with Eq 25) is
(36)
where is defined by
(37)
where . Intuitively, to find the magnitude of interference at location *ℓ*, for all possible values of we first select the maximum phase difference in the set and then choose the set with the smallest maximum. From the plots Figs 4 and 7 it is clear that the naive way of approximating with for some large *ℓ* is not a good idea, as may vary heavily with *ℓ*, especially for non-algebraic scale ratios. Note, that the calculation of is a special case of with *M* = 2.

To estimate coding efficiency in the presence of noise we are mostly interested in the above infemum when *ℓ* is such that the phase difference is close to the precision *δ* of the modules. It motivates to investigate the (numerically computable) minimum
for some pair *δ*_{1} < *δ*_{2}, where *ℓ*_{2} is so that for all *ℓ* ≥ *ℓ*_{2} we have and *ℓ*_{1} is the smallest *ℓ* so that .

### Tools for the numerical investigation of Diophantine approximation

A common and natural way to numerically investigate Diophantine approximation is using lattice reduction [46]. By lattice we mean a subset of defined by some vectors , *m* ≤ *d* so that

Given a lattice , a classical computational problem is to find the shortest non-zero vector of it (Fig 8). In the followings we show how Diophantine approximation of a vector (*α*_{1}, …, *α*_{n}) can be investigated with the help of finding shortest vectors of appropriately chosen lattices.

Which element of the lattice generated by the above two blue headed vectors is closest to the origin? Or in other words, what is the shortest nonzero vector which can be obtained as an integer coefficient linear combination of the above vectors?

Let us first consider a simple example. Let the lattice be defined by the rows of the matrix
where *ε* > 0. For all *ε* which is small enough the shortest vector of corresponds to a simultaneous Diophantine approximation of (*α*_{1}, …, *α*_{n}) with the common numerator *b*_{n+1} and denominators *b*_{i}, *i* = 1, …, *n*. The parameter *ε* can be considered as a penalty term: the smaller this term the bigger the numerator can be.

When speaking about shortest vectors we need to specify the norm with respect to which vectors are compared. Here we are looking for the largest phase difference between the modules so we use supremum norm (Eq 25). The shortest vector in supremum norm of the lattice defined by *V* is an approximation so that
is as small as possible. By this we can compute what is the maximal phase difference between the module with scale 1 and all other modules up to distance *b*_{n+1}.

Remember that according to Eq 25 we are searching for an approximation minimizing
(38)
Similarly to the previous example, it can be done simply by dividing columns *i*, *i* = 1, …, *n* of *V* by (1 + *α*_{i}), and by adding some more columns of similar form which refer to interference between modules *i* and *j*. For example, for *n* = 3 the shortest (in sup norm) element of the lattice generated by the rows of the following matrix gives an approximation minimizing Eq 38:

In this way maximal interference in the grid cell system can be computed numerically as shortest vectors of some lattices in supremum norm. Finding this shortest vector is an integer linear programming (ILP) problem, which in general is an NP-hard computational problem, and can be solved by e.g. a branch and bound algorithm [47]. There are also efficient methods which find approximation solutions in polynomial time, such as the LLL algorithm due to Lenstra, Lenstra and Lovász [46].

The LLL algorithm finds not only a short vector of a lattice, but also another basis of it which consists of short and nearly orthogonal vectors in the *L*^{2} norm, a so called LLL reduced basis. The error made by the LLL algorithm is too high to precisely compute the constant terms in Eq 27, and therefore we could not rely only on this algorithm. Nevertheless, compared to the ILP solution, we could significantly speed up our computations by first applying the LLL algorithm to find an approximate solution (and a reduced lattice), and then an ILP solver on this LLL reduced basis, which could find nontrivial optimal solutions very efficiently if started from this input.

## References

- 1.
Sterling P, Laughlin S. Principles of Neural Design. MIT Press; 2015.
- 2.
Burak Y, Brookings T, Fiete I. Triangular lattice neurons may implement an advanced numeral system to precisely encode rat position over large ranges; 2006.
- 3. Fiete IR, Burak Y, Brookings T. What Grid Cells Convey about Rat Location. The Journal of Neuroscience. 2008;28(27):6858–6871. pmid:18596161
- 4. Mathis A, Herz AVM, Stemmler M. Optimal population codes for space: grid cells outperform place cells. Neural Comput. 2012;24(9):2280–317. pmid:22594833
- 5. Mathis A, Herz AVM, Stemmler MB. Resolution of nested neuronal representations can be exponential in the number of neurons. Phys Rev Lett. 2012;109(1):018103. pmid:23031134
- 6. Wei XX, Prentice J, Balasubramanian V. A principle of economy predicts the functional architecture of grid cells. Elife. 2015;4:e08362. pmid:26335200
- 7. Stemmler M, Mathis A, Herz AVM. Connecting multiple spatial scales to decode the population activity of grid cells. Sci Adv. 2015;1(11):e1500816. pmid:26824061
- 8. Mosheiff N, Agmon H, Moriel A, Burak Y. An efficient coding theory for a dynamic trajectory predicts non-uniform allocation of entorhinal grid cells to modules. PLoS Comput Biol. 2017;13(6):e1005597. pmid:28628647
- 9. Towse BW, Barry C, Bush D, Burgess N. Optimal configurations of spatial scale for grid cell firing under noise and uncertainty. Philos Trans R Soc Lond B Biol Sci. 2014;369(1635):20130290. pmid:24366144
- 10. Hafting T, Fyhn M, Molden S, Moser MB, Moser EI. Microstructure of a spatial map in the entorhinal cortex. Nature. 2005;436:801–806. pmid:15965463
- 11. Moser EI, Roudi Y, Witter MP, Kentros C, Bonhoeffer T, Moser MB. Grid cells and cortical representation. Nat Rev Neurosci. 2014;15(7):466–81. pmid:24917300
- 12. Barry C, Hayman R, Burgess N, Jeffery KJ. Experience-dependent rescaling of entorhinal grids. Nature Neuroscience. 2007;10:682–684. pmid:17486102
- 13. Stensola H, Stensola T, Solstad T, Frøland K, Moser MB, Moser EI. The entorhinal grid map is discretized. Nature. 2012;492(7427):72–78. pmid:23222610
- 14. Sreenivasan S, Fiete I. Grid cells generate an analog error-correcting code for singularly precise neural computation. Nat Neurosci. 2011;14(10):1330–7. pmid:21909090
- 15. Yoon K, Buice MA, Barry C, Hayman R, Burgess N, Fiete IR. Specific evidence of low-dimensional continuous attractor dynamics in grid cells. Nat Neurosci. 2013;16(8):1077–84. pmid:23852111
- 16. Fyhn M, Hafting T, Treves A, Moser MB, Moser EI. Hippocampal remapping and grid realignment in entorhinal cortex. Nature. 2007;446:190–194. pmid:17322902
- 17. Gardner RJ, Lu L, Wernle T, Moser MB, Moser EI. Correlation structure of grid cells is preserved during sleep. bioRxiv. 2017; p. http://dx.doi.org/10.1101/198499.
- 18. Trettel SG, Trimper JB, Hwaun E, Fiete IR, Colgin LL. Grid cell co-activity patterns during sleep reflect spatial overlap of grid fields during active behaviors. bioRxiv. 2017; p. https://doi.org/10.1101/198671.
- 19. Hurwitz A. Ueber die angenaherte Darstellung der Irrationalzahlen durch rationale Bruche (On the approximate representation of irrational numbers by rational fractions). Mathematische Annalen. 1891;39 (2):279–284.
- 20.
Perron O. Die Lehre von den Kettenbrüchen [The Theory of Continued Fractions] (in German), Chapter 2. Leipzig: B. G. Teubner; 1913.
- 21.
Oxtoby JC. Measure and Category. New York-Berlin: Springer-Verlag; 1980.
- 22. Broderick R, Fishman L, Kleinbock D, Reich A, Weiss B. The set of badly approximable vectors is strongly C1 incompressible. Mathematical Proceedings of the Cambridge Philosophical Society. 2012;153(2):319–339.
- 23. Brun VH, Solstad T, Kjelstrup KB, Fyhn M, Witter MP, Moser EI, et al. Progressive increase in grid scale from dorsal to ventral medial entorhinal cortex. Hippocampus. 2008;18(12):1200–12. pmid:19021257
- 24. Stensola T, Stensola H, Moser MB, Moser EI. Shearing-induced asymmetry in entorhinal grid cells. Nature. 2015;518(7538):207–12. pmid:25673414
- 25. Taylor KD. Range of Movement and Activity of Common Rats (Rattus norvegicus) on Agricultural Land. Journal of Applied Ecology. 1978;15(3):663–677.
- 26. Tsoar A, Nathan R, Bartan Y, Vyssotski A, Dell’Omo G, Ulanovsky N. Large-scale navigational map in a mammal. Proc Natl Acad Sci U S A. 2011;108(37):E718–24. pmid:21844350
- 27. Geva-Sagiv M, Las L, Yovel Y, Ulanovsky N. Spatial cognition in bats and rats: from sensory acquisition to multiscale maps and navigation. Nat Rev Neurosci. 2015;16(2):94–108. pmid:25601780
- 28. Mathis A, Stemmler MB, Herz AV. Probable nature of higher-dimensional symmetries underlying mammalian grid-cell activity patterns. Elife. 2015;4. pmid:25910055
- 29. Krupic J, Bauza M, Burton S, Barry C, O’Keefe J. Grid cell symmetry is shaped by environmental geometry. Nature. 2015;518(7538):232–5. pmid:25673417
- 30. Stella F, Si B, Kropff E, Treves A. Grid cells on the ball. Journal of Statistical Mechanics: Theory and Experiment. 2013;2013(03):P03013.
- 31. Urdapilleta E, Troiani F, Stella F, Treves A. Can rodents conceive hyperbolic spaces? J R Soc Interface. 2015;12(107).
- 32. McNaughton BL, Battaglia FP, Jensen O, Moser EI, Moser MB. Path integration and the neural basis of the”cognitive map”. Nature Reviews Neurosci. 2006;7(8):663–678.
- 33. Kropff E, Treves A. The emergence of grid cells: Intelligent design or just adaptation? Hippocampus. 2008;18(12):1256–69. pmid:19021261
- 34. Brown EN, Frank LM, Tang D, Quirk MC, Wilson MA. A statistical paradigm for neural spike train decoding applied to position prediction from ensemble firing patterns of rat hippocampal place cells. J Neurosci. 1998;18(18):7411–25. pmid:9736661
- 35. Samu D, Eros P, Ujfalussy B, Kiss T. Robust path integration in the entorhinal grid cell system with hippocampal feed-back. Biol Cybern. 2009;101(1):19–34. pmid:19381679
- 36. Burak Y, Fiete IR. Fundamental limits on persistent activity in networks of noisy neurons. Proc Natl Acad Sci U S A. 2012;109(43):17645–50. pmid:23047704
- 37. Issa JB, Zhang K. Universal conditions for exact path integration in neural systems. Proc Natl Acad Sci U S A. 2012;109(17):6716–20. pmid:22493275
- 38. Burak Y, Fiete IR. Accurate path integration in continuous attractor network models of grid cells. PLoS Comput Biol. 2009;5(2):e1000291. pmid:19229307
- 39. Giocomo LM, Hussaini SA, Zheng F, Kandel ER, Moser MB, Moser EI. Grid cells use HCN1 channels for spatial scaling. Cell. 2011;147(5):1159–70. pmid:22100643
- 40. Urdapilleta E, Si B, Treves A. Selforganization of modular activity of grid cells. Hippocampus. 2017;27(11):1204–1213. pmid:28768062
- 41. Moser EI, Kropff E, Moser MB. Place cells, grid cells, and the brain’s spatial representation system. Annu Rev Neurosci. 2008;31:69–89. pmid:18284371
- 42. Etienne AS, Jeffery KJ. Path integration in mammals. Hippocampus. 2004;14:180–192. pmid:15098724
- 43. Ormond J, McNaughton BL. Place field expansion after focal MEC inactivations is consistent with loss of Fourier components and path integrator gain reduction. Proc Natl Acad Sci U S A. 2015;112(13):4116–21. pmid:25733884
- 44.
Drmota M, Tichy RF. Sequences, Discrepancies and Applications. Lecture Notes in Math, Springer Verlag, Berlin. 1997; no. 1651.
- 45. Cassels J. Simultaneous Diophantine Approximation II. Proc London Math Soc. 1955;(3), 5:435–448.
- 46. Lenstra AK, Lenstra HW, Lovász L. Factoring polynomials with rational coefficients. Mathematische Annalen. 1982;261(4):515–534.
- 47. Land AH, Doig AG. An Automatic Method of Solving Discrete Programming Problems. Econometrica. 1960;28(3):497–520.