Figure 1.
(a) Word frequency (per million) across Dutch, English, French, German, and Spanish.
Equating corpus sizes (left) resulted in average word frequencies that were comparable across languages; size-equated corpora were thus used in all further analyses. If, instead, corpus size was defined only by a frequency threshold (right), differences in average word frequency emerged. (b) Word frequency distributions for each language, using equivalent corpus sizes.
Figure 2.
Distribution of orthographic word lengths for Dutch, English, French, German, and Spanish.
Figure 3.
Mean orthographic neighborhood sizes for words in Dutch, English, French, German, and Spanish.
Total mean neighborhood size (left group) includes single-letter substitutions (e.g., ‘log’ for ‘hog’), deletions (e.g., ‘end’ for ‘bend’) and additions (e.g., ‘hand’ for ‘and’).
Figure 4.
Distribution of orthographic neighborhood densities across Dutch, English, French, German, and Spanish (log-log scale).
Figure 5.
Average orthographic neighborhood size of words in Dutch, English, French, German, and Spanish at each word length.
Figure 6.
Average orthographic neighborhood size as a function of word frequency.
Frequency bins are evenly spaced divisions of words in 5% increments. Bin one represents the average orthographic neighborhood size of the top 5% most frequent words in the language, bin twenty represents the average orthographic neighborhood size of the 5% least frequent words.
Figure 7.
Ratio of within-language and foreign orthographic neighbors as part of total neighborhood size for each word with at least one neighbor.
The top row compares the proportion of English within-language neighbors (blue) to foreign neighbors in each other language. The bottom row compares the proportion of within-language neighbors in each language to foreign (i.e., English) neighbors (blue).
Table 1.
Mean orthographic within-language neighborhood size and foreign neighborhood size.
Figure 8.
Distributions of phonological word lengths for Dutch, English, French, German, and Spanish.
Figure 9.
Mean phonological neighborhood sizes for words in Dutch, English, French, German, and Spanish.
Total mean neighborhood size (left group) includes single-phoneme substitutions (e.g., ‘show’ for ‘dough’), deletions (e.g., ‘owe’ for ‘dough) and additions (e.g., ‘dome’ for ‘dough).
Figure 10.
Distribution of phonological neighborhood densities across Dutch, English, French, German, and Spanish (log-log scale).
Figure 11.
Average phonological neighborhood size of words in Dutch, English, French, German, and Spanish at each word length.
Figure 12.
Average phonological neighborhood size as a function of word frequency.
Frequency bins are evenly spaced divisions of words in 5% increments. Bin one represents the average phonological neighborhood size of the top 5% most frequent words in the language, bin twenty represents the average phonological neighborhood size of the 5% least frequent words.
Table 2.
Mean phonological within-language and foreign neighborhood size.
Figure 13.
Ratio of within-language and foreign phonological neighbors as part of total neighborhood size for each word.
The top row compares the proportion of English within-language neighbors (blue) to foreign neighbors in each other language. The bottom row compares the proportion of within-language neighbors in each language to foreign (i.e., English) neighbors (blue).
Figure 14.
Comparisons of orthographic and phonological word lengths for Dutch, English, French, German, and Spanish.
Figure 15.
Screen-shot of the EnglishPOND portion of the CLEARPOND website, accessible at
http://clearpond.northwestern.edu. CLEARPOND provides a user-friendly, web-based interface for obtaining Dutch, English, French, German, and Spanish phonological and orthographic neighborhood densities (or, PONDs). The search function allows users to search for POND information in any of the five languages using single word queries or by providing full lists of words. CLEARPOND provides a number of important psycholinguistic measures, such as neighborhood density and neighborhood frequency, both for within-language neighbors and foreign-language neighbors. With user-controlled output selection, researchers can choose the output parameters that are most relevant. In addition to allowing users to acquire data for specific words, CLEARPOND can also search by features so that researchers can generate new lists of words that meet precise criteria, such as a specific range of neighborhood sizes or lexical frequency (as provided by the Subtlex databases). Furthermore, multiple filters can be applied simultaneously, providing greater control for stimuli creation. Users also have the option of exporting their results directly to a text file, making it easy to create downloadable documents containing pertinent psycholinguistic measures for all of their stimuli. In addition to the web-based interface, more comprehensive lists containing all of the information provided by the database are available for download, so that the entire CLEARPOND database can be accessed offline.