Fig 1.
Distribution of the NILC source of origin.
Distribution of the different textual genres and written materials that composed the original source of the NILC [38].
Table 1.
Numbers of word tokens, word types, and lemmas by grammatical category before and after data processing for the Brazilian Portuguese Lexicon.
Table 2.
Numbers, columns, and descriptions of the Brazilian Portuguese Lexicon.
Table 3.
Conventions used in the grammatical category and grammatical information columns in the search engines and results at the Brazilian Portuguese Lexicon.
Fig 2.
The simple and the complex input search engines of the Brazilian Portuguese Lexicon.
Simple search allows a list of words as input and complex search allows specific criteria specification.
Table 4.
Symbols used as wildcards in the search engines in the Brazilian Portuguese Lexicon.
Fig 3.
Results of the complex search made in the Fig 2 example.
The top-left space presents the general search information, the top-right space provides basic statistics, and the down space displays the search results.
Fig 4.
General distributions of the Brazilian Portuguese Lexicon corpus.
a) number of words by grammatical category; b) number words according to the number of letters for each grammatical category; c) log10 frequency by word rank distribution; and d) Zipf’s law (i.e., log10 frequency by log10 rank) for each grammatical category.
Fig 5.
General interactions between variables in the Brazilian Portuguese Lexicon.
a) log10 number of words by log10 orthographic neighborhood for each grammatical category; b) log10 number of words by OLD20 for each grammatical category; c) mean orthographic neighborhood by number of letters for each grammatical category; and d) mean OLD20 by the number of letters for each grammatical category.
Table 5.
General means and standard deviations between parentheses by grammatical category.
Fig 6.
Correlations between the different current Brazilian Portuguese corpora.
LexPorBR: Brazilian Portuguese Lexicon; SubtlexBR: SUBTLEX-PT-BR [15]; WlBlog, WlTwitter, and WlNews for the three Worldlex (Portuguese Brazil) corpora [16]. Correlations were calculated using the Zipf scale frequency [12]. Pearson correlation above the diagonal, histograms with corpora distribution on the diagonal, and bivariate scatter plots with loess smooth fits and ellipses below the diagonal [42].
Table 6.
Relative percentage (%) of word types contained in the LexPorBR, SubtlexBR [15], and Worldlex (Portuguese Brazil) [16] corpora.
The head corpus contains the percentage of word types of the left corpus and the left corpus is contained by the head corpus.
Table 7.
Overestimated and underestimated words by the Brazilian Portuguese Lexicon compared to the SUBTLEX-PT-BR [15] and Worldlex (Portuguese Brazil) [16].
Between parentheses is the number of the most frequent words verified to list the 10 words presented in each list; Zipf scale range interval of the words found is indicated under heads.