Italian norms and naming latencies for 357 high quality color images

In the domain of cognitive studies on the lexico-semantic representational system, one of the most important means of ensuring effective experimental designs is using ecological stimulus sets accompanied by normative data on the most relevant variables affecting the processing of their items. In the context of image sets, color photographs are particularly suited to this purpose as they reduce the difficulty of visual decoding processes that may emerge with traditional image sets of line drawings. This is especially so in clinical populations. In this study we provide Italian norms for a set of 357 high quality image-items belonging to 23 semantic subcategories from the Moreno-Martínez and Montoro database. Data from several variables affecting image processing were collected from a sample of 255 Italian-speaking participants: age of acquisition, familiarity, lexical frequency, manipulability, name agreement, typicality and visual complexity. Lexical frequency data were derived from the CoLFIS corpus. Furthermore, we collected data on image oral naming latencies to explore how the variance in these latencies could be explained by these critical variables. Multiple regression analyses on the naming latencies show classical psycholinguistic phenomena, such as the effects of age of acquisition and name agreement. In addition, manipulability was also a significant predictor. The described Italian normative data and naming latencies are available for download as supplementary material.


Introduction
Object naming is perhaps the most widely exploited task for studying lexical access during speech production. Decades of research using this paradigm have allowed researchers to identify some of the variables that influence the speed and the accuracy with which words are retrieved from the mental lexicon. It is undeniable that this advance in psycholinguistic knowledge of speech production processes has been closely linked to the appearance of standardized sets of stimuli to be named.
In this respect, one of the first and most influential normative data sets is the battery of Snodgrass and Vanderwart [1]. This set consists of 260 black and white line drawings a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 containing values for four relevant variables affecting cognitive processing during object naming: familiarity, image agreement, name agreement and visual complexity (for a color version of the battery see [2]). Normative data studies have recently started to use a more ecologically valid type of stimuli, where line-drawings are being replaced by photographs. Under the assumption that photographs provide more surface and texture details than line-drawings, it has been hypothesized that photographs would accelerate visual processing and that this, in turn, could accelerate the lexicalization process. Congruent with this hypothesis, Salmon, Mateshon and McMullen [3] showed, for instance, that photographs of tools are named faster than their corresponding line-drawings. Salmon and colleagues interpreted these results as congruent with the notion that the automatic activation of the motor cortex areas associated with the use of tools (e.g., [4]) is facilitated by photographic stimuli in comparison to linedrawing stimuli. These results indicate the importance of controlling for visual features of the items to be named, at least for this specific semantic category. At the same time, visual features are likely to play an important role in other semantic categories beyond that of tools [5]. Perceptual characteristics of items also influence other cognitive domains besides word production. For instance, recent memory studies reported that the perceptual characteristics of the items to be learned constitute a cue that impacts memory predictions for those items. Specifically, items that are easier to perceive during encoding generate higher judgments of learning (a.k.a. JOLS), despite the fact that ease of perception does not generally influence subsequent recall performance. Such a phenomenon is observed with word stimuli [6] as well as with picture stimuli [7,8].
In addition, color is an essential attribute of objects and therefore provides a more realistic representation. The greater richness provided by color photographs compared to black and white photographs has been shown to improve object perception (e.g., [9,10]), although it does not seem to ameliorate semantic processing [11]. Consequently, normative studies have started to use more ecological color photographs instead of black and white stimuli (e.g., [12][13][14]).
Normative data studies have not only been concerned about the representation modality of the stimuli (black and white line-drawings or color photographs), but also about the number of critical variables included in these studies as predictors of lexical access during speech production. Apart from the four variables presented in the original study by Snodgrass and Vanderwart [1], compelling evidence shows that other variables affect the lexicalization process, like, for instance, the age at which a word is first learned. Specifically, early acquired words tend to be named faster and more accurately than late acquired words. This phenomenon, known as the age of acquisition (AoA) effect, is not exclusive to object naming but is widespread across several lexico-semantic processes such as semantic categorization [15], reading [16], or the probability of retrieving a word from the mental lexicon during language production [17] (for a review, see [18]). Another variable determining the speed and accuracy of word retrieval is word frequency. In object naming tasks, high frequency words are named faster and more accurately than low frequency words [19,20]. Critically, such an advantage is absent when the task does not require lexical access, as for instance when participants are asked to indicate whether previously presented words denoting the objects depicted in target pictures (i.e., old/new decision task; [21]). This evidence suggests that the phenomenon is mainly ascribed to a lexico-phonological level of processing [22]. However, it is debatable whether word frequency is still a reliable predictor of naming latencies once AoA is taken into account [23,24]. Indeed, a recent Bayesian meta-analysis indicates that the influence of word frequency in picture naming latencies is less relevant than traditionally thought [25]. A third critical predictor for object naming is manipulability. Broadly speaking, manipulability refers to the possibility of manually interacting with a specific object. It has been recently reported that items with a high level of manipulability are named faster than items with a low level of manipulability [3,26]. Although this variable remains vague (see for discussion, [13]) and it is still unclear at which level of processing the advantage takes place (see for discussion, [26]), the phenomenon has been replicated in different languages.
In sum, since the original study of Snodgrass and Vanderwart [1] researchers have focused on more ecological stimuli such as color photographs and, at the same time, have discovered a number of standardized variables crucially affecting performance in lexical tasks. The objective of the present study is to offer researchers working with Italian-speaking participants a standardized set of 357 high quality color photographs ascribable to a high number of subcategories together with norms for eight variables affecting image processing: AoA, familiarity, lexical frequency, manipulability, two name agreement measures (see below), typicality and visual complexity. To this end, we standardized in the Italian language the set of images provided by Moreno-Martínez and Montoro [27]. In addition to the cross-linguistic validation, we conducted an oral naming study in order to identify the more relevant predictors of naming latencies for the set of images.

Participants
A total of 255 healthy Italian native speakers (198 females, 57 males; mean age: 21.29; sd: ± 3.54; 238 right-handed, 17 left-handed) participated in the rating study. In addition, twenty Italian native speakers (15 females, 5 males; mean age: 20.6; sd: ± 1.39; 19 right-handed, 1 lefthanded) took part in the oral naming study. All 275 participants provided their written consent, had normal or corrected-to-normal vision and were all students at Padova University, who attended a degree course in Psychology and participated to obtain university credits.
Ethical statement. The procedures were approved by the Ethical Committee for Psychological Research of the University of Padova before the study began (Protocol number: 1395-CB7EFAF01EE7652929D155AFEE6552FF; Title: Mechanisms of Word Retrieval in Spoken Language Production). Additionally, participation was voluntary and participants were advised that they were free to suspend their participation in the experiment at any time and for any reason.

Stimuli
The stimuli were the freely available set of color photographs by Moreno-Martínez and Montoro [27]. This set is composed of 360 high-quality color photographs belonging to 23 semantic subcategories. Specifically, ten subcategories were selected from the living domain: animals, birds, body parts, dried fruits, insects, flowers, fruits, sea creatures, trees, vegetables. Twelve subcategories were selected from the nonliving domain: buildings, clothing, foodstuff, furniture, jewelry, kitchen utensils, musical instruments, office material, sports/games, tools, vehicles, weapons; and finally, nonliving natural things (e.g., containing items like 'mountain', 'stone', etc.). Because three items of the original database (i.e., "zarajo" and "porra" from the foodstuff category and "churrera" from kitchen utensils) were typical items of the Spanish culture and were impossible to translate into Italian, they were not included in the Italian set of images. Therefore, we collected norms for a total of 357 items. As described by Moreno-Martínez and Montoro [27], the color photographs were taken by the authors, the images were then modified to remove their original backgrounds (except for the nonliving natural things) and placed on a plain white background. Images have a mean dimension of 265x223 pixels and, for each category susceptible to being oriented, half of the items were left-facing and the other half were right-facing. Some examples of items are presented in Fig 1. Italian lexical frequency values were retrieved from the CoLFIS database (which comprises 3,798,275 lexical occurrences; [28]).

Procedure-rating tasks
To guarantee uniformity across studies, the experimental procedure was kept as similar as possible to that used in Moreno-Martínez and Montoro's study [27]. The images were shown to a sample of 255 participants. To guarantee high consistency with the original study, the 357 images to be evaluated were divided into three lists (A, B, C) containing a similar number of exemplars from each of the 23 subcategories and totaling 119 items each. Participants were randomly assigned to perform the task on one of the three lists into which the entire image set was divided. Specifically, 81 participants were assigned to list A (64 females and 17 males; age: 21.46±2.61), 90 participants were assigned to list B (69 females and 21 males; age: 21.47±5.24) and 84 were assigned to list C (65 females and 19 males; age: 20.94±1.84). Without any time pressure, they had to typewrite on the computer keyboard the name of the object represented in each figure. Subsequently, they were required to rate the five psycholinguistic variables included in the study: visual complexity, AoA, familiarity, manipulability and typicality. Visual complexity was always the first rating assessed, and typicality was always the last. The presentation of the other three rating tasks (AoA, familiarity, manipulability) was randomly presented across subjects. Participants performed the task individually in one session lasting about 90 minutes (with self-administered rest periods), and were tested simultaneously in groups of about 30 people in the same room.
The task was preceded by a practice phase in which participants were required to typewrite the name of and subsequently rate 10 items not belonging to the main dataset for the same described variables. This practice phase was aimed at enabling participants to become familiar with the task and to develop anchor points useful for rating the subsequent stimulus material. Delivery of images and participants' responses was controlled by E-Prime 2.0 software (Psychology Software Tools, Inc., Pittsburgh, PA). Pictures were displayed on computers with Dell monitor 21.52''; participants' distance from the screen was approximately 60 cm. Each image was preceded by a 500 ms-fixation cross and remained visible for 3000 ms for the naming task or until a response was given for the rating task.
In the typewriting (written naming) task participants were asked to type the name of the object represented in each image, trying to select the most precise and specific name rather than general names indicating the category they belonged to (e.g., "rose", instead of general names like "flower" or "plant"). Participants typed the name of each image on the computer keyboard without time pressure. They were also instructed to type the initials NC for "I don't know" (NC = "non conosco" in Italian) if they did not recognize the object of the image, to type PL for "tip of the tongue" (PL = "punta della lingua" in Italian) if they knew what the object of the image represented but were momentarily unable to remember its name, and to type NR for "don't remember" (NR = "non ricordo" in Italian) if they recognized the object of the image but did not know if there was a word to name it. Typed responses were saved by the program. Name agreement was calculated based on the percentage of participants who named the item according to its dominant name. Two measures of name agreement were calculated: the percentage of participants who gave the dominant name to each specific item and the H statistic. The H statistic is a logarithmic function describing the different names that an item received and the proportion of participants giving each name [1] to capture information about the dispersion of the names. It has been shown that the H statistic captures more information about the variability of names across participants than the simple percentage-of-agreement measure [1,29,30]. For example, if only one name is given to a photograph, H equals zero; if two names occur with equal frequencies, H equals 1. Thus, H increases with the number of names given for the same item, and it is higher if the alternatives have similar probabilities.
The rating tasks were performed by pressing the number on the keyboard that corresponded to the participant's evaluation. In line with the original study [27], in the visual complexity ratings participants were required to "rate the visual complexity of the image itself, rather than that of the object it represents", evaluating "the amount of details, intricacy of lines, pattern and quantity of colors presented in the image" using a 5-point scale (1 = very simple, 5 = very complex). For the AoA, familiarity, manipulability and typicality ratings, the image was presented together with the name of the item (i.e., the expected name). Additionally, in the typicality rating task, the category name of the item was also provided (e.g., "fruit", for the item "lemon").
For AoA participants were instructed to rate the age at which they thought they had first learned each word using a seven-point scale (1 = 0-2 years, 7 = 13 years or more). In the familiarity rating task, they were instructed to rate each item by assessing how often they thought they had come across each of them and how frequently they came into contact with the concept (both directly through real-life exemplars and in a mediated way, as represented in the media), using a 5-point scale (1 = very unfamiliar, 5 = very familiar). In the manipulability rating task, participants were instructed to rate each item by assessing "the degree to which using a human hand is necessary for this object to perform its function", using a 5-point scale (1 = never necessary, 5 = totally indispensable). The typicality test aims at measuring the degree at which a concept is a representative exemplar of its category. Participants scored how representative of its category they thought an exemplar was (e.g., "ship" for "vehicle") using a 5-point scale (1 = not at all prototypical, 5 = very prototypical). Table 1 reports summary statistics for all the variables, and Table 2 the same statistics separately for all the subcategories. Table 3 shows Pearson's correlations among the variables.

Reliability
To determine the reliability of our data, we correlated the variables among items sharing the same dominant name in the present study and other studies in the literature [27].  [14], and 80 with the Italian version of Viggiano et al. [14]. Pearson's correlations are reported in Table 4. Correlations fluctuated between .28 and .98.

Results-oral naming study
All answers classified as incorrect names (e.g., semantic superordinates such as "tool" instead of "pliers"; semantic coordinates such as "boat" instead of "ship"), missing responses and   verbal dysfluencies were excluded from the analysis. Following those criteria, a total of 30.2% of the data were excluded (15.2% of incorrect responses; 15% of no-responses and 0.03% of voice key problems). Fourteen items did not elicit correct responses (i.e., all responses were incorrect, missing responses or verbal dysfluencies) and were excluded from the analyses. Correct responses have a mean latency of 1203 ms with a standard deviation of 361 ms. Preliminary analyses were performed to control whether phonemic properties of the word-initial phonemes affected voice key activation differently, influencing response latencies [37]. Following previous research in Italian [38], we divided the items into five phonetic categories. ANOVA results did not show any significant effect of the word-initial phoneme on naming latencies (p = .255). Three different kinds of analysis were performed on naming latencies. In a first descriptive level of analysis (Analysis Type-1), we performed correlations between naming latencies and the eight variables of the normative study.
In a second type of analysis (Analysis Type-2), we assessed how much of the variance in the naming latencies was explained by each of the above variables. Analysis was performed on the average of the naming latencies of each stimulus. Given the high level of correlation among some of the variables, we adopted the following approach in order to avoid problems of collinearity. In a first step, we assessed the correlations among the variables through a hierarchical clustering analysis using the varclus function of the "Hmisc" package [39] with the R statistical software [40]. This allowed us to identify clusters of variables (i.e., variables with a Spearman similarity coefficient > .35). This kind of analysis separates variables into clusters that can be scored as a single variable, thus resulting in data reduction. In a second step, in order to select the more important variable within each cluster, we performed likelihood ratio tests among those models containing separately each of the identified variables in each cluster. For model comparison we took into consideration the Bayesian information criterion (BIC) [41] using the compareLM function of the "rcompanion" package [42] with R. Once the most important variable for each cluster was selected, in a third and final step, we conducted a multiple regression analysis in order to explore how much of the variance in the naming latencies was explained by the selected variables. This second kind of analysis was performed on a subset of items which, on the normative typewriting task, received a name agreement value equal to or above 50% (i.e., items that elicited the expected name from at least half of participants). This criterion was selected in order to exclude spurious influences, as, for instance, poor visual structural descriptions of the photos, or the impact of idiosyncratic linguistic characteristics of the target words in Italian ( [43]; for an example of the influence of name agreement on picture naming latencies see [44]). Following this criterion, the analysis was performed on 196 items (see S1 Text for a further multiple regression analysis including all the variables).
In a third type of analysis (Analysis Type-3), the influence of the variables was explored in those experimental trials in which participants used the expected name to denominate the photo stimuli. That is, correct alternative responses were not considered in the analysis, as, for instance, responses contained detailed description of the photo (e.g., "Indian elephant" instead of "elephant") or abbreviations (e.g., "auto" instead of "automobile", see for a similar procedure [45]). In this manner, we ensured that the analysis was performed on those oral responses that were identical to the name used in the normative study. Collinearity was reduced following steps 1 and 2 of the Analysis-Type 2. Naming latencies were analyzed using mixed effects regression model performed at the single trial level, which allowed us to test the influence of the variables considering both by-participants and by-item variabilities [46]. In addition, this approach allowed us to exclude from the analysis each single response that did not elicit the expected name. Analyses were performed on 3979 data points. As the data were not normally distributed, we use the Box-Cox test [47], using the function boxcox in the package ''MASS" [48] to estimate the most appropriate transformation for the data to reduce skewedness and approximate a normal distribution. The test indicated that the reciprocal transformation was the most appropriate transformation (we used -1000/RT to facilitate reading of the results). Latencies of correct responses were analyzed with linear mixed models (LMM) using the package ''lme4" [49]. Analyses were performed with the R statistical software [40] (see S1 Text for a further mixed effects regression model including all the variables).

Analysis Type-1
As can be seen in Table 5, naming latencies correlated positively with H statistic, AoA and visual complexity and negatively with name agreement, lexical frequency, familiarity and typicality. No significant correlation was obtained between naming latencies and manipulability.

Analysis Type-2
Fig 2 shows the hierarchical clustering structure among the variables. Two clusters of highly correlated variables emerged. Agreement and H statistic formed a cluster and typicality, AoA and familiarity formed another. In the first cluster, the likelihood ratio test indicated that, compared to agreement, H statistic produced a significant increase in the explained variance (χ 2 = 27.75, p < .001). In the second cluster, the likelihood ratio tests indicated that AoA produced a significant increase in the explained variance compared to typicality (χ 2 = 58.47, p < .001) and familiarity (χ 2 = 63.41, p < .001). Thus, the multiple regression analysis was conducted with five variables: H statistic, AoA, frequency, manipulability and visual complexity. Partial effects of the model are illustrated in Fig 3. As can be seen in Table 6, H statistic, AoA and manipulability were significant predictors of naming latencies (R 2 = 0.611). Specifically, faster naming latencies were obtained for items with lower H statistic values, acquired early in life and with higher manipulability rating. No effects of visual complexity and lexical frequency were obtained. Tolerance statistics were all above .5 and the average of the variance inflation factor (VIF) was 1.38, suggesting that collinearity was not a problem of the regression model [34].

Discussion
The present study provides Italian norms for 357 high quality color photographs from the set of Moreno-Martínez and Montoro [27]. Several psycholinguistic variables that have been showed to affect latency and accuracy during object naming are included: agreement, H statistic, word-lexical frequency, age of acquisition, visual complexity, familiarity, manipulability and typicality. As reported in other normative studies (e.g., [50,51]), the variables are highly correlated. It is noteworthy that the correlation pattern we obtained matches to a large extent the one reported in the original Moreno-Martínez and Montoro's study in Spanish.
A second aim of this study, which is the most original part of this study on the Italian validation of Moreno-Martínez and Montoro's database, was to explore how much of the variance in the latencies of the oral naming task could be explained by the above-mentioned crucial variables. A first descriptive analysis showed that all the variables, except manipulability, correlated with naming latencies. In a second analysis, in order to reduce multicollinearity problems, we separated the variables into clusters through a hierarchical cluster analysis and then we selected the most relevant variable for each cluster. The regression analysis using the five remaining variables as predictors showed a significant effect of H statistic, AoA and manipulability. In particular, the items that tended to elicit a similar name from participants in the typewriting naming normative study (i.e., lower H index) were named faster (e.g., [44]). At the same time, items acquired early in life were named faster than items acquired late in life, replicating the well-known effect of age of acquisition [18]. In addition, manipulability also modulated naming latencies once the control variables of name agreement (i.e., H statistic), age of acquisition and visual complexity were taken into account. Specifically, items that were ranked with high manipulability in the rating study were named faster in the oral naming task. This result suggests that manipulability is a critical variable affecting speech production in picture naming, replicating recent findings [13,26,52]. An apparently unexpected outcome is the lack of a significant effect of lexical frequency, since traditionally this variable has been demonstrated to be a very reliable predictor of naming latencies [19,22]. In order to exclude the possibility that the lack of frequency effect might be due to the specific properties of the Italian database corpus we used (i.e., CoLFIS), further analyses were performed with Worldlex, a more up-to-date corpus [53]. The Worldlex corpus provides three different frequency measures based on Twitter posts, internet blogs and newspapers. Within the 357 items of our experimental set, Worldlex frequency measures were highly correlated with the frequency measures provided by the COLFIS corpus (0.89, 0.89 and 0.91, respectively). Three new multiple regressions with Twitter, blogs and newspapers Worldlex measures were performed. Again, no significant frequency effects were reported (p > .14). On the other hand, the lack of frequency effect appears congruent with recent findings in naming tasks which showed no lexical frequency effects when AoA was also included in the statistical analysis as a predictor. This suggests that AoA is a more reliable predictor of naming latencies and that it assimilates part of the effect tied to frequency [23,24,54], for a different approach see also [17,55]. In line with that, when in our analysis the variable AoA is excluded from the multiple regression model the effect of the frequency turns out to be significant (t = -4.375, p < .001), with faster naming latencies for more frequent words, for further discussion see [37,56,57]. Furthermore, the pattern of results we obtained in the naming task matches the main conclusions of a recent Bayesian meta-analysis [25]. In this meta-analysis, AoA and name agreement measures have a strong influence on naming latencies, while the influence of lexical frequency is unclear and visual complexity yields null effects.
Other studies have provided psycholinguistic indexes in Italian (e.g., [14,51]). For instance, Duñabeitia and colleagues provided a normative study with name agreement and visual complexity data for 750 color drawings in six different European Languages, including Italian [58]. However, to our knowledge, our study is the first to provide eight psycholinguistic indexes in Italian for such a high number of very ecological items (i.e., 357 quality color photographs). Examining all these variables in detail is of critical relevance in object naming research, as well as in other cognitive research domains, such as memory or object perception. Having wellcontrolled and ecological stimuli sets is just as important in clinical and neuropsychological domains [59], both to improve assessment procedures and to disclose which processing level can be the most impaired in patients' failures [11]. This normative study could help item selection for the design of experimental work and clinical trials.
Supporting information S1 Table. Normative psycholinguistic ratings for each item.