The impact of linguistic features on CTR in Instagram ads: A study of supplement and cosmetic products

Kenjiro Inoue; Mitsuo Yoshida

doi:10.1371/journal.pone.0338313

Abstract

This study analyzes linguistic features impacting click-through rate (CTR) in Japanese Instagram ads (21,692 ads; July 2021-June 2023, Meta’s Marketing API). CTR was computed as link clicks/impressions from Meta’s Ads Manager. Using J-LIWC2015, we quantified psycholinguistic dimensions, predominantly in Japanese. Multivariate regression models, controlling for caption length, log-transformed impressions, and product-level fixed effects, identified distinct linguistic patterns predicting CTR by product category. For supplement ads, “risk” and “discrepancy” positively impacted CTR; “motion” and “negative emotion” decreased it. For cosmetic ads, “see” , “positive emotion” , and “motion” were positive predictors, while “body” and “negative emotion” decreased it. These findings underscore the critical role of linguistic features in enhancing advertising impact when aligned with the psychological needs of target audiences. By leveraging these insights, marketers can develop data-driven communication strategies to optimize engagement on Instagram.

Citation: Inoue K, Yoshida M (2026) The impact of linguistic features on CTR in Instagram ads: A study of supplement and cosmetic products. PLoS One 21(4): e0338313. https://doi.org/10.1371/journal.pone.0338313

Editor: Ali Haider Mohammed, Universiti Monash Malaysia: Monash University Malaysia, MALAYSIA

Received: April 30, 2025; Accepted: November 20, 2025; Published: April 15, 2026

Copyright: © 2026 Inoue, Yoshida. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Our complete dataset has been made publicly available. You can access the data directly via the following URL/DOI: https://doi.org/10.5281/zenodo.18038706.

Funding: This work was supported by JST-Mirai Program Grant Number JPMJMI23B1, Japan. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: Kenjiro Inoue is an employee of Onestar, Inc. There are no patents, products in development or marketed products associated with this research to declare. This does not alter our adherence to PLOS ONE policies on sharing data and materials.

Introduction

In the contemporary digital economy, online advertising has become a primary channel for reaching consumers, with social media platforms like Instagram commanding a significant share of the market [1]. The visual-centric nature of Instagram has led to a substantial body of research focused on the impact of image and video content on user engagement [2]. Similarly, the effectiveness of algorithmic targeting and personalization has been a major area of academic and industry focus [3–5]. However, a critical component of advertising effectiveness, the linguistic content of the ad copy itself, remains comparatively underexplored. This research gap is particularly pronounced for non-English languages and in non-Western cultural contexts, where linguistic norms and persuasive appeals may differ significantly. This study aims to bridge this gap by systematically investigating how specific linguistic features within Japanese-language Instagram ad copy impact a key performance metric: the click-through rate (CTR). We anchor our investigation in established theories of persuasion, primarily the Elaboration Likelihood Model (ELM) [6] and Framing Theory [7]. Recent work has explored ELM’s applicability in emerging contexts like virtual influencers and high/low-involvement products [8]. ELM posits that persuasion occurs via two distinct routes: a central route, characterized by careful and thoughtful consideration of the argument’s merits, and a peripheral route, which relies on heuristic cues such as source credibility or emotional appeals. Framing Theory suggests that the way information is presented—as either a potential gain or a potential loss—can significantly alter its persuasive impact. We hypothesize that the impact of these routes and frames is not universal but is contingent on the product category. Specifically, we examine two distinct but prominent product categories in online advertising: health supplements and cosmetics. We propose that “supplement” products, which relate to health and are often considered higher-involvement purchases, may be more susceptible to persuasion via the central route and loss-framing (e.g., avoiding a health risk). In contrast, “cosmetic” products, which are often tied to aspirational goals and aesthetics, may be more effectively promoted through the peripheral route and gain-framing (e.g., achieving a more beautiful appearance). To test these propositions, this study pursues three primary objectives:

To quantify the prevalence of various psycholinguistic features, as defined by the Japanese Linguistic Inquiry and Word Count (J-LIWC2015) dictionary [9], in a large corpus of Japanese Instagram ads for supplement and cosmetic products.
To determine which of these linguistic features significantly predict ad performance (CTR) after controlling for potential confounding factors like ad length and brand-level effects.
To formally test whether the impact of these linguistic features on CTR differs significantly between the two product categories.

By analyzing a large-scale, real-world dataset of 21,692 ads, this study makes a significant contribution to the literature on computational advertising and consumer psychology. It is one of the first studies to empirically link J-LIWC2015 categories to a behavioral outcome metric (CTR) in the Japanese digital advertising ecosystem. The findings provide actionable, data-driven insights for marketers seeking to optimize their ad copy and offer empirical evidence for the context-dependent nature of persuasion in online environments. The relationship between these theoretical frameworks and our linguistic variables is further illustrated in Figure 1. This conceptual model guides our investigation by positing that specific psycholinguistic features (LIWC categories) impact CTR through distinct ELM processing routes and framing orientations, depending on the product category.

Download:

Fig 1. Conceptual model of linguistic feature impact on CTR.

This diagram illustrates how specific LIWC categories are hypothesized to impact Click-Through Rate (CTR) via Elaboration Likelihood Model (ELM) routes and Framing Theory, contingent on product category.

https://doi.org/10.1371/journal.pone.0338313.g001

Background

Linguistic features in advertising

The power of language in advertising is a foundational concept in marketing, dating back to early work on the psychology of selling [10]. While modern digital advertising research has often prioritized visual elements and targeting technology, the textual component remains a crucial interface for communicating value, evoking emotion, and prompting action [11]. Recent work highlights how influencer characteristics and live content impact impulsive buying in e-commerce [12]. Prior research has shown that specific word choices can significantly impact consumer attitudes and behaviors. For example, the use of emotional language can enhance ad effectiveness by creating a positive affective response, which then transfers to the brand itself [13]. Similarly, cognitive language that prompts analytical thought can be effective for products requiring more deliberate consideration [14]. However, much of this research has been conducted in controlled experimental settings or has focused on English-language advertising, leaving a gap in our understanding of these dynamics in large-scale, real-world, non-English environments. Table 1 summarizes seminal linguistic studies on online advertising.

Download:

Table 1. Prior linguistic studies on online advertising and their relevance. This table summarizes seminal studies investigating the impact of linguistic features on online ad performance. It outlines the authors, platforms, specific linguistic variables analyzed, key findings related to CTR or other performance metrics, and their direct relevance to the current study. The table highlights diverse approaches to text analysis in digital advertising, emphasizing how various linguistic elements contribute to consumer engagement across different contexts.

https://doi.org/10.1371/journal.pone.0338313.t001

Japanese cultural context and linguistic nuances in advertising

The effectiveness of linguistic strategies in advertising is deeply intertwined with cultural context. In Japan, communication often emphasizes subtlety, indirectness, and a high-context approach, contrasting with more direct communication styles prevalent in Western cultures [15]. This manifests in advertising through the nuanced use of language and non-verbal cues. For instance, while Western ads might employ direct emotional appeals, Japanese advertising frequently leverages rhetorical expressions like metaphor and metonymy to evoke feelings and convey product value indirectly through imagery and association, particularly in beauty and health sectors [15]. These approaches align with Japanese cultural values of harmony and implicit understanding.

Further, Japanese language boasts a rich vocabulary of onomatopoeic and mimetic expressions (known as EEEs - Embodied Emotional Expressions) which are intuitively used to convey nuanced emotions and sensations [16]. The pervasive “kawaii” (cute) culture also serves as a distinct persuasive strategy, where adorable characters and gentle language can bypass direct argumentation to create affective appeal and build trust [17]. Such culturally specific linguistic preferences underscore the importance of employing tools like J-LIWC2015, which is optimized for Japanese text, to accurately capture these psycholinguistic dimensions accurately. Moreover, in highly regulated sectors like health and cosmetics, Japanese advertising must navigate strict legal frameworks (e.g., Pharmaceutical and Medical Device Act, Act against Unjustifiable Premiums and Misleading Representations) that impact permissible linguistic claims, especially regarding health benefits and exaggerated expressions. Understanding these cultural and regulatory nuances is crucial for developing effective advertising strategies in the Japanese market.

LIWC as a tool for psychological text analysis

To systematically analyze the psychological dimensions of language at scale, researchers have increasingly turned to computational text analysis tools. One of the most established and validated tools is the the Linguistic Inquiry and Word Count (LIWC) [18,19]. LIWC is a dictionary-based software that analyzes text by calculating the percentage of words that impact into psychologically meaningful categories. These categories include affective processes (e.g., positive and negative emotion), cognitive processes (e.g., insight, causation), social processes, and various content-related themes (e.g., health, money, risk). Unlike complex machine learning models that can be difficult to interpret, LIWC’s dictionary approach provides transparent and theoretically grounded metrics. It has been successfully applied across numerous domains to link language patterns to psychological states and real-world outcomes, such as predicting political elections from tweets [20] or assessing public opinion [21]. This study utilizes the officially validated Japanese version, J-LIWC2015 [9], to apply this robust methodology to Japanese advertising copy.

Persuasion theories in advertising contexts

Our analysis is guided by two complementary theories of persuasion: the Elaboration Likelihood Model (ELM) [6] and Framing Theory [7]. ELM posits that persuasion occurs via two distinct routes: a central route, characterized by careful and thoughtful consideration of the argument’s merits, and a peripheral route, which relies on heuristic cues suchs as source credibility or emotional appeals. Framing Theory complements ELM by focusing on how information is presented [7]. A gain-frame impacts the positive outcomes of taking an action (e.g., “achieve radiant skin”), while a loss-frame impacts the negative consequences of not taking an action (e.g., “don’t let tired skin hold you back”). Research, including Protection Motivation Theory [22], suggests that for health-related behaviors, loss-framing can be particularly effective because it activates a desire to mitigate risk. Therefore, we expect that “supplement” ads may benefit from loss-framed language (e.g., LIWC categories like “risk”). Conversely, aspirational “cosmetic” products may benefit more from gain-framed language that highlights positive outcomes (e.g., LIWC categories like “posemo” and “achieve”). By integrating these theoretical frameworks, this study moves beyond simply identifying which words work, aiming to understand why they work in different product contexts.

Methods

Data collection

This study analyzes linguistic features impacting click-through rate (CTR) in Japanese Instagram ads (21,692 ads; July 2021-June 2023, Meta’s Marketing API). All data were collected from Meta’s Marketing API under standard developer agreements, ensuring compliance with their Terms of Service. Advertisements included were publicly accessible posts from Meta Business accounts. The anonymized dataset used in this study is available in the S1 File.

Inclusion and exclusion criteria

To ensure the relevance and consistency of our dataset, we applied specific inclusion and exclusion criteria. Data were collected from Meta’s Marketing API, a repository of advertisements run on Meta’s platforms. The collection period spanned two years, from July 1, 2021, to June 30, 2023. Our initial query targeted ads delivered to users in Japan. From this initial collection, we implemented a multi-step filtering process to construct our final analytical dataset. We first identified and isolated ads belonging to two of the most frequently advertised product categories: “supplement” and “cosmetic”. The selection of “supplement” and “cosmetic” categories was based on their high frequency within our dataset and their theoretical relevance to the Elaboration Likelihood Model and Framing Theory. These two categories also represent psychologically and commercially distinct segments—health-related versus beauty-oriented products—making them a theoretically meaningful and practically relevant comparison for testing differential persuasion mechanisms.

As shown in Table 2, these two categories accounted for the largest number of total ads and products, making them ideal for a focused comparative analysis. We focused our analysis on ads predominantly written in Japanese, ensuring their relevance to J-LIWC2015. We defined ‘predominantly written in Japanese’ as ad captions where at least 30% of the tokens (after preprocessing) were matched to the J-LIWC2015 dictionary, which is optimized for Japanese text. The 30% threshold was chosen to balance linguistic coverage and sample retention, ensuring sufficient representation of Japanese text without excessively excluding ads containing mixed or code-switched language. We did not apply a fixed token-based language filtering threshold to retain the real-world linguistic diversity of Instagram ad copy. To ensure that the ad copy was substantial enough for meaningful linguistic analysis, we applied an inclusion criterion requiring a minimum caption length of 5 words. To ensure CTR reliability, ads with missing click data and those with fewer than 50 clicks were excluded from the analysis. While the exclusion of ads with fewer than 50 clicks improves the stability of CTR estimates, it may also introduce bias by disproportionately filtering out smaller advertisers, which is noted as a limitation of this dataset. We did not exclude duplicate ad creatives. This rigorous filtering process resulted in a final dataset of 21,692 unique advertisements. Each ad was assigned a unique product ID. The dataset was composed of 12,206 “supplement” ads and 9,486 “cosmetic” ads. For each ad, we retained the full text of the primary caption, along with the total number of impressions and clicks. The distribution of advertisements across all collected product categories is summarized in Table 2. This table clearly indicates that “supplement” and “cosmetic” categories represent the largest proportion of advertisements within our collected data, justifying their selection for focused analysis in this study. It should be noted that this study focuses solely on the linguistic content of advertisements and does not include audience-level or targeting-related variables such as demographic segmentation, device type, or delivery optimization, which may also influence CTR.

Download:

Table 2. Distribution of advertisements across product categories on Instagram. This table shows the total number of advertisements and the average, median, and maximum number of ads per product for each category. The data indicates that “supplement” and “cosmetic” categories account for the largest proportion of advertisements.

https://doi.org/10.1371/journal.pone.0338313.t002

Ethical considerations

This study analyzed secondary, non-identifiable advertising data obtained under a standard developer agreement with Meta. Under the authors’ institutional guidelines, research that involves no interaction with human subjects and uses only de-identified data is exempt from Institutional Review Board oversight; therefore, no ethical approval was required. All data were collected from Meta Marketing API. The dataset used in this study is anonymized and contains no personally identifiable information. A critical requirement for this study is compliance with data source policies. All data collection and analysis strictly adhered to Meta’s Marketing API Terms of Service and Platform Policies, ensuring the proper use and handling of publicly accessible advertising data.

Linguistic feature extraction

We conducted linguistic preprocessing and feature extraction using the J-LIWC2015 dictionary. All ad texts were tokenized using MeCab with the IPA dictionary, and the frequency of each LIWC category was computed. Stopwords and platform-specific tokens such as hashtags and URLs were excluded prior to analysis. An overview of the preprocessing and feature extraction workflow is shown in Figure 2. We used word count as a control variable to account for message length. Full preprocessing details, including normalization schemes and illustrative tokenization examples, are provided in S1 Text (see S2 File).

Download:

Fig 2. Overview of the J-LIWC2015 preprocessing pipeline for ad texts.

This flowchart illustrates the four key steps involved in preparing Instagram ad texts for J-LIWC2015 analysis: 1) Notation normalization to unify linguistic variations, 2) Morphological analysis to segment Japanese sentences into morphemes, 3) Lemmatization to convert words to their base forms for accurate dictionary matching, and 4) Lexical quantification to compute the proportion of words belonging to each LIWC category.

https://doi.org/10.1371/journal.pone.0338313.g002

Measures

The primary outcome variable was the click-through rate (CTR), a standard indicator of user engagement in digital advertising. CTR was calculated for each advertisement as (Total Clicks / Total Impressions) × 100. CTR was treated as a continuous variable because it represents proportional engagement rather than a binary outcome, and modeling it continuously preserves more information about the variance in advertising performance. Given the typically skewed distribution of CTR, a logarithmic transformation was applied to approximate normality for linear regression. The small constant ensured that ads with zero CTR were retained in the analysis. The main predictor variables were the J-LIWC2015 category scores, representing the percentage of caption words matching each psycholinguistic category. All LIWC scores were standardized as z-scores (mean = 0, standard deviation = 1) to enable comparability of effect sizes across categories. Two covariates were included to isolate the linguistic effects. First, caption word count was controlled for to account for variation in message length. Second, product-level fixed effects were implemented using dummy variables for each product ID, controlling for unobserved, time-invariant characteristics such as brand equity, pricing, visual design, and targeting configuration.

Research design and statistical analysis

This study employs a quantitative, cross-sectional design to examine the relationship between linguistic features and advertising performance. Specifically, we address two primary research questions for Instagram advertisements within the “supplement” and “cosmetic” product categories:

RQ1: Do psycholinguistic features in Japanese Instagram ad copy predict click-through rates (CTR)?
RQ2: Does the predictive power of these psycholinguistic features differ between “supplement” and “cosmetic” product categories?

To answer these questions, we constructed a series of multivariate Ordinary Least Squares (OLS) regression models. Model 1 assessed the continuous effect of linguistic intensity by regressing log-transformed CTR on the standardized proportion of words in each J-LIWC2015 category. Separate models were estimated for each product type. The specification was:

where i indexes the advertisement, k indexes LIWC categories, denotes product-level fixed effects, and is the error term. Model 2 focused on threshold effects, using binary indicators of LIWC category presence as predictors, while retaining the same structure as Model 1. To investigate RQ2 directly, we also estimated models using the full dataset (N = 21,692) that included interaction terms between each LIWC feature and a product category dummy (1 = “cosmetic”, 0 = “supplement”):

A significant interaction coefficient () indicates differential effects of linguistic features by category.

To assess multicollinearity, we calculated Variance Inflation Factors (VIFs) for all predictors. To avoid redundancy from hierarchical LIWC categories, we excluded parent-level constructs and retained only subcategories. Maximum VIFs were 2.78 for Models 1 and 2 and 4.12 for interaction models, all below the threshold of 5. Robustness was verified through additional logistic regressions using a dichotomized CTR: ads in the top 25th percentile (CTR > 0.985%) were coded as 1 (“high CTR”), others as 0 (“low CTR”). This analysis serves as a robustness check, confirming that the key findings are not dependent on the choice of the regression model. Results largely confirmed the direction and statistical significance of the coefficients for the primary linguistic predictors observed in the OLS models. For supplements, “risk” and “discrepancy” retained positive significance; for cosmetics, “see”, “positive emotion”, and “motion” were positive predictors, while “body” and “negative emotion” had negative effects. These results, including coefficients and p-values, are detailed in S1 Appendix (Tables 5 and 6). Effects remained stable after adjusting for word count and log-impressions. Finally, to correct for multiple testing across LIWC categories, we applied the Benjamini-Hochberg procedure with a 5% false discovery rate. This adjustment mitigates the risk of Type I errors and improves the reliability of the reported findings.

Results

Descriptive statistics

A descriptive analysis of the J-LIWC2015 categories showed notable differences in the language typically used for each product type (Table 3; see also Figure 3). Supplement advertisements, on average, contained a higher proportion of words related to “Drives” (e.g., achievement, reward), “Relativity” (e.g., time, space), and, most notably, “Risk”. In contrast, cosmetic advertisements featured a significantly higher prevalence of words related to “Affective Processes” (especially “Positive Emotion”) and “Perceptual Processes” (especially “See”), indicating a greater focus on emotional appeal and visual outcomes.

Download:

Table 3. Descriptive statistics of key LIWC categories. This table presents the mean and standard deviation of the percentage of words per ad for key J-LIWC2015 categories, separated by product category (Supplement and Cosmetic).

https://doi.org/10.1371/journal.pone.0338313.t003

Download:

Fig 3. Violin plots showing the distribution of CTR for “supplement” and “cosmetic” categories.

The figure demonstrates that “supplement” advertisements impact a broader variability in CTR compared to “cosmetic” advertisements.

https://doi.org/10.1371/journal.pone.0338313.g003

RQ1: Linguistic predictors of CTR by category

The regression analyses identified several significant linguistic predictors of CTR for each product category, after controlling for word count and product-level fixed effects (Table 4 and Figure 4). For supplement advertisements (Model 1), the strongest positive predictor of CTR was the “Risk” category (), indicating that ad copy impacting health risks or concerns was highly effective at driving engagement. The “Discrepancy” category, which includes words that highlight a gap between a current and desired state (e.g., “lack”, “should”), was also a significant positive predictor (). Conversely, words related to “Motion” () and “Negative Emotion” () were negatively associated with CTR.

Download:

Table 4. OLS regression results: Linguistic features impacting log-transformed CTR. This table presents standardized coefficients (β) from three OLS regression models. Model 1 (Supplement) and Model 2 (Cosmetic) show category-specific effects. The pooled interaction model identifies significant differences in linguistic effects by product category. All models control for word count and product-level fixed effects. Significance (Benjamini-Hochberg FDR corrected q-values) is indicated. Standard errors are in parentheses for the pooled interaction model. Full model fit statistics (R², Adj R², AIC, BIC) are provided, including null model benchmarks for comparison.

https://doi.org/10.1371/journal.pone.0338313.t004

Download:

Fig 4. Standardized coefficients (Beta) of key linguistic features on CTR.

This forest plot displays the standardized regression coefficients (β) and 95% confidence intervals for the effect of each J-LIWC2015 category on log-transformed CTR, separated by product category (Supplement vs. Cosmetic).

https://doi.org/10.1371/journal.pone.0338313.g004

For cosmetic advertisements (Model 2), a different set of features emerged as dominant. The strongest positive predictor was the “See” category (), which includes words related to visual appearance (e.g., “shine”, “bright”, “flawless”), strongly supporting the idea that visual and aesthetic benefits are key drivers of engagement. The “Positive Emotion” category () also showed a statistically significant positive association. Interestingly, and in direct contrast to supplements, “Motion”-related words were positively associated with CTR in this context (). Language that directly referenced the “Body” () was negatively associated with CTR. The “Negative Emotion” category () was also negatively associated with CTR. These results provide a direct answer to RQ1, demonstrating that specific psycholinguistic features in Japanese Instagram ad copy significantly predict CTR, and these effects vary distinctly across product categories. The results highlight a distinct contrast in persuasive strategies: while supplements benefit from language addressing risks and discrepancies, cosmetics thrive on visual and emotional appeals. This stark difference underscores the category-specific nature of effective ad copy.

RQ2: Interaction between language and product category

The pooled regression model (referred to as the interaction model in Table 4) confirmed that the impacts of several key linguistic features were statistically different across the two product categories, as supported by a joint Wald test (, df = 7, p < 0.001). The interaction term “Risk x Cosmetic” was significantly negative (), indicating that the positive impact of risk-related language on CTR was significantly weaker for cosmetics compared to supplements. Conversely, the interaction term “See x Cosmetic” was significantly positive (), confirming that the positive impact of visual, perceptual language was significantly stronger for cosmetics. Most strikingly, the “Motion x Cosmetic” interaction was large and highly significant (), formally demonstrating that the impact of motion-related words reverses direction between the two categories: it is detrimental for supplement ads but beneficial for cosmetic ads. The interaction term “Body x Cosmetic” was also significant and negative (), indicating that the impact of “Body” language observed in cosmetic ads was significantly weaker (or negative) when compared to supplements (where “Body” was not a strong predictor). These significant interaction effects provide strong evidence that the optimal linguistic strategy for driving user engagement is highly dependent on the product being advertised. The significant interaction effects observed in this model directly address RQ2, providing empirical evidence that the impact of linguistic features on CTR is contingent on the product category. This finding reveals a key reversal in linguistic effectiveness: language that is negatively associated with CTR for supplement ads can be highly beneficial for cosmetic ads, and vice versa. This provides strong empirical support for the theoretical proposition that optimal linguistic strategies are context-dependent.

Discussion

This study’s findings suggest that the linguistic content of Instagram advertisements is a significant predictor of user engagement, with its impact notably moderated by product category. The results indicate two potentially distinct persuasive pathways for “supplement” and “cosmetic” products, which can be interpreted through the theoretical lenses of the Elaboration Likelihood Model (ELM) and Framing Theory. Our work extends these established frameworks by providing empirical evidence from a large-scale, non-Western context, demonstrating how specific psycholinguistic features activate distinct central/peripheral processing routes and loss/gain framing orientations depending on the product category.

For supplement products, the data appear to be consistent with a persuasion model based on central-route processing and loss-framing. The prominence of “Risk” as a positive predictor of CTR aligns with Protection Motivation Theory. The “Discrepancy” category also showed a statistically significant positive association, suggesting a persuasive strategy that highlights gaps between current and ideal states. This type of messaging encourages a deliberate, cognitive evaluation of the product as a solution, characteristic of the central route to persuasion. In contrast, for cosmetic products, results suggest a more reward-seeking and aspirational persuasion model. The strong positive impact of “See” (perceptual) is consistent with an emphasis on aesthetic appeal, while “Positive Emotion” also showed a significant positive association. These findings highlight a clear divergence in persuasive appeals, with supplements leveraging cognitive, problem-solving language and cosmetics relying on visual and emotional cues to drive engagement. Language directly referencing the “Body” was negatively associated with CTR, possibly indicating that explicit mentions of physical flaws or discomfort may deter engagement. Interestingly, and reversing its effect in supplements, “Motion” words were positively associated with CTR for cosmetics; this could support an interpretation where action-oriented language frames the product as part of a dynamic, aspirational lifestyle, serving as an attractive peripheral cue. This reversal may reflect a difference in the underlying motivational orientation of consumers. For cosmetics, motion-related language may resonate with approach-oriented goals such as self-enhancement and transformation, whereas for supplements, similar expressions might imply physical strain or risk, leading to avoidance-oriented interpretations. The “Negative Emotion” category was also negatively associated with CTR, implying it can deter clicks in this context as well.

The significant interaction effects provide empirical support for the core theoretical argument of this paper: that a one-size-fits-all approach to crafting effective ad copy may be suboptimal. The findings suggest that the optimal linguistic strategy is likely contingent on the consumer’s psychological mindset, which seems to be intrinsically linked to the product category. Supplements appear to appeal to a problem-solving, risk-averse mindset, whereas cosmetics may resonate with an aspirational, reward-seeking one.

From a managerial perspective, these findings offer actionable, data-driven insights that can inform ad-copy optimization for digital marketers and copywriters. To improve effectiveness, advertising language could be tailored to the psychological motivations associated with each product category. For supplement products, ad copy could be framed to present the product as a concrete solution to health-related concerns or as a tool to bridge a wellness gap. Language that evokes a sense of urgency or protection appears to be effective. For example, rather than using generic phrases such as “Feel great,” more impactful language might be “Don’t let fatigue compromise your health—reduce your risk with our daily supplement.” The observed positive effect of discrepancy-related language further supports this potential strategy. For cosmetic products, marketers could benefit from prioritizing sensory and visually descriptive language, as words associated with perceptual processes (e.g., “See”) were among the strongest predictors of CTR in this category. While positive emotional language also demonstrated a significant effect, its impact was smaller in comparison, suggesting it may serve a complementary role. Additionally, dynamic and action-oriented language (e.g., “Motion”) can help frame the product as part of an aspirational lifestyle. In contrast, language that directly references the physical body was negatively associated with CTR, indicating that overt references to physical flaws may deter engagement. Instead, messages should focus on positive and aspirational outcomes. Finally, the use of negatively valenced language might be approached with caution. For supplements, narrowly framed risk language can be effective, but our findings suggest that broader expressions of negative emotion may reduce CTR. Similarly, for cosmetics, negative emotional cues also appear to decrease engagement. Together, these principles suggest the value of A/B testing different linguistic frames and highlight the potential need for context-specific experimentation to optimize ad performance.

The potential power of persuasive language, particularly fear- and risk-based appeals, warrants a discussion of ethical responsibilities. While our findings indicate that risk-framing can increase engagement for supplement ads, marketers have a responsibility to avoid overstating risks or making unsubstantiated health claims, which can mislead consumers and generate undue anxiety. Regulatory bodies and platform policies already place restrictions on health-related advertising, and our findings reinforce the importance of these guidelines. We believe effective marketing should be built on transparency and accuracy, using persuasive frames to highlight genuine benefits rather than to exploit consumer vulnerabilities. In this regard, ethical advertising practice requires aligning persuasive strategies with public health interests, ensuring that data-driven insights are used to inform responsible communication rather than to manipulate consumer behavior.

Unlike prior LIWC-based advertising studies conducted primarily in English and on Western platforms, this research extends psycholinguistic text analysis to a large-scale, non-English corpus within the Japanese advertising ecosystem. By doing so, it demonstrates the cross-cultural generalizability of LIWC constructs while also revealing culturally contingent linguistic effects that have not been documented in Western contexts. Furthermore, as digital platforms increasingly rely on algorithmic targeting, our findings underscore the importance of consumer protection and ethical data use in optimizing persuasive communication. Methodologically, this study demonstrates the feasibility of scalable psycholinguistic analysis in a non-English context, offering a framework that can be extended to other languages and markets.

In sum, this study provides large-scale empirical evidence that bridges computational psycholinguistics with digital marketing strategy, demonstrating that effective ad language depends on both psychological appeal and product category. By applying J-LIWC2015 to a large corpus of over 21,000 Japanese ads, our work offers a scalable and theoretically-grounded methodology for optimizing ad copy in diverse cultural contexts.

Limitations and future research

This study is subject to several limitations that also present opportunities for future research. The most significant limitation is that our analysis relied solely on J-LIWC2015, which quantifies psycholinguistic dimensions based on tokenized words. As a dictionary-based approach, it can miss contextual nuance (e.g., irony, sarcasm), which may attenuate or obscure certain effects. This lexical scope means our models inherently exclude persuasive cues conveyed through multimodal signals, such as emojis and images, as well as instances of English code-switching within Japanese captions. Future studies should integrate computer-vision features or cross-lingual embeddings to enrich the understanding of multimodal persuasion. Another key limitation is that our current analysis does not account for specific audience characteristics, such as age, gender, or interests. Future studies would benefit from incorporating audience-level data to test for interaction effects between linguistic features and user profiles. Lastly, the cross-sectional nature of our data limits our ability to infer causality. Longitudinal studies or controlled experiments would be valuable for establishing causal relationships between linguistic features and advertising effectiveness over time.

Conclusion

This study provides large-scale, empirical evidence that the linguistic features of ad copy are a critical driver of user engagement on Instagram, demonstrating that the path to persuasion is not universal but is contingent on the product category. It addresses both our research questions: for health supplements, language that addresses risk or discrepancy is most effective, while for cosmetics, language that evokes positive emotions and appeals to visual outcomes drives the highest engagement. These findings not only contribute to our theoretical understanding of persuasion in digital environments but also offer actionable guidance for marketers to create more resonant and effective advertising in a crowded digital marketplace.

Supporting information

S1 Appendix. Appendix A: Robustness analyses.

https://doi.org/10.1371/journal.pone.0338313.s001

(DOCX)

S1 File. Instagram advertising dataset.

This dataset contains anonymized caption texts, product category labels (supplement or cosmetic), impressions, clicks, and computed CTR values used in the analyses. Brand identifiers and sensitive information have been masked to comply with privacy requirements.

https://doi.org/10.1371/journal.pone.0338313.s002

(CSV)

S2 File. Text preprocessing details.

This document provides full details of the preprocessing pipeline applied to Japanese Instagram ad captions, including normalization, morphological analysis, lemmatization, removal of non-linguistic elements, and lexical quantification using J-LIWC2015. The procedures correspond to the linguistic feature extraction methods described in the main text.

https://doi.org/10.1371/journal.pone.0338313.s003

(DOCX)

References

1. Choi H, Mela CF, Balseiro SR, Leary A. Online display advertising markets: a literature review and future directions. ISR. 2020;31(2):556–75.
- View Article
- Google Scholar
2. Rizomyliotis I, Zafeiriadis D, Konstantoulaki K, Giovanis A. Optimal Instagram advertising design features. A study on brand image and millennials’ purchase intention. IJIMA. 2021;15(4):394.
- View Article
- Google Scholar
3. Bleier A, Eisenbeiss M. Personalized online advertising effectiveness: the interplay of what, when, and where. Mark Sci. 2015;34(5):669–88.
- View Article
- Google Scholar
4. Bleier A, Eisenbeiss M. The importance of trust for personalized online advertising. J Retail. 2015;91(3):390–409.
- View Article
- Google Scholar
5. Goldfarb A, Tucker CE. Privacy regulation and online advertising. Manag Sci. 2011;57(1):57–71.
- View Article
- Google Scholar
6. Petty RE, Cacioppo JT. The elaboration likelihood model of persuasion. Elsevier; 1986. p. 123–205.
7. Tversky A, Kahneman D. The framing of decisions and the psychology of choice. Science. 1981;211(4481):453–8. pmid:7455683
- View Article
- PubMed/NCBI
- Google Scholar
8. Park S, Wei X, Lee H. Revisiting the elaboration likelihood model in the context of a virtual influencer: a comparison between high‐ and low‐involvement products. J Consum Behav. 2023;23(4):1638–52.
- View Article
- Google Scholar
9. Sayinzoga F, Lundeen T, Musange SF, Butrick E, Nzeyimana D, Murindahabi N, et al. Assessing the impact of group antenatal care on gestational length in Rwanda: a cluster-randomized trial. PLoS One. 2021;16(2):e0246442. pmid:33529256
- View Article
- PubMed/NCBI
- Google Scholar
10. Strong EK. The psychology of selling and advertising. New York: McGraw-Hill; 1925.
11. Kotler P, Armstrong G. Principles of marketing. 17th ed. Harlow, England: Pearson Education; 2018.
12. Shao Z. How the characteristics of social media influencers and live content influence consumers’ impulsive buying in live streaming commerce? The role of congruence and attachment. JRIM. 2023;18(3):506–27.
- View Article
- Google Scholar
13. Brown SP, Homer PM, Inman JJ. A meta-analysis of relationships between ad-evoked feelings and advertising responses. J Mark Res. 1998;35(1):114.
- View Article
- Google Scholar
14. Morris JD, Woo C, Singh AJ. Elaboration likelihood model: a missing intrinsic emotional implication. J Target Meas Anal Mark. 2005;14(1):79–98.
- View Article
- Google Scholar
15. Shen Z, Pang B, Li X, Chen Y. An exploration of Japanese cultural dynamics communication practices through social pragmatics. JPDA. 2024;3(1):60–72.
- View Article
- Google Scholar
16. Murata A, Zhou Y, Watanabe J. Embodied emotional expressions for intuitive experience sampling methods: a demographic investigation with Japanese speakers. Int J Wellbeing. 2024;14(1):1–17.
- View Article
- Google Scholar
17. Bîrlea OM. Soft power: ‘Cute Culture’, a persuasive strategy in Japanese advertising. Trames J Humanit Soc Sci. 2023;27(3):311.
- View Article
- Google Scholar
18. Pennebaker JW, Boyd RL, Jordan K, Blackburn K. The development and psychometric properties of LIWC2015. University of Texas at Austin; 2015. https://doi.org/10.15781/T29G6Z
19. Boyd RL, Ashokkumar A, Seraj S, Pennebaker JW. The development and psychometric properties of LIWC-22. 2022.
- View Article
- Google Scholar
20. Tumasjan A, Sprenger T, Sandner P, Welpe I. Predicting elections with Twitter: what 140 characters reveal about political sentiment. ICWSM. 2010;4(1):178–85.
- View Article
- Google Scholar
21. Frimer JA, Aquino K, Gebauer JE, Zhu LL, Oakes H. A decline in prosocial language helps explain public disapproval of the US Congress. Proc Natl Acad Sci U S A. 2015;112(21):6591–4. pmid:25964358
- View Article
- PubMed/NCBI
- Google Scholar
22. Rogers RW. A protection motivation theory of fear appeals and attitude change. J Psychol. 1975;91(1):93–114. pmid:28136248
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Choi H, Mela CF, Balseiro SR, Leary A. Online display advertising markets: a literature review and future directions. ISR. 2020;31(2):556–75.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Rizomyliotis I, Zafeiriadis D, Konstantoulaki K, Giovanis A. Optimal Instagram advertising design features. A study on brand image and millennials’ purchase intention. IJIMA. 2021;15(4):394.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Bleier A, Eisenbeiss M. Personalized online advertising effectiveness: the interplay of what, when, and where. Mark Sci. 2015;34(5):669–88.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Bleier A, Eisenbeiss M. The importance of trust for personalized online advertising. J Retail. 2015;91(3):390–409.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Goldfarb A, Tucker CE. Privacy regulation and online advertising. Manag Sci. 2011;57(1):57–71.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Petty RE, Cacioppo JT. The elaboration likelihood model of persuasion. Elsevier; 1986. p. 123–205.

[ref7] 7. Tversky A, Kahneman D. The framing of decisions and the psychology of choice. Science. 1981;211(4481):453–8. pmid:7455683
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref8] 8. Park S, Wei X, Lee H. Revisiting the elaboration likelihood model in the context of a virtual influencer: a comparison between high‐ and low‐involvement products. J Consum Behav. 2023;23(4):1638–52.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref9] 9. Sayinzoga F, Lundeen T, Musange SF, Butrick E, Nzeyimana D, Murindahabi N, et al. Assessing the impact of group antenatal care on gestational length in Rwanda: a cluster-randomized trial. PLoS One. 2021;16(2):e0246442. pmid:33529256
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref10] 10. Strong EK. The psychology of selling and advertising. New York: McGraw-Hill; 1925.

[ref11] 11. Kotler P, Armstrong G. Principles of marketing. 17th ed. Harlow, England: Pearson Education; 2018.

[ref12] 12. Shao Z. How the characteristics of social media influencers and live content influence consumers’ impulsive buying in live streaming commerce? The role of congruence and attachment. JRIM. 2023;18(3):506–27.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref13] 13. Brown SP, Homer PM, Inman JJ. A meta-analysis of relationships between ad-evoked feelings and advertising responses. J Mark Res. 1998;35(1):114.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref14] 14. Morris JD, Woo C, Singh AJ. Elaboration likelihood model: a missing intrinsic emotional implication. J Target Meas Anal Mark. 2005;14(1):79–98.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref15] 15. Shen Z, Pang B, Li X, Chen Y. An exploration of Japanese cultural dynamics communication practices through social pragmatics. JPDA. 2024;3(1):60–72.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref16] 16. Murata A, Zhou Y, Watanabe J. Embodied emotional expressions for intuitive experience sampling methods: a demographic investigation with Japanese speakers. Int J Wellbeing. 2024;14(1):1–17.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref17] 17. Bîrlea OM. Soft power: ‘Cute Culture’, a persuasive strategy in Japanese advertising. Trames J Humanit Soc Sci. 2023;27(3):311.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref18] 18. Pennebaker JW, Boyd RL, Jordan K, Blackburn K. The development and psychometric properties of LIWC2015. University of Texas at Austin; 2015. https://doi.org/10.15781/T29G6Z

[ref19] 19. Boyd RL, Ashokkumar A, Seraj S, Pennebaker JW. The development and psychometric properties of LIWC-22. 2022.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref20] 20. Tumasjan A, Sprenger T, Sandner P, Welpe I. Predicting elections with Twitter: what 140 characters reveal about political sentiment. ICWSM. 2010;4(1):178–85.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref21] 21. Frimer JA, Aquino K, Gebauer JE, Zhu LL, Oakes H. A decline in prosocial language helps explain public disapproval of the US Congress. Proc Natl Acad Sci U S A. 2015;112(21):6591–4. pmid:25964358
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref22] 22. Rogers RW. A protection motivation theory of fear appeals and attitude change. J Psychol. 1975;91(1):93–114. pmid:28136248
View Article
PubMed/NCBI
Google Scholar

[60] View Article

[61] PubMed/NCBI

[62] Google Scholar

Figures

Abstract

Introduction

Background

Linguistic features in advertising

Japanese cultural context and linguistic nuances in advertising

LIWC as a tool for psychological text analysis

Persuasion theories in advertising contexts

Methods

Data collection

Inclusion and exclusion criteria

Ethical considerations

Linguistic feature extraction

Measures

Research design and statistical analysis

Results

Descriptive statistics

RQ1: Linguistic predictors of CTR by category

RQ2: Interaction between language and product category

Discussion

Limitations and future research

Conclusion

Supporting information

S1 Appendix. Appendix A: Robustness analyses.

S1 File. Instagram advertising dataset.

S2 File. Text preprocessing details.

References