Table 1.
Summary statistics for product innovators, process innovators, innovators as well as firms with innovation expenditures.
Table 2.
Features related to text, meta information and network measures.
Table 3.
Descriptive statistics for selected variables.
Fig 1.
Average occurrence of different emerging technology terms on firm websites with and without product innovations.
For instance, the emerging technology term virtual reality appears on nearly 2 percent of all product innovator websites, but only on approximately 0.75 percent of all non-product innovator websites. Emerging technology terms not appearing on firm websites are not illustrated. The y-axis has a scale break at 2 percent.
Table 4.
Content of the LDA topics with the strongest relationship to MIP-based innovation indicators.
Fig 2.
Differences in the topic share of the top 10 topics with the strongest correlation with MIP-based innovation indicators on average.
For instance, the LDA topic 98 has an average share of 10 percent in a document if a firm has innovation expenditure, compared to merely 6 percent if a firm does not have innovation expenditure.
Table 5.
Results for Random Forest classification models using different feature sets and target variables.
Evaluation metrics are presented for the test sample.
Fig 3.
Feature importance values for ‘all’ feature models.
For instance, a value that is two times larger implies that the mean decrease in impurity of the related feature is twice as high. Product innovators (top left), process innovators (top right), innovators (bottom left) and firms with innovation expenditures (bottom right) as target variable.