The Language of Innovation

doi:10.1371/journal.pone.0230107

Fig 1.

Skip Gram structure.

At each step of the training a batch of random sets of technological codes is extracted from the patents of the training corpus. In each of these sets, one code is taken out and becomes the input to be passed to the neural net while the remaining codes form the target context that the network learns to predict. The embedding matrix E maps the input code to the hidden layer and the decoding matrix D is used to calculate the probability of the context through a softmax normalization. The neural net is trained to maximize such probability for each input—context couple of the batch at each step, thus making the optimization stochastic. More details can be found in the Result section.

More »

Expand

Fig 2.

Relation between CS and Patenting activity, examples from 3 different sectors.

Top panel displays the CS for three couples of codes. The shaded gray area represents the one standard deviation interval around the CS average value taken on all possible couples of codes. Bottom panel shows a typical pattern of rise and fall of popularity of innovative couples of codes. In both panels, the Red Line indicates the first year in which the two codes have been used together. A strong rise in CS is a precursor of patenting activity.

More »

Expand

Fig 3.

Future co-occurrences mean value distribution.

Potential innovations classified and ranked according to their CS. Similar couples are more likely to be patented together in the close future.

More »

Expand

Fig 4.

Prediction power of radical innovation events measured by the ROC AUC for different CI ratios, left 0.05% right 0.25%.

In blue the DP classifier, in green the CS and in red the SS classifier. The classifiers are trained in 5 years windows and tested out-of-the-sample over a 5-years-long testing set. In the top panel the testing set immediately follows the training set. CS performs systematically better than DP and the SS classifier performs better of the CS and DP alone, demonstrating how the CS is grasping a semantic structure that is uncorrelated with the popularity of the codes. In the bottom panel we fix our attention on the embeddings learnt in the 1990-1994 training set and move the beginning of the testing set window in the future with an increasing delay, to test the performance of the CS and DP predictors in the far future. The results show how CS performs better the farthest in the future we test it while the the prediction power of DP drops.

More »

Expand

Table 1.

Indirect measures performance.

We show the performance of the most common indirect measures in sliding windows 1990-1999 evaluated through the ROC AUC and the best F1-Score at the two class imbalance ratio discussed in Fig 4.

More »

Expand

Fig 5.

Velocity field of couples of technological codes.

The left panel shows the velocity field integrated from all trajectories in the similarity-popularity plane. The left panel shows the same velocity field with a focus only on the positive (top) or negative (bottom) trend, for which only trajectories in a positive (top) or negative (bottom) trend have been integrated. Highlighted in the left panel, we show some example of real trajectories of couples of codes. Trajectory 1: B60R0021-C09D0007, automotive technology. Trajectory 2: B41J0002-H01C0007, typewriters. Trajectory 3: C04B0035-H01B0012, superconductors. Trajectory 4: C01G0001-H01B0012, superconductors. Trajectory 5: G06Q0020-G06Q0030, e-commerce.

More »

Expand

Fig 6.

Probability of not being decommissioned as a function of the starting point.

The figure shows the probability that trajectories have to avoid the decommissioning region as a function of their starting area. We have focused only on trajectories for which we have a value of CS and popularity every year, (namely those present in all sliding-windows) to reduce the noise due to mixing trajectories ending in different years.

More »

Expand