Patterns of co-occurrent skills in UK job adverts

doi:10.1371/journal.pcsy.0000028

Fig 1.

Closeness centrality and betweenness centrality in the skills network.

Skills network with nodes colored by (a) closeness centrality rank and (b) betweenness centrality rank. Yellow indicates high centrality rank, while green indicates low centrality rank. (c) The scatter plot between both centralities shows moderate correlation between them. Some highly mentioned skills with high (red dots) and low (green dots) centrality are indicated.

More »

Expand

Fig 2.

Optimal skill clusters at different resolutions as obtained by Markov Stability (MS) on the sparsified skills graph .

MS identifies five robust graph partitions of increasing coarseness, from 189 clusters to 4 clusters, as indicated by minima of the Block NVI (points on the purple line) [36–39]. The partitions of the skills network into 21, 7 and 4 skill clusters (with nodes coloured according to their cluster) are shown at the top. The corresponding skill clusters, and their quasi-hierarchical structure, are summarised in the Sankey diagram in Fig 3. For further details on MS, see S1A Appendix.

More »

Expand

Fig 3.

Sankey diagram capturing the multiscale structure of the skill clusters at different levels of resolution.

The optimal MS partitions into 21, 7 and 4 skill clusters (MS21, MS7, MS4) are presented together using a Sankey diagram, with summary labels obtained from the skills using Llama 2. Note that the quasi-hierarchical structure of skill co-occurrences is not imposed by the clustering method, but emerges naturally from the intrinsic co-occurrence patterns in the data, thus revealing the consistency of broader categories of skill requirements within adverts.

More »

Expand

Fig 4.

Data-driven skill clusters.

(a) Co-occurrene skills network coloured according to the MS21 partition into 21 skill clusters. (b) Heatmap summarising four properties (rows) for each of the 21 skill clusters (columns). Each row is normalised by its maximum. (c) For each of the 21 clusters in MS21, we show: the cluster in he skills network; a word cloud with all the skills in the cluster, where font size reflects the eigenvector centrality of each skill; and the list of the top 5 most frequent skills in the cluster.

More »

Expand

Fig 5.

Characterisation of the skill clusters.

Boxplots for the skill cluster distributions of: (a) closeness centrality, (b) containment, and (c) within-cluster semantic similarity. The scatter plots represent (for each cluster): (d) median closeness centrality vs. containment, (e) semantic similarity vs. containment, and (f) semantic similarity a vs. closeness centrality.

More »

Expand

Fig 6.

Coverage of the co-occurrence matrix K for the MS21 clusters.

Coverage and containment have opposite meanings: the ‘Software Development Technologies’ cluster has high self-containment (i.e., low values of its coverage), and is especially unlikely to co-occur with ‘Sales and Customer Relationship’ or ‘Hospitality and Food Industry’. The high values of the coverage for most skill clusters underscore the absence of skill silos.

More »

Expand

From job postings to the clustering of a network of co-occurrent skills.

(a) the data preparation including the extraction of skill co-occurrence from metadata, skill matching to Lightcast taxonomy and dimensionality reduction using MCA; (b) the graph-based clustering including the sparsification of the complete cosine similarity graph and the multiscale clustering with Markov Stability; and (c) the descriptive analysis on the optimal clustering with 21 partitions using LLM, nodal containment and closeness centrality.

More »

Expand