Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Identification of pattern mining algorithm for rugby league players positional groups separation based on movement patterns

  • Victor Elijah Adeyemo ,

    Roles Conceptualization, Data curation, Investigation, Methodology, Project administration, Visualization, Writing – original draft

    v.adeyemo@leedsbeckett.ac.uk

    Affiliations School of Built Environment, Engineering and Computing, Leeds Beckett University, Leeds, United Kingdom, Carnegie School of Sport, Leeds Beckett University, Leeds, United Kingdom, England Performance Unit, Rugby Football League, Manchester, United Kingdom, Leeds Rhinos Rugby League Club, Leeds, United Kingdom

  • Anna Palczewska,

    Roles Conceptualization, Supervision, Writing – review & editing

    Affiliation School of Built Environment, Engineering and Computing, Leeds Beckett University, Leeds, United Kingdom

  • Ben Jones,

    Roles Conceptualization, Supervision, Writing – review & editing

    Affiliations Carnegie School of Sport, Leeds Beckett University, Leeds, United Kingdom, England Performance Unit, Rugby Football League, Manchester, United Kingdom, Leeds Rhinos Rugby League Club, Leeds, United Kingdom, School of Behavioural and Health Science, Faculty of Health Sciences, Australian Catholic University, Brisbane, QLD, Australia, Division of Physiological Sciences and Health through Physical Activity, Lifestyle and Sport Research Centre, Department of Human Biology, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa

  • Dan Weaving

    Roles Conceptualization, Supervision, Writing – review & editing

    Affiliation Carnegie School of Sport, Leeds Beckett University, Leeds, United Kingdom

Abstract

The application of pattern mining algorithms to extract movement patterns from sports big data can improve training specificity by facilitating a more granular evaluation of movement. Since movement patterns can only occur as consecutive, non-consecutive, or non-sequential, this study aimed to identify the best set of movement patterns for player movement profiling in professional rugby league and quantify the similarity among distinct movement patterns. Three pattern mining algorithms (l-length Closed Contiguous [LCCspm], Longest Common Subsequence [LCS] and AprioriClose) were used to extract patterns to profile elite rugby football league hookers (n = 22 players) and wingers (n = 28 players) match-games movements across 319 matches. Jaccard similarity score was used to quantify the similarity between algorithms’ movement patterns and machine learning classification modelling identified the best algorithm’s movement patterns to separate playing positions. LCCspm and LCS movement patterns shared a 0.19 Jaccard similarity score. AprioriClose movement patterns shared no significant Jaccard similarity with LCCspm (0.008) and LCS (0.009) patterns. The closed contiguous movement patterns profiled by LCCspm best-separated players into playing positions. Multi-layered Perceptron classification algorithm achieved the highest accuracy of 91.02% and precision, recall and F1 scores of 0.91 respectively. Therefore, we recommend the extraction of closed contiguous (consecutive) over non-consecutive and non-sequential movement patterns for separating groups of players.

Introduction

Big data in sports are often gathered through wearable sensors such as Global Positioning Systems [GPS] [1] and video-sourced match events extracted by an expert analyst(s) [2]. Importantly, wearable and video-based forms of sport-related big data usually exist as a collection of ordered sequences of match activities [3, 4]. These provide information regarding “when”, “who”, “what”, and “where” activities occurred [3]. Wearable sensors are worn by players to collect on-field activities (e.g., positional data, speeds) during training and or competition. Data collected via wearable sensors or video-sourced match events facilitate the creation of performance indicators for understanding physical, technical and tactical demands on players [5]. Examples of the use of performance indicators include preparing athletes for transition between levels [6], identification of skills for talent development [7], injury prevention and recovery [8], opposition analysis [3] and classification of players into competition levels based on playing positions [9] and they have been widely used to derive actionable insights for making data-driven decisions. However, the insights provided by these performance indicators are currently based on aggregated physical, technical and tactical demands either across a whole match or for specific periods within the game. This can be inadequate because each performance indicator typically accounts for a single activity or event (e.g., pass, shots, total distance covered, average match speed) without providing the context and or explanation of how such activity was performed.

Frequent pattern mining algorithms have been applied in sports in various contexts such as automatic tactics detection in soccer matches [10], athlete performance monitoring [11] and discrimination of non-scoring and scoring outcomes between attacking and defending rugby union teams [12]. Nowadays, sequential pattern mining algorithms [13, 14] are applied to sports data to profile players’ movements. Player movement profiling has become an interesting research area because it offers an alternative view to understanding match demands by concurrent evaluation of the speeds, changes in speeds and turning angles completed by players at any point in time. It helps to identify frequent groups of movements performed by players and uncover how often those groups of movements were performed. Sweeting et. al. [13] proposed the first framework for player movement profiling. The authors profiled players’ movement by finding movement sequences that occurred frequently from elite international-level female netball players’ Radio Frequency (RF) data during four competitive matches. The method is based on grouping similar movement strings into 25 clusters using the hierarchical clustering technique and applying the longest common subsequence (LCS) [14] algorithm to extract the longest common movement patterns from each cluster.

White et. al. [15] suggested that the Sweeting et. al. [13] framework is not stable because it produces different movement patterns for the same set of movement strings in consecutive runs. Thus, they addressed it by developing the Sequential Movement Pattern-mining (SMP) framework and ran stability tests of both frameworks on the same set of rugby league elite players’ movement strings. The SMP framework was more stable between the two existing and evaluated frameworks for profiling athletes’ movement patterns. The SMP framework was applied by Collins et. al. [16] to quantify movement patterns and identify the differences among three rugby league competitions (i.e., International Rugby League, Super League (semi-)Finals and Super League regular season). The study [16] reported that no movement pattern was unique to a single competition level. The analysis of decomposed extracted movement patterns (i.e., movement units) using linear discriminant analysis (LDA) revealed low velocities with mixed turning angles and acceleration characterized the movement units that most differentiate the competitions. Despite the robustness and stability of the SMP framework for discovering movement sequences, the total number of obtainable extracted patterns is limited to the number of identified clusters. More importantly, only the longest common pattern per cluster is outputted while other interesting patterns are discarded. This may have influenced the profiled movement patterns and units that differentiate the competition levels in Collins et. al. [16] study.

The study of Adeyemo et. al. [17] addressed the above-mentioned limitations of the SMP framework [15] by proposing and developing a new algorithm called l-length closed contiguous sequential pattern mining algorithm (i.e. LCCspm). We developed the algorithm because the existing Closed Contiguous Sequential Pattern mining (CCSpan) algorithm [18] may not produce usable movement patterns for sporting contexts and could not scale well on large sets of (lengthy) players’ movement sequences. The study [17] used LCCspm to find frequently occurring (with specified maximal length) closed contiguous movement patterns from sets of movement sequences of five England Rugby Football League (RFL) Super League team matches and closed contiguous match-event patterns from a set of match-event sequences of soccer national teams’ players that participated in men’s FIFA 2018 World cup. The experimental results demonstrated that LCCspm scaled better, ran faster and use lower memory than the Closed Contiguous Sequential Pattern mining (CCSpan) algorithm for mining closed contiguous patterns [17]. More so, LCCspm was able to identify a large number of frequent movement patterns based on a user-defined length of patterns and user-specified support threshold.

The importance of extracting movement patterns [15, 16] from discretized time-series physical data is that it helps to find and reveal groups of movement activities performed by players, unlike the physical, technical, tactical performance indicators that only account for the accumulation of single activities. The extraction of movement patterns to quantify players’ completed movement activities provide granular information about match-based activities with more ease in comparison with the laborious activities involved in expertly coded video analysis. Extracted movement patterns provide the context lacking in accumulated activities reported by physical, technical and tactical indicators [17]. More so, the extracted movement patterns can enhance the specificity of training programmes [16]. However, the investigation of which type of movement pattern (mining algorithm) is best for rugby league players’ movement profiling is yet to be explored.

Since pattern mining algorithms available for player movement profiling can either extract sequential (i.e., consecutive and non-consecutive) or non-sequential movement activities (i.e., movement patterns), this study is motivated to investigate which pattern mining algorithm provides the best type of movement pattern to profile players’ movements in the context of player positions and investigate the similarity of these patterns. For instance, the LCS algorithm of the SMP framework [13, 15] identifies the longest common movement patterns with omissions of performed activities while still retaining the sequential order of movement occurrences and allows repetition of movement activities within a pattern. On the other hand, the LCCspm algorithm [17] identifies the user-defined lengths of frequent and closed contiguous movement patterns where the movement patterns are strictly adjacent and without omission of any movement activities. Another pattern mining algorithm is the AprioriClose algorithm [19] that can discover frequent and closed patterns as movement patterns. Frequent closed movement patterns identify movement patterns with omissions of performed activities, do not allow repetition of movement activities within a pattern and the performed activities are not in sequential order but in lexicographical order.

The context of separating players into playing positions was considered because it is reported to help with talent identification and recruitment [20], customized training [21] as well as players’ performance profiling [22] among others. Two rugby league playing positions (i.e. hookers and wingers) were selected based on their known differences in tactical roles during matches and study [23] also revealed that hookers and wingers share nearly similar average body weight but perform distinct tactical roles and different on-field activities (e.g., 15-m and 40-m sprints). The separation of rugby league players into these two playing positions (i.e., hookers and wingers) based on various types of profiled movement patterns will identify the best type of movement patterns for profiling rugby league players into positions. Additionally, it will assist in the identification of specific movement patterns performed by players within each playing position and how often those movement patterns were performed.

Therefore, this study aimed to identify which pattern mining algorithm extracts the best type of movement patterns to profile players into two rugby league playing positions (i.e., hookers and wingers). To achieve this aim, (1) the set of unique movement patterns per algorithm were quantified for similarity; (2). overlapping movement patterns among the pattern mining algorithms (and playing positions) were investigated; and (3) classification modelling and evaluation was conducted to measure the extent of separation each type of movement pattern can provide and thus identify the best pattern mining algorithm for profiling rugby league players into playing positions.

Method

Overview

An observational repeated measures design was used in which 10Hz GPS data from 50 elite male Rugby Football League players consisting of 12 teams that participated in 319 fixtures within the 2019 and 2020 seasons were collected via wearable sensors (Catapult S5, Catapult Innovations, Melbourne, Australia) worn during matches. Two playing positions were selected hookers (n = 22) and wingers (n = 28). A total of 1,036 total observations (hookers = 500 and wingers = 536) were included which represent players’ movement sequence per fixture. The three types of obtainable movement patterns were extracted from processed GPS data by applying three pattern mining algorithms. Five machine learning classification algorithms were implemented and evaluated for each set of pattern, the evaluation results were used to decide which set gives the best separation for the playing positions. Also, movement patterns extracted by each algorithm were analysed for similarity and overlaps. The experimental framework depicted in Fig 1 illustrates an overview of data collection, processing and analysis methods. This study received the approval of the University Ethics Committee and obtained written informed consent from the organisation representing all participants. The GPS data were obtained (from clubs through the Rugby Football League) and are not available publicly.

Data and processing

The method for generating movement sequences from global positioning systems data as published by [15] was followed to obtain sets of movement sequences. An example of GPS data and the result of the discretization is depicted in Table 1. Micro-sensor units including global positioning systems sampling at 10Hz [24] captured 50 elite male Rugby Football League players’ physical demands (i.e., acceleration and velocity) and tracking data (i.e., latitude and longitude). GPS data of players with no recorded velocity, acceleration, tracking values were excluded. The velocity, acceleration and turning angle were extracted from GPS data and were further discretized using thresholds presented in Table 2 as published by [15].

thumbnail
Table 1. Example of processing GPS data into movement sequence.

https://doi.org/10.1371/journal.pone.0301608.t001

thumbnail
Table 2. The movement descriptors and threshold assignment values.

https://doi.org/10.1371/journal.pone.0301608.t002

Concatenation of velocity, acceleration and turning angle descriptors created the movement unit every 0.1s (10Hz) (Table 1, column “MovementUnit”) which were assigned to a movement unit character (Table 1, column “A”). An example of movement sequence from Table 1 is the sequential concatenation of the movement unit characters in column “A” i.e., “ijfeikhddb”. Inactive periods as proposed by [15] were filtered out from each player’s continuous movement sequence resulting in obtaining a set of discrete movement sequences for each player per match. A total of 1,036 sets of discrete movement sequences were created representing all players’ movement sequences per fixture (i.e., player-per-fixture granularity). Movement patterns were extracted from each set of discrete movement sequences and represented the frequent recurring movement patterns for each player within a match.

Pattern mining algorithms

To extract frequent movement patterns for the considered playing positions, three pattern mining algorithms were used: (1) LCS algorithm: existing and used within SMP framework [15], (2) LCCspm: a new algorithm, outperforming other closed-contiguous pattern mining algorithms [17] and (3) AprioriClose: a frequent closed pattern mining algorithm [19]. The LCS algorithm of the “SMP” has no parameter. The LCCspm algorithm [17] has two parameters: support (determines patterns’ frequency) and length (determines patterns’ maximum length). The AprioriClose [19] has a support parameter.

The support parameter of both the LCCspm and Apriori close algorithms was set to 5 percent to extract a large number of frequent movement patterns, because a high support threshold will identify few frequent patterns [25], from the player’s discrete movement sequences (i.e., active periods within a match). LCCspm length parameter was set to 20 (i.e. 2 seconds time-frame) to ensure more and longer patterns are extracted. Meanwhile, the movement patterns extracted by AprioriClose and LCS algorithms were later filtered to exclude patterns containing more than 20 items. The studies [12, 17] used similar parameter values to enable the extraction of large and longer-length frequent patterns from rugby union and rugby league data.

Following the extraction of sets of user-defined length and frequent movement patterns from each set of 1,036 discrete movement sequences, a set of unique movement patterns was derived by computing the mathematical union of all sets of extracted movement patterns, per pattern mining algorithm. The sets of unique movement patterns (per algorithm) were subjected to further analysis discussed in the section below.

Selection analysis of (movement) pattern mining algorithms

Three steps were taken to identify the best pattern mining algorithm to extract movement patterns for profiling rugby league players into playing positions. First, similarity analysis of the different sets of unique movement patterns obtained using each pattern mining algorithm was considered. The analysis of overlap movement patterns among pattern mining algorithms as well as within each playing position was also carried out. Lastly, the separation of players into playing positions (hookers and wingers) based on the different sets of extracted movement patterns was conducted and measured.

Jaccard analysis.

Jaccard similarity measure [26] enables exact matching of patterns between two sets and was used to quantify the similarity among the groups of extracted patterns. It is computed as: (1)

Jaccard similarity measure values ranged from 0 to 1, where 0 indicates no similarity and 1 indicates an exact match. The similarity among the sets of unique movement patterns (i.e., the union of all extracted movement patterns for all player-match levels) identified by each pattern mining algorithm was quantified by the Jaccard similarity measure.

Overlap movement patterns.

Overlap movement patterns between two sets of movement patterns were identified using the exact matching method. Overlapping unique movement patterns between pairs of pattern mining algorithms were identified. Each pair’s top and bottom patterns were checked for overlap by comparing the most frequent-50 and least frequent-50 patterns from each algorithm, representing one-third of the lowest set of extracted movement patterns, and then visualize. This was carried out to identify where the overlapping movement patterns are located. A further analysis was carried out on the identified overlapped movement patterns to discover those patterns performed by players of each playing position. Also, the overlapped movement patterns between playing positions per pattern mining algorithm were explored.

Separation of players into playing positions.

The separation of players into playing positions was achieved through machine learning classification modelling. Classification algorithms are usually fitted on a dataset (consisting of independent and dependent variables) to develop a model that can correctly label previously unseen observation(s) into its group (i.e. dependent variable values). This study generated classification input datasets such that the set of unique movement patterns derived from the movement patterns extracted per pattern mining algorithm were the independent variables and each observation represents players per match. The values of each observation are either 1 if players performed the unique movement patterns within fixtures or 0 if otherwise. The value of the dependent variable is either hooker or winger depending on the players’ playing position.

Five machine-learning classification algorithms were considered for classification modelling. Decision Tree [27], Gaussian Naive Bayes [28], Random Forest [29], Logistic Regression [30] and Multi-Layered Perceptron (MLP) [31] algorithms were selected because they have different learning methods and fit distinct models. The classification algorithms were implemented in the scikit-learn (version 1.1.3) python module [32]. The parameters for fitting each classification algorithm are presented in Table 3. The classification models were fitted via the k-fold cross-validation technique [33]. The cross-validation n splits parameter was set to 10, random state was set to 10 and the shuffle parameter was set to “True”. The 10-fold cross-validation technique divides the data into ten chunks in ten iterations and uses nine chunks for training and one separate chunk for testing in each iteration. The cross-validated models’ performances were evaluated by aggregated accuracy, precision, recall and f1-score metrics.

Additionally, the feature importance scores [30] of movement patterns used by the best classification models were analysed. This was done to identify top-20 important movement patterns (per pattern mining algorithm) used for classification model development. The source code for computing the Jaccard similarity of unique movement patterns, visualization of overlapped frequent patterns between the pattern mining algorithms and playing positions, machine learning classification model development and evaluation, and feature importance scores analyses are all available publicly and online in a GitHub repository [34].

Results

Jaccard analysis

LCCspm algorithm extracted a unique set of 3,881 frequent closed contiguous movement patterns. The LCS algorithm (of the “SMP” framework) extracted a unique set of 2,513 frequent longest common subsequence movement patterns. The AprioriClose algorithm extracted a unique set of 155 frequent closed itemsets movement patterns.

Table 4 reports the results of Jaccard similarity analysis to quantify the similarity in the extracted unique sets of movement patterns between pattern mining algorithms. Overall, Jaccard scores ranged from 0.008 to 0.19 suggesting limited similarity among the movement patterns extracted by the three algorithms.

thumbnail
Table 4. Jaccard similarity scores of extracted movement patterns per algorithm.

https://doi.org/10.1371/journal.pone.0301608.t004

Overlap movement patterns

Overlapping movement patterns between algorithms.

LCCspm vs. LCS. 1022 unique movement pattern overlapped between LCCspm (26% of total) and LCS (40% of total) algorithms. In the most frequent-50 extracted movement patterns for each pattern mining algorithm, 32 movement patterns overlapped between LCCspm and LCS algorithms. Fig 2 highlights the visualisation of the overlapped movement patterns based on the frequency count of LCCspm algorithm between LCCspm and LCS algorithms. The movement patterns “VU” (sprint acceleration backwards and sprint acceleration with large-change of direction), “uuv” (jog acceleration straight[x2] and jog acceleration with acute-change of direction), “ji” (walk acceleration with acute-change of direction and walk acceleration straight) and “eef” (walk neutral straight [x2] and walk neutral acute-change of direction) were among the extracted most frequent on-field activities that overlapped between the LCCspm and LCS movement patterns (Fig 2). However, no movement patterns overlapped in the least frequent-50 movement patterns extracted by both LCCspm and LCS algorithms.

thumbnail
Fig 2. Overlapped movement patterns between the most frequent-50 LCCspm and LCS patterns.

https://doi.org/10.1371/journal.pone.0301608.g002

LCCspm vs. AprioriClose. 32 movement patterns overlapped between the unique sets of movement patterns identified by LCCspm (0.83% of total) and AprioriClose (20.65% of total) algorithms. In the most frequent-50 extracted movement patterns for each pattern mining algorithm, 3 movement patterns overlapped between AprioriClose and LCCspm algorithms. Fig 3 highlights the visualization of the overlapped movement patterns based on the frequency count of AprioriClose algorithm. The movement pattern “uv” (jog acceleration straight and jog acceleration with acute-change of direction is the most frequent followed by “ij” (walk acceleration straight and walk acceleration with acute-change of direction (Fig 3). However, no movement patterns overlapped in the least frequent-50 movement patterns extracted by both AprioriClose and LCCspm algorithms.

thumbnail
Fig 3. Overlapped movement patterns between the most frequent-50 AprioriClose and LCCspm patterns.

https://doi.org/10.1371/journal.pone.0301608.g003

LCS vs. AprioriClose. 25 movement patterns overlapped between the LCS (1% of total) and AprioriClose (16.13% of total) algorithms. In the most frequent-50 extracted movement patterns for each pattern mining algorithm, 3 movement patterns overlapped between LCS and AprioriClose algorithms. Fig 4 visualises the overlapped movement patterns based on the frequency count of the LCS algorithm. The movement pattern “ef” (walk neutral straight and walk neutral with acute-change of direction) was the second most frequent overlapping pattern (Fig 4). However, no movement patterns overlapped in the least frequent-50 movement patterns extracted by both LCS and AprioriClose algorithms.

thumbnail
Fig 4. Overlapped movement patterns between the most frequent-50 LCS and AprioriClose patterns.

https://doi.org/10.1371/journal.pone.0301608.g004

Overlapped frequent-50 movement patterns between positions.

The further analysis of the overlapped movement patterns between the most frequent-50 frequent LCCspm and LCS patterns (Fig 2) by playing positions revealed that hookers performed twenty-nine (29) overlapped patterns (Fig 5) and wingers performed thirty-one (31) overlapped patterns (Fig 6).

thumbnail
Fig 5. LCCspm and LCS overlap movement patterns performed by hookers.

https://doi.org/10.1371/journal.pone.0301608.g005

thumbnail
Fig 6. LCCspm and LCS overlap movement patterns performed by wingers.

https://doi.org/10.1371/journal.pone.0301608.g006

The movement patterns “ji” denoted as walk acceleration acute-change and walk acceleration straight and “fee” denoted as walk neutral acute-change and [walk neutral straight] x 2 were mainly performed by wingers while movement patterns “uuuuv” denoted as [jog acceleration straight] x 4 and jog acceleration acute-change, and “mn” jog deceleration straight and jog deceleration acute-change were mainly performed by hookers among other overlapped movement patterns.

All overlapped movement patterns (“ef, uv and ij”) between the most frequent-50 frequent LCCspm and AprioriClose patterns (Fig 3) were performed by hookers and wingers. Similarly, both hookers and wingers performed the overlapped movement patterns (“ef, uv and a”) between the most frequent-50 frequent movement patterns extracted by LCS and AprioriClose algorithms (Fig 4).

Overlapped movement patterns between positions per algorithm.

LCCspm. 2,282 and 3,174 sets of frequent closed contiguous movement patterns were identified by LCCspm to profile hookers and wingers respectively. A total of 1,575 movement patterns overlapped between both playing positions (visualized in Fig 7 based on how often they were performed by hookers and Fig 8 based on how often they were performed by wingers) as extracted by LCCspm algorithm.

thumbnail
Fig 7. LCCspm overlapped movement patterns as performed by hookers.

https://doi.org/10.1371/journal.pone.0301608.g007

thumbnail
Fig 8. LCCspm overlapped movement patterns as performed by wingers.

https://doi.org/10.1371/journal.pone.0301608.g008

Also, LCCspm profiled 707 closed contiguous movement patterns uniquely performed by hookers and another set of 1599 movement patterns performed only by wingers.

LCS. 1,534 and 1,632 sets of longest common movement patterns were identified by the LCS algorithm of the “SMP” framework to profile hookers and wingers respectively. A total of 653 overlapped movement patterns were identified between hookers and wingers (visualized in Fig 9 based on often they were performed by hookers and Fig 10 based on how often they were performed by wingers) as extracted by the LCS algorithm.

thumbnail
Fig 9. LCS overlapped movement patterns as performed by hookers.

https://doi.org/10.1371/journal.pone.0301608.g009

thumbnail
Fig 10. LCS overlapped movement patterns as performed by wingers.

https://doi.org/10.1371/journal.pone.0301608.g010

The LCS algorithm of the “SMP” framework profiled 818 longest common movement patterns performed only by hookers and another set of 979 movement patterns performed only by wingers.

AprioriClose. 142 and 136 sets of non-sequential movement patterns were identified by the AprioriClose algorithm to profile hookers and wingers respectively. A total of 123 overlapped movement patterns were identified between both hookers and wingers (visualized in Fig 11 based on often they were performed by hookers and Fig 12 based on how often they were performed by wingers) as extracted by the AprioriClose algorithm.

thumbnail
Fig 11. AprioriClose overlapped movement patterns as performed by hookers.

https://doi.org/10.1371/journal.pone.0301608.g011

thumbnail
Fig 12. AprioriClose overlapped movement patterns as performed by wingers.

https://doi.org/10.1371/journal.pone.0301608.g012

AprioriClose algorithm profiled a total of 19 non-sequential movement patterns performed only by hookers and another set of 13 non-sequential movement patterns performed only by wingers.

Separation of players into playing positions

Three datasets were generated for classification modelling. The first dataset (representing LCCspm algorithm) contained 3,881 independent variables. The second dataset (representing LCS algorithm) contained 2,849 independent variables. The third dataset (representing AprioriClose algorithm) contained 155 independent variables.

The accuracy of the selected five (5) machine learning classification algorithms after modelling on all three datasets are reported in Table 5. All classifiers fitted on the LCCspm dataset achieved the highest accuracies when compared to their counterparts fitted on the LCS and AprioriClose datasets. For example, the Decision Tree classifier achieved an accuracy of 82.83% on the LCCspm dataset compared to 56.36% accuracy on the LCS dataset and 73.56% accuracy on the AprioriClose dataset.

thumbnail
Table 5. Classifiers’ separation accuracies using sets of extracted movement patterns.

https://doi.org/10.1371/journal.pone.0301608.t005

The MLP classifier fitted on the dataset having LCCspm movement patterns as its independent variables had the highest individual accuracy of 91.02% among all other classifiers fitted on any of the three datasets. MLP classifier achieved 61.78% and 80.9% accuracies on the LCS and AprioriClose datasets respectively. Meanwhile, the accuracy of the Gaussian Naive Bayes classifiers is the lowest among other classification algorithms, across all algorithms.

Consequently, the LCCspm algorithm used for mining closed contiguous movement patterns provided the most data-driven insights for separating players into playing positions based on the classification models’ performances. The AprioriClose algorithm used for mining closed itemsets movement patterns provided the second-best data-driven insights (among three selected pattern mining algorithms) to separate players into playing positions. Meanwhile, the LCS algorithm of the “SMP” framework ranked provided the least data-driven insights to separate players into playing positions.

From Table 5, the Logistic Regression algorithm fitted two of the three most accurate classification models per pattern mining algorithm. It fitted the most accurate classification models on the AprioriClose and LCS datasets, the accuracy of 82.95% and 65.83% respectively. Meanwhile, it fitted the second-best accurate classification model of 89.77% accuracy on the LCCspm dataset. As such, further analysis for the top-20 feature importance scores of the movement patterns used by the Logistic regression models per pattern mining algorithm was conducted and reported in Table 6.

thumbnail
Table 6. Logistic Regression Top 20 important patterns and scores per algorithm.

(a) Top 20 APR Patterns Importance Score. (b) Top 20 SMP Patterns Importance Score. (c) Top 20 LCC Patterns Importance Score.

https://doi.org/10.1371/journal.pone.0301608.t006

Discussion

This study is the first to identify which pattern mining algorithms (LCCspm, LCS, AprioriClose) provide the best capable set of movement patterns to classify rugby league hookers and wingers’ playing positions. A secondary aim was to understand the similarity of extracted movement patterns among all three algorithms and between the two playing positions (hookers and wingers). Hookers’ and wingers’ playing positions were chosen as the criterion positions to compare algorithms given their unique tactical and physical roles in professional rugby league. Overall, the findings suggest that the LCCspm pattern mining algorithm provided the best set of movement patterns for separating hookers and wingers and that there is a lack of similarity in the extracted movement patterns between algorithms.

The classification results of this study revealed the extent of separating players into playing positions, based on each set of frequent movement patterns, extracted from the same sets of movement sequences, and under the same parameter condition. Table 5 shows that the separation of elite rugby players into playing positions (i.e., hookers and wingers) based on their frequent movement patterns is best done using their extracted closed contiguous movement patterns, profiled by LCCspm algorithm. The LCCspm closed contiguous movement pattern using the Multi-Layered Perceptron classifier performed best to classify hookers and wingers in professional rugby league, with an overall accuracy of 91.02%. The AprioriClose closed itemsets (non-consecutive) movement patterns offered a better separation accuracy than the longest common subsequence movement patterns of the LCS algorithm. AprioriClose movement patterns provided a decent separation (through Logistic Regression accuracy of 82.05%). Its lowered accuracy can be attributed to the nature of its movement patterns being non-consecutive, non-sequential and without repeated movement activity. Also, the results of this study indicate player movement profiling using the LCCspm algorithm will discover more numbers of movement patterns for profiling players from the same sets of movement sequences than AprioriClose and LCS algorithms. This implies there are more discoverable consecutive movement patterns than non-consecutive and non-sequential movement patterns. More so, Jaccard similarity scores (1 being full similarity) ranged from 0.008 to 0.19 among movement patterns algorithms (Table 4), suggesting a lack of similarity in the extracted patterns overall. The lack of similarity among the sets of movement patterns can be attributed to the pattern mining algorithms as they extract consecutive, non-consecutive, and non-sequential movement patterns respectively. LCCspm and LCS sets of movement patterns shared a relatively higher similarity because both algorithms extract some form of sequential movement patterns as opposed to AprioriClose non-sequential patterns. Based on these results, the LCCspm algorithm is justified and identified as the best for profiling movement patterns of rugby league players into hookers and wingers playing positions.

The application of LCCspm algorithm to profile the movement of hookers and wingers revealed wingers performed 892 movement patterns more than hookers. This suggests a more variable movement profile of wingers than hookers. There were overlapped movement patterns between hookers and wingers (Fig 6), but the frequency at which the movement patterns were performed differs by playing positions.

Overlapped movement patterns with a combination of movement units u and v which indicates accelerated jogs with some acute change in direction or on straights were mostly performed by hookers (Fig 7). Wingers on the other hand performed overlapped movement patterns that included accelerated walks with some acute direction changes as indicated by movement units j and i (Fig 8).

This study also identified groups of movement activities performed uniquely by hookers and wingers. For example, the LCCspm algorithm identified hookers as the only positional group that performed the sequential movement pattern “GGGGGGGGGGGGSSSSSSS” (Run-Acceleration-Straight [x12] and Sprint-Acceleration-Straight [x7]). Equally, only wingers completed the sequential movement pattern of “TSSTSTTSST” (Sprint-Acceleration-Acute change, Sprint-Acceleration-Straight [x2], Sprint-Acceleration- Acute change, Sprint-Acceleration-Straight, Sprint-Acceleration-Acute change [x2], Sprint-Acceleration-Straight [x2], Sprint-Acceleration-Acute change). It is well established that wingers complete greater high-speed (>5m.s1) activity during matches than hookers (wingers: 626m vs. hookers: 285m) [35], although these differences are less pronounced with global acceleration-based measures (e.g., average acceleration over a period of time). These differences are likely due to the vastly different tactical roles of wingers (e.g., returning kicks in attack leading to open space to move at high speed) vs. hookers (e.g., repositioning behind the play the ball to distribute possession). Applying pattern mining algorithms to uncover the sequential nature of the occurrences of activity enables the better capability to classify positional groups and aid in enhanced training specificity.

It is also noteworthy that the most important variables used by the Logistic Regression classification model are mostly not part of the most frequent-50 overlapping patterns profiled by the LCCspm algorithm. This indicates that the not-too-frequent movement patterns and those uniquely performed by players of each playing position provided insights used for players’ playing position separation. The twenty most important LCCspm movement patterns used for fitting the Logistic Regression classifier consist of 2 to 6-length on-field movement activities (Table 6C). The second most important variable “SS” (denoted as [sprint acceleration straight] x2) and ninth most important variable “HH” (denoted as [run acceleration acute-change] x2) discovered by the LCCspm pattern mining algorithm are the only patterns to include on-field activities “S” and “H” in its set of most important movement patterns across all three pattern mining algorithms. The nineteenth important movement pattern “fe” in Table 6C is the only pattern present in the most frequent LCCspm and LCS overlapped movement pattern in Fig 2). Given that the movement patterns extracted by the LCCspm algorithm achieved the highest accuracy for player position classification compared to other patterns, it is concluded that LCCspm and closed contiguous movement patterns are optimal for profiling rugby league players. In practice, sports performance analysts and or data scientists are encouraged to extract players’ closed contiguous movement patterns when conducting player profiling analysis. The consecutive sequence of performed activities by players are vital for profiling and distinguishing players of different playing positions. LCCspm closed contiguous patterns effectively and efficiently captures the consecutiveness of players’ movement.

Conclusions and future works

This study is the first to identify the best pattern mining algorithm and its set of movement patterns for player movement profiling. Closed contiguous movement patterns are the best to separate rugby league hookers and wingers into playing positions because all classification models were most accurate on the dataset generated with LCCspm unique movement patterns as independent variables. Therefore, mining closed contiguous movement patterns for profiling the on-field activities of players is recommended. LCCspm and LCS algorithms extracted movement patterns that shared some form of similarity while AprioriClose movement patterns shared no similarity because itemset does not consider the order of item appearance. Given that one of the cores of sports analytics is the ability to predict [36] sports outcomes or groups, the LCCspm algorithm for mining movement patterns for player position classification is recommended as a useful advanced analytics (predictive) tool for sports analytics. Additionally, this study’s method can be replicated for other use cases, such as the extraction of match event patterns. In the future, the identification of the minimum number of patterns to ably separate between groups will be considered. Future consideration will also be given to using the patterns extracted from SMP framework condensed sequences and those extracted through the “LCCspm” algorithm to understand the locomotive and match demand on players and teams.

Acknowledgments

The authors would like to acknowledge The Rugby Football League (RFL) for the access to GPS data.

References

  1. 1. Ray T, Choi J, Reeder J, Lee SP, Aranyosi AJ, Ghaffari R, et al. Soft, skin-interfaced wearable systems for sports science and analytics. Current Opinion in Biomedical Engineering. 2019;9:47–56.
  2. 2. Papic C, Sanders RH, Naemi R, Elipot M, Andersen J. Improving data acquisition speed and accuracy in sport using neural networks. Journal of Sports Sciences. 2021;39(5):513–522. pmid:33140693
  3. 3. Colomer CM, Pyne DB, Mooney M, McKune A, Serpell BG. Performance analysis in rugby Union: a critical systematic review. Sports Medicine-Open. 2020;6(1):1–15.
  4. 4. O’Donoghue P. An introduction to performance analysis of sport. Routledge; 2014.
  5. 5. Chambers R, Gabbett TJ, Cole MH, Beard A. The use of wearable microsensors to quantify sport-specific movements. Sports medicine. 2015;45(7):1065–1081. pmid:25834998
  6. 6. Bradley PS, Ade JD. Are current physical match performance metrics in elite soccer fit for purpose or is the adoption of an integrated approach needed? International Journal of Sports Physiology and Performance. 2018;13(5):656–664. pmid:29345547
  7. 7. Hughes M, Franks IM. The essentials of performance analysis. London: E & FN Spon. 2008;.
  8. 8. Gabbett T, Jenkins D, Abernethy B. Physical collisions and injury during professional rugby league skills training. Journal of Science and Medicine in Sport. 2010;13(6):578–583. pmid:20483661
  9. 9. Whitehead S, Till K, Jones B, Beggs C, Dalton-Barron N, Weaving D. The use of technical-tactical and physical performance indicators to classify between levels of match-play in elite rugby league. Science and Medicine in Football. 2021;5(2):121–127. pmid:35077338
  10. 10. Decroos T, Van Haaren J, Davis J. Automatic discovery of tactics in spatio-temporal soccer match data. In: Proceedings of the 24th acm sigkdd international conference on knowledge discovery & data mining; 2018. p. 223–232.
  11. 11. Hrovat G, Fister I Jr, Yermak K, Stiglic G, Fister I. Interestingness measure for mining sequential patterns in sports. Journal of Intelligent & Fuzzy Systems. 2015;29(5):1981–1994.
  12. 12. Bunker R, Fujii K, Hanada H, Takeuchi I. Supervised sequential pattern mining of event sequences in sport to identify important patterns of play: an application to rugby union. PloS one. 2021;16(9):e0256329. pmid:34555042
  13. 13. Sweeting AJ, Aughey RJ, Cormack SJ, Morgan S. Discovering frequently recurring movement sequences in team-sport athlete spatiotemporal data. Journal of Sports Sciences. 2017;35(24):2439–2445. pmid:28282752
  14. 14. Kuo S, Cross GR. An improved algorithm to find the length of the longest common subsequence of two strings. In: ACM Sigir Forum. vol. 23. ACM New York, NY, USA; 1989. p. 89–99.
  15. 15. White R, Palczewska A, Weaving D, Collins N, Jones B. Sequential movement pattern-mining (SMP) in field-based team-sport: A framework for quantifying spatiotemporal data and improve training specificity? Journal of Sports Sciences. 2021; p. 1–10. pmid:34565294
  16. 16. Collins N, White R, Palczewska A, Weaving D, Dalton-Barron N, Jones B. Moving beyond velocity derivatives; using global positioning system data to extract sequential movement patterns at different levels of rugby league match-play. European Journal of Sport Science. 2022; p. 1–9.
  17. 17. Adeyemo VE, Palczewska A, Jones B. LCCspm: l-Length Closed Contiguous Sequential Patterns Mining Algorithm to Find Frequent Athlete Movement Patterns from GPS. In: 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE; 2021. p. 455–460.
  18. 18. Zhang J, Wang Y, Yang D. CCSpan: Mining closed contiguous sequential patterns. Knowledge-Based Systems. 2015;89:1–13.
  19. 19. Pasquier N, Bastide Y, Taouil R, Lakhal L. Discovering frequent closed itemsets for association rules. In: International Conference on Database Theory. Springer; 1999. p. 398–416.
  20. 20. Kempton T, Sirotic AC, Coutts AJ. A comparison of physical and technical performance profiles between successful and less-successful professional rugby league teams. International journal of sports physiology and performance. 2017;12(4):520–526. pmid:27617478
  21. 21. Woods CT, Leicht AS, Jones B, Till K. Game-play characteristics differ between the European Super League and the National Rugby League: implications for coaching and talent recruitment. International Journal of Sports Science & Coaching. 2018;13(6):1171–1176.
  22. 22. Wedding C, Woods C, Sinclair W, Gomez M, Leicht A. Examining the evolution and classification of player position using performance indicators in the National Rugby League during the 2015–2019 seasons. Journal of science and medicine in sport. 2020;23(9):891–896. pmid:32146082
  23. 23. Meir R, Newton R, Curtis E, Fardell M, Butler B. Physical fitness qualities of professional rugby league football players: determination of positional differences. The Journal of Strength & Conditioning Research. 2001;15(4):450–458. pmid:11726256
  24. 24. Rennie G, Hart B, Dalton-Barron N, Weaving D, Williams S, Jones B. Longitudinal changes in Super League match locomotor and event characteristics: A league-wide investigation over three seasons in rugby league. Plos one. 2021;16(12):e0260711. pmid:34855846
  25. 25. Wu J, Liu D, Guo Z, Xu Q, Wu Y. TacticFlow: Visual analytics of ever-changing tactics in racket sports. IEEE Transactions on Visualization and Computer Graphics. 2021;28(1):835–845. pmid:34587062
  26. 26. Wang Z, Long C, Cong G, Ju C. Effective and efficient sports play retrieval with deep representation learning. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; 2019. p. 499–509.
  27. 27. Balogun AO, Basri S, Mahamad S, Abdulkadir SJ, Capretz LF, Imam AA, et al. Empirical analysis of rank aggregation-based multi-filter feature selection methods in software defect prediction. Electronics. 2021;10(2):179.
  28. 28. Balogun A, Balogun A, Sadiku P, Adeyemo V. Heterogeneous ensemble models for generic classification. Scientific Annals of Computer Science. 2017;15(1):92–98.
  29. 29. Elijah AV, Abdullah A, JhanJhi N, Supramaniam M, Abdullateef B. Ensemble and deep-learning methods for two-class and multi-attack anomaly intrusion detection: an empirical study. International Journal of Advanced Computer Science and Applications. 2019;10(9).
  30. 30. Saarela M, Jauhiainen S. Comparison of feature importance measures as explanations for classification models. SN Applied Sciences. 2021;3(2):1–12.
  31. 31. Mabayoje M, Balogun A, Ameen A, Adeyemo V. Influence of Feature Selection on Multi-Layer Perceptron Classifier for Intrusion Detection System. Computing, Information System Development Informatics & Allied Research Journals. 2016;7(4):87–94.
  32. 32. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research. 2011;12:2825–2830.
  33. 33. Adeyemo VE, Balogun AO, Mojeed HA, Akande NO, Adewole KS. Ensemble-based logistic model trees for website phishing detection. In: International Conference on Advances in Cyber Security. Springer; 2020. p. 627–641.
  34. 34. Adeyemo VE. l-Length Closed Contiguous Sequential Pattern Mining Algorithm; 2021. https://github.com/arhvel/MovementPatternsForClassification.
  35. 35. Dalton-Barron N, Whitehead S, Roe G, Cummins C, Beggs C, Jones B. Time to embrace the complexity when analysing GPS data? A systematic review of contextual factors on match running in rugby league. Journal of sports sciences. 2020;38(10):1161–1180. pmid:32295471
  36. 36. Watanabe NM, Shapiro S, Drayer J. Big data and analytics in sport management. Journal of Sport Management. 2021;35(3):197–202.