An Automated Summarization Assessment Algorithm for Identifying Summarizing Strategies

Background Summarization is a process to select important information from a source text. Summarizing strategies are the core cognitive processes in summarization activity. Since summarization can be important as a tool to improve comprehension, it has attracted interest of teachers for teaching summary writing through direct instruction. To do this, they need to review and assess the students' summaries and these tasks are very time-consuming. Thus, a computer-assisted assessment can be used to help teachers to conduct this task more effectively. Design/Results This paper aims to propose an algorithm based on the combination of semantic relations between words and their syntactic composition to identify summarizing strategies employed by students in summary writing. An innovative aspect of our algorithm lies in its ability to identify summarizing strategies at the syntactic and semantic levels. The efficiency of the algorithm is measured in terms of Precision, Recall and F-measure. We then implemented the algorithm for the automated summarization assessment system that can be used to identify the summarizing strategies used by students in summary writing.


Introduction
Reading skills are essential for success in society. Reading affects different aspects in our life, especially in school. The aim of reading is to elicit meaning from the written text. A lack of capacity in this area may affect the comprehension ability. Comprehension involves inferential and evaluative thinking, not just a reproduction of the author's words. It can be taught and improved through teaching students during their learning process. it helps the summarizer to generate an appropriate summary. Different researchers use different terminology to describe the summarizing strategies, which are fundamentally a similar process. These authors [11,12,13,14,15] suggest several summarizing strategies involved in producing appropriate summaries. These strategies are explained in detail as follows:

Deletion
To produce a summary sentence, a deletion strategy is used to remove unnecessary information in the sentence of the source text. Unnecessary information includes trivial details about the topics such as examples and scenarios or redundant information containing the rewording of some of the important information.

Sentence Combination
To produce a summary sentence, sentence combination is used to combine two or more sentences/phrases from the source text. In other words, phrases from more than one sentence are merged into a summary sentence. These sentences are usually combined using conjunction words, such as for, but, and, after, since, and before.

Generalization
The generalization rule replaces a general term for a list. There are two kinds of replacement. One is the replacement of a general word for a list of similar items, e.g. 'pineapple, banana, star fruit and pear' can be replaced by 'fruits'. The other one is the replacement of a general word for a list of similar actions, e.g. the sentences: 'Yang eats a pear', and 'Chen eats a banana', can be replaced by: 'The boys eat fruits'.

Paraphrasing
In the paraphrasing process, a word in the source sentence is replaced with a synonymous word (a different word with the same meaning) in the summary sentence.

Topic Sentence Selection (TSS)
To produce a summary sentence, the topic sentence selection strategy is used to extract an important sentence from the original text to represent the main idea of a paragraph. There are four methods to identify the important sentence: Key method. The most frequent words in a text are the most representative of its content, thus a segment of text containing them is more relevant [16]. Word frequency is a method used to identify keywords that are non-stop-words, which occur frequently in a document [17,18]. According to [19], sentences having keywords or content words have a greater chance of being included in the summary.
Location method. Important sentences are normally at the beginning and the end of a document or paragraphs, as well as immediately below section headings [20,21]. Paragraphs at the beginning and end of a document are more likely to contain material that is useful for a summary, especially the first and last sentences of the paragraphs [19,22].
Title method. Important sentences normally contain words that are presented in the title and major headings of a document [20]. Thus, words occurring in the title are good candidates for document specific concepts [23].
Cue method. Cue phrases are words and phrases that directly signal the structure of a discourse. They are also known as discourse markers, discourse connectives, and discourse particles in computational linguistics [24]. Cue phrases, such as "conclusion" or "in particular" are often followed by important information. Thus, sentences that contain one or more of these cue phrases are considered more important than sentences without cue phrases [25]. These cue words are context dependent. However, due to the existence of different types of text, such as scientific articles and newspaper articles, it is difficult to collect these cue words as a unique list. Hence, since discourse markers can be used as an indicator of important content in a text and are more generic [26], we provide the list using discourse markers. These discourse markers are collected from the previous works [16]. Table 1 shows some of these cue words that may appear in a sentence.

Invention
A summary sentence is created using invention rule if one makes explicit topic sentences by using his or her own words to state the implicit main idea of the paragraphs. Thus, the invention rule requires that students "add information rather than just delete, select or manipulate sentences already provided for them" [13,15].

Copy-verbatim
In the copy-verbatim process, a summary sentence is produced from the source sentence without any changes. This strategy is not part of the summarizing strategies but it is used by students.
In this work, we consider five basic summarizing strategies-sentence combination, deletion, paraphrase, copy-verbatim, topic sentence selection-and four methods-key method, title method, cue method and location method. Since summarizing strategies are general rules and quite ambiguous for the computer to process; hence, we need to transform these general rules into a set of comprehensible rules for processing. For example, an explanation of deletion strategy is as follows:

Rule Process
Deletion remove unnecessary information from the original text: The term "unnecessary information" in the example above is very subjective and quite ambiguous for the computer to process and execute. To develop a system that can identify summarizing strategies in summary writing automatically, we need to produce more measurable and precise rules for each summarizing strategy. For this purpose, an analysis has been done on human-written summary. The results of the analysis are used to formulate a more detail and precise rules on how to identify each strategy. In this study, we used the same dataset as described in section "Experimental evaluations". Two experts: a) An English teacher with good reading skills and understanding ability in the English language as well as experience in teaching summary writing; b) A lecturer with experience in using the skills in their teaching method, were asked to identify the summarizing strategies used by summarizer in each summary sentence. The human expert disassembled the summary text into a number of sentences, and then compared each sentence of summary text with all sentences from the original text to determine whether two sentences are semantically identical or not. Semantically identical sentences include same information or talk about similar idea. However, the sentence(s) from the original text that is/are semantically equivalent with the current sentence of summary text can be considered as the source sentence(s) that has/have been associated to produce the current summary sentence. Given two sentences, the summary sentence and the source sentence, the experts determine the summarizing strategies employed by summarizer to produce the current sentence of summary text. Table 2 displays an overview of the analysis that we have conducted on summary text. It illustrates the results achieved over the summaries. In particular, for each summary text, the number of each sentence of summary text is shown in the first column; while the second column presents the summary sentences, the third column displays the most relevant sentences which are extracted from the source text and have been used to produce summary sentences; and finally the last column shows the summarizing strategies that have been employed to produce each summary sentence. This study aims to determine most relevant sentences from the original text for each summary sentence and identify the summarizing strategies used to construct the summary sentence.
Each strategy must have a unique or specific characteristic which can be used to identify the strategies. The steps to identify the characteristics of each strategy are explained as follows.

Deletion strategy
The main role of deletion strategy is to remove unimportant words or phrase from a sentence. It aims to delete phrase from the sentence if it is irrelevant to the main idea. To identify the deletion strategy, we use the following four rules: Sentence length. It indicates the number of words in a sentence. The main task of deletion strategy is to eliminate unimportant information such as stop-words, explanations and Length ðS s Þ is less than Length ðO s Þ: ð1Þ Even though the deletion strategy removes some phrases from a sentence, it should keep the meaning of original sentence in new produced sentence. Hence, two additional rules should be considered. The following rules were also considered in order to identify deletion strategy: Word overlapping. It considers the set of words (only non-stop words) occurring in both sentences. Given two sentences, let S summary = {W 1 ,W 2 , Á Á ÁW N } be a sentence of summary text, where N is the number of words in the sentence S summary , S original = {W 1 ,W 2 , Á Á ÁW M } is a sentence of original text, where M is the number of words in sentence S original . However, for each word from sentence S summary , the same word or the synonym word must be restated in sentence S original . Hence, the following statement can be made: Where, W is a word of S summary and W o can be either a similar word or synonymous word. Syntactic composition. It checks whether the syntactic composition of two sentences is equal. For example, given two sentences: Suppose we select three words from sentence S summary ; A, B and C. If the word B occurred after A and the word C occurred after B, this composition should occur in sentence S original . It means the word B must appear after word A and the word C must appear after word B in the S original sentence. Thus, the following statement can be made: Where, A S s B: B appears after A in sentence S summary . B S s C: C appears after B in sentence S summary . A S o B: B appears after A in sentence S original . B S o C: C appears after B in sentence S original . Besides these rules for identifying the deletion strategy, in this study we also consider the similarity measure between two sentences as a rule to identify the deletion strategy. The similarity measure between two sentences is computed based on the semantic similarity and syntactic similarity between two sentences. We used Eqs (16) and (17) to calculate similarity measure between two sentences.
In this study we collected 163 summary sentences produced by deletion strategy and their corresponding sentences from the source text. We then calculated the similarity measure between the sentence pairs by using Eqs (16)- (20). Fig 1 presents the results obtained in this study. Based on the analysis of the results, we found that the similarity measure between two sentences in deletion strategy was between 0 and 1, as shown in Fig 1. Thus, the following statement can be used as the fourth rule to identify deletion strategy: From this study, we also found that in deletion strategy, only one sentence from the original text was used to create a summary sentence. Hence, we also consider this feature to identify deletion strategy. So, if N is the number of sentences that have been used for creating a summary sentence, then in deletion strategy we have the following statement: The number of sentence ðNÞ is equal to 1: ð5Þ

Topic Sentence Selection (TSS) Strategy
The main objective of this strategy is to determine a sentence from a paragraph, which represents the main idea of the paragraph. To identify topic sentence selection strategy, we consider 4 methods which are key method, location method, cue method and title method. The methods are explained as follows. Location method. This method assumes that sentences at the beginning as well as at the end of a document or a paragraph indicate the important information.
In this study, we investigated the use of location method to produce a summary sentence. For this purpose, we examined 560 summary sentences. We found that topic sentences tend to appear at the beginning or at the end of a paragraph. As shown in Fig 2, 49% and 51% of the topic sentences appeared at the beginning and the end of paragraphs, respectively. These findings are in agreement with the previous studies of Fattah and Ren [21] and Bawakid and Oussalah [27].
The following steps are used to identify topic sentence selection using location method: 1. Select all sentences from the source text that appeared at the beginning or at the end of a paragraph.
2. Add the selected sentences from step 1 to Sentence Location List (SLL). 3. For each summary sentence, find the corresponding sentence from the source text. Let S summary be a sentence of summary text, while S original is a corresponding sentence of the original text that is used to produce the sentence S summary. 4. Check the following statement to identify topic sentence selection: ( Where X indicates the sentence S original . Key word method. The assumption made by key word method is that the important sentences of a source text include one or more of key words. Key words are non-stop words, which occur frequently in the source text. We used term frequency (Tf) methods to identify words with high frequency in the source text, and then the words with high frequency were selected as the keywords. In this study, words with high frequency are shown in Fig 3. In this study, we identified the sentences from the source text that are used to produce summary sentences which consist of these key words. From the analysis of these sentences, we found that all of these sentences include keywords. The result of our study is presented in Fig 4. It shows the percentage use of keywords in summarises for identifying topic sentence selection strategy.
The following steps are used to identify topic sentence selection using keyword method: 1. Remove all stop-words from the source sentences.
2. Identify the frequency of each word of the source text.
3. Select top N words with high frequency, and then add them to Keywords List (KL).

4.
Find the corresponding sentence from source text for each summary sentence. Let S summary be a summary sentence, and S original be a corresponding sentence of the original text that is used to produce the sentence S summary . 5. Check the following statement to identify topic sentence selection: ( Where Y indicates a word of S original . Title Method. In title method, if a sentence of the original text contains one or more of the words that appeared in the title, the sentences can be considered as a topic sentence. In this study, we identified the sentences from the source text that are used to produce summary sentences which consist of title words. The result of our study is presented in Fig 5. It shows the percentage use of each word from text title that has been used to select an important sentence in topic sentence selection strategy.
The following steps are used to identify topic sentence selection using title method: 1. Add all words (non-stop words) to Title List (TL).  2. Find the corresponding sentence from source text for each sentence of summary text. Let S summary be a sentence of summary text, S original be a corresponding sentence of the original text that is used to produce the sentence S summary .
3. Check out the following statement for identifying topic sentence selection: ( Where Z indicates a word of S original . Cue method. Cue method includes cue words or phrases such as "in conclusion", "in this paper", "our investigation has shown", and "a major result is". The presence of these words in a sentence indicates the important information in the source text. These cue words are context dependent. However, due to the existence of different type of text, such as scientific article and newspaper article, it is difficult to collect these cue word as a unit list. Hence, since discourse markers can be used as an indicator of important content in a text and are more generic, a list of cue words has been built using discourse markers. In this study, we found some discourse markers that were used to indicate the significance of a sentence. Fig 6 presents some of these cue words.
The following steps are used to identify topic sentence selection using cue method: 1. Construct a Cue word list (CWL) using the discourse marker.
2. Find the corresponding sentence from source text for each summary sentence. Let S summary be a summary sentence, S original be a corresponding sentence of the original text that is used to produce the sentence S summary .
3. Check the following statement to identify topic sentence selection: ( Where CWL indicates a word of S original .

Paraphrasing strategy
Paraphrase strategy is a way to replace a word in source sentence with a synonym or similar word in summary sentence. For example, given two sentences (A: "I plunged into the ocean and swam back to shore.") and (B: "I dived into the ocean and swam back to shore."). The word 'plunged' in sentence A was replaced by a synonym word "dived".
The following steps are used to identify paraphrasing strategy: 1. Let S summary = {W 1 ,W 2 , Á Á ÁW N } be a summary sentence and S original = {W 1 ,W 2 , Á Á ÁW M } be a corresponding sentence of the original text that is used to produce the sentence S summary , where M and N are the number of words.
2. Get the root of each word of S original using WordNet, and then add to Array Root (AR).
3. Get the synonym of each word of S original using WordNet, and then add to Array Synonym (AS).
4. For each word of S summary , get the root of word using the WordNet, Let RW be the root of the word, then check out the following conditions:.
i. If RW was in AR, then set paraphrase strategy to "0", then jump to step 4; otherwise continue the following step.
ii. If RW was in AS, then set paraphrase to "1"; Stop the current loop; Otherwise jump to (iii); iii. Calculate the semantic similarity between RW and all word from S original using Eqs (16) and (17).
iv. If there is a similar value, then set paraphrase to "1"; Stop the current loop; Otherwise jump to 4;

Sentence Combination Strategy
The main objective of the sentence combination strategy is to combine one or more sentences from the source text to construct a summary sentence. It uses conjunction words such as and, or, so and etc., to merge sentences into a single sentence. In this study, we examined two features such as the number of source sentences combined in each summary sentence and the similarity measure between two sentences, summary sentence and the corresponding sentence of the source text. For this purpose, we collected 105 summary sentences produced using sentence combination strategy.
To examine how many sentences are normally merged in a summary sentence, we analysed the number of source sentences that have been used to create a summary sentence. From the analysis, we found that most summary sentences are generated from two or three sentences of the source text. Fig 7 presents the number of source sentences included in summary sentences. As we can see in Fig 7, out of 105 summary sentences created using sentence combination strategy, 70 summary sentences were usually a combination of two source sentences, 28 summary sentences were produced from 3 source sentences and 7 summary sentences were generated by 4 source sentences. As a result from this study, the following statement can be used as a rule to identify sentence combination strategy: The number of sentences ðNÞ is greater than 1: Where, N is the number of source sentences which have been used to produce a summary sentence.
Besides the aforementioned rules for identifying sentence combination, in this study we also consider the similarity measure between a summary sentence and the number of sentences from the source text involved in summary sentence, as a rule to identify this strategy. The similarity measure is computed based on the semantic similarity and syntactic similarity between two sentences.
The following steps are used to calculate the similarity measure in sentence combination strategy: 1. Given a Summary Sentence (SS) = {P 1 , P 2 Á Á ÁP N }, where P 1 , P 2 and P N are phrases from summary sentence that came from T 1 , T 2 , and T M respectively. T 1 , T 2 , and T M are source sentences that are used to produce the summary sentence.
2. Calculate the similarity measure between each pair of sentences, such as (T 1 ,SS), (T 2 ,SS)Á Á Á, and (T M ,SS) using the following steps: a. Create a "word Set". Identifying Summarizing Strategies b. Calculate semantic similarity between two sentences using Eq 18.
c. Calculate syntactic similarity between two sentences using Eq 19.
d. Calculate similarity measure between two sentences based on the semantic similarity and syntactic similarity using Eq 20.
3. Calculate the average similarity measure between sentences using the following equation: Where, M is the number of source sentences.
In this study, we collected 100 summary sentences produced by sentence combination strategy and the corresponding sentences from the source text. Then, we calculate the similarity measure between sentence pairs by using Eqs (16) and (17). From the analysis of the results, we found that the similarity measure between sentences in sentence combination strategy is between 0 and 1, as shown in Fig 8. Therefore, the following statement can be used as a rule to identify sentence combination strategy:

Copy-verbatim
In the copy-verbatim process, a summary sentence is created from the source sentence without any changes. This strategy is not part of the summarizing strategies but it is one of the common strategies that is used by students. To identify the copy-verbatim strategy, we use the following three rules: Sentence length. Sentence length, contains the number of words in a sentence. The main task of copy-verbatim strategy is to produce a summary sentence using a source sentence without any changes. Therefore, the length of summary sentence in summary text is always equal to the length of the corresponding sentence in the source text. Given two sentences, summary sentence and original sentence, let S s be a summary sentence, O s be an original sentence, let Len (O s ) denote the length of sentence O s and Len (S s ) denote the length of sentence S s . The first rule can have the following statement: The Length ðS S Þ is equal to the Length ðO S Þ: ð13Þ Similarity measure between sentences. In this study, to identify copy-verbatim strategy, we also consider the similarity measure between two sentences as a rule to identify this strategy. The following steps are used to calculate similarity measure between two sentences: 1. Create a "word set".
2. Calculate semantic similarity between two sentences using Eq 18.
3. Calculate syntactic similarity between two sentences using Eq 19.
4. Calculate similarity measure between two sentences based on the semantic similarity and syntactic similarity using Eq 20.
We collected 80 summary sentences produced by copy-verbatim strategy and the corresponding sentences from the source text. Then, we calculated the similarity measure between sentence pairs. We found that the similarity measure between two sentences in copy-verbatim strategy is bigger than 0 and equal to 1. Thus, the following statement can be used as a second rule to identify copy-verbatim strategy: The Similarity sentences ðS 1 ; S 2 Þ is equal to 1: ð14Þ Total number of sentences. In copy-verbatim strategy we detected only one sentence from the original text used to produce a summary sentence. Hence, we also consider this feature to identify this strategy. So, if N is the number of sentences that have been used to produce a summary sentence, then in copy-verbatim strategy we have the following statement that can be used as a third rule to identify strategy: The number of sentence ðNÞ is equal to 1: The summarizing strategies found from the decomposition of summary text were analyzed and formalized into a set of heuristic rules on how to identify the summarizing strategies. These rules are given in Table 3.

Related Works
There exists a large research on how the computer can help writing summaries: either by carrying out summarization or by evaluating students' summaries. However, computer models of the methods employed by instructors to evaluate students' summaries are yet lacking. An implementation of these models is more difficult, since many complicated goals must be considered to implement these models: those have to identify the important information or main idea from a source text (i.e., sentences/paragraph), then to perform a summarizing strategy (i.e., what kind of summarizing strategies to accomplish on these sentences/paragraph). Despite of the difficulty to implement these models, recently, researchers have developed a few systems for summary assessment.
In this section, first, the summary assessment systems those focus on content and style are introduced. Then, the summary assessment systems those focus on identifying summarizing strategies are introduced.
Laburpen Ebaluaka Automatikoa (LEA) [28], which is based on Latent Semantic Analysis (LSA) and cosine similarity measure, has been proposed to evaluate the output of the summarizing process. It is designed for both teachers and students, and enables teachers to examine the student-written summary, as well as allows students to produce a summary text using their own words. The summaries are evaluated based on certain features, such as cohesion, coherence, the use of language, and the adequacy of the summary.
Summary Street [29], which is based on LSA, is a computer-based assessment system that is used to evaluate the content of the summary text. Summary Street ranks a student-written summary by comparing the summary text and source text. It creates an environment to give appropriate feedback to the students, such as content coverage, length, redundancy and plagiarism.
Lin [30] proposed an automatic summary assessment system named Recall-Oriented Understudy for Gisting Evaluation. It is used to assess the quality of the summary text. The current system includes various automatic assessment approaches, such as ROUGE-N, ROU-GE-L and ROUGE-S. ROUGE-N compares two summaries based on the total number of matches. ROUGE-L calculates the similarity between a reference text and a candidate's text based on the Longest Common Subsequence (LCS). ROUGE-S (Skip-Bigram Co-Occurrence): skip-bigram is any pair of words in their sentence order, allowing for arbitrary gaps. Table 3. The rules to identify summarizing strategies and methods.

Summarizing Strategies
Heuristic rules to identify summarizing strategies Deletion 1. Words of summary sentence are found in source sentence.
2. The syntactic composition of the words in the summary sentence and in the corresponding source sentence is the same.
3. The number of words in summary sentence is less than the number of words in the corresponding source sentence.
Sentence combination 1. The summary sentence contains a combination of phrases from two or more sentences in the original text.
2. TN > 1 && (∑ (i = 1) TN Sim (S r, S s )) / TN < 1 Paraphrase 1. A word in the source sentence is replaced with a synonym word in the summary sentence.

Topic Sentence Selection (TSS)
A summary sentence is created by TSS, if it used: 1. Title method: The sentence includes one or more of Title words.
2. Location method: The sentence should be the first or last sentence of paragraph.
3. Cue method: The sentence includes one or more of cue phrases.
4. Keyword method: The sentence includes one or more of Key words.
Copy-verbatim 1. All words of summary sentence are found in source sentence.
2. The position of the words in the summary sentence and in the corresponding source sentence is the same.
3. The number of words in summary sentence is equal to the number of words in the source sentence. FRESA (Framework for Evaluating Summaries Automatically) [31], which is based on Jensen-Shannon divergence and ROUGE is a framework that is used to evaluate the multilingual summarization without Human references. It used the Rouge package such as uni-grams, bigrams, and the skip bi-grams with maximum skip distance of 4 (ROUGE-1, ROUGE-2 and ROUGE-SU4), to compute various statistics.
Mohler, Bunescu [32] introduced an Answer Grading System, which combines a graph alignment model and a text similarity model. This system aims to improve the existing approaches that automatically assign a grade to an answer provided by a student, using the dependency parse structure of a text and machine learning techniques. The current system uses the Stanford Dependency Parser [33] to create the dependency graphs for both the student (A 1 ) and teacher (A 2 ) answers. For each node in the student's dependency graph the system computes a similarity score for each node in the teacher's dependency graph using a set of lexical, semantic, and syntactic features. The similarity scores are used to weight the edges that connect the nodes in A 1 on one side and the nodes in A 2 on the other. The system then applies the Hungarian algorithm to determine both an optimal matching and the score associated with such a matching for the answer pair. Finally, the system produces a total grade based on the alignment scores and semantic similarity measures.
Although previous systems [28,29,30,31] have developed to assess summary writing, they focus on the content of the summary. A few summarization assessment systems have been developed to identify the summarizing strategies used by students in writing a summary. To the best of our knowledge, there are two systems which have been developed for summary assessment. We explain each of them as follows.
Modelling summarization assessment strategies (MSAS) [14] based on LSA have been developed. This model is based on the identification of 5 types of strategies which are: 1. Copy, a sentence from a summary text is semantically very close to a sentence in a source text.
2. Paraphrase, a sentence from a summary text is close to only one sentence in a source text.
3. Generalization, a sentence from a summary text is close to several sentences in a source text.
4. Construction, if no sentences of the original text are close to the summary sentence but at least one of them is related.

5.
Off-the-subject, if all sentences of the original text are not related to the summary sentence.
Using LSA and cosine similarity, each sentence from summary text is semantically compared with all sentences in a source text to identify the summarizing strategies. Three similarity thresholds have been used to create four categories: not enough similarity (cosine is less than 0.2), low similarity (cosine is greater than 0.2 and less than 0.5), good similarity (cosine is greater than 0.5 and less than 0.8), too high similarity (cosine is greater than 0.8). The comparison between each sentence from summary text and each sentence from source text results in a distribution of similarities among these four categories which lead to the identification of the student strategy.
Summary Sentence Decomposition Algorithm (SSDA) [15], which is based on word-position, has been proposed to identify the summarizing strategies used by students in summary writing. In this system, the summary text is syntactically compared with the source text to identify the summarizing strategies such as deletion, sentence combination and copy-verbatim. It does not use the semantic relationships between words in comparison between two sentences; hence, it cannot find summarizing strategies at the semantic level, such as paraphrasing, generalization, and invention.

Focusing on Main Problem
Conceptually, the process of identifying summarizing strategies involves two sub-processes as shown in Fig 9: 1) identifying the sentences from the source text that were used to create the summary sentences; and 2) identifying the summarizing strategies based on the sentences that have been identified in the first process. Before identifying the summarizing strategies, the Text Relevance Detection Component (TRDC) should be able to determine the relevant sentences Identifying Summarizing Strategies from the source text, for each summary sentence. If the relevant sentences cannot be determined from the source text, no matter how well other components in the system perform, the summarizing strategies will not be identified. Therefore, the text relevance detection component is an important engine in identifying summarizing strategies. This module provides a list of sentences which will be analysed in further steps. These sentences are then further processed using a variety of techniques to identify the summarizing strategies has been used in summary writing.
In text relevance context, linguistic knowledge such as semantic relations between words and their syntactic composition, play key role in sentence understanding. This is particularly important in comparison between two sentences where a single word token was used as a basic lexical unit for comparison.
Syntactic information, such as word order, can provide useful information to distinguish the meaning of two sentences, when two sentences share the similar bag-of-words. For example, "student helps teacher" and "teacher helps student" will be judged as similar sentences because they have the same surface text. However, these sentences convey different meanings. On other hand, two sentences are considered to be similar if most of the words are the same or if they are a paraphrase of each other. However, it is not always the case that sentences with similar meaning necessarily share many similar words. Hence, semantic information such as semantic similarity between words and synonym words can provide useful information when two sentences have similar meaning, but they used different words in the sentences.
While both semantic information and syntactic information contribute in sentence understanding [34,35,36,37,38] the current systems that have been proposed to identify summarizing strategies did not use the combination of semantic relations between words and their syntactic composition to identify text relevancy. Obviously this drawback has a negative influence on the performance of the previous systems.
As shown in Fig 9, there are two levels of summarizing strategies, semantic and syntactic levels. The strategies in semantic levels include paraphrase, generalization, topic sentence selection and invention. The strategies in syntactic level include deletion, copy verbatim and sentence combination. A few systems have been proposed to identify summarizing strategies [14,15]. However, these systems can identify strategies either in semantic level or in syntactic level. On the other hand, these systems did not use the combination of semantic and syntactic information to determine the relevant sentences from the source text, for each summary sentence. Obviously these disadvantages have a negative effect on the performance of current systems.

ISSLK Algorithm
The ISSLK combines semantic information and syntactic information to identify relevant sentences and summarizing strategies. The ISSLK algorithm is developed to: 1. Determine whether a sentence in the summary text is from the original text. Let S s represent a sentence of the summary text.
2. Identify all sentences from the original text that have relations with S s. Let R relations include these sentences.
3. Identify all sentences from R relations that are used to produce sentence S s. Let P Relevant Sentences include these sentences.
4. Identify the summarizing strategies and methods used to produce a summary sentence using sentences from P Relevant Sentences .
This algorithm includes two sub-algorithms, which are:

Sentences Relevance Identification Algorithm
The sentences relevance identification algorithm is a process for identifying sentences from the source text, which are used to produce a sentence in the summary text. It uses the combination of semantic similarity and syntactic similarity to identify these sentences. The steps to determine these sentences are presented in the intermediate-processing stage.

Summarizing Strategies Identification Algorithm
After identifying the relevant sentences for each sentence of summary text, the summarizing strategies that have been used to produce a summary sentence are identified. This process involves the use of the rules, as shown in Table 3, in which the rules are transformed into an algorithm as presented in the post-processing stage. Fig 10 displays the general architecture of the ISSLK algorithm, which consists of three main stages: a) Pre-processing, b) Intermediate-processing, and, c) Post-processing.

Pre-processing
This stage aims to perform a basic linguistic analysis on both the source text and students' summaries. Thus, it prepares them for further processing. In order to perform this analysis, external tool and resource are used. The pre-processing module provides text pre-processing functions, such as sentence segmentation, tokenization, part-of-speech tagging, stemming, stop word removal, finding sentences location (FSL), keyword extraction (KE) and title word extraction (TWE). The FSL finds the location of each sentence in a source text and determines whether it is the first or the last sentence of a paragraph or document. The TWE extracts all the nouns and verbs from the title of a document. The KE uses the Term Frequency (TF) method to identify words with high frequency.

Intermediate-processing
Intermediate processing is the core of the ISSLK algorithm and determines whether the summary sentence is generated from the source text, and, if so, identifies all the relevant sentences from the original text that are used to produce the summary sentence. To do so, the intermediate processing uses the Sentence Similarity Computation Component (SSCC) and Sentences Relevance Detection Component (SRDC). We describe each of them as follows: Sentence Similarity Computation Component (SSCC). The sentence similarity computation component includes a computation model to calculate the sentence similarity measure. The Sentence Similarity Computation Model (SSCM) is presented in Fig 11. It shows the overall process of applying the semantic and syntactic information to determine the similarity measure between two sentences. The main task of SSCM is to identify all the sentences from the original text that have relations with a sentence of summary text. This model includes a few components, such as word set, semantic similarity between words, semantic similarity between sentences, syntactic similarity between sentences, and sentence similarity measurement. The task of each component is as follows: The word set-Given two sentences S 1 and S 2 , a "word set" is created using distinct words from the pair of sentences. Let WS = {W 1 , W 2 Á Á ÁW N } denote word set, where N is the number of distinct words in the word set. The word set between two sentences is obtained through certain steps as follows: 1. Two sentences are taken as input.
2. Using a loop for each word, W, from S 1 , certain tasks are undertaken, which include: ii. if the RW appears in the WS, jumping to step 2 and continuing the loop using the next word from S 1 , otherwise, jumping to step iii; iii. If the RW does not appear in the WS, then assigning the RW to the WS and then jumping to step 2 to continue the loop using the next word from S 1 .
iv. Conducting the same process for Sentence 2. Semantic Similarity between Words (SSW)-Semantic word similarity [39,40] plays an important role in this method. It is used to create a word order vector and semantic vector. The semantic similarity between two words is determined through these steps: 1. Two words, W 1 and W 2 , are taken as input.
2. the root of each word is obtained using the lexical database, WordNet; 3. the synonym of each word is obtained using the WordNet; 4. the number of synonyms for each word is determined; 5. the Least Common Subsume (LCS) of two words and their length are determined; 6. The similarity score between words using Eqs (16) and (17) is computed.

:
Where LCS stands for the least common subsume, max_w is the number of words in Word-Net, Synset (w) is the number of synonyms of word w, and IC (w) is the information content of word w based on the lexical database WordNet.
Semantic similarity between sentences-We used semantic-vector approach [1,45,46] to measure the semantic similarity between sentences. The following tasks are performed to measure the semantic similarity between two sentences.

To create the semantic-vector.
The semantic-vector is created using the word set and corresponding sentence. Each cell of the semantic-vector corresponds to a word in the word set, so the dimension equals the number of words in the word set.

To weight each cell of the semantic-vector.
Each cell of the semantic-vector is weighted using the calculated semantic similarity between words from the word set and corresponding sentence. As an example: a. If the word, w, from the word set appears in the sentence S 1 , the weight of the w in the semantic vector is set to 1. Otherwise, go to the next step; b. If the sentence S 1 does not contain the w, then compute the similarity score between the w and the words from sentence S 1 using the SSW method.
c. If exist similarity values, then the weight of the w in the semantic-vector is set to the highest similarity value. Otherwise, go to the next step; d. If there is no similarity value, then the weight of the w in the semantic-vector is set to 0.
3. The semantic-vector is created for each of the two sentences. The semantic similarity measure is computed based on the two semantic-vectors. The cosine similarity is used to calculate the semantic similarity between sentences: Where S 1 = (w 11 ,w 12 ,Á Á Á,w 1m ) and S 2 = (w 21 ,w 22 ,Á Á Á,w 2m ) are the semantic vectors of sentences S 1 and S 2 , respectively; w pj is the weight of the j th word in vector S p , m is the number of words.
Word order similarity between sentences-We use the syntactic-vector approach [47,48] to measure the word-order similarity between sentences. The following tasks are performed to measure the word-order similarity between two sentences.

To create the syntactic-vector.
The syntactic-vector is created using the word set and corresponding sentence. The dimension of current vector is equal to the number of words in the word set.

To weight each cell of the syntactic-vector.
Unlike the semantic-vector, each cell of the syntactic-vector is weighted using a unique index. The unique index can be the index position of the words that appear in the corresponding sentence. However, the weight of each cell in syntactic-vector is determined by the following steps: i. For each word, w, from the word set. If the w appears in the sentence S 1 the cell in the syntactic-vector is set to the index position of the corresponding word in the sentence S 1 . Otherwise, go to the next step; ii. If the word w does not appear in the sentence S 1 , then compute the similarity score between the w and the words from sentence S 1 using the SSW method.
iii. If exist similarity values, then the value of the cell is set to the index position of the word from the sentence S 1 with the highest similarity measure.
iv. If there is not a similar value between the w and the words in the sentence S 1 , the weight of the cell in the syntactic-vector is set to 0.
3. For both sentences the syntactic-vector is created. Then, the syntactic similarity measure is computed based on the two syntactic-vectors. The following equation is used to calculate word-order similarity between sentences: Where O 1 = (d 11 , d 12 ,Á Á Á, d 1m ) and O 2 = (d 21 , d 22 ,Á Á Á, d 2m ) are the syntactic vectors of sentences S 1 and S 2 , respectively; d pj is the weight of the j th cell in vector O p .
Sentence similarity measurement-The similarity measure between two sentences is calculated using a linear equation that combines the semantic and word-order similarity. The similarity measure is computed as follows: Where alpha is the weighting parameter, specifying the relative contributions to the overall similarity measure from the semantic and syntactic similarity measures. The larger the alpha, the heavier the weight for the semantic similarity. If alpha equals 0.5 the semantic and syntactic similarity measures are assumed to be equally important.
Sentences Relevance Detection Component (SRDC). Let T Original Text = {S 1 , S 2 Á Á ÁS N } represent all sentences from the original text, where N is the number of sentences. S s denotes a summary sentence.
Let Arr Relations = {(S 1 ,S s ,Value sim(S1,Ss) ),(S 2 ,S s ,Value sim(S2,Ss) ) Á Á Á(S M , Ss, Value sim(SM,Ss) )} represent all the sentences from the original text that have relations with S s , where M is less than or equal to N and Value sim(SM,Ss) indicates the similarity measure between two sentences S M and S s .
Based on the previous section (Intermediate-processing), a summary sentence is related to any sentences of the original text, if the two sentences share at least a word. Hence, a set of sentences from the original text are found to have relations with a sentence of the summary text. Thus, it is important to determine which sentences from the source text have been used to create the summary sentence. In other words, we attempt to find a subset of the sentences Arr Relations that are used to produce Ss. Brr Relevant sentences , Brr RS represent a subset of the sentences Arr Relations . The steps to determine these sentences are as follows: Step 1. It selects a relation from Arr Relations with the greatest similarity score. Let S 1 be a sentence of ArrRelations that has relation to S s with the greatest similarity score, Value sim(S1,Ss) ). Thus, this pair of sentences is taken to the next step.
Step 2. In the current step, all the common words between two sentences S 1 and S s are eliminated; then, the length of sentence S s is checked. If it is equal to zero, it indicates that sentence S s includes a phrase from one sentence in the original text and sentence S 1 is used to create the sentence S s . In this case, sentence S 1 is assigned to Brr RS and then the cell (S 1 ,Ss,Value sim(S1,Ss) ) is removed from Arr Relations . Finally, the algorithm stops the current process. If the length of the sentence S s is not equal to zero, the algorithm continues the process to the next step.
Step 3. Let S 1 ' represent sentence S 1 with its remaining words and S s ' represent sentence S s with its remaining words. Using the SSW method, the semantic similarity measure between the words of sentence S s ' and S 1 ' is calculated. If there is a similarity measure, the similar words would be removed. We then check the length of S s ' . If it is equal to zero, this state shows that sentence S s contains a phrase from one sentence in the original text, and that sentence S 1 is used to create the sentence S s . Thus, sentence S 1 is assigned to Brr RS and then the cell (S 1 , S s , Value sim(S1, Ss) ) is removed from Arr Relations . Finally, the algorithm stops the current process. If the length of the sentence S s ' is not yet equal to zero, it shows that the sentence S s contains a combination of phrases from two or more sentences in the original text. Thus, sentence S 1 is assigned to Brr RS and then the cell (S 1 , S s ,Value sim(S1,Ss) ) is removed from Arr Relations . Finally, the algorithm continues the process to the final step.
Step 4. In this step, to calculate sentence similarity and to find other sentences that are used to create sentence S s , Arr Relations ' with the remaining elements and sentence S s " with the remaining words of S s ' are sent to the SSCC.

Post-processing
The final step of ISSLK is to support the automatic assessment of summaries by identifying summarizing strategies. In fact, it aims, to answer the following questions: 1. What summarizing strategies have been used to create a summary sentence?
2. How can a topic sentence selection strategy be identified?
3. What are the methods used to identify a topic sentence selection strategy? Table 3 summarizes the rules to identify each summarizing strategy and method. The overall processes for applying these rules to identify the summarizing strategies and methods are described as follows: Identifying summarizing strategies used in summary writing. Deletion, sentence combination, copy-verbatim strategies-Given two texts, summary text and original text, Let S s = {W 1 ,W 2 Á Á ÁW K } be a sentence of the summary text and Brr RS = {(T 1 , S s , P 1 ), (T 2 , S s , P 2 ) Á Á Á(T N , S s , P M )} represent all the sentences from the original text that are used to produce sentence S s , where k is the number of words in S s , M is the number of phrases in the sentence S s , T N is the N th sentence from the original text and (T N , S s , P M ) indicates that the M th phrase of sentence S s comes from the N th sentence from the original text. The steps for identifying deletion, copypasting and sentence combination strategies are as follows: Step 1. The algorithm checks the value of N. If it is equal to 1, then the algorithm attempts to find the deletion strategy and copy-verbatim strategy using step 2, otherwise, it attempts to identify the sentence combination strategy using step 3.
Step 2. Given two sentences, T and Ss, the algorithm computes the length of each sentence. Let Len (T) denote the length of sentence T and Len (Ss) denote the length of sentence S s . It also calculates the similarity measure between two sentences. Using Len (T), Len (Ss) and Sim (T, Ss), the following statements can be made: Where T indicates a sentence of Brr RS and Sim (T , S s ) denotes the sentence similarity measure between T and S s .
The State CP describes that the sentence S s used the copy-verbatim strategy if one sentence is used to produce S s, the length of two sentences is equal, and the similarity measure between two sentences is between 0 and 1 (but not 0).
The State Del describes that sentence S s used the deletion strategy and that if one sentence is used to produce S s , the length of sentence S s is less than the length of sentence T and the similarity measure between two sentences is between 0 and 1 (but not 0 and 1). The algorithm also considers the two following rules to identify deletion strategy.
Where, W is a word of S s and W o can be either a similar word or synonymous word.
Where, W 1 S s W 2 : W 2 appears after W 1 in sentence S s . W 2 S s W 3: W 3 appears after W 2 in sentence S s . W 1 S o W 2 : W 2 appears after W 1 in sentence T. W 2 S o W 3 : W 3 appears after W 2 in sentence T.
Step 3. If the value of N is greater than 1, it indicates that more than one sentence from the original text is used to produce the sentence S s . Hence, the S s used the sentence combination strategy if the value of N was greater than 1 and the average of the semantic similarity measure is between 0 and 1 (but not 0). The corresponding statement is provided below: Since the summary sentence S s contains a combination of phrases from two or more sentences in the original text, each phrase of sentence S s can be analyzed to identify other summarizing strategies, such as deletion, copy-pasting, topic sentence selection and paraphrasing.
Paraphrase strategy-Given two sentences, let S summary = {W 1 , W 2 , Á Á ÁW N } be a sentence of a summary text, where N is the number of words in the sentence S summary , S RS = {W 1 , W 2 , Á Á ÁW M } be a sentence of Brr RS that is used to create the sentence S summary , where M is the number of words in sentence S RS . A Root = {W R1 , W R2 ,Á Á ÁW RN } includes the root of each word of sentence S summary , where W Rj is the root of j th word in sentence S summary .
B Synonym = {W 1 , W 2 ,Á Á ÁW K }includes the synonym of each word of the sentence S summary . In the first step, the algorithm by a loop for each word of sentence S RS obtains the root and the synonyms using WordNet, then assign them to A Root and B Synonym , respectively.
In the second step, the algorithm by a loop for each word of sentence S summary determines the root of the word using the WordNet. Let RW be the root of the word. It checks if the RW was in A Root , and then continues the loop by the next word, otherwise, it searches for RW in B Synonym , then, if the search result is true, it indicates that the sentence S summary used the paraphrase strategy, and the current loop will then stop.
Topic sentence selection strategy: cue, title, keyword, location methods-Given two sentences, let 1. S summary be a sentence of summary text, S RS be a sentence of Arr Relevant sentences that is used to produce the sentence S summary ; indicates that the j th sentence, S, from source text is the first or the last sentence of a paragraph. Usually, those sentences are at the beginning and end of a document, the first and last sentences of paragraphs and also immediately below section headings. The steps for identifying the topic sentence selection (TSS) strategy using the four methods, cue, title, location and keyword are identified as follows: Title method-In the first step, it checks the sentence S RS for identifying the title method. Thus, if a word of L title word is in sentence S RS , it indicates that the sentence S summary used the title method; otherwise it did not use this method.
Key word method-In the second step, it checks the sentence S RS for identifying the keyword method. Thus, if a word of the L key word is in the sentence S RS , it indicates that the sentence S summary used the keyword method; otherwise it did not use this method.
Location method-In the third step, it checks the sentence S RS for identifying the location method. Thus, if the sentence S RS is in L sentence location , it indicates that the sentence S summary used the location method, otherwise it did not use this method.
Cue method-In the fourth step, it checks sentence S RS to identify the cue method. Thus, if a word of L cue word is in sentence S RS , it indicates that the summary sentence S summary used the cue method; otherwise it did not use this method.
Finally, the sentence S summary used topic sentence selection if it used at least one of these methods-keyword, cue, title and location.

Experimental Evaluations
To evaluate the ISSLK algorithm, we carried out two experiments. In the first experiment, we measured the performance of the algorithm against human judgment to identify the summarizing strategies. In second experiment, we compare the performance of the algorithm with the existing method. To do this, we now explain our experiments on the single-document summarization datasets provided by Document Understanding Conference (DUC) (http://duc.nist. gov).

Data set
In this section, we describe the data that used throughout our experiments. For assessment of the performance of the proposed method we used the document datasets DUC 2002 and corresponding 100-word summaries generated for each of documents. DUC 2002 contains 567 documents-summary pairs from Document Understanding Conference. It is worth mentioning that each document of DUC 2002 is denoted by original text or source text and the corresponding summary is denoted by candidate summary. We also used a set of students' summaries. In our experiments, the documents and corresponding summaries were randomly divided into two separate dataset. Table 4 gives a brief description of the datasets.

Evaluation Metric
To evaluate the performance of the ISSLK, an evaluation metric is required. Various evaluation metrics are widely used in different natural language processing applications. In our experiment, the evaluation is performed using precision, recall and F-measure.
Precision, Recall and F-score. Precision, recall and F-score are the prevalent measures for evaluating a system [49]. Precision is the fraction of selected items that are correct and recall is the fraction of correct items that are selected. In this work, the summarizing strategies identified by a human refer to a set of ideal items, and the strategies identified by an algorithm refer to a set of system items. Precision is used to assess the fraction of the system items that the algorithm correctly identified and recall is used to assess the fraction of the ideal items that the algorithm identified. The precision is computed using Eq 26. It is the division of identified summarizing strategies by ISSLK and human expert over the number of summarizing strategies identified by Algorithm only. The recall is computed using Eq 27. It is the division of identified summarizing strategies by ISSLK and human expert intersection over the number of summarizing strategies identified by human expert. Recall Where, A = The number of summarizing strategies identified by Algorithm and Human expert. B = The number of summarizing strategies identified by Algorithm only. C = The number of summarizing strategies identified by Human expert only.
There is an anti-correlation between precision and recall (Manning et al., 2008). It means, the recall drops when the precision drops and vice versa. To take into consideration the two metrics together, a single measure, called F-score, is used. F-score is a statistical measure that merges both precision and recall. It is calculated as follows: If a large value assigns to the beta, it indicates that precision has more priority. If a small value assigns to the beta it indicates that recall has more priority. If beta is equal to 1 the precision and recall are assumed to have equally priority in computing F-score. F-score for beta equals 1 is computed as follows: Where P is precision and R is recall.

Experiment 1-Evaluation of the algorithm with the human judgment
Procedure. Method H 0 -Summary Text-Source text. One method that can be used to identify the strategies employed by the summarizer is as follows. The first split the summary text into a number of sentences. The second, for each summary sentence determine all relevant sentences from the source text that are associated to produce the current summary sentence. Finally, ccompare the current summary sentence and the all relevant sentences from the source text to identify the strategies used to produce the current summary sentence.
To evaluate the algorithm, we need a gold standard data, which is a set of all correct results. Based on this dataset, also known as judgment data, we can decide whether the output of the algorithm is correct or not. For this purpose, two experts: a) An English teacher with good reading skills and understanding ability in the English language as well as experience in teaching summary writing; b) A lecturer with experience in using the skills in their teaching method, were asked to identify the summarizing strategies used by summarizer in each summary sentence. Once the subjects completed the task using method H 0 , we compared the results, the summarizing strategies identified by the ISSLK with those identified by subjects. Table 5 shows summarizing strategies identified ISSLK and Human expert as an example.
We used Cohen's Kappa [50,51] as a measure of agreement between the two raters. The Kappa coefficient for measuring the inter-raters agreement was 0.61. This value indicated that our assessors had good agreement [52] for grading each student summary.
Parameter setting. The proposed algorithm requires parameter to be determined before use: a weighting parameter (alpha) (refer to Eq 20) for weighting the significance between semantic information and syntactic information. The parameter in the current experiment was found using training data. We ran our proposed algorithm, ISSLK, on the training dataset. We evaluate ISSLK for each alpha between 0.1 to 0.9 with a step of 0.1. Table 6 presents our experimental results obtained by using various the alpha values. We evaluate the results in terms of precision, recall and F-measure. By analyzing the results, we find that the best performance is achieved by an alpha value 0.7. This alpha produced the scores for three metrics as follows: 0.8126 (precision), 0.6818 (recall), 0.7415 (F-measure). The best values of Table 6 have been marked in boldface. As a result, using the current data set, we obtain the best result when we use 0.7 as the alpha value. Therefore, we can recommend this the alpha values for use on the testing data.
Performance analysis. To confirm the aforementioned results, we validate our proposed algorithm, ISSLK. To do this, we measure the performance of the algorithm against human judgment to identify the summarizing strategies using unused data set, testing data. We apply ISSLK to the testing data set only with the alpha value 0.7. To compute the precision, recall and F-measure, we determine the values of A, B and C by analysing the number of summarizing strategies identified by the algorithm and human (A), the number of summarizing strategies identified by algorithm only (B), and the number of summarizing strategies identified by human only (C). Then, the equations of precision, recall and F-measure are applied to obtain the values for each summary.

Results and Discussion
According to the results presented in Table 7, the algorithm obtained an average of 77% precision, 66% recall and 70% F-score for summaries. It did not attain a high percentage for the precision, recall and F-score in comparison to human judgment due to various reasons, such as: 1. The algorithm failed to identify some of the summarizing strategies identified by the expert. These strategies are generalization and invention. It has affected the result of the algorithm and is the reason why we did not achieve the high percentage for precision and finally Fscore. However, this limitation is understandable because the algorithm was designed to identify the summarizing strategies and methods-paraphrase, topic sentence selection, sentence combination, copy-verbatim, key-words method, title method, location method and cue method-and is not able to identify strategies such as invention and generalization.
2. Another reason is that when the algorithm and human want to identify the topic sentence selection strategy using the cue method. The cue method used cue words, such as "in conclusion" and "as result", to display the important sentence in a text. These cue words rely on the content of the text. Thus, it is difficult to derive the list of cue words, since different types of text may generate a different list of cue words. Hence, there is no standard list of cue words; the lack of this standard list affects the results of the algorithm.
3. The algorithm used WordNet as the main semantic knowledge base for the calculation of semantic similarity between words. The comprehensiveness of WordNet is determined by the proportion of words in the text that are covered by its knowledge base. However, the main criticism of WordNet concerns its limited word coverage to calculate semantic similarity between words. Obviously, this disadvantage has a negative effect on the performance of our proposed algorithm.
4. The algorithm is not able to distinguish between an active sentence and a passive sentence. Given a summary sentence (A: 'Father likes his child.') and two original sentences (B: 'child likes his father.'; C: 'child is liked by his father.'), although the similarity measure between sentences (A and B) and (A and C) is same, but as we can see the meaning of sentence A is more similar to the sentence C. hence, it is important to know what passive and active sentences are before comparisons can be drawn.

Experiment 2-Comparison with related methods
In this section, the performance of our algorithm is compared with other well-known or recently proposed methods. In particular, to evaluate our methods on data set, we select the following methods: SSDA [15] and MSAS [14]. The evaluation metrics values are presented in Tables 8 and 9. In Table 9 ''---" means the proposed method could not identify the corresponding summarizing strategies. The above mentioned approaches use different data sources in their experiments. This makes a direct comparison between evaluation results of the different approaches impossible. In addition, they used different evaluation measures. Therefore, we re-examined the mentioned approaches upon the same dataset. Detailed comparison. With comparison to the precision and F-score values for other methods, our proposed method achieved significant improvement. Table 10 shows the improvement of ISSLK for all two metrics. It is clear that ISSLK obtains the high F-measure values and outperforms all the other methods. We use the relative improvement, Eq 30, for comparison. In Table 10 ''+" means the proposed method improves the related methods. Table 10 presents among other methods the MSAS shows the best results compared to SSDA. Compared with the method MSAS, our method improves the performance by (6.1728) %, and (4.9746) % in terms precision and F-score metrics, respectively.

Conclusion
Summarizing strategies are the core of the cognitive processes involved in the summarization activity. In this paper, we propose an algorithm based on the linguistic measure to identify the summarizing strategies used by summarizer in summary writing. The algorithm employs three similarity metrics to calculate similarity measure between two sentences: a) semantic similarity between sentences; b) word-order similarity between sentences; and c) semantic similarity between words. The main feature of the proposed algorithm is its ability to capture the meaning in comparison between a source text sentence and a summary text sentence, when two sentences have same surface text or different words have been used in the sentences. This algorithm is also able to identify summarizing strategies at both the semantic and syntactic levels. The algorithm is able to identify summarizing strategies and methods such as deletion, sentence combination, paraphrase, copy-verbatim, topic sentence selection, cue method, title method, keyword method and location method.  The evaluation of ISSLK is conducted over DUC dataset. The proposed algorithm is very easy to follow and requires minimal text processing cost. Initially, parameter of ISSLK is optimized over the training dataset. Later the actual summarizing strategies identification evaluation is done over test dataset. The first experiment was conducted to evaluate the performance of the algorithm using the comparison between the algorithm and human judgments. The result demonstrates that the algorithm obtained an average of 77% precision, 66% recall, 70% F-score. ISSLK is compared with the current systems which are well-known existing systems that are proposed to identify summarizing strategies. The experimental results display that the performance of the proposed algorithm is very competitive when compared with other systems. The results also displayed that ISSLK improved the performance of the existing system. We observed that ISSLK is able to obtain an average of 86% precision, 81% recall, 83% F-score.
This paper presents the following suggestions for future work. Firstly, the algorithm failed to identify some of the strategies, such as generalization and invention. To improve the performance of the algorithm in identifying summarizing strategies, it needs to work on algorithm to identify the strategies, such as generalization and invention. Obviously, this can improve the precision, recall, F-measure, and, finally, the accuracy of the algorithm. Finally, we are confident that the idea of incorporating semantic and syntactic information can be further explored by using a combination of more complex techniques and modules for text analysis. This is because once a passive or active sentence has been used in writing, it is important to know what passive and active sentences are before comparisons can be drawn. Finally, our method used WordNet as the main semantic knowledge base for the calculation of semantic similarity between words. The comprehensiveness of WordNet is determined by the proportion of words in the text that are covered by its knowledge base. However, the main criticism of WordNet concerns its limited word coverage to calculate semantic similarity between words. Obviously, this disadvantage has a negative effect on the performance of our proposed method. One solution is that, in addition to WordNet, other knowledge resources must be used.
In addition future works, we aim to examine other method to compute semantic similarity between words. It can be useful for increasing the overall performance of the proposed method.
Supporting Information S1 Dataset. Used to evaluate the proposed algorithm. (XML)