An Empirical Analysis of Rough Set Categorical Clustering Techniques

Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy.


Introduction
The grouping of objects having similar characteristics in the same cluster and having dissimilarity into different clusters is the keen objective of clustering. Moreover, clustering can segment large heterogeneous data sets into smaller homogeneous subsets which is easily managed, separately modeled and analyzed [1]. Clustering has been utilized for various data mining tasks like data summation and classification. In many areas such as research and development [2], marketing [3], medicine [4], nuclear science [5], software engineering [6] and radar scanning [7] clustering techniques are used. Large scale research and development a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 planning is identified by Mathieu and Gibson [2] using cluster analysis as a part of a decision support tool to participate and determine resource allocation. Wu et al. [4] developed a specific clustering algorithm designed for handling the gene data complexity. Wong etal. [5] presented an approach for positron emission tomography (PET) that is used to segment tissues in a nuclear medical imaging. Radar signals are segmented in marine objects and scanning land by Haimov et al. [7] using cluster analysis.
All these mentioned algorithms only deal those databases having attributes with numeric domains. Unlike numerical data, categorical data have multi-valued attributes in which the horizontal co-occurrences (common value for the objects) as well as the vertical co-occurrences (common value for the attributes) are required to be examined [4]. Thus, a similarity for the attributes can be defined for common objects, common values and the association between two. To handle categorical data clustering issue, Huang [1], Gibson et al. [8], Guha et al. [4] and Dempster et al. [9] contributed up to some extent but their techniques cannot deal with uncertainty [10]. Uncertainty is when there is no sharp boundary between clusters and it has become an integral part of most of the real world applications nowadays.
The rough set theory, proposed by Pawlak in 1982 [11] can be seen as a reliable mathematical approach towards the uncertainty. The first attempt on rough set based technique to select clustering attribute is proposed by Mazlack et al. [12]. They proposed two techniques, i.e., Bi-Clustering(BC) and Total Roughness(TR) techniques. Parmar et al. [13] proposed an algorithm Minimum-Minimum Roughness (MMR) in 2007 as one of the most successful pioneering rough clustering techniques. The generalizabilty and clusters purity of these techniques are still an issue as they can be applied only for a very special data set and objects in different class appear in one clusters, respectively [14]. Hence in 2010, Herawan et al. [15] proposed a technique to selecting clustering attribute called maximum dependency of attributes (MDA) which take into account the dependency of attributes in an information system using rough set theory. In 2013, Hassanein and Elmelegy [16] proposed a better and new approach for selecting clustering attribute called maximum significance attribute (MSA). This technique is based on the significance of attributes using rough set theory in an information system. Both MDA and MSA outperformed their predecessors approaches like BC, TR and MMR in terms of purity, computational complexity and rough accuracy up to certain level. However, MDA and MSA techniques have some limitations and issues while dealing with some special data sets in selecting the best clustering attribute. Moreover, these techniques have certain pros and cons which are explored in this study.
It is well known that the MSA and MDA approaches works to find their best possible clustering attribute on basis of maximum dependency and significance degrees respectively. Accordingly the following questions may arise when employing MSA and MDA techniques to any data set. First two questions illustrate the limitations of MDA and MSA techniques where they found difficulty in selecting or failed to select the best clustering attribute. While last question deal with exploring some useful pros and cons of both approaches. In the light of above research questions and limitations of existing techniques, a new rough clustering approach called Maximum Indiscernible Attribute (MIA) is proposed which uses the rough set indiscernibility relations for finding best clustering attribute. A set of objects can be characterized using rough set approach in terms of attribute values [17] and the partitions induced by indiscernibility relation of an attribute shows clusters obtained. Therefore, the number of clusters can be computed by finding cardinality of indiscernibility relation of any attribute. The number of clusters have also been used for evaluating clusters internally in [18] and [19]. Moreover in this paper, the effect of number of clusters on purity and entropy is also explored using propositions to validate the proposed approach.
The MIA technique selects the best clustering attribute having maximum cardinality of attribute's indiscernibility relation. Therefore, it takes into account only the domain knowledge of any data set, hence it has lesser computational complexity as compare to MDA and MSA techniques. Similarly, experimental results reveal that the MIA technique outperformed MDA and MSA techniques for all evaluation measures like purity, entropy, accuracy and rough accuracy.
The rest of this paper is organized as follows. Section 2 describes the rough set theory. Section 3 illustrates the proposed MIA technique and related propositions. The analysis of MDA and MSA techniques with some useful propositions and evaluation measures are presented in Section 4. Section 5 presents the experimentation, comparison of the techniques in light of each research questions. Section 6 discusses the experimental results. Finally, Section 7 concludes the study.

Pawlak's Rough Set
In early 1980s Zdzislaw Pawlak introduced Rough set theory as a new mathematical tool to deal with vagueness and uncertainty [20]. In the presence of uncertainty, rough set theory aids decision making [15]. Rough set theory does not need any preliminary or additional information about data, such as probability distribution in statistics, basic probability assignment in the Dempster-Shafer theory, or grade of membership or the value of possibility in fuzzy set theory [20]. With every object of the universe of discourse some information (data, knowledge) is associated, this founded assumption of rough set theory. This can be understood by letting a group of patients suffering from a specific disease. Information like name, age, address, temperature and blood pressure is contained in each patient's associated data file. Elementary granules of knowledge about patients (or types of patients) can be understood as same symptoms patients are indiscernible (similar) in view of the available information and can be classified in blocks. These blocks are called elementary sets or concepts, and can be considered as initial blocks of knowledge about patients.
This bring motivation for rough set theory that it represents subsets of a universe in terms of equivalence classes of a clustering of the universe. The concept of rough set theory is used here in term of data containing in an information system. For the representation of objects in terms of their attribute values the information system notation provides a convenient tool. Rough set information system is a 4-tuple (quadruple) S = (U, A, V, δ), where U is a nonempty finite set of objects, A is a non-empty finite set of attributes, V = S a 2 A V a , V a is the domain(value set) of attribute a, δ: U × A ! V is a function such that δ(u, a) 2 V a for every (u, a) 2 U × A, called information function [11].
With every set X U two crisp sets can be associated, called the lower and the upper approximation of X. The notions of lower and upper approximations of a set can be defined as follows [11].
For T A, the T-lower approximation of X, denoted by T ðXÞ and T-upper approximation, denoted by TðXÞ of X, respectively. The lower approximation of X is the union of all elementary set which are included in X.

T ðXÞ ¼ fx 2 Uj½x T Xg ð1Þ
Whereas the upper approximation of X is the union of all elementary set which have nonempty intersection with X, that is, In other words the lower approximation of a set is the set of all elements that surely belongs to X, whereas the upper approximation of X is the set of all elements that possibly belong to X. The difference of the upper and the lower approximation of X is its boundary region. Obviously a set is rough if it has non empty boundary region; otherwise the set is crisp. The Tboundary region of X will be referred as set, The accuracy of approximation (rough accuracy) of any subset X U with respect to T A, denoted η T (X) is measured by, Where |X| denotes the cardinality of X. For empty set ϕ, we define η T (ϕ) = 1. Obviously 0 η T (X) 1. If X is a union of some equivalence classes of U, then η T (X) = 1. Thus, the set X is crisp with respect to T. And, if X is not a union of some equivalence classes of U, then η T (X) < 1. Thus, the set X is rough(imprecise) with respect to S [11]. This means that higher the accuracy of approximation of any subset X U is the more precise (the less imprecise) of it self [15].

Maximum Indiscernible Attribute (MIA)
In this section, we will present the proposed technique, which we refer to as maximum indiscernible attribute (MIA). Rough indiscernibility relation of attribute(s) which is the domain knowledge of information systems is taken into account for MIA technique. Let T be any subset of A, two elements x, y 2 U is said to be T-indiscernible (indiscernible by the set of attribute T A in S) if and only if δ(x, t) = δ(y, t) for every t 2 T. Obviously, every subset T of A induces unique equivalence indiscernibility relation and unique clustering denoted by IND(T). The clustering of U induced by IND(T) in S denoted by U/T and the equivalence class in the clustering U/T containing x 2 U, denoted by [x] T . The cardinality of indiscernibility relation of an attribute(s) will show the number of clusters obtained by that attribute and can be evaluated as, The pseudo-code of the MIA algorithm is illustrated in Fig 1. This algorithm comprises of three main steps. The first step deals with the computation of indiscernibility relations for each attribute. The second step deals with the determination of each attribute's indiscernibility relation cardinality. This cardinality can be determined using Eq (5). In the last step, when each cardinality is computed, then the clustering attribute will be selected based on maximum cardinality. If the highest value of cardinality of indiscernibility relation is same with other, then it is recommended to take into account the pair of attributes that are tied and so on, until the tie is broken. An equivalence relation of selected attribute(s) will give the clusters obtained.
The cardinality of indiscernibility relation shows the number of clusters created. This idea of selecting attribute having maximum cardinality of indiscernibility relation is based on a study which claims that high purity is easy to achieve when the number of clusters is large [21]. Similarly, the number of clusters obtained at each step of the clustering process is an indicator of the success fullness of a clustering approach [22]. Though, large number of clusters means more cohesive and low coupled clusters are created [23], this provide justification that the higher the cardinality of indiscernibility relation of attribute(s), the more accurate for selecting clustering attribute.
We first present the relation between the properties of roughness of a subset X U with the cardinality of indiscernibility relation of two attributes as stated in Proposition 1. Generalization of Proposition 1 is given in Proposition 2. Moreover, Proposition 3 and Proposition 4 illustrates the effect of the number of clusters on purity and entropy of clustering respectively.
for every X U. Proof: Let L 1 , L 2 , . . ., L n and M be any subsets of A in an information system S = (U, A, V, δ). From the hypothesis and follows from Proposition 1, we have

Proposition 3 Increasing number of clusters maximizes purity.
Proof: The extent to which a cluster contains objects of a single class is called purity [24]. After calculating the class distribution of the data for each cluster, i.e., for cluster x we compute P xy , the probability that a member of cluster x belongs to class y as P xy = c xy /c x , where c x is the number of objects in cluster x and c xy is the number of objects of class y in cluster x. The purity of cluster x is P x = max y P xy and the overall purity of a clustering is, Eq (6) of purity can be simplified to, If we consider the worst possible case i.e. in selected best attributes each cluster in them has just single object correctly classified to particular class, than Eq (7) gives, Which shows that reducing the number of clusters minimize the purity of clusters because, k c > kÀ 1 c > kÀ 2 c is always true. And if we consider best possible case i.e. all objects of clusters of selected attribute correctly classified to particular class, than Eq (7) shows, In this case we always get purity 1 which means that for ideal case, reducing the number of clusters has no effect on the purity of clusters. As long as all objects of clusters of selected attribute are correctly classified to particular class it will always give 100percent purity. Proposition 4 Increasing number of clusters minimizes entropy. Proof: The degree to which each cluster consists of objects of a single class is called Entropy [24]. Smaller the entropy is, better will be the clustering performance. Using the class distribution and previous terminology, the entropy of each cluster x is calculated using the standard formula, Where L is the number of classes. The total entropy for a set of clusters is calculated as the sum of entropies of each cluster weighted by the size of each clusters, that is, Where k is the number of clusters and m is the total number of data points. Eq (8) of entropy can be simplified to, If in selected group of clusters, each cluster in them has just single object correctly classified to particular class then c xy = c sy = c ty = 1. For this worst possible case for any clustering solution, the Eq (10) results, For inequality k > k − 1 > k − 2 we have e x < e y < e z , because by reducing number of clusters will increase the size of each cluster. Hence, À kL c log 2 c t is always true. Hence, it shows that the entropy will minimize for increasing number of clusters.
Ideally, if each cluster will contain elements from only one class then entropy is 0 [24]. Considering this best possible case for any clustering solution, where all objects inside clusters are correctly classified to a particular expert cluster then c xy = c x , c sy = c s , c ty = c t . Hence, Eq (10) simplified to, In this case we always get entropy 0 because reducing the number of clusters has no effect on the entropy of clusters as long as all objects of clusters are correctly classified to only one expert cluster.

Performance Comparison
The performance of two existing rough set techniques in clustering categorical data that is MDA and MSA is investigated. In the subsequent subsections we briefly illustrates the analysis of these techniques, related propositions and evaluation metrics that are employed in this study.

Maximum dependency attributes (MDA)
Based on rough set theory using the dependency of attributes in information systems, Herawan et al. [15] in 2010 proposed MDA technique for selecting the clustering attribute. Let S = (U, A, V, δ) be an information system and let P and Q be any subsets of A. Degree of dependency of attribute Q on attributes P, denoted P ) k Q, is defined by, Obviously, 0 k 1. Attribute Q is said to be depends totally (in a degree of k) on the attribute P if k = 1. Otherwise, Q depends partially on P. The maximum degree of dependency of attributes is the more accurate (higher of accuracy of approximation) for selecting clustering attribute [25]. If the highest value of an attribute is the same with other attributes, then it is recommended to look at the next highest MDA inside the attributes that are tied and so on until the tie is broken.

Maximum significance of attributes (MSA)
In 2013, Hassanein and Elmelegy [16] proposed MSA technique for selecting clustering attribute. It uses the rough set theory concept of significance of attributes in information systems. Suppose significance of single attribute a i 2 A related to a j 2 A, Here, the attribute having maximum degree of significance is selected as the best clustering attribute. If the highest value of an attribute is the same with other attributes, then it is recommended to look at the next highest MSA inside the attributes that are tied and repeat until the tie is broken [16]. Proposition 5 If attributes are not dependent on each other then they are also not significant for each other.
Proof: Eq (11) results 0 if attributes are not dependent on each other that is, Now, Eq (12) of significance of attributes gives,

Evaluation metrics
The evaluation metrics purity and entropy are already defined in Section 3 whereas remaining measurse are presented in subsequent paras.

Accuracy.
The ratio between the number of correctly clustered objects over the total number of objects is accuracy [26]. After finding the true positive(TP), true negative(TN), false negative(FN) and false positive(FP), the accuracy is calculated as,

Minimum Number of Iterations.
As the number of iterations shows the computational complexity of desired technique so, lesser number of iteration to perform clustering task indicates better technique. Minimum steps required to find the value sets, indiscernibility relations, dependency, significance of each attribute and maximum values are counted as the iterations. For any technique, this evaluation shows the easiness with efficiency.

Response Time.
The response time of CPU to perform clustering task is examined by counting the time in milliseconds. Like wise the minimum number of iterations, response time also predicts the computational complexity of that technique. Hence, lesser time indicates a better technique. The rapidness with efficiency of any technique can be seen by this evaluation.

Rough Accuracy.
Mean roughness is used to measure the rough accuracy of selecting clustering attribute. The higher the mean roughness is the higher the accuracy of the selecting clustering attribute. The mean roughness of attribute a x A, with respect to attribute a y A, where x 6 ¼ y, denoted Rough a y (a x ) is evaluated as follows,

Experimentation and Comparison
An empirical study is performed on various small cases and six UCI data sets for categorical clustering using MDA, MSA and MIA techniques. The UCI data sets includes Lenses (24 instances, 4 attributes), Hayes-Roth (132 instances, 5 attributes), Molecular biology-Splice (3190 instances, 62 attributes), Balloons (16 instances, 5 attributes), Train (10 instances, 32 attributes) and Soya been (47 instances, 35 attributes). The comparison process of finding best clustering attribute using these techniques were organized in the light of research questions in Section 1 and in terms of different cluster evaluation parameters like purity and entropy. Moreover, the effect of the nature of data sets over the performance of these clustering technique is also analyzed. The reason behind taking various data sets is to investigate the generalizabilty ability and performance of these techniques on different data sets. Now in subsequent subsections, the research questions are separately analyzed and discussed in detail. To illustrate procedure of selecting best clustering attribute by MDA, MSA and MIA techniques, the dependency, significance degree and indiscernibility relation cardinality of an information system are computed step wise only in first example of Section 5.1.1.

Zero Dependency and Significance Degree
The cases where attributes are not dependent on each other, then automatically the resultant dependency degree of each attribute is zero. Similarly, if each attribute has no significance on remaining attributes then significance degree produced by each attribute is also zero. In this situation, the MDA and MSA techniques faces difficulty to select or failed to select the best clustering attribute as one cannot select maximum among all zeros. On the other hand, the MIA technique successfully select best clustering attribute even if attributes are not dependent or not significance for each other. Two cases are discussed here. Case 1 is Dengue Diagnosis information system taken from [27] while, Case 2 is Lenses data set taken from the UCI repository. The results of these examples also helps in validating Proposition 5.

Dengue Diagnosis.
Patients with possible dengue symptoms are presented in Table 1, which is taken from [27]. In this data set, twenty patients were considered having three symptoms or categorical attributes: Symptom A(SYMP A), Symptom B(SYMP B) and Symptom C(SYMP C).
Firstly, the equivalence classes induced by indiscernibility relation of singleton attributes are obtained. In the next step, procedures to find dependency, significance and indiscernibility relation cardinality of each attribute are presented. The dependency of attributes of each data set is evaluated using Eq (11), whereas the significance of each attribute with respect to other attributes can be computed via Eq (12). Similarly, the cardinality of indiscernibility relations can be computed using Eq (5). From Table 1 Based on Eq (11), the degree of dependency of attribute SYMP A on SYMP B, denoted as SYMP B ) SYMP A, can be calculated as follows, SYMPB ) k SYMPA, k ¼ Σ x2U=SYMPA jSYMPBðXÞj jUj ¼ jfgj jfa; b; :::; tgj ¼ 0: Using the same way we obtain, SYMPC ) k SYMPA, k ¼ Σ x2U=SYMPA jSYMPCðXÞj jUj ¼ jfgj jfa; b; :::; tgj ¼ 0: Table 2 summarized the degree of dependency of all attributes of dengue diagnosis information system. It shows that the MDA technique have not been able to select a clustering attribute because maximum dependency degree of each attribute is 0. Thus, the MDA technique would lead a problem, because all the values are same that is 0. We can get the significance of subsets of U based on each attribute with respect to other attributes via Eq (12).
i-The significance of attribute SYMP A with respect to attribute SYMP B, denoted as σ SYMPB (SYMPA), can be calculated as follows.
Let C 0 represents all attributes except attribute SYMP C that is C 0 = {SYMPA, SYMPB}. On the other hand, the MIA technique doesn't require dependency or significance among attributes hence, it successfully select best clustering attribute for this data set. Using Eq (5), The indiscernibility relations cardinality for each attribute is presented in Table 4. According to this table, the attribute SYMP C has maximum cardinality of its indiscernibility relation, hence it is most indiscernible attribute and by MIA technique it is selected as best clustering attribute.

Lenses. UCI Lenses data set comprises of 24 instances and 4 conditional attributes.
The degree of dependencies of all attributes of Lenses data set are summarized in Table 5. While, the results of the significance of all attributes is presented in Table 6. It can seen in both tables that for each attribute, the maximum dependency and significance value is 0. Hence, here both MDA and MSA techniques fails to select best clustering attribute based on maximum dependency and significance degree. Whereas, the MIA technique irrespective of attribute dependency or significance, selects best clustering attribute for on basis of maximum indiscernibility relations cardinality. For each attribute, the indiscernibility relations Analysis of Rough Clustering Techniques cardinality is presented in Table 7. which shows, the attribute 1 has higher indiscernibility relation cardinality that is 3. Hence, it is most indiscernible attribute and by MIA technique it is selected as best clustering attribute.

Same Dependencies and Significance Degrees
There are cases where two or more attributes of any data set are equal dependent or significant for each other. As a result for these data sets, the MDA and MSA techniques gives the same maximum dependence and significance degrees respectively. In this situation, both techniques finds difficulty in selecting or unable to select an attribute as their best clustering attribute. The reason is simple that one cannot select maximum among same values. On the other hand, the MIA technique successfully select best clustering attribute even if attributes are equally dependent or equally significant. To explain this limitation of MSA and MDA, three example case  studies Suraj's LEMS [28], Pawlak's Car performance [29]and Grzymala's inconsistent [30] data sets are discussed here.

Suraj's LEMS Data Set.
Lets consider a very simple information system shown in Table 8 taken from [28]. The set of objects U consists of seven objects with two conditional attributes: Age and LEMS (Lower Extremity Motor Score) and one decision attribute (Walk). Table 9 summarizes the degree of dependency of all attributes of LEMS data set for MDA technique. Looking into this table, the maximum dependency degrees of AGE and LEMS attributes are equal that is 0.5714. Selecting maximum among same values is impossible, therefore the MDA technique would lead a problem in selecting best clustering attribute. Similarly, for MSA technique after computing the significance degrees of all attributes the results are summarized in Table 10. The MSA technique also faces difficulty in selecting best clustering attribute as maximum significance degrees for AGE and LEMS attributes are equal that is 0.5714. Thus, likewise MDA the MSA technique also not been able to select best clustering attribute for this data set. Whereas, by considering MIA technique the best clustering attribute is successfully selected. Table 11 presents the indiscernibility relation cardinality for each attribute of Suraj's LEMS data set. The MIA technique selects LEMS attribute as best clustering attribute because the LEMS attribute has higher indiscernibility relation cardinality that is 4.

Grzymala's Information System.
Considering Grzymala's information system from [30] as shown in Table 12. This is a patient data set having a decision attribute and four conditional attributes i.e. A, B, C, D expressing certain symptoms of a disease. Taking into account the MDA technique, which required the degree of dependency of all attributes of Grzymala's data set. The computed degree of dependencies are summarized in Table 13. This table shows that the attributes B, C and D has first(0.16), second(0) and last(0) maximum dependency degrees same. Thus, the MDA technique unable to select best clustering attribute among attributes B, C and D as all possible maximum dependencies values of these attributes are equal. However, for this data sets the MIA technique can select the best clustering attribute on basis of maximum indiscernibility relation cardinality. The indiscernibility relation cardinality for  each attribute of Grzymala's information system is illustrated in Table 14. Attribute A has comparatively higher indiscernibility relation cardinality that is 3, hence it is selected as best clustering attribute using MIA technique.

Pawlak's Car performance Data Set.
Pawlak's Car performance data set in Table 15 is taken from [29]. There are six cars (m = 6) with three(n = 3) conditional attributes i.e. a = Terrain familiarity, b = Gasoline level, c = Distance. Considering the MSA technique, the significance degrees of all attributes are summarized in Table 16. This table shows that attribute b and attribute c has first(1) and second(0.67) maximum same. Due to same maximum significance degrees so, the MSA technique faces a problem in selecting best among both attributes b and c. Whereas, the MIA technique successfully selects best clustering attributes on basis of maximum indiscernibility relation cardinality. Table 17 presents the indiscernibility relation cardinality for each attribute of Pawlak's Car performance data set. According to which, the attribute b and c has higher but equal indiscernibility relation cardinality that is 3. Hence, according to MIA technique the possible combinations of b and c attributes will be taken. The indiscernibility relation cardinality for only possible combination(b+c) is 4 which is maximum. Hence, this resultant combination of attributes that is b+c is selected as best clustering option using MIA technique.

Selecting Different Attributes as Best
If MIA, MSA and MDA techniques in some cases select different attribute as their best clustering attribute, then evaluations measures results also differently. From Proposition 6, it can be concluded that MIA technique is taking lesser iterations and time to select the its best clustering attribute as compare to MDA and MSA techniques. The results also proves that the MIA technique outperforms other techniques for evaluation measures like accuracy, purity, rough accuracy and entropy. Moreover, as compare to MSA technique the MDA technique selects their best clustering attribute in lesser iterations and response time whereas, the MSA technique shows better results for remaining evaluation measures than MDA technique. Five UCI data sets are utilized here for experimentation. That includes Hayes-Roth, Splice, Balloons, Train and Soya been. The MIA, MDA and MSA techniques selects their best attribute on basis of maximum indiscernibility relation cardinality, dependency and significance degrees respectively. In terms of respond time and minimum iterations, Tables 18 and 19 illustrates the results. It can be seen that MIA performs better due to less iteration required and better response time as compare to other techniques for all data sets. The number of iterations includes steps for finding maximum dependency among attributes for MDA, significance of attributes for MSA and cardinality of indiscernibility relations of attributes for MIA technique.

Discussion
The research questions that are designed in start of this article are answered and discussed in this section. The answers are based on the results of experiments performed in Section 5. The  minimum iterations required for finding its best clustering attribute. Whereas, the MSA technique despite of having more computational steps but its performance is better in terms of accuracy, purity, rough accuracy and entropy. Meanwhile, the proposed MIA technique not only perform better in terms of number of iterations and response time but also it proves to be more efficient than MDA and MSA in terms of purity, entropy rough accuracy and accuracy.

Conclusion
An alternative rough categorical clustering technique MIA is proposed in light of some useful limitations, pros and cons of existing techniques like MDA and MSA. The effect of special data sets nature over the performance of MDA and MSA clustering techniques is analyzed to explore the difficulties and issues faced by both techniques in selecting their best clustering attributes. Moreover, this work illustrates how MIA is resolving those issues. MIA technique utilizes rough indiscernibility relations of each attribute in selecting its best clustering attribute. The MIA technique is proven to be better, efficient, simple and more general as compare to MDA and MSA clustering techniques in terms of number of iterations, response time, purity, entropy, rough accuracy and accuracy. Ten different data sets from UCI repository and previously used research cases are utilized for experiments. Moreover, this study provides the users an alternative approach for selecting a proper and effective rough clustering technique to select best clustering attribute. The performance of the proposed MIA technique shows that it can be extended for other real and big categorical data sets.