KMS: K-Means/K-Medians Support
Parameter Information
Sample Selection
The sample selection option indicates whether to cluster genes or experiments.
Distance Metric Selection
This area allows the selection of the metric to be used to assess gene-to-gene
or sample-to-sample distances. The initial metric displayed (choosen) corresponds to the global
setting in the Multiple Array Viewer's 'Metrics' menu. Alterations to the
chosen metric in this dialog will only alter the metric used for the current
algorithm run. The global setting in the main 'Metrics' menu will remain unchanged.
An appendix in the MeV manual describes the distance metrics offered in MeV.
Means/Medians option
The Means or Medians option indicates whether each cluster's centroid vector
should be calculated a mean or a median of the member expression patterns.
Number of k-means/k-medians runs
This integer value indicates how many times KMC should be run.
Threshold % of occurrence in same cluster
This parameter indicates the minimum percentage of times that two elements
should cluster together in order consider the two elements in a cluster.
For instance, if 10 KMC runs were run, and the percentage was 80% then a pair of
expression elements found together at least 8 times would be considered
to pass a criteria to be included in a cluster.
Number of Clusters (K)
This positive integer value indicates the number of clusters to be created during
each KMC run. Note that for K-Means support the final number may turn out to
be slightly smaller or larger than this entered value depending on the nature
of the input data and the appropriate selection of K (number of clusters to create).
Note that FOM can be used to estimate an appropriate value for K.
Number of Iterations
This positive integer value is the maximum number of times that all the elements in the data set
will be tested for cluster fit. On each iteration each element
is associated with the cluster with the closest mean (or median).
Note that a KMC run will terminate when either no elements
require migration (reassignment) to new clusters or when the maximum number of
iterations has been reached.
Hierarchical Clustering
This check box selects whether to perform hierarchical clustering on the elements in each cluster
created.