cabealho
data/hora
Clique para editar os estilos do texto mestre
Segundo nvel
Terceiro nvel
Quarto nvel
Quinto nvel
rodap
n
PC(r) : the number of pairs in the dataset in which points are within a distance r of each other (e.g. schools and libraries).
PC(r) vs. r plotted in log scale follows a power law a straight line
The correlation fractal dimension of the dataset will be used as metric for the fractal clustering.
Some attributes do not contribute to D2
These attributes do not need to be considered in the fractal clustering.
The order in which attributes are removed (left to right) ranks the attributes by relevance.
The removed attributes fall in one of two cases:
i) They are null or constant throughout all tuples.
ii) They are correlated with one or more attributes in the dataset and can be derived from them.
We can then divide the dataset into two partitions.
D(r): Relevant attributes
D(i): Removed ones
Each attribute in D(i), if not null or constant, is correlated with one or more attributes in D(r).
We can consider only the attributes in D(r) to perform FC.
Uses the correlation fractal dimension as the similarity metric to cluster elements.
Starting with a set of initial clusters of a sample of the dataset.
tries to cluster points so that each point disturbs the least the fractal dimension of the cluster.
Points that cause a large perturbation in all clusters are considered outliers.
If the final fractal dimension of a cluster increased more than a certain threshold:
Divide the cluster and submit it to one more FC step.
Clusters are not restricted to any shape - improves the quality of clusters.