Nearest in space

Nearest neighbors relationships are part of methods such as Cuzick and Edwards' and Jacquez's k-Nearest Neighbor (k-NN) methods. These methods consider whether events neighbor each other in space or in space and time, respectively. Considering nearest neighbors avoids the problem of setting a threshold distance to evaluate whether cases are near or far from each other. Threshold distances are used in other methods, but may not be appropriate to all datasets. For example, if your dataset consists of both urban and rural locations, distances to neighbors will be longer in the rural locations than in the urban region. Thus, no single threshold distance will capture the types of neighborhoods you wish to consider.

Spatial nearest neighbors

ClusterSeer calculates nearest neighbor relationships from the distance between events submitted as a file of point locations (as occurs in the Cuzick & Edwards' method).

Nearest neighbor relationships in space may or may not be reciprocal. A point may be its nearest neighbor's nearest neighbor, but perhaps its nearest neighbor is actually closer to something else. In the figure below, the ones are nearest to each other but the zero's nearest neighbor is a 1.

k-NN

A point has a nearest neighbor, but nearest neighbor relationships can be considered at higher levels. The nearest neighbor methods in ClusterSeer are flexible and can consider several levels of neighbors (first nearest neighbor, first and second nearest neighbor, etc.). k defines the number of neighbors to consider in the analysis. In the illustration below, the zero has both ones as its first and second nearest neighbors (k = 2).

Ties

A problem with nearest neighbor methods is how to resolve ties. If two neighbors are the same distance from the event considered, which one should be scored? ClusterSeer solves the tie arbitrarily by choosing only one of the tied events.

See Also