Statistical Distance Test Statistic

This statistic is used to evaluate the significance of multiple sets of Monte Carlo simulations in Jacquez's k-Nearest Neighbor and Cuzick & Edwards' methods.

It combines the P-values across the number of tests you specify (k). Similar to the Bonferroni and Simes combined P-values, this statistic gives an overall probability that accounts for multiple comparisons. With this measurement, you can calculate the distance between the mean of a cluster Ji1 and a single data point, i.

Allow J to signify a 1 X 10 vector of the test statistics (J1,…J10 ). For each randomization ClusterSeer computes a J vector, which can be represented as a location in 10 dimensions. The results under randomization form a cloud of “Number of runs” points in this 10-D space. The center of the cloud is the centroid. You can evaluate significance by comparing the statistical distance from the centroid of the observed vector J to the statistical distances from the centroid of the J vectors being randomized. The statistical distance from each point to the centroid is as follows:

 

Here di is the distance from point i to the centroid, Ji1 is the value of the statistic ( Jk) calculated for k=1 using the data from the first randomization.  s1 signifies the standard deviation of the J1 under randomization, and J1is the mean of J1.

ClusterSeer calculates an upper-tail P-value based on the Monte Carlo simulations, counting the number of distances to the centroid that are greater than or equal to the distance from the observed J to the centroid. This P-value is the probability, under the null hypothesis, of observing a vector of Jk or (Δ Jk) as or more extreme than the observed. If the combined P-value is smaller than 0.05, you can reject the null hypothesis that there is no spatial clustering.

See Also