web dirweb dir Bookmark and Share |
 

Supplementary material to “Careful Construction of Hypothesis Tests: Comment on 'Lies, Damned Lies, and Statistics (in Geology)'”

22 February 2011

Steven R. Taylor, Rocky Mountain Geophysics, Inc., Los Alamos, New Mexico

Dale N. Anderson, Los Alamos National Laboratory, Los Alamos, New Mexico

Citation:

Taylor, S. R., and D. N. Anderson (2011), Careful construction of hypothesis tests: Comment on “Lies, damned lies, and statistics (in geology),” Eos Trans. AGU, 92(8), 65, doi:10.1029/2011EO080009. [Full Article (pdf)]

This comment is stimulated by the recent interesting article entitled “Lies, Damned Lies and Statistics (in Geology)” by P. Vermeesch that recently appeared in the American Geophysical Union publication EOS. In the article, Pearson's chi-square test is applied to a large catalog of earthquakes to test the hypothesis that earthquakes are uniformly distributed across day of week. In the Vermeesch analysis this hypothesis is rejected leading to the conclusion that earthquakes are unevenly distributed by weekday with seismic activity being particularly high on Sunday. The conclusion is made that the strong dependence on p-values with sample size makes them uninterpretable. However, this is a well-known property of classical tests of hypothesis, that is, the power of a classical test is a function of the degrees of freedom of the test, so that a test with large degrees of freedom will always have the resolution to reject the null. Vermeesch correctly notes that caution should be exercised when applying classical tests on voluminous data. With proper cautions and attention to the application setting, reporting hypothesis test analysis with p-values is a technically sound approach to inferential analysis (see Anderson et al., 2007; Taylor 2010; Taylor et al., 2010). We were therefore concerned about the conclusions of Vermeesch giving p-values a bad reputation and offer second note of caution. In the Vermeesch analysis, the original random variable, origin time of earthquake, is mapped to a new random variable, earthquakes counts by day of week. This new random variable is motivated by the null hypothesis that earthquakes are uniformly distributed across day of week. It is the rejection of this null hypothesis that causes inferential problems. There is no scientific basis for the claim that earthquake occurrence is correlated with day of week and yet rejection of the null hypothesis forces this as a possible conclusion. The null hypothesis can be analyzed with the original random variable. As we will discuss in the following, the null is still rejected. However, with the original random variable, interpretation of this rejection has a plausible explanation.

As noted above, it is well known that a chi-square test for agreement of data with a hypothesized probability distribution is extremely powerful with voluminous data. For example, when the number of bins used in the chi-square test is small and the number of events is large — it is easy to find a binning that will reject a null hypothesis that data agree with a probability distribution. Originally we thought the test for uniformity of an earthquake catalog as a function of day of week may be better addressed using more balanced test such as Kolmogorov-Smirnov (KS) or possibly the Rayleigh test of uniformity used in circular statistics (e.g. Fisher, 1995). In preparing to address these issues, we acquired the same dataset of events greater than magnitude 4 between 1999 and 2009 obtained from the U.S. Geological Survey Web site as used by P. Vermeesch and noticed some peculiarities that will be expanded upon below.

We first binned the data into day of week and reproduced the result of Vermeesch using the same chi-square test (Figure S1). Also, note that the largest number of events is on Sunday prompting Vermeesch to speculate on the possibility of false triggers caused by tolling of church bells and to conclude that the apparent non-uniformity of events versus weekday as not being due to geological reasons. For the KS-test, the data are first ordered (in our case by epoch time) and this is where we began to notice some peculiarities. Figure S2a shows a plot of the catalog data event number as a function of epoch time in seconds. We noticed a large kink and change in slope of the plot occurring late in 2004 and remembered that this corresponds to the time of the magnitude 9.1 Great Sumatra Earthquake of December 26, 2004 which, interestingly, turns out to be the 3rd largest earthquake in the world since 1900 and occurred early on a Sunday at 00:58:53 GMT. Could it be that the catalog is contaminated by large aftershocks of this devastating earthquake?

Histogram of global earthquakes greater than magnitude 4

Figure S1. Histogram of global earthquakes greater than magnitude 4 occurring between 1999 and 2009 grouped by weekday along with results of chi-square test for uniformity that was clearly rejected as indicated by extremely small p-value.

Plot of event number versus epoch time in seconds from catalog data used by Vermeesch

Figure S2. (a) Plot of event number versus epoch time in seconds from catalog data used by Vermeesch (2009). (b) Number of events per day as a function of epoch time. Vertical magenta line corresponds to day of magnitude 9.1 Sumatra Earthquake of December 26, 2004.

Figure S2b shows the catalog data grouped by number of events per day. Note the large number of events immediately following the Great Sumatra Earthquake of December 26, 2004. There also appears to be a distinct increase in the number of events per day in the years following this event. Finally, Figure S3 shows the result of the Kolmogorov-Smirnov test for uniformity on events sorted (ordered by time). The null hypothesis of temporal uniformity is clearly rejected as indicated by the small p-value. Figure S3 appears to indicate a slow trend of global seismicity back to uniformity over a period of many years.

The statistical tests are clearly doing their job and the catalog data is not a uniform function of time (day of week or otherwise) and careful p-value analysis used in hypothesis testing is not necessarily uninterpretable as suggested by Vermeesch. The apparent non-uniformity appears to be contamination of the earthquake catalog used by Vermeesch by aftershocks from a large earthquake (or earthquakes). Therefore, the explanation of this non-uniformity may indeed be a non-geological one as suggested by Vermeesch or is it? Would the data appear to be uniform if we attempted to remove all aftershocks from the Great Sumatra earthquake and others that occurred over this ten-year time span? We don’t know the answer to this and suspect that it will not be. It also points out to the importance of exploratory data analysis prior to performing any statistical hypothesis testing.

Results of Kolmogorov-Smirnov test for temporal uniformity of earthquake catalog dataset

Figure S3. Results of Kolmogorov-Smirnov test for temporal uniformity of earthquake catalog dataset. The blue line is the empirical (observed) cumulative distribution from the catalog data and the red line is what would be predicted giving a distribution that is uniform in epoch time in seconds. The null hypothesis for uniformity is rejected. Vertical magenta line corresponds to day of magnitude 9.1 Sumatra Earthquake of December 26, 2004.

References

Anderson, D.N., D.K. Fagan, M.A. Tinker, G.D. Kraft, and K.D. Hutchenson, A mathematical statistics formulation of the teleseismic explosion identification problem with multiple discriminants, Bull. Seism. Soc. Am., 97, 1730–1741, 2007.

Fisher, N.I., Statistical Analysis of Circular Data, Cambridge University Press, Cambridge, UK, 1995.

Rohatgi, V.K., An Introduction to Probability Theory and Mathematical Statistics, Wiley, New York, NY, 684pp, 1976.

Taylor, S.R., S. Arrowsmith, and D.N. Anderson, Detection of short-time transients from spectrograms using scan statistics, accepted by Bull. Seism. Soc. Am., 2010.

Taylor, S.R., p-value discriminants from two-dimensional MDAC misfit functions, submitted to Bull. Seism. Soc. Am., 2010.

AGU galvanizes a community of Earth and space scientists that collaboratively advances and communicates science and its power to ensure a sustainable future.