Theom-Lk15-WLA

In a study of the Theomatic factor 90 relating to references to either one of the sons in the story of the Prodigal Son (Luke 15), the author states in the beginning of section 7: "In the most conservative way possible, we need to find out what the actual odds are for this event occurring." (p.33)

L	H	S	P	1:N
4	46	765	.3115	3
3	39	467	.0081	123
2	25	226	.0009	1135

The author observed 46 successes (hits, H), in a sample of 765 phrases of at most 4-words in length. Of these 765 phrases, there were 467 phrases of 3-words or less, of which 39 were successes. Of these 467 phrases, there were 226 phrases of two words or less, of which 25 were successful Theomatics events. The author's task is to compute the statistical probability of each of these occurrences as if there was no design in Theomatics, if it were merely a random phenomenon.

The author, in calculating the p-values (P) for the above results and the associated odds 1:N, does not utilize the sample sizes given, which are the actual sample sizes, but attempts a reduction in each sample size using a Word Length Average (WLA) statistic. The WLA is the average number of base words per phrase (discounting articles and beginning conjunctions). He claims that it is appropriate to remove phrases from the sample until the WLA of the sample, WLA_S, is the same as the WLA of the successful phrases (hits) obtained from it, WLA _H. This claim is made upon an assumption that the expected value of WLA_S, M_S, is not more than M_H, the expected value of WLA_H.

To support this claim, the author states (in uncorrected figures): "Theomatics discovered a total of 46 hits of 2.37 WLA . There are 434 possible phrase combinations with the same 2.37 WLA (the number pool) from which Theomatics could have (emphasis mine) derived its data. Our objective here is to find out what the probability is of finding 46 hits out of a number pool of 434 numbers." (Lk15, p.37) He states further: "Calculating the probability according to the length of the phrase -- matching the WLA of both the Theomatic hits and the 434 number pool -- is a totally fair, honest, and objective way to figure the p-value." (p. 44) Evidently, no justification is actually proposed by the author; he merely states that he is correct. A formal analysis of the validity of this assumption is warranted. First, we will complete our explanation of the author's methodology.

It so happens that WLA_S is larger than WLA_H, contrary to the way the author expects Theomatics to occur normally in such an experiment, so his procedure is simply to remove some of the longer (4-word) phrases from the 4-word sample, regardless whether they were successful trials or not, until WLA_S equals WLA_H. Eliminating all of the 4-word phrases reduces the sample from 765 to 434 giving 2.413 for WLA_S. Since this is yet greater than his target of 2.37, he continues removing 3-word phrases from the sample until it yields the appropriate WLA (p. 35). The author considers this "par for the course," and states that it is now possible to test his hypothesis that God designed Theomatics (p. 35, emphasis his): "This figure of 434 now constitutes the number pool -- it will now give us an accurate comparison of Theomatics against the null hypothesis -- and enable us to come up with an accurate p factor."

The author follows this activity by recalculating the p-values without using the WLA (p. 44), attempting to demonstrate that his conclusions do not ultimately depend upon this procedure, stating that the more obvious statistics -- those that any practicing statistician would naturally use -- still yield significant results.

Though the (uncorrected) odds for the 4-word phrases, as noted above, are 1 in 3, the author claims that the p-value for the 4-word phrases without considering WLA is 0.00014856, or odds of 1 in 6,731 (p. 44). He apparently arrived at this figure by using the sample size relevant to the 3-word phrases (467) by mistake instead of using the original sample of 765 4-word phrases as he should have. This appears to be an oversight since he subsequently correctly uses the 3-word sample size in determining the p-value for the hits of phrases 3 words or less in length. In any case, he states that the final p-value for the 4-word phrases without using WLA to adjust sample size is 1 in 1,101,684 (p. 44). It should have been reported as 1 in 525.

The (uncorrected) final p-values (P=P_HxP_C) without WLA, with odds 1:N, and general significance O (being the average number of random trials needed to get this kind of result if Theomatics does not exist), obtained from the p-values of the hits (P_H) and clustering (P_C), the number of hits ( H), the phrase sums (S), along with the actual clustering results (0, 1, 2) for each phrase length, are as follows:

L	H	S	0	1	2	P_H	P_C	P	1:N	O
4	46	765	16	21	9	.3115	.00611	.00190	525	1.06
3	39	467	15	16	8	.0081	.00563	.00005	21,840	11.24
2	25	226	11	10	4	.0009	.00452	4.0E-6	251,340	616.96

In concluding his analysis, the author uses this WLA sample-reduction technique to determine the statistical odds of obtaining the 3-word and 2-word hits in his experiment. He thus finds the odds of the 3-word experiment to be 1:261,205,726 (p. 47) which is the result that he formally publishes as the result of his experiment, and he finds the 2-word result to have odds of one in billions (p. 48, detailed also in reporting errors).

In light of the above errors in analysis, the use of the WLA is clearly essential to the author's published conclusion. The 4-word and 3-word results of the experiment without considering WLA are within the range of the random benchmark derived from the maximum order statistic O were our context random (see our Methodology analysis). The 2-word phrase result would be considered very unusual, but not as sensational as the author's claim. The author is unable to solidly reject his null hypothesis without this sample-reduction procedure. The billions-to-one odds are impressively published, but are not quite so apparent from analysis of the experimental results apart from the author's use of the WLA . Though the author's analysis stands or falls with such use of the WLA, he offers no explicit justification for his approach.

As stated earlier, the author's claim that it is valid to adjust the sample size based on WLA implies he thinks the mean WLA for hits is expected to be at least as large as the WLA of the relevant sample, such that, on average, if many like tests were conducted in similar contexts, the WLA's of the resulting phrase samples would be no larger than the average word length of the hits obtained from them.

The reasoning is implied from the very procedure the author employs. Reducing the sample size to something the author would "expect" to be a "normal" or "reasonable" sample size implies that the sample size observed in the experiment is somehow known to be unusual or inappropriate. Instead of discarding the experimental context as inappropriate to display the evident statistical significance of Theomatics, and looking for another scenario that fits more nicely with his expectation, the author conveniently prefers to adjust the sample size.

On average, he expects that the WLA of the sample (WLA_S) should not be larger than that of the hits (WLA_H), it should be smaller, so, naturally, he feels the sample obtained in this particular Theomatic experiment is highly unusual and can be adjusted without adversely affecting the validity of the resulting conclusion. This implies he feels the expected WLA of the sample (M_S ) no larger than the expected WLA of the hits (M_H), so he is able to justify a presumed sample size that "could have" occurred having a WLA_S that is upper-bounded by the WLA_H of the hits. Being "conservative" so as not to corrupt his conclusion, he only reduces the sample size so as to obtain an equivalent WLA to that of the hits, not making the sample WLA_S arbitrarily smaller than this WLA_Hbound like he expects it to be.

This reasoning clearly implies the author feels that Theomatic events tend to occur more frequently in longer phrases than in shorter ones, resulting in a WLA_H for the hits that exceeds the WLA_S of the sample on average. This comprises the author's "justification" to use the WLA to reduce the sample size. A summary review of the logic follows:

To answer this question, we simply observe, as the author notes, that the (uncorrected) data in this particular experiment indicate otherwise:

L	Sample Words	Sample Size	Sample WLA	Hit Words	Hits	Hit WLA
4	2319	765	3.031	109	46	2.370
3	1127	467	2.413	81	39	2.077
2	404	226	1.788	39	25	1.560

Clearly, in each case the WLA of the sample is significantly larger than the WLA of the hits. The author therefore reduces the sample size in each case to something he expects "could have" been observed, and thus obtains incredible statistical significance for the Theomatic phenomenon. Again, the author's entire analysis stands or falls based upon this reasoning.

Further, we observe the following: the author plainly emphasizes the fact, based upon thousands of Theomatic instances he has carefully observed, collected, and analyzed over the last two decades, that Theomatic occurrences clearly favor the smaller phrases, obtaining a higher percentage of hits from them: "The one major factor that makes Theomatics stand tall -- is the shortness of the phrases that produce the Theomatic hits." (p.35) Again (p. 44), "The real power of Theomatics is the shortness and explicitness of the theomatic phrases and hits. After all, if some sort of Intelligence factor is at work here, then we would expect short and explicit -- one, two, and three word phrases, to produce the most significant results. It is when we look at that aspect, that the p-values literally go ballistic . This is true across the board -- from hundreds of individual studies in my files consisting of thousands of features. The reason for this, is that as one expands outwardly, the patterns dissipate."

In making such statements, the author implies that WLA_H will generally be smaller than WLA_S(or that M_S> M_H ); the smaller the phrase length being considered, the higher the proportion of hits and the resulting statistical significance of the results. This implies that the percent proportion of hits (% ) will be larger for the smaller phrases than for the larger ones, so the mean of WLA_His less than the mean of WLA_S. We can easily observe that this conclusion appears valid here.

L	Hits	S	%	WLA_S	WLA_H	D
2	25	226	11	1.788	1.560	.228
3	39	467	8	2.413	2.077	.336
4	46	765	6	3.031	2.370	.662

The larger the allowed phrase length, the larger the actual difference D between the WLA of the sample and its hits (D=WLA _S-WLA_H). The author's observation that Theomatics "dissipates" in WLA performance when moving toward longer phrases appears correct. What is observed in the experiment appears to be appropriate to use as a general assumption about the behavior of Theomatics, being totally consistent with thousands of instances observed by the author.

These facts flatly contradict the author's implicit assumption that WLA_S can legitimately be upper-bounded by WLA_H. In fact, if WLA_S ever did fall below WLA_H , this would certainly be an oddity... not the norm... based upon the author's own data and claims.

In addition to granting him the sensational claim of unbelievable statistical odds, the author's sample reduction procedure conveniently permits him to consider successful trials in his calculation of the p-value that are not retained in his sample, which contradicts the very definition of a sample . Such mathematics is quite creative, to say the least.

The author's reduction of the sample based on WLA implies a contradiction in the definition of his experiment: his assumption is inconsistent with his own claims and with experimental data. His reasoning and the implied conclusion, the billions-to-one odds of Theomatics, cannot be accepted. One simply cannot correctly say that 46 hits were obtained from a sample of 434 phrases (p.38), or that 39 hits were obtained in 170 phrases (p.46), or that 25 hits were obtained in a sample of 117 (p. 48). While it is certainly true, though quite unlikely, that such results "could have" occurred in an hypothetical test of Theomatics, this fact is irrelevant: these results did not occur in the test that was conducted, apparently would not normally occur in any such experiment, and should not be considered in determining p-values in Theomatics.

In order to determine the probability of the Theomatic phenomenon observed by the author, one must correct the errors the author made in his analysis. The correct results are:

L	H	S	0	1	2	P_H	1:N_H	P_C	P	1:N	O
4	53	683	12	22	19	.0100	100	.80116	.00797	125	1.002
3	35	412	10	16	9	.0091	110	.18664	.00170	589	1.078
2	19	195	6	10	3	.0128	78	.09000	.01572	864	1.150

The chart shows the phrase length L, the sample size S, the clustering results (0, 1, 2), the probability of the hits P_H, the odds of this number of hits occuring in such a sample 1:N_H(N_{H =}1/ P_H) the probability of the cluster distribution P_C, the total probability of the hits and clustering P (P = P_H X P_C), the final odds 1:N(N = 1/ P), and the representative statistic O (being the average number of random trials needed to get this kind of result if Theomatics does not exist).

The correct way way to determine the statistical significance of the results of any experiment is to look carefully at what actually occurred in the experiment. Since the author has formally stated that the experiment is to consider results for all phrases of 4 words or less (p.22), the first O statistic obtained from the 4-word phases, 1.002, would technically be the result of his experiment. Therefore the results of this experiment are such that they would be expected on average to occur in nearly every single test if Theomatics were random.

The author's published claim of odds of 1 in 261 million corresponds to the final p-value for the 3-word phrases, reducing to 1 in 589 when his errors are corrected. The comparable O statistic is 1.078, so one would this result 93% of the time. Even so, clearly, both results are well below that of the MOS (maximum order statistic) benchmark of 2, or every other test, which represents odds of 1 in 3,466.

The last result, that of the 2-word phrases, is similarly insignificant, as observed in extensive testing.

The above results are all certainly well within what might be expected in a random context: the null hypothesis cannot be rejected... doing so is not even a consideration. The correct conclusion to draw in this Luke 15 analysis is that no Theomatic significance is evident at all. No other conclusion may be deemed correct, much less "conservative."