the Threshold 0.0.2.1 PHP class

last update : may 4, 2005 16:59.

summary :

a search result may be represented as a set (e.g. a PHP associative array).
the array key is generally a MySql primary key (or PID). but, it could be anything.
the array value is generally a weight (an unsigned integer between 0 and ???) or a fuzzy value (between 0.0 and 1.0). but, it could be any numerical value.
the maximum weight value is sometimes known, sometimes ignored.

the first aim of the Threshold class consist of converting weights to fuzzy values, eliminating negative or zero values, sorting the array, etc : not really a big job!

results are then presented on most sites, grouped by pages, each page holding a fixed number of items.
the second aim of our class is, however, to divide the result set into pages, in an appropriate way, taking in account a few user preferences.

the idea is that, if possible, a reasonnable amount of items will appear on the first page, that, if possible, all relevant items will show up on this very page page and, last but not least, that, if possible, only relevant items will appear on this very page.

A constraint is that the visitor must be able to provide his/her preference with a single choice (namely, this will be our fzNarrow value) in a limited number of options such as Of course, another constraint, is that the method must use as less resources as possible.
As you may guess, the problem is, in a general case and necessarily for the pages other than the first, a bit more complex than just comparing the fuzzy value to a fixed limit.

the technology is developped by marc.meurrens@ACM.org and nathan.meurrens@cassiopea.org using fuzzy logic (or so-called "artificial intelligence"). feel free to contact the authors.

this page provides a way to test how the user's preferences will modify the output.

available documentation :

test case 1
test case 2
test case 3
test case 4
 
      display debug information

View license : gpl-license.html

list the source (approx. 300 lines) of the class Threshold
download the class Threshold
list the source (approx. 550 lines) of the class FuzOp related to Threshold
(unused, but provided for documentation purposes)
download the class FuzOp
list the source (approx. 220 lines) of this very file threshold.php (tests/demos/html, etc)
download the file threshold.php
list the source (approx. 150 lines) of the class DemoThreshold used by this page.
download the class DemoThreshold
list the source (approx. 300 lines) of the included file threshold.inc.php
download the file threshold.inc.php
list the source (approx. 80 lines) of the included file w3c.inc.php
download the file w3c.inc.php

   

fzNarrow :


This fuzzy value is the only preference that a user should provide.

suggested value : the value looks fine in most cases.
a smaller value such as will produce a larger number of items on each page.
of course, still smaller values such as or will still produce larger pages.
a larger value such as may be fine for well targeted requests on rich collections.

mind that really large values (a will for a really narrow output) such as and above should be reserved for large databases and/or precise questions and/or a willing to seriously reduce the volume of the output (eventually to a single page with just the few best answers). such values are probably not suitable in most cases. on the other side, too small values, such as and below may produce many irrelevant answers on the first pages.


it's probably a good idea to offer a site visitor a selection of (about 5 ?) options, varying from 'very precise' to 'very large' and returning corresponding values (e.g. 0.7654321, 0.62831854, 0.5, 0.42, 0.31415927) these values are of course a little bit arbitrary, but are suitable in most cases.

other parameters

the following parameters must be tuned by the site owner or developper. they are a.o. related to the size of the data base, to the layout of the pages, etc.
however, parameters such as uMin ou uMax may be eventually modified by users. some parameters may eventually be modified in expert mode.

uMin :


unsigned integer value : the minimal number of items on a page.
suggested value : , or even (a.k.a. Google's "j'ai de la chance").
anyway, we recommand a small value.
if you decide to use a larger value, such as , unsignificant answers will certainly appear on the first page if they are less than 15 correct answers...

uMax :


unsigned integer value : the maximal number of items on a page.
suggested value :
don't hesitate to use a larger value, such as , to make sure (or to try...) that all significant answers will appear on the first page.

fzSize :


suggested value :
Use a high value, such as if you want to reduce the number of items per page as much as possible, on the basis of the uMax value.
Use a low value, such as if you want not to exceed the uMax value but you accept a large number of items as long as it stays below uMax.

fzOnSkip :


suggested value :
Use if you want that the same fzNarrow value should be used on all pages.
Use a lower value, such as , if you want to seriously increase the number of items on the second page, still more on the third, etc.
Use a really low value, such as , if you want to use our fuzzy mechanism for the first page, but let the uMax value decides alone for the other pages.

random seed :


keep the same seed to test several sets of parameters on exactly the same data.
the field to obtain a new seed (and thus new data).

   

case 1 :

the maximum weight cannot be predicted...

THE INPUT :
key_2_val
by asc key
THE OUTPUT :
(divided in pages)
Array
(
    [0] => 24
    [1] => 18
    [2] => 29
    [3] => 121
    [4] => 25
    [5] => 864
    [6] => 171
    [7] => 188
    [8] => 25
    [9] => 12
    [10] => 771
    [11] => 27
    [12] => 9
    [13] => 213
    [14] => 34
    [15] => 29
    [16] => 878
    [17] => 1108
    [18] => 643
    [19] => 358
    [20] => 194
    [21] => 10
    [22] => 28
    [23] => 146
    [24] => 35
    [25] => 215
    [26] => 20
    [27] => 245
    [28] => 4
    [29] => 934
    [30] => 3
    [31] => 1039
    [32] => 502
    [33] => 21
    [34] => 925
    [35] => 609
    [36] => 22
    [37] => 3
    [38] => 123
    [39] => 42
)
Array
(
    [17] => 1
    [31] => 0.937725631769
    [29] => 0.842960288809
    [34] => 0.834837545126
    [16] => 0.792418772563
    [5] => 0.779783393502
    [10] => 0.695848375451
    [18] => 0.580324909747
    [35] => 0.54963898917
)
Array
(
    [32] => 0.453068592058
    [19] => 0.323104693141
    [27] => 0.221119133574
    [25] => 0.1940433213
    [13] => 0.192238267148
    [20] => 0.175090252708
    [7] => 0.169675090253
    [6] => 0.154332129964
    [23] => 0.131768953069
    [38] => 0.111010830325
    [3] => 0.109205776173
    [39] => 0.0379061371841
    [24] => 0.0315884476534
    [14] => 0.0306859205776
    [2] => 0.0261732851986
    [15] => 0.0261732851986
    [22] => 0.0252707581227
    [11] => 0.0243682310469
    [8] => 0.0225631768953
    [4] => 0.0225631768953
)
Array
(
    [0] => 0.0216606498195
    [36] => 0.0198555956679
    [33] => 0.0189530685921
    [26] => 0.0180505415162
    [1] => 0.0162454873646
    [9] => 0.0108303249097
    [21] => 0.00902527075812
    [12] => 0.00812274368231
    [28] => 0.00361010830325
    [37] => 0.00270758122744
    [30] => 0.00270758122744
)
 


Get Thunderbird Get Firefox Contrat Creative Commons Valid HTML 4.01! Valid CSS Check Links! G-2004 Cassiopea asbl