- summary :
a search result may be represented as a set (e.g. a PHP associative array).
the array key is generally a MySql primary key (or PID). but, it could be anything.
the array value is generally a weight (an unsigned integer between 0 and ???)
or a fuzzy value (between 0.0 and 1.0). but, it could be any numerical value.
the maximum weight value is sometimes known, sometimes ignored.
the first aim of the Threshold class consist of converting weights to fuzzy values, eliminating
negative or zero values, sorting the array, etc : not really a big job!
results are then presented on most sites, grouped by pages, each page holding a fixed number of items.
the second aim of our class is, however, to divide the result set into pages,
in an appropriate way, taking in account
a few user preferences.
the idea is that, if possible, a reasonnable amount of items
will appear on the first page, that, if possible, all relevant items will
show up on this very page page and, last but not least,
that, if possible, only relevant items will appear on this very page.
A constraint is that the visitor must be able to provide his/her preference
with a single choice (namely, this will be our fzNarrow value)
in a limited number of options such as
Of course, another constraint, is that the method must use as less resources as possible.
As you may guess, the problem is, in a general case and necessarily for the pages other than the first,
a bit more complex than just comparing the fuzzy value to a fixed limit.
the technology is developped by
marc.meurrens@ACM.org
and nathan.meurrens@cassiopea.org
using fuzzy logic (or so-called "artificial intelligence"). feel free to contact the authors.
this page provides a way to test how the user's preferences will modify the output.
- available documentation :
test case 1
test case 2
test case 3
test case 4
display debug information
View license :
gpl-license.html
list the source (approx. 300 lines) of the class Threshold
download the class Threshold
list the source (approx. 550 lines) of the class FuzOp related to Threshold
(unused, but provided for documentation purposes)
download the class FuzOp
list the source (approx. 220 lines) of this very file threshold.php (tests/demos/html, etc)
download the file threshold.php
list the source (approx. 150 lines) of the class DemoThreshold
used by this page.
download the class DemoThreshold
list the source (approx. 300 lines) of the included file threshold.inc.php
download the file threshold.inc.php
list the source (approx. 80 lines) of the included file w3c.inc.php
download the file w3c.inc.php
- fzNarrow :
This fuzzy value is the only preference that a user should provide.
suggested value : the value looks fine in most cases.
a smaller value such as will produce a larger number of items on each page.
of course, still smaller values such as
or
will still produce larger pages.
a larger value such as may be fine for well targeted requests on rich collections.
mind that really large values (a will for a really narrow output) such as
and above
should be reserved for large databases and/or precise questions
and/or a willing to seriously reduce the volume of the output
(eventually to a single page with just the few best answers).
such values are probably not suitable in most cases.
on the other side, too small values,
such as
and below may produce many irrelevant answers on the first pages.
it's probably a good idea to offer a site visitor a selection of (about 5 ?) options,
varying from 'very precise' to 'very large' and returning corresponding values
(e.g. 0.7654321,
0.62831854,
0.5,
0.42,
0.31415927)
these values are of course a little bit arbitrary, but are suitable in most cases.
- other parameters
the following parameters must be tuned by the site owner or developper. they are
a.o. related to the size of the data base, to the layout of the pages, etc.
however, parameters such as uMin ou uMax
may be eventually modified by users.
some parameters may eventually be modified in expert mode.
- uMin :
unsigned integer value : the minimal number of items on a page.
suggested value : , or even (a.k.a. Google's "j'ai de la chance").
anyway, we recommand a small value.
if you decide to use a larger value, such as , unsignificant answers will certainly appear on the first page if they are less than 15 correct answers...
- uMax :
unsigned integer value : the maximal number of items on a page.
suggested value :
don't hesitate to use a larger value, such as , to make sure (or to try...) that all significant answers will appear on the first page.
- fzSize :
suggested value :
Use a high value, such as if you want to reduce the number of items per page as much as possible,
on the basis of the uMax value.
Use a low value, such as if you want not to exceed the uMax value but you accept a large number of items as long as it stays below uMax.
- fzOnSkip :
suggested value :
Use if you want that the same fzNarrow value should be used on all pages.
Use a lower value, such as , if you want to seriously increase the number of items on the second page, still more on the third, etc.
Use a really low value, such as , if you want to use our fuzzy mechanism for the first page, but let the uMax value decides alone
for the other pages.
- random seed :
keep the same seed to test several sets of parameters on exactly the same data.
the field to obtain a new seed (and thus new data).