Search usage:

Assuming that we have a linked list of strings to search, the query class first compiles the user's query, then we repeatedly search each of the strings in the list until one is found. The user's query can be of the form: (black & white) | (!green & !(red | pink))

So, as you can see, both conditional and bracketed queries are possible.

The interface to this extremely small and clever search engine is very simple.

The entire C++ class is just 239 lines of code and is a masterpiece?! of compact design :-)

You can use it for many types of searching, from local websites to databases and text files.

============================================================

Example usage:

/*where pQuery points to, say "black!red"and pNode is the node in a linked list of strings, each of which being something like perhaps: "abcdefgblackand whitetext that isred difficult to find"*/

bool bResp;

QueryCompileAndGo *pSEngine = new QueryCompileAndGo();

pSEngine->CompileQuery(pQuery);

while(pNode)

{

bResp = pSEngine->EvaluateCompiledQuery(pNode->pcDesc)

if(bResp) break;

pNode->Next();

}

delete pSEngine;

============================================================

Search header, with some descriptive text

#ifndef __QUERYCAG_HPP
#define __QUERYCAG_HPP
#ifndef NULL
#define NULL 0
#endif
#include <string.h>

//#include <Classes.hpp>
/*
Free-Form Queries
-----------------
Eg (a & b) | (!c & !(d | e))

where a, b etc are usually boolean expressions (eg 'x > y')

There is a Compilation phase & an Execution phase. Execution is interpretive.
The target structure is:-
try -> BeginPhrase
try -> BeginPhrase
try a
try b
EndPhrase
orelse
try -> BeginPhrase
try !c
try !-> BeginPhrase
try d
orelse
try e
EndPhrase
EndPhrase
EndPhrase
EndQuery

Although, usually, a,b, etc are full boolean expressions, (eg 'x >= y'),
in our case, 'a',etc represents a string (eg "aaa") & the boolean expression is
'string "aaa" exists in the supplied buffer'.

Notes on above:
1. In the following, 'OpCode' (Operation Code) means any of the above symbols:
ie BeginPhrase/EndPhrase/try/orelse/EndQuery.
1. 'try' means:
if RHS (right hand side) is true, continue on down the phrase (ie vertically
down). If next OpCode is 'orelse' or 'EndPhrase', whole phrase is true, so
return 'true' to calling 'try' (ie leftwards). So, within a Phrase, there is
an implied 'and' relation between the successive 'try's in each 'or(else)'
branch, of which there can be any number.
if RHS is false, skip down phrase to next 'orelse' or 'EndPhrase'
if 'orelse' just start as though starting phrase (in our case, do nothing,
in many cases BeginPhrase does some kind of push(data-pointer) & EndPhrase
has to pop & OrElse has to pop&push)
if EndPhrase, return 'false' to calling 'try'
2. There are 2 types of 'try' arguments, each of which can have a 'not' cover
1. try expression (in our case: try 'text contains value')
2. try lower-level Phrase
('not' merely reverses boolean result of 'try', (before deciding whether to
continue on down or break-out to OrElse/EndPhrase))
The order of assembling the 'compiled' query mirrors the source. In the above
query, the assembly is left-t-right within top-to-bottom.
A new phrase is started for each '('. A hidden '(..)' surrounds the whole query.
Each new phrase is housed in a new memalloc block. At compile-time, each
BeginPhrase points back to its calling block so that at ')' the compilation can
continue. It is possible, at the end of the Compilation phase, to tidy-up into a
single memory-block.
Data values (eg "aaa") are all stored in a single block, for efficiency
(otherwise 'try' Ops would be variable length & it would be slower to skip to
the next 'orelse' or 'EndPhrase' in the execution phase).
Try Ops contain pointer to either data ("aaa") or to a lower-level phrase.

2. A simpler approach, slightly less efficient, but requiring only one memory
block for the 'code' & 'data'(WITHOUT any end-of-compile clean-up (used in
the old days when it was difficult, if not impossible, to get extra memory)),
is to have an extra 'Go-to' operation. So if you have
a | (b | (c & d)) | e
...
try -> BeginPhrase
try a
orelse
try -> BeginPhrase
try b
orelse
try -> BeginPhrase
try c
try d
EndPhrase
EndPhrase
orelse
try e
EndPhrase

'Code' would be
-----
p1 tBegin
tTry (ttValue,Not,&aaa)
tGoTo p1a
aaa "aaa"
p1a tOrelse
tTry (ttPhrase,Not,&p2)
tGoTo p1b
p2 tBegin
tTry (ttValue, &bbb)
tGoTo p2a
bbb "bbb"
p2a tOrelse
tTry (ttPhrase, &p3)
tGoTo &p2b
p3 tBegin
tTry (ttValue, &ccc)
tGoTo p3a
ccc "ccc"
p3a tTry (ttValue, &ddd)
tGoTo p3b
ddd "ddd"
p3b tEnd
p2b tEnd
p1b tTry (ttValue, &eee)
tGoTo p1c
eee "eee"
p1c tEnd


Notice that, in fact, the BeginPhrase OpCode is unnecessary in this
implementation: a phrase starts with its first 'try'. It is shown above for
clarity.

CASE SENSITIVITY
----------------
This class only performs case sensitive matching. If case insensitivity is
required, it is up to the using application to convert the query & each record
to be searched to the same case (conventionally, upper case).

ENHANCED ARGUMENT MATCHING
--------------------------
If string matching other than simple equalty is required (eg "ABxxxCD", where
xxx is any character, or "ABnnn.nnCD" where n is any number), the virtual
method FindString should be overloaded. Maybe the first char of each argument
could [optionally] be a special 'type' marker. (If optional, probably a non-
printable char (NOT 0x00, as this is the terminator).)
Of course, it is also possible to invent several new syntactic operators which,
like !, & and |, can appear in the text. If so, in AddPhrase method, under
'default:', don't reject single non space/!/&/| chars, store them exactly as for
! & present them as a separate param in the FindString method (0x00 means no
special operator for this match). The TryOp struct would have to be enlarged to
include this char. It is easiest if, as for !, the such operators only apply to
the next argument. In which case, don't forget to clear the field (to 0) after
the AddTry method (NB just search for 'Not' & mimic (except for just before a
phrase (ie just before a '(')).)
*/


class QueryCompileAndGo
{
public:
enum OP_CODE {tBeginPhrase=1,tEndPhrase,tTry,tOrElse,tGoTo};
enum TRY_TYPE {ttValue,ttPhrase};

struct SimpleInstruction
{
OP_CODE OpCode;
};

struct TryOp
{
OP_CODE OpCode;
TRY_TYPE Type;
bool Not;
int ArgumentIndex; // from start of Compiled Query
};

struct GoToOp
{
OP_CODE OpCode;
int DestIndex; // from start of Compiled Query
};
/*

IF ABOVE FAIL WITH YOUR COMPILER, TRY THESE

struct SimpleInstruction
{
OP_CODE OpCode;
TRY_TYPE Type; // padding
bool Not; // ,,
int Index; // ,,
};

struct TryOp
{
OP_CODE OpCode;
TRY_TYPE Type;
bool Not;
int ArgumentIndex; // from start of Compiled Query
};

struct GoToOp
{
OP_CODE OpCode;
TRY_TYPE Type; // padding
bool Not; // ,,
int DestIndex; // from start of Compiled Query
};
*/
QueryCompileAndGo();
~QueryCompileAndGo();
bool CompileQuery(char* apQuery, char* apCompiled=NULL);
bool EvaluateCompiledQuery(char* apText, char* apCompiledQuery = NULL);
virtual bool FindString(char* apText, char* apString)
{return (strstr(apText,apString) != NULL);}

private:
// Compilation phase
void AddGoTo();
void UpdateGoTo(int aCallerGoToIndex);
void SkipGoTo() {CIndex += sizeof(GoToOp);}
int AddTryPhrase() {return AddTry(ttPhrase);}
void AddPhrase(int aCallerGoToIndex);
int AddTryValue() {return AddTry(ttValue);}
void AddValue(int aCallerGoToIndex);
int AddTry(TRY_TYPE aType);// returns Index of GoTo it has added after Try Op
void SkipTry() {CIndex += sizeof(TryOp);}
void AddOrElse() {AddSimpleInstruction(tOrElse);}
void SkipOrElse() {SkipSimpleInstruction();}
void AddEndPhrase() {AddSimpleInstruction(tEndPhrase);}
void AddSimpleInstruction(OP_CODE aOpCode)
{((SimpleInstruction*)&pCompiled[CIndex])->OpCode = aOpCode;
SkipSimpleInstruction();}
void SkipSimpleInstruction() {CIndex+=sizeof(SimpleInstruction);}
void CheckRoom(int aForLen);
// Execution phase
bool EvaluateTryPhrase(TryOp* apTry);
bool EvaluateTryValue(TryOp* apTry);
bool EvaluateTry();

// data
int CompiledChunkLen;
int CompiledBufLen;
char* pSource;
int SIndex; /* Source Index */
char* pCompiled;
int CIndex; /* Compiled Index */
bool CompileError;
bool Not;
// Execution phase
bool EvaluateResult;
int BracketLevel;
int MaxBracketLevel;
char* pSearchText;
TryOp EvalTryPhrase;
};
#endif