/*
 *				c s e t . h
 */

/*)LIBRARY
*/

#ifdef	DOCUMENTATION

title	cset	Header file for character set functions
index		Header file for character set functions

synopsis

	 #ifdef vms
	 #include "c:cset.h"
	 #else
	 #include <cset.h>
	 #endif

description

	The character set functions provide a set of routines for describing
	and manipulating sets of characters.  The character sets, called
	"csets", created in this way can be manipulated quickly, and require
	relatively little storage.  They are meant to be used as arguments
	to pattern-matching functions like span() (which see).

	For these purposes, a set of functions to create csets, and produce
	the complement (with respect to the set of all 8-bit characters) of
	a set, and the join (union), meet (intersection) and difference of
	two csets is provided; see cset(), cscomp(), csjoin(), csmeet(),
	and csdiff().

	csets can also be used more generally as representations of sets -
	i.e., the name can be read as "C sets".  In this case, the universe is
	the set of numbers 0...(cssize-1), where cssize is a global parameter
	defined in cset.c; it is normally 256 for character work.  The
	functions provided for this kind of application include csmember(),
	which checks membership, and csless() and cswith(), which add and
	remove elements from sets.

	When csets are used in this way, it is important to understand that
	a cset is a data object with an internal structure, and that different
	csets may share internal data - i.e., csets are not normally "atomic"
	objects and care must be taken in manipulating them.  A look at the
	representation of csets should help clarify this point.

	The only object you normally manipulate directly in your code is
	a cset pointer, type (CSET *).  This pointer points to a cset
	header, which contains a mask and a pointer to a table of cssize
	bytes.  A character is in the cset if any of the bits in its mask
	is on in the corresponding table entry.  Csets created by cset()
	always have a one-bit mask; however, csjoin() and friends, avoid, if
	possible, using up a bit position, by creating a header with a mask
	containing more than one bit.  Hence, the join of two csets often can
	be represented very cheaply.

	Complements of csets are represented still more efficiently; even
	the header of a cset and its complement are shared.  Only the pointer
	is changed - its bit pattern is complemented.

	A consequence of this representation is that a great deal of data is
	often shared between csets.  When manipulating csets as arbitrary
	sets, it is important to understand that applying csless() or cswith()
	to a cset may cause any related csets to be changed.  Thus, after the
	sequence of calls:

		uvowels = cset("AEIOU");
		lvowels = cset("aeiou");
		vowels  = csjoin(uvowels,lvowels);
		lvowels = cswith(lvowels,'y');

	'y' is probably a member of vowels.  (Only "probably" because it
	is impossible to predict whether uvowels and lvowels happen to get
	the same table; csjoin() cannot use the "cheap" representation if
	they don't.)

	Two methods are available to avoid this problem.  First, cscopy()
	returns a guaranteed-"unique" copy of a cset.  Second, the global
	csunique (in cset.c) can be set, forcing functions such as csjoin()
	to avoid space-saving shortcuts.

internal

	The exact form of the cset header structure was chosen to be identical
	to the character set pointer structure used for the PDP-11 CIS
	instructions.  Any ambitious programmers are encouraged to make use
	of those instructions to produce fast versions of span() etc.

	Note that the use of a complemented pointer to the header to
	represent a complemented cset relies on malloc() always returning
	memory pointers with a 0 in the bottom bit of their representation.
	This is probably true in most implementations.

bugs

author

	Jerry Leichter

#endif

/*
)EDITLEVEL=10
 * Edit history
 * 0.0 12-Jul-82 JSL	Invention
 */

#ifndef _CSET_				/* Don't do this twice		 */
#define _CSET_

typedef struct cset
	{ char	mask;			/* Mask for chars in set	*/
	  char	_fill_;			/* For CIS compatibility	*/
	  char	*table;			/* Character table		*/
	} CSET;

extern CSET *cset();			/* Make a cset			*/
extern CSET *cset_t();			/* Make a temporary cset	*/
extern CSET *cscopy();			/* Copy a cset			*/
extern CSET *csdiff();			/* Difference of csets		*/
extern CSET *csjoin();			/* Union of csets		*/
extern CSET *csless();			/* Remove element from cset	*/
extern CSET *csmeet();			/* Intersection of csets	*/
extern int   csmember();		/* Test for membership		*/
extern CSET *cswith();			/* Add element to cset		*/
extern CSET *_cscomp();			/* Real, callable complement	*/

/* The character set matching functions */
extern char *any();
extern char *ospan();
extern char *span();
extern char *upto();

#define cscomp (CSET *)~(int)		/* Macro complement		*/

extern int csmask;			/* Mask to apply to chars	*/
extern int cssize;			/* Size of a cset		*/
extern int csunique;			/* Make unique copies of csets	*/

#endif
