LBL-4683 (Revised.) BDMS Berkeley Database Management System User's Manual (Version 2.2) David R. Richards Computer Science and Applied Mathematics Department Lawrence Berkeley Laboratory University of California Berkeley, California 94720 ACKNOWLEDGEMENTS BDMS had its origin in the joint development of a Particle Physics Data System by the Berkeley Particle Data Group and the Caltech Data Compilation Group. The diversity of databases that make up that system necessitated the use of a common database management system, while the complexity of the data demanded capabilities not available in any existing DBMS. The new system which was developed to meet this need has proven to be of general usefulness, and responsibility for its continued support and further development has been assumed by the Com- puter Science and Applied Mathematics Department at LBL. As the system evolved, many people contributed ideas and assisted with programming. They include Tricia Coffeen, Paul Chan, Geoffrey Fox, Marge Hutchinson, Tom Lasinski, Deane Merrill, Gill Ringland, Alan Rittenberg, Silvia Sorell, Paul Stevens, Tom Trippe, Vicky White and George Yost. The users of BDMS also deserve thanks for contending with the seemingly endless change to which a developing system is subject. Their comments, criticism, and unfailing ability to flush out even the most recluse bugs have been and continue to be invaluable. This work was supported by the US Department of Energy (formerly US Energy Research and Development Administration and US Atomic Energy Commission), the National Science Foundation, and the National Bureau of Standards. ********** Table of Contents 06 APR 78 1. System overview 1 1.1 Introduction 1 1.2 Database structure 3 1.3 Example database 5 1.4 System organization 6 1.5 User exits 7 2. Database definition 8 2.1 Introduction 8 2.2 Database definition language 9 2.3 Compiler commands 12 2.4 Compiler error messages 14 3. Executive 16 3.1 Introduction 16 3.2 Executive commands 17 3.3 Executive error messages 21 4. Editor 22 4.1 Introduction 22 4.2 Assignment statements 23 data element names 23 numeric values 23 character string values 24 null values 24 4.3 Editor commands 25 commands that set operating mode 25 commands that cause immediate action 25 4.4 Append mode 27 order of assignment statements 27 automatic parent node generation 27 automatic record generation 28 multiple assignment statements 29 4.5 Edit mode 30 naming data element occurrences 30 replace command (*R) 30 substitute command (*S) 30 delete command (*D) 31 insert command (*I) 32 order of edits 32 path memory 33 4.6 Editor error messages 35 4.7 List error messages 38 5. Retrieval 39 5.1 Introduction 39 5.2 Simple conditions 40 5.3 Complex conditions 42 5.4 Record ID 43 5.5 Truncated search 44 5.6 Retrieval error messages 45 Table of Contents 06 APR 78 6. Utilities 47 6.1 Introduction 47 6.2 Load 48 6.3 Dump 49 6.4 Clean 50 6.5 Balance 51 1.1 System overview 1 Introduction 1.1 Introduction BDMS is a general-purpose database management and information retrieval system with a broad range of capabilities for creating, maintaining, and accessing computer databases. It frees programmer and end user alike from concern with the physical storage of data, instead making it possible to deal with data at a logical level. In other words, it allows one to deal with information rather than data. Many applications, which previously would have required extensive special program development, can be handled by the system with little, if any, extra software. Its capabilities include - 1. A natural and easy-to-use database definition language. 2. A powerful editor that operates directly on the database. Data are entered in a simple, natural way, and many short-cuts are available to expedite the entry of large amounts of data. Modifications to existing data are immediately effective. 3. Extensive retrieval facilities including controlled database inversion, boolean and relational operators, nested parenthe- ses in search expressions, truncated and range search, and to previously retrieved sets. 4. A standard listing format that makes the structure of the data readily apparent. When working interactively, retrieved records may be listed at the terminal or printed off-line. They also may be dumped in a format suitable for data exchan- ge and database reloading. 5. A common command and data language for both batch and interactive use. 6. Exits to user-supplied routines at several places in the system to allow input data validation, data transformation on input, output, indexing, searching, and automatic data element generation. 7. Utilities for loading and dumping databases and database space maintenance. A designer wishing to base a special-purpose system on BDMS will find it relatively easy to use just those modules needed. All system capabilities are available via Fortran subroutine calls. For example- 1. Special software might be written to build a database from data initially stored in some special format, e.g. fixed field card images. This could be done by writing a program which reads data in the special format and calls appropriate subroutines to load it directly into the database, bypassing the standard data input language. In fact, such a routine could be coded as a replacement for the standard input routine with the system behaving normally in all other respects. 1.1 System overview 2 Introduction 2. The system could be modified through the provision of a replacement for the standard data output routine so that retrieved data would be listed in a format better suited to its end use. 3. A very ambitious system designer might choose to replace the entire user interface with one providing extensive prompting and data validation tailored to his specific application. In all of these cases, the designer/programmer is freed from concern with physical storage mechanisms for his data. Instead, he can use existing software to access and modify the data, referring to it by name. BDMS is highly modular and coded for the most part in machine- independent FORTRAN IV. The operating system interface and machine- dependent code are isolated in a few modules so that the system is easily transportable. 1.2 System Overview 3 Database Structure 1.2 Database Structure A BDMS database is structured into records, the units in which data pass between the system and disc storage. Normally, a record will have some significance to the user, e.g. a record in a bibliographic database would be a description of a single document, but this is not always necessary or desirable. The individual data items within a record are called data elements; they are the smallest units of data with any meaning to the system, although an individual data element might have some internal structure known to an application program. A data element has a unique name and is normally referenced by name (or a synonym). There is essentially no limit on the number of data elements that may be defined for a database. A hierarchical structure may be imposed on the records when a database is defined. This means that some data elements are declared to be subordinate to other, parent, data elements. Those data elements for which no parent is declared are called record-level data elements; it is often useful to consider the record itself to be their parent. There is essentially no limit on the number of levels which may be defined in the record structure. Within a given record, each record-level data element may occur once, several times, or not at all. Likewise, each occurrence of a data element at any lower level in the hierarchy may have linked to it one, several, or no occurrences of each of its subordinate data elements. There is essentially no limit on the number of times any data element may occur in a single record. Furthermore, there is no storage overhead associated with data elements that do not occur at all in either the record or a particular occurrence of their parent. Data elements are classified into six types according to the values they can assume - character or bit string, integer or real (single or double precision) vector, and pure node. Character or bit strings may be of any length with no limit beyond that imposed by run-time memory restrictions. Integer or real (floating point) data elements may be scalars (single numbers) or arbitrary length vectors (i.e. an ordered set of numbers, which are the components of the vector). Real data elements may be single or double precision. Pure node data elements carry no value; they may be used to link together subordinate data elements in the record hierarchy or as flags. Any data element, regardless of type, may serve as a node in the hierarchical record structure. In general, if one of a group of related data elements may occur only once in each occurrence of the group, it should be made the parent of the rest of the data elements. However, if all of the data elements in such a group may occur multiply, they should all be linked to a pure node parent. Any data element may be declared to be a record key. The system will then maintain an index for that data element to allow efficient retrieval. In an index, key values have a fixed length that is declared in the database definition. Data element values are 1.2 System Overview 4 Database Structure truncated or padded as necessary when they are put into an index. The system assigns a Record ID to each record as it is created. This guarantees that each record has a unique identifier by which it can be selectively retrieved even if none of its data elements is defined to be a key so that no indices are maintained. Physically, a database is divided into a data file, which contains the database definition and data records, a directory file used to access records in the data file, and an inversion file comprising the indices (if any exist). 1.3 System Overview 5 Example Database 1.3 Example Database Throughout this manual, we shall use for illustration a hypothetical database whose records are summaries of some kind of experiment reports and whose structure may be diagrammed as follows - (Record) ------------------------------------------------------------ | | | | | | | | AN TABLE AUTHORS R -------------------- ------------ -------- | | | | l l | | XN YN DATUM CT A I T D ------- --- | | | | X Y CD PRINCIPAL | DY The data elements have the following meanings - AN is a document accession number. TABLE is the name of a collection of data. It links together definitions of the X and Y variables, XN and YN, possible comments on the TABLE, CT, and the actual data points in the TABLE, DATUM. DATUM is a pure node data element that links X and Y values and possible comments on the DATUM, CD. DY is the error on Y. In a given record TABLE might occur multiply; in a given TABLE, DATUM and CT might occur multiply; in a given DATUM, CD might occur multiply. The rest of the data elements, XN, YN, X, Y, and DY would normally occur only once within a given occurrence of their parent data element. AUTHORS is a pure node that links one or more authors, A, with their institutional affiliation(s), I. In general, all of these data elements might occur multiply. Note how principal, a pure node type data element, is used to flag a particular author as the principal author. Normally, only one author would be so flagged. R is the document reference (e.g. report number) and T and D are its type and date. A diagram of this type only shows the hierarchical relationships among data elements. In general, any of the data elements may occur several times or not at all in any particular record. Thus, the pictorial representation of a real record would require the use of a third dimension to display the multiple occurrences. 1.4 System Overview 6 System Organization 1.4 System Organization From the user's point of view, BDMS is divided into several functional modules. The database definition compiler is a separate program used to create a new database before any data are loaded into it. The same editor used for data input and modification is used to construct a database definiton that is converted by the compiler into tables that will drive the rest of the system when the new database is used. The database definition language and use of the compiler are the subject of Chapter 2. The executive is the overall control program. Some executive commands are executed immediately while others cause control to be passed to other system modules, which will then interpret additional user input. The executive functions are discussed in Chapter 3. The editor provides facilities for creating and modifying data records. It is controlled by a set of editor commands and understands the system's external data language. The use of the editor is described in detail in Chapter 4. The query subsystem, described in Chapter 5, interprets user queries and selectively retrieves data records for subsequent display or modification. The utilities, described in Chapter 6, are stand alone programs for database maintenance operations such as efficient loading and dumping and removal of the dead space resulting from update activity. 1.5 System Overview 7 User Exits 1.5 User Exits As mentioned in the introduction, exits to user routines are provided at several places in the system, allowing it to be tailored to specific databases and applications. The subroutines that may be supplied by the user are described briefly below. Details of their use and calling parameters may be found in the programmer's manual. IPROC is passed a data element value before it is stored in the record. IPROC may modify the value or check it for validity. If a value is found to be in error, IPROC can instruct the calling routine not to store it, and/or output an error message. OPROC is passed a data element value before it is output. It may modify the value, check if for validity, or suppress its output altogether. Normally, IPROC and OPROC would be used as a pair to transform between the internal and external forms of a data element. KPROC is passed the value of a key data element before it is stored in an index (or deleted from an index when restoring a modified record or adding a record). It may modify the value or check for validity. It can suppress the storage of a key in an index. KMAP is passed the value of a CHAR type key data element before it is stored in or deleted from an index. It may perform an arbitrary character code mapping in order to enforce a desired collating sequence for values in the index. QPROC is passed the value of a key data element before an index search. Normally it would transform the value in the same way as KPROC, (it might consist of no more than a call to KPROC) but it is provided to allow the use of different forms of a key value during data input and retrieval. SPROC is called before a record is stored in the database. It may perform an intra-record data integrity check, and generate or modify data element values. It can prevent the storage of the record if errors are detected. FPROC is called just after a record is fetched from the database. Its primary use is to rematerialize any virtual data element occurrences that the database designer wishes to make visable to the user. The IPROC, OPROC, KPROC, and QPROC routines are called only if the database definition instructs the system to do so. KMAP, SPROC and FPROC are always called but, of course, may be do-nothing routines. 2.1 Database Definition 8 Introduction 2.1 Introduction To set up a BDMS database, one must define the nature of the data, e.g. names and types of data elements, their hierarchical relation- ships, what indices are to be maintained, etc. The system must be informed of any user-supplied routines for input processing and vali- dation, output, key or query processing. All of this information is coded in a database definition language which is processed by the database definition compiler. The output of the compiler is a binary file definition table (FDT) which becomes the zero'th record of the newly-defined database and controls the operation of the system while that database is being accessed. Plans exist for extending the compiler to provide facilities for modifying the definition of an existing database, e.g. adding or deleting data elements and synonyms. 2.2 Database Definition 9 Database Definition Language 2.2 Database Definition Language The database definition language of BDMS has the same syntax as the data language used by the editor (c.f. Chapter 4). That is, it consists of a series of statements of the form = ; By way of illustration, a file definition for the example database described in Chapter 1 would be coded as follows, assuming that it is desired to index the records by the document accession number, AN, the individual AUTHORS, A, and the type of data reported, YN - FILE=EXPT-REPORTS; DE.=AN; SYN=ACCESSION-NO; TY=INTEGER; KEY; LENGTH=1; DE.=TABLE; TY=CHAR; DE.=DATUM; PAR=TABLE; TY=NODE; DE.=X; SYN=X-VALUE; PAR=DATUM; TY=REAL; DE.=Y; SYN=Y-VALUE; PAR=DATUM; TY=REAL; DE.=DY; SYN=Y-ERROR; PAR=Y; TY=REAL; DE.=CD; SYN=COMMENT-DATUM; PAR=DATUM; TY=CHAR; DE.=XN; SYN=X-NAME; PAR=TABLE; TY=CHAR; DE.=YN; SYN=Y-NAME; PAR=TABLE; TY=CHAR; KEY; LENGTH=1; DE.=CT; SYN=COMMENT-TABLE; PAR=TABLE; TY=CHAR; DE.=AUTHORS; TY=NODE; DE.=A; SYN=AUTHOR-NAME; PAR=AUTHORS; TY=CHAR; KEY; LENGTH=3; DE.=I; SYN=INSTITUTION; PAR=AUTHORS; TY=CHAR; DE.=PRINCIPAL; PAR=A; TY=NODE; DE.=R; SYN=REFERENCE; TY=CHAR; DE.=T; SYN=TITLE; PAR=R; TY=CHAR; DE.=D; SYN=DATE; PAR=R; TY=CHAR; The meaning and allowed values of the database definition attributes are - FILE=; this is not yet used by the system. DE=; this is the name that will be used when the data element is output. Names must be unique within a database. They may not contain embedded blanks, equal signs, or semicolons. The size of the FDT will be minimized and the editor's processing of input to the defined database will be slightly more efficient if the data element names all have similar lengths (i.e. all fit into the same number of computer words). SYN=; synonyms may be used interchangeably with the preferred data element name for input and in queries. They must satisfy the same rules of construction. Again, it is preferable for all synonyms to have similar lengths. 2.2 Database Definition 10 Database Definition Language PAR=; This specifies the parent to which a data element will be linked in the hierarchical record structure. The parent data element must have been defined previously. PAR is omitted for record-level data elements. TY=; Allowed values are - INTEger - integer (vector) REAL - real (vector) DOUBle - double precision real (vector) CHAR - character string BIT - bit string NODE - pure node Note that double precision and bit string types can be used procedurally but not in the editor or user query languages. VIRTUAL; If this follows a DE specification, the data element will be discarded when a record is stored in the database. It may be input, however, and may be used as a key. It will still exist in the record when the SPROC routine is called. If it is desired that a virtual data element be rematerialized when a record is fetched from the database, this may be done by the FPROC routine. KEY; If this follows a DE specification, an index will be maintained for the data element. LENGTH=; This may follow a key attribute. If absent, the key length defaults to 1 word. IPROC=; If this follows a DE specification, the IPROC routine will be called by the editor when the data element is input. Allowed format specifications are - EXTERNAL the DE value is passed to and is returned by IPROC in external format, i.e. Unpacked character string. INTERNAL the DE value is passed to and returned by IPROC in internal format, i.e. Internal binary numeric representation, packed character string, or bit string. 2.2 Database Definition 11 Database Definition Language CONVERT the DE value is passed to IPROC in external format and returned by IPROC in internal format. OPROC=; if this follows a DE specification, the OPROC routine will be called by the list routine before the data element is output. Allowed format specifications are the same as for IPROC except that the convert has the opposite meaning, i.e. CONVERT the DE value is passed to OPROC in internal format and returned by OPROC in external format. KPROC=; If this follows a DE specification, the KPROC routine will be called prior to storing a DE value in an index. Allowed format specifications are the same as for IPROC. QPROC=; If this follows a DE specification, the QPROC routine will be called by the query interpreter when it encounters the DE in a query. Allowed format specifications are the same as for IPROC. IPROC, OPROC, KPROC, and QPROC are not valid attributes for a pure node type DE. 2.3 Database Definition 12 Compiler Commands 2.3 Compiler Commands The commands recognized by the compiler are described in this section. Only the first four letters of a command are necessary for recognition. DEFIne This command invokes the editor, which initially will be in append mode, so that a new database definition may be input. This creates a database definition record in core which has the following hierarchical structure - (Database Definition Record) --------------------------- | | FILE DE ---------------------------------------------------- | | | | | | | | | SYN PAR TY VIRTUAL KEY IPROC OPROC KPROC QPROC | LENGTH VIRTUAL and KEY are pure node type and LENGTH is integer type. All other data elements are CHAR type. As this record is being input, errors or omissions may be remedied using the EDIT mode commands *I, *R, *S, and *D just as if it were a data record. It may be listed at any time with the editor command *L. When the definition is complete, the editor must be terminated by a ** command. COMPile This command initiates compilation of the database definition created by a preceding define command. The compiler responds with COMPILING DEFINITION followed by either COMPILATION SUCCESSFIL or a list of errors detected and COMPILATION TERMINATED MODIfy This command invokes the editor to allow correction of the errors detected in a preceding compilation. When the changes have been made, 2.3 Database Definition 13 Compiler Commands the editor must be terminated by a ** command. It differs from a define command in that define will completely erase an existing definition record while modify allows changes to an existing definition record. LINE, This sets the input/output line length to characters. The default is 80 characters. STOP This terminates execution of the compiler. If the definition has been compiled successfully, an empty database will exist ready for the addition of data records. The AUDIT facility is always turned on when the database definition compiler is run; all input to the system from the input file (the terminal when running interactively) is echoed on the audit file. This provides a record of the definition that may be edited and used as batch-mode input to the compiler if the definition needs to be changed later. This copy of the definition might also be saved so that it could be used when reloading a dumped copy of the database, e.g. following transmittal to another site. 2.4 Database Definition 14 Compiler Error Messages 2.4 Compiler Error Messages The error messages that may be generated by the database definition compiler are summarized in this section. The prefix ***ERROR***, which is common to all messages, has been omitted. ILLEGAL RECURRENCE OF RECORD LEVEL ATTRIBUTE One of the attributes file, or DE has recurred without a following period. It is ignored. INVALID COMMAND A command has been input whose first four letters do not match any of those in section 2.3. INVALID PARENT SPECIFIED FOR DE.N The parent specified for DE.N has not been defined prior to defining DE.N or is misspelled. INVALID TYPE SPECIFIED FOR DE.N Valid types are INTEger, REAL, DOUBle, CHAR, BIT, or NODE. INVALID IPROC TYPE SPECIFIED FOR DE.N INVALID OPROC TYPE SPECIFIED FOR DE.N INVALID KPROC TYPE SPECIFIED FOR DE.N INVALID QPROC TYPE SPECIFIED FOR DE.N Valid types are EXTERNAL, INTERNAL, or CONVERT. NO DATA ELEMENTS DEFINED The database definition contains no DE attribute specification statements at all. NO TYPE SPECIFIED FOR DE.N Every data element must have a type specification. PROC SPECIFIED FOR PURE NODE DE.N Since a pure node type data element carries no value, it is meaningless for a processor to operate on it. WORK SPACE EXCEEDED -- COMMAND SKIPPED The database definition is too large to compile in the available work space. The compiler must be recompiled with a larger work space before proceeding. 2.4 Database Definition 15 Compiler Error Messages WORK SPACE EXCEEDED -- RUN ABORTED There was not even enough work space to initialize the system. This is an unlikely occurrence indicative of system malfunction. 3.1 Executive 16 Introduction 3.1 Introduction The BDMS executive is a self-contained facility that allows a user to access and maintain an existing database. Commands are provided to add new records to the database, to search for records satisfying stated criteria, and to subsequently display, print, dump, modify, and delete such records. In addition, several commands allow user control over certain aspects of the system's operation such as input/output, line length, and whether to maintain an audit trail. An overview of the executive will be presented in this chapter, and the use of most commands will be described in detail. Two areas are sufficiently involved to warrant separate treatment in the following two chapters - the editor and query language. 3.2 Executive 17 Executive Commands 3.2 Executive Commands All executive commands take the form of an english-language verb followed optionally by a comma and an integer identifying the record to be acted upon. The commands are terminated by a blank and hence may not contain embedded blanks. Whenever the executive is waiting for a user command, it issues the message MODE:DBname:EXEC> The command verbs recognized by the executive are - ADD add new records to the database. FIND retrieve and make current a set of records. SET make a previously created set current. PURGE purge all previously created sets or the last set created. LIST list at the terminal the current set or a selected record. PRINT print off-line the current set or a selected record. DUMP dump the current set or a selected record in external format. SIGNIFICANCE set the number of significant figures to be printed for real values. PRECISION set the number of decimal places to be printed for real values. MLDIFY modify a record selected from the current set. DELETE delete from the database a record selected from the current set. LINE set input/output line length. AUDIT turn on audit facility (default). NOAUDIT turn off audit facility. STOP terminate program execution. In somewhat more detail, these commands function as follows. ADD This command informs the executive that one or more new records are to be added to the database. An empty record will be created and assigned the next available Record ID. Control is then transferred to the editor so that the user may enter data in the record(s). (See Chapter 4 for instructions on the use of the editor.) When the user exits from the editor (with a ** or *C editor command), the number of records added to the database is reported in the form RECORD(S) ADDED FIND This command invokes the query subsystem and is followed by a condition terminated by a ** delimiter. The database will be searched for records satisfying the condition and if any are found, they form a 3.2 Executive 18 Executive Commands new current set on which subsequent list, print, dump, modify, or delete commands will act. The set identifier and number of records in the set are reported to the user in the form RECORD(S) IN SET The format of queries will be covered in detail in Chapter 5. SET, This command makes a previously-created the current set on which subsequent list, print, dump, modify, and delete commands will act. PURGE, This command purges a previously-created , freeing the disc and work space it uses. If is omitted, all existing sets are purged. (At present, only the last-created set, or all sets, may be purged with this command.) LIST,,,.... This command lists the n'th record in the current set on the system file log (which is the terminal when running interactively). If the number is omitted, all records in the set will be listed. Each data element occurrence will begin on a new line and will be numbered if several occurrences of that data element are linked to the same parent occurrence, or if it is a record level data element which occurs more than once. Subordinate data elements will be listed following the parent occurrence to which they are linked and indented according to their level in the record structure. If the list of data element names is omitted, the last display or suppress command issued controls which data elements are displayed. If no display or suppress command has been issued, all data elements are displayed. If a list of data element names separated by commas follows the command verb (and record number , if present), only those data elements in the list will be displayed. The entire command, including the list of data element names, must not contain embedded blanks. PRINT,,,.... This is identical to list, except that the record(s) is listed on the system file print, which can be disposed for printing off-line when the job is concluded. DUMP,,,.... This command is similar to print, except that the record(s) is dumped on the system file dump in a format readable by the editor. 3.2 Executive 19 Executive Commands This file may be used subsequently to load another database. SIGNIFICANCE, This resets the number of significant figures to be displayed for real values to . The default is 5 figures. PRECISION, This sets the number of decimal places to be displayed for real values to . Setting the precision overrides the default which is 5 significant figures. MODIFY, This command causes the 'th record of the current set to be fetched from the database into core and then passes control to the editor. The record can be modified as desired using editor commands. When all editing is complete, control is returned to the executive, which restores the record in the database and updates all indices to reflect the changes in the record. Successful completion of this operation generates the message: RECORD MODIFIED DELETE, This causes the 'th record of the current set to be removed from the database. All indices will be updated to reflect this action. Successful completion generates the message RECORD DELETED LINE, This sets the input/output line length to characters. The default is 80 characters. AUDIT This command turns on the audit facility (if it has been turned off previously with a noaudit command). While it is on, all input to the system from the input file (the terminal when running interactively) is echoed on the audit file. Besides providing a record of activity, this file may be used to update a backup copy of the database if a run is terminated abnormally (e.g. a file-preserving system crash). NOAUDIT This command turns off the audit facility. Normally, it should be the first command in a batch run, since in batch mode the input file may be saved and used in lieu of an audit file. 3.2 Executive 20 Executive Commands STOP This signals the end of a run. The executive will perform necessary housekeeping functions and terminate. 3.3 Executive 21 Executive Error Messages 3.3 Executive Error Messages The error messages that may be generated by the executive are summarized in this section. The prefix ***ERROR***, which is common to all messages, has been omitted. INVALID COMMAND Either the command verb was unrecognized, or the command syntax was incorrect. INVALID RECORD NUMBER The record number specified was larger than the number of records in the current set. INVALID SET NUMBER The set number specified was larger than the number of existing sets. RECORD NOT DELETED It was not possible to delete the specified record. RECORD NOT FULLY DE-INDEXED If a record was being deleted, it was not possible to remove some or all of its index entries. If a record was being modified, it was not possible to remove some or all of the index entries for those key data elements whose values were changed. As a result, the record may satisfy some queries that it should not. RECORD NOT FULLY INDEXED If a new record was being added, it was not possible to make some or all of its index entries. If a record was being modified, it was not possible to make some or all of the new index entries for those key data elements whose values were changed. As a result, the record will not satisfy some queries that it should. RECORD NOT RETRIEVED It was not possible to retrieve a record specified in a LIST, PRINT, DUMP, MODIFI, or DELETE command. RECORD NOT STORED It was not possible to store a new or modified record. WORK SPACE EXCEEDED There was not sufficient work space to carry out some requested operation. Before it will be possible, the system must be recompiled with a larger work space. 4.1 Editor 22 Introduction 4.1 Introduction The EDITOR is the subsystem of BDMS which is invoked by the executive to create a new record in core prior to its addition to a database (add executive command) or to modify an existing record which has been fetched into core from the database (modify executive command). It is also invoked by the database definition compiler to create or modify the definition of a new database. It has two operating modes append mode, which is used to build or extend a record, and edit mode, which allows selective replacement or modification of data element values and insertion or deletion of data element occurrences. Normally, a new record will be created in append mode, but one might shift into edit mode to correct an error or omission before the record is stored. Likewise, most editing of an existing record probably will be done in edit mode, but append mode might be used to extend the record by simply adding data element occurrences. The editor reads free format input comprising commands and data element names and assignment statements. The commands tell it what operations to perform on the record, thus setting the operating moDE. Data element names may occur alone or as part of an assignment statement, depending on the operation specified by the last command. In edit mode they usually are qualified by one or more occurrence numbers which identify the particular data element occurrence to be altered. Assignment statements assign values to data element occurrences. Section 4.2 of this chapter will describe in detail the form of assignment statements to prepare the reader for the discussion of editor commands and operating modes in Sections 4.3-4.5. Section 4.6 is a summary of the error messages that might be encountered while using the editor. 4.2 Editor 23 Assignment Statements 4.2 Assignment statements when a value is to be assigned to a data element occurrence, it is done with an assignment statement whose general form is- = ; DATA ELEMENT NAMES A data element name is either the preferred name or one of the synonyms specified in the database definition. In edit mode, it may be qualified by one or more occurrence numbers -- integers separated from the name and each other by periods. Occurrence numbers will be discussed in Section 4.5. NUMERIC VALUES The value of an integer type data element must be a series of digits preceded optionally by a + or - sign. The value may be expressed as a binary, octal, decimal, or hexadecimal number; the base is specified by a single letter following the number as follows- B binary O or Q octal D or absent decimal H or Z hexadecimal The set of allowable digits is that subset of (0-9, A-F) appropriate for the base chosen. The value of a real or double type data element must be a series of decimal digits preceded optionally by a + or - sign. It may, in addition, have a decimal point and/or a power-of-ten multiplier expressed as an e followed by the integer exponent with an optional sign. Blanks anywhere within a numeric value are ignored and the value is terminated by a semicolon. Numeric values must not exceed the maximum size which can be stored as a single precision number (or double precision number for double type) by the computer being used. A numeric vector value is represented as a series of component values constructed according to the above rules, separated by commas and terminated by a semicolon. For example, x = 1.5,3.2,7.6; 4.2 Editor 24 Assignment Statements CHARACTER STRING VALUES The value of char type data element is the character string between the relational operator and a terminating semicolon. Leading and trailing blanks are ignored but embedded blanks are considered to be part of the value. If leading and/or trailing blanks are desired, they may be forced through use of the symbol for logical negation, (a backward slash in ASCII character set). Each leading or trailing negation symbol is converted into a blank when the value is stored. If the character string contains semicolons, they must be doubled to avoid confusion with the terminating semicolon; any semicolon immediately followed by another semicolon will be stored as a single semicolon and will not terminate the string value. NULL VALUES A data element occurrence of any type may be assigned a null (i.e. undefined) value by an assignment statement of the form = ; This is occasionally useful if it is necessary to enter a value for a subordinate data element but no value is known for its parent. 4.3 Editor 25 Editor Commands 4.3 Editor Commands The editor is controlled by commands which either set its operating mode and determine how subsequent data element names and assignment statements will be interpreted, or cause some immediate action to be taken. All commands are single letters preceded by an asterisk and followed by a blank. They will be summarized here and discussed at length in the following two sections. COMMANDS THAT SET OPERATING MODE APPEND MODE *A append data element occurrence(s). This is the default mode. EDIT MODE *R replace data element value(s). *S substitute string(s) in data element value(s). *D delete data element occurrence(s). *I insert data element occurrence(s). COMMANDS THAT CAUSE IMMEDIATE ACTION ** exit from editor - record will be stored or replaced by edited version. *C cancel edit - do not store new or altered record. This command simply terminates the editor and returns control to the executive. If a new record was being input, it will not be stored in the database. If an existing record was being edited, no changes made will be reflected in the database. It is primarily useful in an interactive session to terminate a hopelessly confounded edit or one which was mistakenly begun. *L list the present state of the record. This command allows a record to be visually checked anytime while it is being created or edited. It does not affect the operating mode or alter the edit command in effect when it is given. *E end of record. This marks the end of a record when multiple records are being 4.3 Editor 26 Editor Commands added. It is usually not needed (c.f. Section 4.4, subsection on automatic record generation). When modifying a record, it has the same effect as **. 4.4 Editor 27 Append Mode 4.4 Append Mode When first entered, the editor is in append mode. It may be returned to append mode at any time with a *A command. ORDER of ASSIGNMENT STATEMENTS In append mode the order in which assignment statements and pure node data element names are entered determines the order of the data element occurrences created and their linkage to parent occurrences. The occurrences of a data element linked to each occurrence of its parent data element (or the record level data element) form an ordered list. In general, each assignment statement or pure node name creates a new occurrence of a data element which will be added to the end of the list of occurrences that is linked to the last-created occurrence of its parent data element. For example, if data were being added to the database described in Chapter 1, AUTHORS.; A.=Jones; A.=Smith; I.=LBL; AUTHORS.; A.=Baker; I.=UCB; would create two authors groups, the first with two authors at LBL and the second whose single author is affiliated with UCB. The periods following the data element names mean 'next'. A data element name without a period means ''first'' or occurrence number 1. The next two subsections will elucidate the significance of these conventions. Other than having data element occurrences in the desired order and following the correct parent data element occurrence, the order of assignment statements in append mode is immaterial. An assignment statement for the reference, R, could have been inserted anywhere in the preceding example without affecting the creation of the authors groups. AUTOMATIC PARENT NODE GENERATION In append mode, node type parent data element names can usually be omitted from the input stream. The occurrences of such data elements necessary to link together subordinate data element occurrences will be generated automatically if 4.4 Editor 28 Append Mode 1) One of the subordinate data elements is encountered in the input stream and no parent occurrence yet exists, or 2) occurrence number 1 of one of the subordinate data elements is encountered, i.e. The data element name appears without a period following it, and the last-created parent occurrence already has at least one occurrence of that data element linked to it. Thus, the preceding example could have been entered in the simpler form A=Jones; A.=Smith; I=LBL; A=Baker; I=UCB; or equivalently, I=LBL; A=Jones; A.=Smith; I=UCB; A=Baker; each occurrence of A in the first case or I in the second causes the creation of an authors node. Automatic parent node generation will also work if the parent data element is not node type. In this case, an automatically generated parent occurrence will have a null value. AUTOMATIC RECORD GENERATION When adding records, if a record-level data element name appears more than once without a following period, the second occurrence will trigger the storage of the previous record and begin a new one just as if a *E command had been encountered. The second occurrence of the record-level data element will become a part of the new record. The record itself may be considered to be the pure node parent of all record-level data elements, so automatic record generation is completely analogous to the automatic parent node generation described in the preceding subsection. If the editor has been invoked by a modify executive command, automatic record generation is meaningless, so that in append mode, a recurrence of a record-level data element name without a period will be flagged as an input error. 4.4 Editor 29 Append Mode MULTIPLE ASSIGNMENT STATEMENTS Several consecutive occurrences of a data element may be created without the necessity of repeating the data element name. This is done with a multiple assignment statement, whose most general form is = ;;... Using a multiple assignment statement, the preceding example could have been input in the still simpler form A=Jones; Smith; I=LBL; A=Baker; I=UCB; or equivalently, I=LBL; A=Jones; Smith; I=UCB; A=Baker; the difference between the two assignment statements X=1,2,3; and X=1;2;3; should be clearly understood. The first creates a single occurrence of X whose value is a three-component vector (assuming X is defined to be a numeric type data element). The second creates three occurrences of X whose values are 1, 2, and 3, respectively. In order to implement multiple assignment statements, the input processor will attempt to interpret anything which is not a recognizable data element name or assignment statement as another value in the preceding assignment statement. If a data element name is misspelled, the erroneous assignment statement cannot be distinguished from another value for a preceding char type data element. It will only be recognized as an error if the preceding data element is not char type. 4.5 Editor 30 Edit Mode 4.5 Edit Mode Any of the commands *R, *S, *D, or *I places the editor in edit mode. NAMING DATA ELEMENT OCCURRENCES In edit mode, the data element occurrence to be altered is identified by specifying its position within the record structure. To completely identify a particular data element occurrence, one must specify its name and occurrence number, the number of the parent occurrence to which it is linked, the number of the grandparent occurrence to which its parent is linked, etc. Up to the record-level ancestor. The series of occurrence numbers is appended to the data element name in this order, i.e., in order of increasing remoteness of the ancestor, as a series of integers separated from the name and each other by periods. Referring to the example record structure of Chapter 1, A.4.2 would identify the fourth author A.4 of the second authors group, Authors.2. The (first and only) X value in the second datum in the first table would be identified as X.1.2.1, etc. Under certain circumstances it is not necessary to specify all these occurrence numbers since the editor will supply default values. This will be discussed below in the subsection on path memory. REPLACE COMMAND (*R) This is followed by assignment statements which assign new values to existing data element occurrences. For example, *R A.2.1=J.Doe; would cause the value of A.2.1 To be replaced by J.Doe. It is not meaningful to replace a pure node type data element since it carries no value. SUBSTITUTE COMMAND (*S) This allows the replacement of a selected substring within a char type data element value. It works like the *R command except that the data element values within assignment statements which follow a *S command have the form 4.5 Editor 31 Edit Mode . The may be any character that does not occur in either the or the . All blanks within the substrings are significant. The will replace the first substring in the data element value that matches the . If the is null, the will be inserted at the beginning of data element value. If the is null, the first substring in the data element value that matches the will be deleted. For example, if A.2.1 Has the value J.Deo, the command *S A.2.1=/eo/oe/; would change it to J.Doe. The command *S A.2.1=//B/; would change that value to BJ.Doe, while the command *S A.2.1=/J//; would leave as a final value B.Doe. The modified data element value is echoed following execution of a *S command so that the correctness of the substitution may be ascertained. DELETE COMMAND (*d) This is followed by the names of the data element occurrences to be deleted, each terminated by a semicolon. All occurrences of subordinate data elements linked to a deleted data element occurrence are also deleted. For example, *D DATUM.2.1; Would delete DATUM.2 in TABLE.1 and any occurrences of X, Y, DY, and CD linked to it. After a data element occurrence has been deleted, any occurrences of that data element linked to the same parent occurrence and following the deleted occurrence will be identified by occurrence numbers one smaller than originally. 4.5 Editor 32 Edit Mode INSERT COMMAND (*I) This is followed by assignment statements (or occurrence names for pure node type data elements). Each of them will cause an occurrence of a data element to be inserted before the named occurrence. For example, if the author name which should have been A.2 in AUTHORS.1 had been mistakenly omitted from a record, it could be inserted with the command *I A.2.1=J.Doe; After a data element occurrence has been inserted, any occurrences of that data element linked to the same parent occurrence and following the inserted occurrence will be identified by occurrence numbers one larger than originally. Hence the inserted occurrence becomes the named occurrence. In order to add a data element occurrence after all existing occurrences linked to a given parent occurrence, one either specifies an occurrence number greater than the number of occurrences already existing, or specifies occurrence number 0, or simply omits the last qualifier, but not its preceding period. Hence, the command in the preceding example would have the desired effect even if AUTHORS.1 had only one subordinate a occurrence prior to the insertion. A multiple assignment statement following an insert command causes several new occurrences to be inserted before the specified occurrence of the data element. They will occur in the modified record in the order of their appearance in the assignment statement. After insertion, any occurrences of that data element linked to the same parent occurrence and following the inserted occurrences will have their occurrence numbers incremented by the number of inserted occurrences. For example, *I A.4.2 = J.Smith; B.Jones; would leave AUTHORS.2 with A.4=J.Smith; and A.5=B.Jones;. If an A.4 existed prior to insertion, it would now be A.6, etc. ORDER of EDITS One point made in the preceding discussion warrants further emphasis - data element occurrence numbers are relative list positions and can change during editing. After data element occurrences are inserted or deleted, any occurrences following the insertion or deletion point (linked to the same parent occurrence) must be identified by occurrence numbers larger or smaller than originally. 4.5 Editor 33 Edit Mode In order to avoid confusion, especially when working from a printed listing or doing batch updates, replacements and substitutions should be done first, and insertions and deletions should be done from the bottom up, i.e. those data element occurrences with the highest occurrence numbers should be edited first. Then successively modified occurrences will retain the occurrence numbers of the original record. If one does become confused while working interactively, the current values of the occurrence numbers may be ascertained by listing the record with a *L command. PATH MEMORY In edit mode, the editor will attempt to supply a default parent hierarchy for a data element being modified if the user has not given complete position information. This is implemented with a path memory, which works as follows. The editor remembers the path taken through the record structure to arrive at the position of a modification. Then, if the position of the next modification is not completely specified, an attempt will be made to link whatever partial path is specified to the path remembered from the last modification. If this linkage can be achieved, the resulting path will be used as the position of the new modification and the path memory updated. The path memory is cleared when the editor enters append mode. The best way to understand the effect of this general mechanism is through the consideration of some specific examples. 1. Suppose one wishes to replace the values of several A's within AUTHORS.2. This can be accomplished with the command *R A.1.2=B.Jones; A.2=E.Smith; A.4=C.Baker; The occurrence of the parent data element authors is not specified for A.2 and A.4 and so it is assumed to be AUTHORS.2, as specified for A.1. 2. Suppose that the entire second authors group was inadvertently skipped when a record was initially input. It could be inserted with the command *I AUTHORS.2; A=B.Jones; E.Smith; J.Doe; C.Baker; I=LBL; Since no parent occurrence is specified for A.1, A.2, A.3, A.4, and I.1, it is picked up from the path memory as 4.5 Editor 34 Edit Mode AUTHORS.2. 3. Suppose that a new third DATUM is to be added to TABLE.2. It can be inserted with the command *I DATUM.3.2; X=1.537; Y=3.206; DY=0.001; CD=new data points; since no parent occurrence is specified for X.1, Y.1, and CD.1, it is picked up from the path memory as DATUM.3 in TABLE.2. For DY.1, it is assumed to be Y.1 in DATUM.3 in TABLE.2. 4. Suppose that after the edit in 3., One desires to replace the value of X.1 in DATUM.2 in TABLE.2. The command *R X.1.2=2.372; will accomplish this. It is not necessary to give a complete position specification for X, namely X.1.2.2, because the " default occurrence of TABLE resulting from the previous edit is TABLE.2. 4.6 Editor 35 Editor Error Messages 4.6 Editor error messages In this section, the editor error messages will be summarized and their meanings elaborated. It must be clearly recognized, however, that the editor is designed to make every effort to find a legal interpretation for all input it processes. The implications of this were discussed in Section 4.4. (Subsection on multiple assignment statements). In general, when an error is detected, that part of the input stream which caused the error will be skipped and processing will continue with the following input. In append mode, this means that some data element occurrence will not be created. In edit mode, some edit operation will not be performed. If one is working interactively, the error usually can be corrected immediately. In batch mode, one error will sometimes cause several others, e.g. a parent data element occurrence is not created due to a misspelling, causing several subordinate data element occurrences to be linked incorrectly to a previously created parent occurrence. All error messages issued by the editor are preceded by ***ERROR***. This prefix has been omitted in the following list of messages. DATA ELEMENT VALUE IN DELETE COMMAND It is not meaningful to specify a new value for a data element occurrence being deleted. Only data element names terminated by semicolons may follow a delete command. ILLEGAL RECURRENCE OF RECORD-LEVEL DATA ELEMENT A record-level data element name without a period occurred in append mode while modifying a record which already has an occurrence of that data element. While adding records, this is not an error but will cause the automatic generation of a new record. INCOMPLETE ASSIGNMENT STATEMENT In batch mode, an end-of-file condition has been sensed while processing a data element name or value. Probably the terminating semicolon was omitted. INCORRECT QUALIFIER The editor could not determine the path to a data element occurrence based on the occurrence numbers given and the current state of the path memory. This message is also given if too many occurrence numbers are specified for the depth of a data element within the hierarchical record structure. INVALID LENGTH RETURNED BY IPROC The user-supplied IPROC routine has returned a negative data element length. 4.6 Editor 36 Editor Error Messages INVALID NUMERIC DATA ELEMENT VALUE The value specified for a numeric data element does not have a form valid for its type. MISSING DATA ELEMENT VALUE A non-node type data element name is followed immediately by a semicolon. This is only allowed in a delete comand. Multiple assignment statement in replace or substitute command only one data element occurrence at a time may be replaced. Multiple assignment statements are not meaningful following a replace or substitute command. NO MATCHING SUBSTRING The old substring specified in a substitute command does not occur in the string being scanned. NON-CHAR DATA ELEMENT IN SUBSTITUTE COMMAND A substitute command may only operate on char type data elements. PARENT DATA ELEMENT MISSING An edit operation cannot be performed because an indicated parent data element occurrence does not exist. PURE NODE DATA ELEMENT IN REPLACE OR SUBSTITUTE COMMAND Since a node type data element carries no value, it is meaningless to replace it. Only assignment statements may follow a replace or substitute command. QUALIFIED DATA ELEMENT NAME IN APPEND MODE Data element names may not be followed by occurrence numbers in append mode. SPECIFIED OCCURRENCE DOES NOT EXIST The data element occurrence specified in an edit command does not exist. An occurrence number probably was input incorrectly. SUBSTITUE COMMAND SYNTAX The syntax of a substitute command is incorrect, e.g. the delimiter character does not occur exactly three times or there are non-blank characters between the third delimiter and the terminating semicolon. UNDEFINED DATA ELEMENT NAME The input stream contains a data element name which could not be 4.6 Editor 37 Editor Error Messages recognized and could not be interpreted as another value in a preceding assignment statement. WORK SPACE EXCEEDED A data element name or value is longer than the available work space or there is no more space available for expansion of the path memory. In general, the system must be recompiled with a longer work space to guarantee that the error will not recur when accessing this database. Sometimes, storing the record and then retrieving it again will free enough work space to alleviate the problem and allow further editing. This is likely if extensive edits have resulted in a large amount of dead space in the record. 4.7 Editor 38 List Error Messages 4.7 List error messages Certain error conditions that generate messages can arise while listing a record. They are summarized in this section. All error messages issued by the list routine are preceded by ***ERROR***. This prefix has been omitted in the following list of messages. INVALID LENGTH RETURNED BY OPROC -- DATA ELEMENT SKIPPED The user-supplied OPROC routine has returned a negative data element length. INVALID MODE PARAMETER -- LIST ABORTED This will only occur when the user calls list directly with an invalid mode parameter. RECORD BUFFER EMPTY -- LIST ABORTET There is no record in core to be listed, not even an empty one. This will only occur if a user calls list directly without properly initializing the record buffer. WORK SPACE EXCEEDED -- DATA ELEMENT SKIPPED WORK SPACE EXCEEDED -- LIST ABORTED The record is so large that there is insufficient work space for list to operate. The system should probably be recompiled with a larger work space for use with this database. 5.1 Retrieval 39 Introduction 5.1 Introduction The BDMS query language permits a user to search a database for those records satisfying an arbitrarily complex condition on key (indexed) data element values. The condition is constructed as a boolean combination of key value specifications, including inequalities and ranges. Furthermore, it is possible to search for records having an occurrence of a specified data element regardless of value, or for those having an occurrence of the data element with a null value. Truncated value specification for character string keys may be used to search for those records having an occurrence of the data element beginning in a particular way. The retrieval facility is invoked by the find executive command. This must be followed by a condition that is terminated by a **, i.e. FIND ** The set of records that satisfy the condition is assigned a set identification number, and the number of records in the set is reported in the form. RECORD(S) IN SET If no records satisfy the condition, the response is 0 RECORD(S) IN SET The number assigned to a non-empty set may be used in subsequent queries in combination with further conditions. The next two sections describe the format of conditions. 5.2 Retrieval 40 Simple Conditions 5.2 Simple conditions A simple condition has the form - -= [[= ] [ [[=]] ]] [[> ] ; [to[ < ] ;]] [[>=] [ [<= ] ]] [[< ] ] [[<=] ; ] [[<>] ] [ ; ] where curly brackets [] surround a set of options, one of which must be chosen, and square brackets [] surround a completely optional element. The relational operators <=, >=, <> stand for 'less than or equal', 'greater than or equal', and 'not equal', respectively. Thus, an exact value, an inclusive or exclusive upper or lower bound, or a range of values may be specified. Examples of valid simple conditions are - A = Jones; A <> Smith; X <= 7; X = 5; TO 8; X > 5; TO < 8; The later two examples differ in that the second excludes both endpoints of the range. The condition <> ; will be satisfied by all records which a) have at least one occurrence of the named data element, and b) have no occurrence of that data element whose value matches that given. If no value appears after the relational operators = or <>, a search is made for those records which respectively do or do not have an occurrence of that data element with a null value. Note that this corresponds to the syntax used to enter a null value using the editor (cf. Chapter 4). For example, 5.2 Retrieval 41 Simple Conditions A =; would result in a search for all records having a null value for a and a <>; defines the complement set. The last form allowed for a simple condition, e.g. ; defines the set of all records having any occurrence of the specified data element, regardless of value. The format of a value depends on the data element type and is the same as that accepted by the editor (cf. Chapter 4). Key values in an index are fixed length, derived from the corresponding data element values by truncation or padding; data element values in a query will be truncated or padded in the same way to ensure valid comparison with index values. 5.3 Retrieval 42 Complex Conditions 5.3 Complex conditions The most general condition that may appear in a query is constructed out of simple conditions and previously defined sets according to the following recursive definition - =[NOT] [] [ [AND] ] [ ] [ [OR ] ] that is, simple conditions and previously-defined sets, identified by number, may be combined using the boolean operators NOT, AND, and OR. Not has the highest precedence and or the lowest; this ordering may be overridden through use of parentheses. Note that simple conditions and set numbers play equivalent roles in a complex query. This is because each simple condition may be viewed as defining an (intermediate) set. These sets, along with any existing sets appearing in the query, are then combined by union (OR), intersection (AND), and complement (NOT). It should be noted that the two conditions <> ; and NOT = ; are not equivalent. The meaning of the first was elucidated in the preceding section. The second differs in that records having no occurrence of the named data element will also satisfy it. This happens because not complements the set defined by a following condition and the universal set in terms of which complementation is defined is the entire database. Thus, the effect of a not operator cannot simply be absorbed into the relational operator and NOT = <> NOT < >= NOT > are not equivalent to <= NOT <= > NOT >= < NOT <> = 5.4 Retrieval 43 Record ID 5.4 Record ID As mentioned in Chapter 1, every record in a database is assigned a unique and permanent integer record ID when it is first created. The pseudo data element name REC-ID may be used in a simple condition to retrieve records by their ID's. Any form of the simple condition including ranges and REC-ID; is allowed. The set of records satisfying the latter 'condition' is the entire database. Simple conditions on REC-ID may be used as components of a complex condition. 5.5 Retrieval 44 Truncated Search 5.5 Truncated search Truncation provides a way of searching for records that contain occurrences of some data element whose value begins in a specified way, regardless of the rest of the value. For example, one might like to find all papers authored by smith, regardless of how the first name had been entered in the database. If authors' names were stored last name first, one could just do a truncated search for smith. Truncation is indicated in the BDMS query language by a single / (slash) following the partial value. Thus, the preceding example would be expressed as FIND A=Smith/; ** any record with a value for a whose first five characters are smith would satisfy the condition. If one desires to search for a value which happens to end with a /, the slash may be doubled to prevent its being interpreted as a truncation delimiter. For example, FIND CT=ABC//; ** would find all records having a value ABC/ for CT. If one desires to truncate immediately following a /, then three slashes are required. For example, FIND CT=ABC///; ** would find all records containing values of CT which begin with the characters ABC/. Truncated values may be used in conjunction with any relational operator. Either or both of the endpoints of a range search may be expressed as truncated values in the same way. Since equality to a truncated value actually defines a range of values that will satisfy the condition, the meanings of the remaining relational operators will be affected in a corresponding way, elucidated by the following diagram - < = > <------------------ ----------------- -------------------> < = <------------------------------------ > = -------------------------------------> < > <------------------ -------------------> The range of record values satisfying an equality condition is represented by the middle segment of the top line while the ranges satisfying other conditions involving the same value but different relational operators are represented by the other line segments. Simple conditions involving truncated values may be used as components of a complex condition. 5.6 Retrieval 45 Retrieval Error Messages 5.6 Retrieval error messages The error messages that can be generated while processing a query are listed below. The entire query is scanned for errors and only if none are detected is the database actually searched. All error messages from the query processor are preceded by ***ERROR***. This prefix has been omitted in the following list of messages. INCOMPLETE QUERY In batch mode, an end of file condition has been sensed while processing a query. Probably the query terminator (**) was omitted. INVALID LENGTH RETURNED BY QPROC The user-supplied QPROC routine has returned a negative data element length. INVALID NUMERIC DATA ELEMENT VALUE The value specified for a numeric data element does not have a form valid for its type. INVALID RANGE SPECIFICATION A meaningless set of relational operators has been used in a range condition, e.g. X < 7; TO > 9;. INVALID SET NUMBER A set number has been used that is less than or equal to 0, or larger than the last-created set number. NON-KEY DATA ELEMENT IN QUERY A data element used in the query is not defined to be a key. QUERY SYNTAX The query has not been properly constructed out of simple conditions and boolean operators, e.g. two or more simple conditions are not joined by a boolean operator. TOO MANY LEFT PARENTHESES TOO MANY RIGHT PARANTHESES The parentheses appearing in the query are not properly paired. UNDEFINED DATA ELEMENT NAME A data element name appearing in the query cannot be recognized. It has probably been misspelled. 5.6 Retrieval 46 Retrieval Error Messages VALUE SPECIFIED FOR PURE NODE DATA ELEMENT The query includes a condition on the value of a node data element. This is meaningless since a pure node carries no value. Perhaps the data element name is misspelled. WORK SPACE EXCEEDED The system has run out of work space while processing the query. It may be necessary to recompile the system with a larger work space for use with this database. 6.1 Utilities 47 Introduction 6.1 Introduction Utility programs are provided with BDMS to perform the maintenance functions of initial database loading, dumping an entire database in external format for transmittal, compressing dead space from the data file, and rebalancing the index trees. Use of these programs is explained in the following sections. 6.2 Utilities 48 Load 6.2 LOAD The LOAD utility is used for initial batch loading of a database. At present, it cannot be used to add records to an existing database - that must be done with the executive ADD command. Loading is a much more efficient operation than adding because the index entries are saved until all records have been stored in the data file, and then the index trees are built from the bottom up in a single pass. Thus, the index trees are perfectly balanced, leading to maximum efficiency in query processing. Input to the load utility consists of one or more commands followed by the data to be loaded in external editor format, with records separated by *e and the last record terminated by **. This is the format produced by the dump utility or the executive dump command. The commands that may precede the input data are - LINE, This sets the input line length to characters. If no line command is present, the line length defaults to 80 characters. FREE, This sets the amount of free space to be left on each index page to allow future database expansion without the possible consequence of unbalancing the index trees. If <> 0, percent free space will be left on each page. If = 0, the pages will be completely filled to achieve minimum index size for a static database. The default fill is 50 percent. LOAD This terminates the command stream and initiates the data stream. A LOAD command must precede the data even if no other commands are present in the input stream. These commands are terminated by a blank and hence must not contain embedded blanks. 6.3 Utilities 49 Dump 6.3 Dump The DUMP utility is used to dump an entire database in a format suitable for subsequent reloading. This is primarily useful if it is necessary to transmit the database to another site. To achieve maximum efficiency, the records are output in the (random) order in which they occur in the data file, rather than in record ID or key sequence. If it is desired to dump only selected parts of a database, this may be done with the executive find and dump commands. The DUMP utility reads from the input file commands specifying the format of the dump file. They are - LINE, This sets the output line length to characters. If no line command is present, the line length defaults to 80 characters. MODE, This selects the output format. For ll=1 or 2 a display format beginning with rec-id is used. For ll=3 or 4 a format suitable for using as input to the load utility is used. For a detailed description of the acceptable values of , see the description of the mode parameter to subroutine list in the programmer's manual. The default for is 4. DUMP This command terminates the input stream and initiates the dumping process. A DUMP command must be present even if it is not preceded by either of the other commands. These commands are terminated by a blank and hence must not contain embedded blanks. The output will be on local file TAPE4. 6.4 Utilities 50 Clean 6.4 Clean The clean utility is used to compress out of the data file the dead space resulting from update activity. The frequency with which this operation should be carried out depends on the nature and frequency of update activity for the database. This utility requires no input file. 6.5 Utilities 51 Balance 6.5 Balance The balance utility is used to rebalance the index trees. This operation may be necessary after extensive update activity in order to optimize query processing. The balance utility reads from the input file the following commands - FREE, This sets the amount of free space to be left on each index page to allow future database expansion without the possible consequence of unbalancing the index trees. If <> 0, percent free space will be left on each page. If = 0, the pages will be completely filled to achieve minimum index size for a static database. The default is 50 percent fill. BALANCE This command terminates the command stream and initiates the rebalancing process. The balance command must be present even if it is not preceded by a free command. These commands are terminated by a blank and therefore must not contain embedded blanks.