2 COBOL Fundamentals

This chapter describes the syntax, semantics and usage of the COBOL programming language as implemented by the current version of GnuCOBOL. For the rest of this document the Language is spelt as COBOL to ease reading however the compiler name retains the mixed case of GnuCOBOL.

This document is intended to serve as a full-function reference and user’s guide suitable for both those readers learning COBOL for the first time as usage as a training tool, as well as those already familiar with some dialects of the COBOL language.

A separate manual exists that just contains the details of the Cobol grammar as implemented in GnuCOBOL, which is designed strictly for experienced COBOL programmers and this is taken from this guide. This does NOT contain any training subject matter what so ever.

These extra manuals are: GnuCOBOL Quick Reference containing just the COBOL semantics / grammar in a short document while the other, GnuCOBOL Sample Programs, shows detailed example Cobol programs with indication of syntax used in each program.

For each implementation of the GnuCOBOL compiler the supplied files NEWS should also be read for any last minute updates along with files README and INSTALL for building the compiler.

2.1 The COBOL Language - The Basics

2.1.1 Language Reserved Words

Cobol programs consist of a sequence of words and symbols. Words, which consist of sequences of letters (upper- and/or lower-case), digits, dashes (’-‘) and/or underscores (’_‘) may have a pre-defined, specific, meaning to the compiler or may be invented by the programmer for his/her purposes.

The GnuCOBOL language specification defines well over 1130 Reserved Words — words to which the compiler assigns a special meaning. This list and number applies to the default list which covers many implementations. It is possible to limit the list to either a specific implementation via -std=xyz[-strict] or to manually unreserve words if they are used in existing sources as user-defined words.

Programmers may use a reserved word as part of a word they are creating themselves, but may not create their own word as an exact duplicate (without regard to case) of a COBOL reserved word. Note that a reserved word includes all classes, such as intrinsic functions, mnemonics names, system routines. The list of reserved words can be changed by adding or removing specific words for a given compile or as a default by use of the steering command -std= (dialect) and –conf= (users config file). See the specific config files that are by default, held in /usr/local/share/gnucobol/config. Also using the option ‘FUNCTION ALL INTRINSIC‘, will add another 100+ reserved words. These can be modified to match the requirements of a business or project team but be Warned, that these are updated when a new version of the compiler is built so might be more prudent to create your own configuation based on an existing one but with a different name.

In addition, you can add and/or remove reserved words by adding one of these options to cobc to add -freserved=<word> or, to remove, -fnot-reserved=<word>. As well as -freserved=<word>:<alias> to create an alias for a word as well as -fnot-register=<word> or -fregister=<word> to remove or add, a special register word.

See Appendix’s B for a complete list of GnuCOBOL reserved words and Appendix C - F (for grouped word lists).

For any given version of GnuCOBOL you can also list the full current set of reserved words by running cobc with --list-reserved, --list-intrinsic, --list-system as well as --list-mnemonics. Again subject to variation depending on usage of the --std line command.

2.1.2 User-Defined Words

When you write GnuCOBOL programs, you’ll need to create a variety of words to represent various aspects of the program, the program’s data and the external environment in which the program will run. This will include internal names by which data files will be referenced, data item names and names of executable logic procedures as section and paragraph names.

User-defined words may be composed from the characters ‘A‘ through ‘Z‘ (upper- and/or lower-case), ‘0‘ through ‘9‘, dash (’-‘) and underscore (’_‘). User-defined words may neither start nor end with hyphen or underscore characters.

Other programming languages provide the programmer with a similar capability of creating their own words (names) for parts of a program; COBOL is somewhat unusual when compared to other languages in that user-defined words may start with a digit.

With the exception of logic procedure names, which may consist entirely of nothing but digits, user-defined words must contain at least one letter.

The maximum size of a user defined word in Cobol is 31 characters as per the COBOL 2014 Standard but to help support other compilers it can be extended by the usage of -std (COBOL85 and ibm-strict has 30) to increase the limit to 63 characters. It must be pointed out that exceeding the standard limit will seriously restrict the ability of transferring any code written for GnuCOBOL to another brand of compiler without changing all such user defined words to 30 or 31. The whole art of writing using Cobol is to minimise the need to change any code over the years that your programs will be in use.

There are very many examples of programs written going back to the 1960’s that are still in operation around the world and the number of lines of Cobol code is estimated at 200 billion.

For example, this author (Vincent Coen) has code going back to the early 60’s with admittedly changes over the years, still in full operation, just take a look at the Contrib area and check out cobxref - Cobol Cross Reference listing tool (also on Sourceforge), dectrans - (Decision Translator), flightlog - Pilots Log Book (also on Sourceforge), and also in Sourceforge - ACAS (Applewood Computers Accounting System) - this one with the original code only, going back to 1967.

Of course many if not most of these applications have had many changes, upgrades etc, over the years, but it shows just how long programs written in Cobol have survived and gone on in full time use for some 60 years.

The point is that, when writing in Cobol, you should always consider is, will the code be transferrable to another system or compiler in the years to come, but without going over the top !

2.1.3 Case Insensitivity

All COBOL implementations allow the use of both upper and lower case letters in program coding. GnuCOBOL is completely insensitive to the case used when writing reserved words or user-defined names. Thus, AAAAA, aaaaa, Aaaaa and AaAaA are all the same word as far as GnuCOBOL is concerned.

The only time the case used does matter is within quoted character strings, where character values will be exactly as coded.

By convention throughout this document, COBOL reserved words will be shown entirely in UPPER-CASE while those words that were created by a programmer will be represented by tokens in mixed or lower case.

This isn’t a bad practice to use in actual programs, as it leads to programs where it is much easier to distinguish reserved words from user-defined ones!

2.1.4 Readability of Programs

Critics of COBOL frequently focus on the wordiness of the language, often citing the case of a so-called “Hello World” program as the “proof” that COBOL is so much more tedious to program in than more “modern” languages. This tedium is cited as such a significant impact to programmer productivity that, in their opinions, COBOL can’t go away quickly enough.

Here are two different “Hello World” applications, one written in Java and the second in GnuCOBOL. First, the Java version:

Class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello World!");
    }
}

And here is the same program, written in GnuCOBOL:

IDENTIFICATION DIVISION.
PROGRAM-ID. HelloWorld.
PROCEDURE DIVISION.
    DISPLAY "Hello World!".

Both of the above programs could have been written on a single line, if desired, and both languages allow a programmer to use (or not use) indentation as they see fit to improve program readability. Sounds like a tie so far.

Let’s look at how much more “wordy” COBOL is than Java. Count the characters in the two programs. The Java program has 95 (not counting carriage returns and any indentation). The COBOL program has 89 (again, not counting carriage returns and indentation)! Technically, it could have been only 65 because the IDENTIFICATION DIVISION. header is actually optional. Clearly, “Hello World” doesn’t look any more concise in Java than it does in COBOL.

Let’s look at a different problem. Surely a program that asks a user to input a positive integer, generates the sum of all positive integers from 1 to that number and then prints the result will be MUCH shorter and MUCH easier to understand when coded in Java than in COBOL, right?

You can be the judge. First, the Java version:

import java.util.Scanner;
public class sumofintegers {
    public static void main(String[] arg) {
        System.out.println("Enter a positive integer");
        Scanner scan=new Scanner(System.in);
        int n=scan.nextInt();
        int sum=0;
        for (int i=1;i<=n;i++) {
            sum+=i;
        }
        System.out.println("The sum is "+sum);
    }
}

And now for the COBOL version:

IDENTIFICATION DIVISION.
PROGRAM-ID. SumOfIntegers.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 n   BINARY-LONG.
01 i   BINARY-LONG.
01 sum BINARY-LONG VALUE 0.
PROCEDURE DIVISION.
DISPLAY "Enter a positive integer"
ACCEPT n
PERFORM VARYING i FROM 1 BY 1 UNTIL i > n
    ADD i TO sum
END-PERFORM
DISPLAY "The sum is " sum.

My familiarity with COBOL may be prejudicing my opinion, but it doesn’t appear to me that the Java code is any simpler than the COBOL code. In case you’re interested in character counts, the Java code comes in at 278 (not counting indentation characters). The COBOL code is 298 (274 without the IDENTIFICATION DIVISION. header).

Despite what you’ve seen here, the more complex the programming logic being implemented, the more concise the Java code will appear to be, even compared to 2002-standard COBOL. That conciseness comes with a price though — program code readability. Java (or C or C++ or C#) programs are generally intelligible only to trained programmers. COBOL programs can, however, be quite understandable by non-programmers. This is actually a side-effect of the “wordiness” of the language, where COBOL statements use natural English words to describe their actions. This inherent readability has come in handy many times throughout my career when I’ve had to learn obscure business (or legal) processes by reading the COBOL program code that supports them.

The “modern” languages, like Java, also have their own “boilerplate” infrastructure overhead that must be coded in order to write the logic that is necessary in the program. Take for example the public static void main(String[] arg) and import java.util.Scanner; statements. The critics tend to forget about this when they criticize COBOL for its structural “overhead”.

When it first was developed, COBOL’s easily-readable syntax made it profoundly different from anything that had been seen before. For the first time, it was possible to specify logic in a manner that was — at least to some extent — comprehensible even to non-programmers. Take for example, the following code written in FORTRAN — a language developed only a year before COBOL:

EXT = PRICE * IQTY
INVTOT = INVTOT + EXT

With its original limitation on the length of variable names (one- to six-character names comprised of a letter followed by up to five letters and/or digits), its implicit rule that variables were automatically created as real (floating-point) unless their name started with a letter in the range I-N, and its use of algebraic notation to express actions being taken, FORTRAN wasn’t a particularly readable language, even for programmers. Compare this with the equivalent COBOL code:

MULTIPLY price BY quantity GIVING extended-amount
ADD extended-amount TO invoice-total

Clearly, even a non-programmer could at least conceptually understand what was going on! Over time, languages like FORTRAN evolved more robust variable names, and COBOL introduced a more formula-based syntactical capability for arithmetic operations, but FORTRAN was never as readable as COBOL.

Because of its inherent readability, I would MUCH rather be handed an assignment to make significant changes to a COBOL program about which I know nothing than to be asked to do the same with a C, C++, C# or Java program.

Those that argue that it is too boring / wasteful / time-consuming / insulting (pick one) to have to code a COBOL program “from scratch” are clearly ignorant of the following facts:

  • Many systems have program-development tools available to ease the task of coding programs; those tools that concentrate on COBOL are capable of providing templates for much of the “overhead” verbiage of any program…

  • Good programmers have — for decades — maintained their own skeleton “template” programs for a variety of program types; simply load a template into a text editor and you’ve got a good start to the program…

  • Legend has it that there’s actually only been ONE program ever written in COBOL, and all programs ever “written” thereafter were simply derivatives of that one. Although this is clearly intended as a (probably) bad joke, it is nevertheless close to the very simple truth that many programmers”reuse” existing COBOL programs when creating new ones. There’s certainly nothing preventing this from happening with programs written in other languages, but it does seem to happen more in COBOL shops. It’s ironic that “code re-usability” is one of the arguments used to justify the existence of the “modern” languages.

2.1.5 Divisions Organize Programs

COBOL programs are structured into four major areas of coding, each with its own purpose. These four areas are known as divisions.

Each division may consist of a variety of sections and each section consists of one or more paragraphs. A paragraph consists of sentences, each of which consists of one or more statements.

This hierarchical structure of program components standardises the composition of all COBOL programs. Much of this manual describes the various divisions, sections, paragraphs and statements that may comprise any COBOL program.

2.1.6 Copybooks

COPY statement ( 3.2 COPY) A Copybook is a segment of program code that may be utilized by multiple programs simply by having those programs use the COPY statement to import that code. This code may define files, data structures or procedural code.

Today’s current programming languages have a statement (usually, this statement is named “import”, “include” or “#include”) that performs this same function. What makes the COBOL copybook feature different than the “include” facility in newer languages, however, is the fact that the COPY statement can edit the imported source code as it is being copied. This capability makes copybook libraries extremely valuable to making code reusable. Also see section 3. Compiler Directing Facility commands COPY and REPLACE.

2.1.7 Structured Data

A contiguous area of storage within the memory space of a program that may be referenced, by name, in a COBOL program is referred to as a Data Item. Other programming languages use the term variable, property or attribute to describe the same thing.

COBOL introduced the concept of structured data. The principle of structured data in COBOL is based on the idea of being able to group related and contiguously-allocated data items together into a single aggregate data item, called a Group Item. For example, a 35-character ‘Employee-Name’ group item might consist of a 20-character ‘Last-Name’ followed by a 14-character ‘First-Name’ and a 1-character ‘Middle-Initial’.

A data item that isn’t itself formed from other data items is referred to in COBOL as an Elementary Item. In the previous example, ‘Last-Name’, ‘First-Name’ and ‘Middle-Initial’ are all elementary items.

2.1.8 Files

One of COBOL’s strengths is the wide variety of data files it is capable of accessing. GnuCOBOL programs, like those created with other COBOL implementations, need to have the structure of any files they will be reading and/or writing described to them. The highest-level characteristic of a file’s structure is defined by specifying the organization of the file, as follows:

  • ORGANIZATION LINE SEQUENTIAL

    These are files with the simplest of all internal structures. Their contents are structured simply as a series of identically- or differently-sized data records, each terminated by a special end-of-record delimiter character. An ASCII line-feed character (hexadecimal 0A) is the end-of-record delimiter character used by any UNIX or pseudo-UNIX (MinGW, Cygwin, OSX) GnuCOBOL build. A truly native Windows build would use a carriage-return, line-feed (hexadecimal 0D0A) sequence.

    Records must be read from or written to these files in a purely sequential manner. The only way to read (or write) record number 100 would be to have read (or written) records number 1 through 99 first.

    When the file is written to by a GnuCOBOL program, the delimiter sequence will be automatically appended to each data record as it is written to the file. A WRITE ( 7.8.52 WRITE) to this type of file will be done as if a BEFORE ADVANCING 1 LINE clause were specified on the WRITE, if no ADVANCING clause is coded.

    When the file is read, the GnuCOBOL runtime system will strip the trailing delimiter sequence from each record. The data will be padded (on the right) with spaces if the data just read is shorter than the area described for data records in the program. If the data is too long, it will be truncated and the excess will be lost.

    These files should not be defined to contain any exact binary data fields because the contents of those fields could inadvertently have the end-of-record sequence as part of their values — this would confuse the runtime system when reading the file, and it would interpret that value as an actual end-of-record sequence.

    The following environment variables can have an effect on LINE SEQUENTIAL processing behaviour so may require changes to settings for:

    COB_LS_FIXED
    COB_LS_NULLS
    COB_LS_SPLIT
    
    When using Micro Focus compatable formats:
    COB-MF-LS-NULLS
    COB_MF_LS_INSTAB
    COB_MF_LS_SPLIT
    COB_MF_LS_VALIDATE
    

    See sections 10.2.3.4 and 10.2.3.7 for more information.

  • LINE ADVANCING

    These are files with an internal structure similar to that of a line sequential file. These files are defined (without an explicit ORGANIZATION specification) using the LINE ADVANCING clause on their SELECT statement ( 5.2.1 SELECT).

    When this kind of file is written to by a GnuCOBOL program, an end-of-record delimiter sequence will be automatically added to each data record as it is written to the file. A WRITE to this type of file will be done as if an AFTER ADVANCING 1 LINE clause were specified on the WRITE, if no ADVANCING clause is coded.

    Like line sequential files, these files should not be defined to contain any exact binary data fields because the contents of those fields could inadvertently have the end-of-record sequence as part of their values — this would confuse the runtime system when reading the file, and it would interpret that value as an actual end-of-record sequence.

  • ORGANIZATION SEQUENTIAL

    These files also have a simple internal structure. Their contents are structured simply as an arbitrarily-long sequence of data characters. This sequence of characters will be treated as a series of fixed-length records simply by logically splitting the sequence of characters up into fixed-length segments, each as long as the maximum record size defined in the program. There are no special end-of-record delimiter characters in the file and when the file is written to by a GnuCOBOL program, no delimiter sequence is appended to the data.

    Records in this type of file are all the same physical length, except possibly for the very last record in the file, which may be shorter than the others. If variable-length logical records are defined to the program, the space occupied by each physical record in the file will occupy the space described by the longest record description in the program.

    So, if a file contains 1275 characters of data, and a program defines the structure of that file as containing 100-character records, then the file contents will consist of twelve (12) 100-character records with a final record containing only 75 characters.

    It would appear that it should be possible to locate and process any record in the file directly simply by calculating its starting character position based upon the program-defined record size. Even so, however, records must be still be read or written to these files in a purely sequential manner. The only way to read (or write) record number 100 would be to have read (or written) records number 1 through 99 first.

    When the file is read, the data is transferred into the program exactly as it exists in the file. In the event that a short record is read as the very last record, that record will be padded (to the right) with spaces.

    Care must be taken that programs reading such a file describe records whose length is exactly the same as that used by the program that created the file. For example, the following shows the contents of a SEQUENTIAL file created by a program that wrote five 6-character records to it. The ‘A‘, ‘B‘, … values reflect the records that were written to the file:

    AAAAAA

    BBBBBB

    CCCCCC

    DDDDDD

    EEEEEE

    Now, assume that another program reads this file, but describes 10-character records rather than 6. Here are the records that program will read:

    AAAAAABBBB

    BBCCCCCCDD

    DDDDEEEEEE

    There may be times where this is exactly what you were looking for. More often than not, however, this is not desirable behaviour. Suggestion: use a copybook to describe the record layouts of any file; this guarantees that multiple programs accessing that file will “see” the same record sizes and layouts by coding a COPY statement ( 3.2 COPY) to import the record layout(s) rather than hand-coding them.

    These files can contain exact binary data fields. Because there is no character sequence that constitutes an end-of-record delimiter, the contents of record fields are irrelevant to the reading process.

  • ORGANIZATION RELATIVE

    The contents of these files consist of a series of fixed-length data records prefixed with a four-byte record header. The record header contains the length of the data, in bytes. The byte-count does not include the four-byte record header.

    Records in this type of file are all the same physical length. If variable-length logical records are defined to the program, the space occupied by each physical record in the file will occupy the maximum possible space, and the logical record length field will contain the number of bytes of data in the record that are actually in use.

    This file organization was defined to accommodate either sequential or random processing. With a RELATIVE file, it is possible to read or write record 100 directly, without having to have first read or written records 1-99. The GnuCOBOL runtime system uses the program-defined maximum record size to calculate a relative byte position in the file where the record header and data begin, and then transfers the necessary data to or from the program.

    When the file is written by a GnuCOBOL program, no delimiter sequence is appended to the data, but a record-length field is added to the beginning of each physical record.

    When the file is read, the data is transferred into the program exactly as it exists in the file.

    Care must be taken that programs reading such a file describe records whose length is exactly the same as that used by the programs that created the file. It won’t end well if the GnuCOBOL runtime library interprets a four-byte ASCII character string as a record length when it transfers data from the file into the program!

    Suggestion: use a copybook to describe the record layouts of any file; this guarantees that multiple programs accessing that file will “see” the same record sizes and layouts by coding a COPY statement ( 3.2 COPY) to import the record layout(s) rather than hand-coding them.

    These files can contain exact binary data fields. The contents of record fields are irrelevant to the reading process as there is no end-of-record delimiter.

  • ORGANIZATION INDEXED

    This is the most advanced file structure available to GnuCOBOL programs. It’s not possible to describe the physical structure of such files because that structure will vary depending upon which advanced file-management facility was included into the GnuCOBOL build you will be using (Berkeley Database [BDB], VBISAM, etc.). We will — instead — discuss the logical structure of the file.

    There will be multiple structures stored for an INDEXED file. The first will be a data component, which may be thought of as being similar to the internal structure of a relative file. Data records may not, however, be directly accessed by their record number as would be the case with a relative file, nor may they be processed sequentially by their physical sequence in the file.

    The remaining structures will be one or more index components. An index component is a data structure that (somehow) enables the contents of a field, called a primary key, within each data record (a customer number, an employee number, a product code, a name, etc.) to be converted to a record number so that the data record for any given primary key value can be directly read, written and/or deleted. Additionally, the index data structure is defined in such a manner as to allow the file to be processed sequentially, record-by-record, in ascending sequence of the primary key field values. Whether this index structure exists as a binary-searchable tree structure (b-tree), an elaborate hash structure or something else is pretty much irrelevant to the programmer — the behaviour of the structure will be as it was just described. The actual mechanism used will depend upon the advanced file-management package was included into your GnuCOBOL implementation when it was built.

    The runtime system will not allow two records to be written to an indexed file with the same primary key value.

    The capability exists for an additional field to be defined as what is known as an alternate key. Alternate key fields behave just like primary keys, allowing both direct and sequential access to record data based upon the alternate key field values, with one exception. That exception is the fact that alternate keys may be allowed to have duplicate values, depending upon how the alternate key field is described to the GnuCOBOL compiler.

    There may be any number of alternate keys, but each key field comes with a disk space penalty as well as an execution time penalty. As the number of alternate key fields increases, it will take longer and longer to write and/or modify records in the file.

    These files can contain exact binary data fields. The contents of record fields are irrelevant to the reading process as there is no end-of-record delimiter.

All files are initially described to a GnuCOBOL program using a SELECT statement ( 5.2.1 SELECT). In addition to defining a name by which the file will be referenced within the program, the SELECT statement will specify the name and path by which the file will be known to the operating system along with its organization, locking and sharing attributes.

A file description in the FILE SECTION ( 6.2 FILE SECTION) will define the structure of records within the file, including whether or not variable-length records are possible and, if so, what the minimum and maximum length might be. In addition, the file description entry can specify file I/O block sizes.

2.1.9 Table Handling

Other programming languages have arrays; COBOL has tables. They’re basically the same thing. There are two special statements that exist in the COBOL language — SEARCH and SEARCH ALL — that make finding data in a table easy.

SEARCH searches a table sequentially, stopping only when either a table entry matching one of any number of search conditions is found, or when all table entries have been checked against the search criteria and none matched any of those criteria.

SEARCH ALL performs an extremely fast search against a table sorted by a key field contained in each table entry. The algorithm used for such a search is a binary search. The algorithm ensures that only a small number of entries in the table need to be checked in order to find a desired entry or to determine that the desired entry doesn’t exist in the table. The larger the table, the more effective this search becomes. For example, a binary search of a table containing 32,768 entries will locate a particular entry or determine the entry doesn’t exist by looking at no more than fifteen (15) entries! The algorithm is explained in detail in the documentation of the SEARCH ALL statement ( 7.8.40 SEARCH ALL).

Finally, COBOL has the ability to perform in-place sorts of the data that is found in a table.

2.1.10 Sorting and Merging Data

The COBOL language includes a powerful SORT statement that can sort large amounts of data according to arbitrarily complex key structures. This data may originate from within the program or may be contained in one or more external files. The sorted data may be written automatically to one or more output files or may be processed, record-by-record in the sorted sequence.

A companion statement — MERGE — can combine the contents of multiple files together, provided those files are all pre-sorted in a similar manner according to the same key structure. The resulting output will consist of the contents of all of the input files, merged together and sequenced according to the common key structure(s). The output generated by a MERGE statement may be written automatically to one or more output files or may be processed internally by the program.

A special form of the SORT statement also exists just to sort the data that resides in a table. This is particularly useful if you wish to use SEARCH ALL against the table.

2.1.11 String Manipulation

There have been programming languages designed specifically for the processing of text strings, and there have been programming languages designed for the sole purpose of performing high-powered numerical computations. Most programming languages fall somewhere in the middle.

COBOL is no exception, although it does include some very powerful string manipulation capabilities; GnuCOBOL actually has even more string-manipulation capabilities than many other COBOL implementations. The following summarizes GnuCOBOL’s string-processing capabilities:

  • Concatenate two or more strings

  • Conversion of a numeric time or date to a formatted character string

  • Convert a binary value to its corresponding character in the program’s character set

    • CHAR intrinsic function ( 8.1.9 CHAR). Add 1 to argument before invoking the function; the description of the CHAR intrinsic function presents a technique utilizing the MOVE statement that will accomplish the same thing without the need of adding 1 to the numeric argument value first.

  • Convert a character string to lower-case

  • Convert a character string to upper-case

  • Convert a character string to only printable characters

  • Convert a character to its numeric value in the program’s character set

    • ORD intrinsic function ( 8.1.74 ORD). Subtract 1 from the result; the description of the ORD intrinsic function presents a technique utilizing the MOVE statement that will accomplish the same thing without the need of adding 1 to the numeric argument value first.

  • Count occurrences of sub strings in a larger string

  • Decode a formatted numeric string back to a numeric value

  • Determine the length of a string or data-item capable of storing strings

  • Extract a sub string from a string based on its starting character position and length

  • Format a numeric item for output, including thousands-separators (’,‘ in the USA), currency symbols (’$‘ in the USA), decimal points, credit/Debit Symbols, Leading Or Trailing Sign Characters

    • MOVE statement ( 7.8.28 MOVE) with picture-symbol editing applied to the receiving field:

  • Justification (left, right or centred) of a string field

  • Monoalphabetic substitution of one or more characters in a string with different characters

  • Parse a string, breaking it up into sub strings based upon one or more delimiting character sequences [1]

  • Removal of leading or trailing spaces from a string

  • Substitution of a single sub string with another of the same length, based upon the sub strings starting character position and length

  • Substitution of one or more sub strings in a string with replacement sub strings of the same length, regardless of where they occur

  • Substitution of one or more sub strings in a string with replacement sub strings of a potentially different length, regardless of where they occur

2.1.12 Screen Formatting Features

The COBOL2002 standard formalizes extensions to the COBOL language that allow for the definition and processing of text-based screens, as is a typical function on mainframe and midframe computers as well as on many point-of-sale (i.e. “cash register”) systems. GnuCOBOL implements virtually all the screen-handling features described by COBOL2002.

These features allow fields to be displayed at specific row/column positions, various colors and video attributes to be assigned to screen fields and the pressing of specific function keys (F1, F2, …) to be detectable. All of this takes place through the auspices of the SCREEN SECTION ( 6.7 SCREEN SECTION) and special formats of the ACCEPT statement ( 7.8.1 ACCEPT) and the DISPLAY statement ( 7.8.12 DISPLAY).

The COBOL2002 standard, and therefore GnuCOBOL, only covers textual user interface (TUI) screens (those comprised of ASCII characters presented using a variety of visual attributes) and not the more-advanced graphical user interface (GUI) screen design and processing capabilities built into most modern operating systems. There are subroutine-based packages available that can do full GUI presentation — most of which may be called by GnuCOBOL programs, with a moderate research time investment (Tcl/Tk, for example) — but none are currently included with GnuCOBOL.

2.1.12.1 A Sample Screen

A Sample Screen Produced by a GnuCOBOL Program:

Screens are defined in the screen section of the data division. Once defined, screens are used at run-time via the ACCEPT and DISPLAY statements.

2.1.12.2 Color Palette and Video Attributes

GnuCOBOL supports the following visual attribute specifications in the SCREEN SECTION ( 6.7 SCREEN SECTION):

  • Color

    Eight (8) different colors may be specified for both the background (screen) and foreground (text) color of any row/column position on the screen. Colors are specified by number, although a copybook supplied with all GnuCOBOL distributions ( screenio.cpy) defines COB-COLOR-<xxxxxx> names for the various colors so they may be specified as a more meaningful name rather than a number. The eight colors, by number, with the constant names defined in screenio.cpy, are as follows:

    • Black

      COB-COLOR-BLACK

    • Blue

      COB-COLOR-BLUE

    • Green

      COB-COLOR-GREEN

    • Cyan

      COB-COLOR-CYAN

    • Red

      COB-COLOR-RED

    • Magenta

      COB-COLOR-MAGENTA

    • Yellow

      COB-COLOR-YELLOW

    • White

      COB-COLOR-WHITE

  • Extended Colors

    This is a video feature that is dependent upon the curses package built into your version of GnuCOBOL. The actual results of extended color codes depends on the terminal’s capabilities and the underlying screenio library (and for PDCurses also the setting of COB_LEGACY parameter).

    With PDcurses library in a Windows operating system, you can use additional color codes, from 8 to 15, both for Foreground and for Background. In that case in a GnuCOBOL program you can use a palette with all following color codes (classic color codes from 0 to 7 and extended color codes from 8 to 15):

    *> Classic Colors
    78 COB-COLOR-BLACK         VALUE  0.
    78 COB-COLOR-BLUE          VALUE  1.
    78 COB-COLOR-GREEN         VALUE  2.
    78 COB-COLOR-CYAN          VALUE  3.
    78 COB-COLOR-RED           VALUE  4.
    78 COB-COLOR-MAGENTA       VALUE  5.
    *> 78 COB-COLOR-YELLOW     VALUE  6.
    78 COB-COLOR-BROWN         VALUE  6. *> Previous value for YELLOW
    78 COB-COLOR-WHITE         VALUE  7.
    
    *> Extended colors
    78 COB-COLOR-GREY          VALUE  8.
    78 COB-COLOR-LIGHT-BLUE    VALUE  9.
    78 COB-COLOR-LIGHT-GREEN   VALUE 10.
    78 COB-COLOR-LIGHT-CYAN    VALUE 11.
    78 COB-COLOR-LIGHT-RED     VALUE 12.
    78 COB-COLOR-LIGHT-MAGENTA VALUE 13.
    78 COB-COLOR-PINK          VALUE 13. *> same code as LIGHT-MAGENTA
    78 COB-COLOR-YELLOW        VALUE 14.
    78 COB-COLOR-LIGHT-WHITE   VALUE 15.
    
  • Before using this palette with classic and extended color codes you need to set the parameter “COB_LEGACY = 1” using one of the following modes:

    1. Into the runtime configuration file (See COB_LEGACY at 10.2.3.5.
       Screen I/O).
    2. Setting the environment variable at the operating system level.
    3. Setting the environment variable in the COBOL source file as following:
    

    SET ENVIRONMENT “COB_LEGACY” TO “1” In this last mode, The GnuCobol statement must be made at the beginning of the program and cannot be changed while the program is running.

    Using an extended color code from 8 to 15 without specifying the parameter COB_LEGACY = 1, will give you the corresponding “classic” color (a color code from 0 to 7) and the following attribute:

    1. Using FOREGROUND-COLOR and a color code from 8 to 15: the character
       will be HIGHLIGHT.
    2. Using BACKGROUND-COLOR and a color code from 8 to 15: the character
       will BLINK.
    

    Ncurses library in Unix operating system does not support extended colors at this time.

  • Summarizing.

    Without specifying the environment variable COB_LEGACY = 1, will give you, what follows:

    1. Using FOREGROUND-COLOR and a color code from 8 to 15: the character will be HIGHLIGHT (and that is the new color).

    2. Using BACKGROUND-COLOR and a color code from 8 to 15: the character will be the corresponding color coded from 0 to 7 and BLINK. (the character flashes on the screen)

    With specifying the parameter COB_LEGACY = 1 will give you what follows:

    3 Using FOREGROUND-COLOR and a color code from 8 to 15: the foreground character will be the new color.

    1. Using BACKGROUND-COLOR and a color code from 8 to 15: the background of character will be the new color.

  • Ncurses library in *nix operating systems does not support extended colors at this time. [ *nix includes Linux, OSX, etc ]

  • Text Brightness

    There are three possible brightness levels supported for text — lowlight (dim), normal and highlight (bright). Not all GnuCOBOL implementations will support all three (some treat lowlight the same as normal). The deciding factor as to whether two or three levels are supported lies with the version of the curses package that is being used. This is a utility screen-IO package that is included into the GnuCOBOL run-time library when the GnuCOBOL software is built.

    As a general rule of thumb, Windows implementations support two levels while Unix ones support all three.

  • Blinking

    This too is a video feature that is dependent upon the curses package built into your version of GnuCOBOL. If blinking is enabled in that package, text displayed in fields defined in the screen section as being blinking will endlessly cycle between the brightest possible setting (highlight) and an “invisible” setting where the text color matches that of the field background color. A Windows build, which generally uses the “pcurses” package, will uses a brighter-than-normal background color to signify “blinking”.

  • Reverse Video

    This video attribute simply swaps the foreground and background colors and display options.

  • Field Outlining

    It is possible, if supported by the curses package being used, to draw borders on the top, left and/or bottom edges of a field.

  • Secure Input

    If desired, screen fields used as input fields may defined as “secure” fields, where each input character (regardless of what was actually typed) will appear as an asterisk (*) character. The actual character whose key was pressed will still be stored into the field in the program, however. This is very useful for password or account number fields.

  • Prompt Character

    Input fields may have any character used as a fill character. These fill characters provide a visual indication of the size of the input field, and will automatically be transformed into spaces when the input field is processed by the program. If no such character is defined for an input field, an underscore (’_‘) will be assumed.

2.1.13 Report Writer Features

GnuCOBOL includes an implementation of the Report Writer Control System, or RWCS. The reportwriter module is now fully implemented as of version 3.0. This is a standardized, optional add-on feature to the COBOL language which automates much of the mechanics involved in the generation of printed reports by:

  1. Controlling the pagination of reports, including:

    1. The automatic production of a one-time notice on the first page of the report (report heading).

    2. The production of zero or more header lines at the top of every page of the report (page heading).

    3. The production of zero or more footer lines at the bottom of every page of the report (page footing).

    4. The automatic numbering of printed pages.

    5. The formatting of those report lines that make up the main body of the report (detail).

    6. Full awareness of where the “pen” is about to “write” on the current page, automatically forcing an eject to a new page, along with the automatic generation of a page footer to close the old page and/or a page header to begin the new one.

    7. The production of a one-time notice at the end of the last page of a report (report footing).

  2. Performing special reporting actions based upon the fact that the data being used to generate the report has been sorted according to one or more key fields:

    1. Automatically suppressing the presentation of one or more fields of data from the detail group when the value(s) of the field(s) duplicate those of the previously generated detail group. Fields such as these are referred to as group-indicate fields.

    2. Automatically causing suppressed detail group-indicate fields to re-appear should a detail group be printed on a new page.

    3. Recognizing when control fields on the report — fields tied to those that were used as SORT statement ( 7.8.42 SORT) keys — have changed. This is known as a control break. The RWCS can automatically perform the following reporting actions when a control break occurs:

      • Producing a footer, known as a control footing after the detail lines that shared the same old value for the control field.

      • Producing a header, known as a control heading before the detail lines that share the same new value for the control field.

  3. Perform data summarise, as follows:

    1. Automatically generating subtotals in control and/or report footings, summarizing values of any fields in the detail group.

    2. Automatically generating crossfoot totals in detail groups. These would be sums of two or more values presented in the detail group.

The REPORT SECTION ( 6.6 REPORT SECTION) documentation explores the description of reports and the PROCEDURE DIVISION ( 7 PROCEDURE DIVISION) chapter documents the various language statements that actually produce reports. Before reading these, you might find it helpful to read 9 Report Writer Usage, which is dedicated to putting the pieces together for you.

2.1.14 Data Initialization

There are three ways in which data division data gets initialized.

  1. When a program or subprogram is first executed, much of the data in its data division will be initialized as follows:

    • Alphanumeric and alphabetic (i.e. text) data items will be initialized to SPACES.

    • Numeric data items will be initialized to a value of ZERO.

    • Data items with an explicit VALUE ( 6.9.63 VALUE) clause in their definition will be initialized to that specific value.

    The various sections of the data division each have their own rules as to when the actions described above will occur — consult the documentation on those sections for additional information.

    These default initialization rules can vary quite substantially from one COBOL implementation to another. For example, it is quite common for data division storage to be initialized to all binary zeros except for those data items where VALUE clauses are present. Take care when working with applications originally developed for another COBOL implementation to ensure that GnuCOBOL’s default initialization rules won’t prove disruptive.

  2. A programmer may use the INITIALIZE statement ( 7.8.24 INITIALIZE) to initialise any group or elementary data item at any time. This statement provides far more initialization options than just the simple rules stated above.

  3. When the ALLOCATE statement ( 7.8.3 ALLOCATE) statement is used to allocate a data item or to simply allocate an area of storage of a size specified on the ALLOCATE, that allocation may occur with or without initialization, as per the programmer’s needs.

2.1.15 Syntax Diagram Conventions

Syntax of the GnuCOBOL language will be described in special syntax diagrams using the following syntactical-description techniques:

  • MANDATORY-RESERVED-WORD

  • ~~~~~~~~~~~~~~~~~~~~~~~

    Reserved words of the COBOL language will appear in UPPER-CASE. When they appear underlined, as this one is, they are required reserved words.

  • OPTIONAL-RESERVED-WORD

    When reserved words appear without underlining, as this one is, they are optional; such reserved words are available in the language syntax merely to improve readability — their presence or absence has no effect upon the program.

  • ABBREVIATION

  • ~~~~

    When only a portion of a reserved word is underlined, it indicates that the word may either be coded in its full form or may be abbreviated to the portion that is underlined.

  • substitutable-items

    Generic terms representing user-defined substitutable items will be shown entirely in lower-case in syntax diagrams. When such items are referenced in text, they will appear as <substitutable-items>.

  • Complex-Syntax-Clause

    Items appearing in Mixed Case within a syntax diagram represent complex clauses of other syntax elements that may appear in that position. Some COBOL syntax gets quite complicated, and using a convention such as this significantly reduces the complexity of a syntax diagram. When such items are referenced in text, they will appear as <Complex-Syntax-Clause>.

  • [ ]

    Square bracket meta characters on syntax diagrams document language syntax that is optional. The [] characters themselves should not be coded. If a syntax diagram contains ‘a [b] c‘, the ‘a‘ and ‘c‘ syntax elements are mandatory but the ‘b‘ element is optional.

  • |

    Vertical bar meta characters on syntax diagrams document simple choices. The | character itself should not be coded. If a syntax diagram contains ‘a|b|c‘, exactly one of the items ‘a‘, ‘b‘ or ‘c‘ must be selected.

  • { xxxxxx }

  • { yyyyyy }

  • { zzzzzz }

    A vertical list of items, bounded by multiple brace characters, is another way of signifying a choice between a series of items where exactly one item must be selected. This form is used to show choices when one or more of the selections is more complex than just a single word, or when there are too many choices to present horizontally with ‘|‘ meta characters.

  • | xxxxxx |

  • | yyyyyy |

  • | zzzzzz |

    A vertical list of items, bounded by multiple vertical bar characters, signifies a choice between a series of items where one or more of the choices could be selected.

  • ...

    The … meta character sequence signifies that the syntax element immediately preceding it may be repeated. The … sequence itself should not be coded. If a syntax diagram contains a b... c, syntax element ‘a‘ must be followed by at least one ‘b‘ element (possibly more) and the entire sequence must be terminated by a ‘c‘ syntax element.

  • { }

    The braces (’{‘ and ‘}‘) meta characters may be used to group a sequence of syntax elements together so that they may be treated as a single entity. The {} characters themselves should not be coded. These are typically used in combination with the ‘|‘ or ‘...‘ meta characters.

  • $*^()-+=:"'<,>./

    Any of these characters appearing within a syntax diagram are to be interpreted literally, and are characters that must be coded — where allowed — in the statement whose format is being described. Note that a ‘.‘ character is a literal character that must be coded on a statement whereas a ‘...‘ symbol is the meta character sequence described above.

2.1.16 Format of Program Source Lines

Prior to the COBOL2002 standard, source statements in COBOL programs were structured around 80-column punched cards. This means that each source line in a COBOL program consisted of five different “areas”, defined by their column number(s).

As of the COBOL2002 standard, a second mode now exists for COBOL source code statements — in this mode of operation, COBOL statements may each be up to 255 characters long, with no specific requirements as to what should appear in which columns.

Of course, in keeping with the long-standing COBOL tradition of maintaining backwards compatibility with older standards, programmers (and, of course, compliant COBOL compilers) are capable of working in either mode. It is even possible to switch back and forth in the same program. The terms Fixed Format Modeand Free Format Modeare used to refer to these two modes of source code formatting.

The GnuCOBOL compiler (cobc) supports both of these source line format modes, defaulting to Fixed Format Mode lacking any other information.

The compiler can be instructed to operate in either mode in any of the following four ways:

  1. Using a compiler option switch — use the -fixed switch to start in Fixed Format Mode (remember that this is the default) or the -free switch to start in Free Format Mode.

  2. You may use the SOURCEFORMAT AS FIXED and SOURCEFORMAT AS FREE clauses of the >>SET CDF directive ( 3.6 >>SET) within your source code to switch to Fixed or Free Format Mode, respectively.

  3. You may use the >>FORMAT IS FIXED and FORMAT IS FREE clauses of the >>DEFINE CDF directive ( 3.4 >>DEFINE) within your source code to switch to Fixed or Free Format Mode, respectively.

  4. You may use the >>SOURCE CDF directive ( 3.7 >>SOURCE) to switch to Free Format Mode (>>SOURCE FORMAT IS FREE) or Fixed Format Mode (>>SOURCE FORMAT IS FIXED.

Using methods 2-4 above, you may switch back and forth between the two formats at will.

The last three options above are all equivalent; all three are supported by GnuCOBOL so that source code compatibility may be maintained with a wide variety of other COBOL implementations. With all three, if the compiler is currently in Fixed Format Mode, the >> must begin in column 8 or beyond, provided no part of the directive extends past column 72. If the compiler is currently in Free Format Mode, the >> may appear in any column, provided no part of the directive extends past column 255.

Depending upon which source format mode the compiler is in, you will need to follow various rules for the format mode currently in effect. These rules are presented in the upcoming paragraphs.

The following discussion presents the various components of every GnuCOBOL source line record when the compiler is operating in Fixed Format Mode. Remember that this is the default mode for the GnuCOBOL compiler.

  • 1-6

    Sequence Number Area

    Historically, back in the days when punched-cards were used to submit COBOL program source to a COBOL compiler, this part of a COBOL statement was reserved for a six-digit sequence number. While the contents of this area are ignored by COBOL compilers, it existed so that a program actually punched on 80-character cards could — if the card deck were dropped on the floor — be run through a card sorter machine and restored to its proper sequence. Of course, this isn’t necessary today; if truth be told, it hasn’t been necessary for a long time.

    12.1 Marking Changes in Programs, for discussion of a valuable use to which the sequence number area may be put today.

  • 7

    Indicator Area

    Column 7 serves as an indicator in which one of five possible values will appear — space, D (or d), - (dash), / or *. The meanings of these characters are as follows:

    • space

      No special meaning — this is the normal character that will appear in this area.

    • D/d

      The line contains a valid GnuCOBOL statement that is normally treated as a comment unless the program is being compiled in debugging mode.

    • *

      The line is a comment.

    • /

      The line is a comment that will also force a page eject in the compilation listing. While GnuCOBOL will honour such a line as a comment, it will not form-feed any generated listing.

      The line is a continuation of the previous line. These are needed only when an alphanumeric literal (quoted character string), reserved word or user-defined word are being split across lines.

  • 8-11

    Area A

    Language DIVISION, SECTION and paragraph section headers must begin in Area A, as must the level numbers 01, 77 in data description entries and the FD and SD file and SORT description headers.

  • 12-72

    Area B

    All other COBOL programming language components are coded in these columns.

  • 73-80

    Program Name Area

    This is another obsolete area of COBOL statements. This part of every statement also hails back to the day when programs were punched on cards; it was expected that the name of the program (or at least the first 8 characters of it) would be punched here so that — if a dropped COBOL source deck contained more than one program — that handy card sorter machine could be used to first separate the cards by program name and then sort them by sequence number. Today’s COBOL compilers (including GnuCOBOL) simply ignore anything past column 72.

    12.1 Marking Changes in Programs, for discussion of a valuable use to which the program name area may be put today.

2.1.17 Program Structure

Complete GnuCOBOL Program Syntax

[ IDENTIFICATION DIVISION. ]
  ~~~~~~~~~~~~~~~~~~~~~~~
  PROGRAM-ID|FUNCTION-ID.  name-1 [ Program-Options ] .
  ~~~~~~~~~~ ~~~~~~~~~~~
[ ENVIRONMENT DIVISION. ]
  ~~~~~~~~~~~ ~~~~~~~~
[ CONFIGURATION SECTION. ]
  ~~~~~~~~~~~~~ ~~~~~~~
[ SOURCE-COMPUTER.         Compilation-Computer-Specification . ]
  ~~~~~~~~~~~~~~~
[ OBJECT-COMPUTER.         Execution-Computer-Specification . ]
  ~~~~~~~~~~~~~~~
[ REPOSITORY.              Function-Specification... . ]
  ~~~~~~~~~~
[ SPECIAL-NAMES.           Program-Configuration-Specification . ]
  ~~~~~~~~~~~~~
[ INPUT-OUTPUT SECTION. ]
  ~~~~~~~~~~~~ ~~~~~~~
[ FILE-CONTROL.            General-File-Description... . ]
  ~~~~~~~~~~~~
[ I-O-CONTROL.             File-Buffering-Specification... . ]
  ~~~~~~~~~~~
[ DATA DIVISION. ]
  ~~~~~~~~~~~~~
[ FILE SECTION.            Detailed-File-Description... . ]
  ~~~~~~~~~~~~
[ WORKING-STORAGE SECTION. Permanent-Data-Definition... . ]
  ~~~~~~~~~~~~~~~ ~~~~~~~
[ LOCAL-STORAGE SECTION.   Temporary-Data-Definition... . ]
  ~~~~~~~~~~~~~ ~~~~~~~
[ LINKAGE SECTION.         Subprogram-Argument-Description... . ]
  ~~~~~~~ ~~~~~~~
[ REPORT SECTION.          Report-Description... . ]
  ~~~~~~ ~~~~~~~
[ SCREEN SECTION.          Screen-Layout-Definition... . ]
  ~~~~~~ ~~~~~~~
  PROCEDURE DIVISION [ { USING Subprogram-Argument...      } ]
  ~~~~~~~~~ ~~~~~~~~   { ~~~~~                             }
                       { CHAINING Main-Program-Argument... }
                         ~~~~~~~~
                     [   RETURNING identifier-1 ] .
[ DECLARATIVES. ]        ~~~~~~~~~
  ~~~~~~~~~~~~
[ Event-Handler-Routine... . ]
[ END DECLARATIVES. ]
  ~~~ ~~~~~~~~~~~~
  General-Program-Logic
[ Nested-Subprogram... ]
[ END PROGRAM|FUNCTION name-1 ]
  ~~~ ~~~~~~~ ~~~~~~~~

Each program consists of up to four Divisions (major groupings of sections, paragraphs and descriptive or procedural coding that all relate to a common purpose), named Identification, Environment, Data and Procedure.

  1. Not all divisions are needed in every program, but they must be specified in the order shown when they are used.

  2. The following points pertain to the identification division

    • The IDENTIFICATION DIVISION. header is always optional.

  3. The following points pertain to the environment division:

    • If both optional sections of this division are coded, they must be coded in the sequence shown.

    • Each of these sections consists of a series of specific paragraphs (SOURCE-COMPUTER and OBJECT-COMPUTER, for example). Each of these paragraphs serves a specific purpose. If no code is required for the purpose one of the paragraphs serves, the entire paragraph may be omitted.

    • If none of the paragraphs within one of the sections are coded, the section header itself may be omitted.

    • The paragraphs within each section may only be coded in that section, but may be coded in any order.

    • If none of the sections within the environment division are coded, the ENVIRONMENT DIVISION. header itself may be omitted.

  4. The following points pertain to the data division:

    • The data division consists of six optional sections — when used, those sections must be coded in the order shown in the syntax diagram.

    • Each of these sections consists of code which serves a specific purpose. If no code is required for the purpose one of those sections serves, the entire section, including its header, may be omitted.

    • If none of the sections within the data division are coded (a highly unlikely, but theoretically possible circumstance), the DATA DIVISION. header itself may be omitted.

  5. The following points pertain to the procedure division:

    • As with the other divisions, the procedure division may consist of sections and those sections may — in turn — consist of paragraphs. Unlike the other divisions, however, section and paragraph names are defined by the programmer, and there may not be any defined at all if the programmer so wishes.

    • Each Event-Handler-Routine will be a separate section devoted to trapping a particular run-time event. If there are no such sections coded, the DECLARATIVES. and END DECLARATIVES. lines may be omitted.

  6. A single file of COBOL source code may contain:

    • A portion of a program; these files are known as copybooks

    • A single program. In this case, the END PROGRAM or END FUNCTION statement is optional.

    • Multiple programs, separated from one another by END PROGRAM or END FUNCTION statements. The final program in such a source code file need not have an END PROGRAM or END FUNCTION statement.

  7. Subprogram ‘B‘ may be nested inside program ‘A‘ by including program B’s source code at the end of program A’s procedure division without an intervening END PROGRAM A. or END FUNCTION A. statement. For now, that’s all that will be said about nesting. 11.2 Independent vs Contained vs Nested Subprograms, for more information.

  8. Regardless of how many programs comprise a single GnuCOBOL source file, only a single output executable program will be generated from that source file when the file is compiled.

2.1.18 Comments

The following information describes how comments may be embedded into GnuCOBOL program source to provide documentation.

  • Comment Type Source Mode — Description

  • Blank Lines FIXED — Blank lines may be inserted as desired.

    FREE — Blank lines may be inserted as desired.

  • Full-line comments FIXED — An entire source line will be treated as a comment (and will be ignored by the compiler) by coding an asterisk (’*‘) in column seven (7).

    FREE — An entire source line will be treated as a comment (and will be ignored by the compiler) by coding the sequence ‘*>‘, starting in any column, as the first non-blank characters on the line.

  • Full-line comments with form-feed FIXED — An entire source line will be treated as a comment by coding a slash (’/‘) in column seven (7). Many COBOL compilers will also issue a form-feed in the program listing so that the ‘/‘ line is at the top of a new page. The GnuCOBOL compiler does not support this form-feed behaviour.

    The GnuCOBOL Interactive Compiler, or GCic, does support this form-feed behaviour when it generates program source listings! GCic (in Sample Programs), for the source and cross-reference listing (produced by GCic) of this program — you can see the effect of ‘/‘ there.

    FREE — There is no Free Source Mode equivalent to ‘/‘.

  • Partial-line comments FIXED — Any text following the character sequence ‘*>‘ on a source line will be treated as a comment. The ‘*‘ must appear in column seven (7) or beyond.

    FREE — Any text following the character sequence ‘*>‘ on a source line will be treated as a comment. The ‘*‘ may appear in any column.

  • Comments that may be treated as code, typically for debugging purposes FIXED — By coding a ‘D‘ in column 7

    (upper- or lower-case), an otherwise valid GnuCOBOL source line will be treated as a comment by the compiler.

    FREE — By specifying the character sequence ‘>>D‘ (upper- or lower-case) as the first non-blank characters on a source line, an otherwise valid GnuCOBOL source line will be treated as a comment by the compiler.

    Debugging statements may be compiled either by specifying the -fdebugging-line switch on the GnuCOBOL compiler or by adding the WITH DEBUGGING MODE clause to the SOURCE-COMPUTER paragraph.

2.1.19 Literals

Literals are constant values that will not change during the execution of a program. There are two fundamental types of literals — numeric and alphanumeric.

2.1.19.1 Numeric Literals

A numeric literal

  • Integers such as 1, 56, 2192 or -54.

  • Non-integer fixed point values such as 1.317 or -2.95.

  • Floating-point values using ‘E<nn>‘ notation such as 9.92E25, representing \(9.92 ~x~10^{25}\) (10 raised to the 25th power) or 5.7E-14, representing \(5.7~x~10^{-14}\) (10 raised to the -14th power). Both the mantissa (the number before the ‘E‘) and the exponent (the number after the ‘E‘) may be explicitly specified as positive (with a ‘+‘), negative (with a ‘-‘) or unsigned (and therefore implicitly positive). A floating-point literals value must be within the range \(-1.7~x~10^{308}\) to \(+1.7~x~10^{308}\) with no more than 15 decimal digits of precision.

  • Hexadecimal numeric literals

  • Null terminated literals

  • Raw C string using L"<characters>".

  • Binary using B#0 or 1.

  • Octal using O#0 - 7. (That is the letter ‘O‘).

  • Hexadecimal number using H# or X#0‘ - ‘F‘.

  • Boolean Literals (Standard) B"< character >".

  • Boolean Literals (Hexadecimal) BX"< hex character >".

  • National Literals (Standard) N"< character >" or NC"< character >".

  • National Literals (Hexadecimal) NX"< character >".

2.1.19.2 Alphanumeric Literals

An alphanumeric literal

An alphanumeric literal is not valid for use in arithmetic expressions unless it is first converted to its numeric computational equivalent; there are three numeric conversion intrinsic functions built into GnuCOBOL that can perform this conversion — NUMVAL ( 8.1.70 NUMVAL), NUMVAL-C ( 8.1.71 NUMVAL-C) and NUMVAL-F ( 8.1.73 NUMVAL-F).

Alphanumeric literals may take any of the following forms:

  • A sequence of characters enclosed by a pair of single-quote (’'‘)

  • A literal formed according to the same rules as for a string literal (above), but prefixed with the letter ‘Z‘ (upper- or lower-case) constitutes a zero-delimited string literal. These literals differ from ordinary string literals in that they will be explicitly terminated with a byte of hexadecimal value 00. These Zero-Delimited Alphanumeric Literals

  • A Hexadecimal Alphanumeric Literal

Alphanumeric literals too long to fit on a single line may be continued to the next line in one of two ways:

  1. If you are using Fixed Format Mode, the alphanumeric literal can be run right up to and including column 72. The literal may then be continued on the next line anywhere after column 11 by coding another quote or apostrophe (whichever was used to begin the literal originally). The continuation line must also have a hyphen (-)

         1         2         3         4         5         6         7
1234567890123456789012345678901234567890123456789012345678901234567890123

       01  LONG-LITERAL-VALUE-DEMO     PIC X(60) VALUE "This is a long l
      -                                                "ong literal that
      -                                                " must be continu
      -                                                "ed.".
  1. Regardless of whether the compiler is operating in Fixed or Free Format Mode, GnuCOBOL allows alphanumeric literals to be broken up into separate fragments. These fragments have their own beginning and ending quote/apostrophe characters and are “glued together” at compilation time using ‘&

         1         2         3         4         5         6         7
1234567890123456789012345678901234567890123456789012345678901234567890123

      01  LONG-LITERAL-VALUE-DEMO      PIC X(60) VALUE "This is a" &
                                        " long literal that must " &
                                                    "be continued.".

If your program is using Free Format Mode, there’s less need to continue long alphanumeric literals because statements may be as long as 255 characters.

Numeric literals may be split across lines just as alphanumeric literals are, using either of the above techniques and both reserved and user-defined words can be split across lines too (using the first technique). The continuation of numeric literals and user-defined/reserved words is provided merely to provide compatibility with older COBOL versions and programs, but should not be used with new programs — it just makes for ugly-looking programs.

2.1.19.3 Figurative Constants

Figurative constants are reserved words that may be used as literals anywhere the figurative constants value could be interpreted as an arbitrarily long sequence of the characters in question. When a specific length is required, such as would be the case with an argument to a subprogram, a figurative constant may not be used. Thus, the following are valid uses of figurative constants:

05 FILLER                PIC 9(10) VALUE ZEROS.
   ...
MOVE SPACES TO Employee-Name

But this is not:

CALL "SUBPGM" USING SPACES

The following are the GnuCOBOL figurative constants and their respective equivalent values.

  • ZERO

    This figurative constant has a value of numeric 0 (zero). ZEROS and ZEROES are both synonyms of ZERO.

  • SPACE

    This figurative constant has a value of one or more space characters. SPACES is a synonym of SPACE.

  • QUOTE

    This figurative constant has a value of one or more double-quote characters (“). QUOTES is a synonym of QUOTE.

  • LOW-VALUE

    This figurative constant has a value of one or more of whatever character occupies the lowest position in the program’s collating sequence as defined in the OBJECT-COMPUTER ( 5.1.2 OBJECT-COMPUTER) paragraph or — if no such specification was made — in whatever default character set the program is using (typically, this is the ASCII character set). LOW-VALUES is a synonym of LOW-VALUE.

    When the character set in use is ASCII with no collating sequence modifications, the LOW-VALUES figurative constant value is the ASCII NUL character. Because character sets can be redefined, however, you should not rely on this fact. Use the NULL figurative constant instead.

  • HIGH-VALUE

    This figurative constant has a value of one or more of whatever character occupies the highest position in the program’s collating sequence as defined in the OBJECT-COMPUTER paragraph or — if no such specification was made — in whatever default character set the program is using (typically, this is the ASCII character set). HIGH-VALUES is a synonym of HIGH-VALUE.

  • NULL

    A character comprised entirely of zero-bits (regardless of the programs collating sequence).

Programmers may create their own figurative constants via the SYMBOLIC CHARACTERS ( 5.1.3.4 Symbolic-Characters-Clause) clause of the SPECIAL-NAMES ( 5.1.3 SPECIAL-NAMES) paragraph.

2.1.20 Punctuation

A comma (’,‘)

The use of comma characters can cause confusion to a COBOL compiler if the DECIMAL POINT IS COMMA clause is used in the SPECIAL-NAMES ( 5.1.3 SPECIAL-NAMES) paragraph, as might be the case in Europe. The following statement, which calls a subroutine passing it two arguments (the numeric constants 1 and 2):

CALL "SUBROUTINE" USING 1,2

Would — with DECIMAL POINT IS COMMA in effect — actually be interpreted as a subroutine call with 1 argument (the non-integer numeric literal whose value is 1 and 2 tenths). For this reason, it is best to always follow a comma with a space.

The period character (’.‘)

The rules for where and when periods are needed in the procedure division are somewhat complicated. 2.2.12 Use of Periods, for the details.

2.1.21 Interfacing to Other Environments

Through the CALL statement, COBOL programs may invoke other COBOL programs serving as subprograms. This is quite similar to cross-program linkage capabilities provided by other languages. In GnuCOBOL’s case, the CALL facility is powerful enough to be tailored to the point where a GnuCOBOL program can communicate with operating system, database management and run-time library APIs, even if they weren’t written in COBOL themselves. 11.8.4 GnuCOBOL Main Programs CALLing C Subprograms, for an example of how a GnuCOBOL program could invoke a C-language subprogram, passing information back and forth between the two.

The fact that GnuCOBOL supports a full-featured two-way interface with C-language programs means that — even if you cannot access a library API directly — you could always do so via a small C “wrapper” program that is CALLed by a GnuCOBOL program.

2.2 Table References

COBOL uses parenthesis to specify the subscripts used to reference table entries (tables in COBOL are what other programming languages refer to as arrays).

For example, observe the following data structure which defines a 4 column by 3 row grid of characters:

01  GRID.
     05 GRID-ROW OCCURS 3 TIMES.
        10 GRID-COLUMN OCCURS 4 TIMES.
            15 GRID-CHARACTER       PIC X(1).

If the structure contains the following grid of characters:

A B C D
E F G H
I J K L

Then GRID-CHARACTER (2, 3) references the ‘G‘ and GRID-CHARACTER (3, 2) references the ‘J‘.

Subscripts may be specified as numeric (integer) literals, numeric (integer) data items, data items created with any of the picture-less integer USAGE ( 6.9.61 USAGE) specifications, USAGE INDEX data items or arithmetic expressions resulting in a non-zero integer value.

In the above examples, a comma is used as a separator character between the two subscript values; semicolons (;) are also valid subscript separator characters, as are spaces! The use of a comma or semicolon separator in such a situation is technically optional, but by convention most COBOL programmers use one or the other. The use of no separator character (other than a space) is not recommended, even though it is syntactically correct, as this practice can lead to programmer-unfriendly code. It isn’t too difficult to read and understand GRID-CHARACTER(2 3), but it’s another story entirely when trying to comprehend GRID-CHARACTER(I + 1 J / 3) (instead of GRID-CHARACTER(I + 1, J / 3)). The compiler accepts it, but too much of this would make my head hurt.

2.2.1 Qualification of Data Names

COBOL allows data names to be duplicated within a program, provided references to those data names may be made in such a manner as to make those references unique through a process known as qualification.

To see qualification at work, observe the following segments of two data records defined in a COBOL program:

01  EMPLOYEE.                     01  CUSTOMER.
    05 MAILING-ADDRESS.               05 MAILING-ADDRESS.
       10 STREET        PIC X(35).       10 STREET        PIC X(35).
       10 CITY          PIC X(15).       10 CITY          PIC X(15).
       10 STATE         PIC X(2).        10 STATE         PIC X(2).
       10 ZIP-CODE.                      10 ZIP-CODE.
          15 ZIP-CODE-5 PIC 9(5).           15 ZIP-CODE-5 PIC 9(5).
          15 FILLER     PIC X(4).           15 FILLER     PIC X(4).

Now, let’s deal with the problem of setting the CITY portion of an EMPLOYEEs MAILING-ADDRESS to ‘Philadelphia‘. Clearly, MOVE 'Philadelphia' TO CITY cannot work because the compiler will be unable to determine which of the two CITY fields you are referring to.

In an attempt to correct the problem, we could qualify the reference to CITY as MOVE 'Philadelphia' TO CITY OF MAILING-ADDRESS.

Unfortunately that too is insufficient because it still insufficiently specifies which CITY is being referenced. To truly identify which specific CITY you want, you’d have to code MOVE 'Philadelphia' TO CITY OF MAILING-ADDRESS OF EMPLOYEE.

Now there can be no confusion as to which CITY is being changed. Fortunately, you don’t need to be quite so specific; COBOL allows intermediate and unnecessary qualification levels to be omitted. This allows MOVE 'Philadelphia' TO CITY OF EMPLOYEE to do the job nicely.

If you need to qualify a reference to a table, do so by coding something like <identifier-1> OF <identifier-2> ( <subscript(s)> ).

The reserved word IN may be used in lieu of OF.

2.2.2 Reference Modifiers

Reference Modifier (Format 1) Syntax

identifier-1 [ OF|IN identifier-2 ] [ (subscript...) ] (start:[ length ])
               ~~ ~~

Reference Modifier (Format 2) Syntax

intrinsic-function-reference (start:[ length ])

The COBOL 1985 standard introduced the concept of a reference modifier to facilitate references to only a portion of a data item; GnuCOBOL fully supports reference modification.

The <start> value indicates the starting character position being referenced (character position values start with 1, not 0 as is the case in some programming languages) and <length> specifies how many characters are wanted.

If no <length> is specified, a value equivalent to the remaining character positions from <start> to the end of <identifier-1> or to the end of the value returned by the function will be assumed.

Both <start> and <length> may be specified as integer numeric literals, integer numeric data items or arithmetic expressions with an integer value.

Here are a few examples:

  • CUSTOMER-LAST-NAME (1:3)

    References the first three characters of CUSTOMER-LAST-NAME.

  • CUSTOMER-LAST-NAME (4:)

    References all character positions of CUSTOMER-LAST-NAME from the fourth onward.

  • FUNCTION CURRENT-DATE (5:2)

    References the current month as a 2-digit number in character form. 8.1.17 CURRENT-DATE, for more information.

  • Hex-Digits (Nibble + 1:1)

    Assuming that Nibble is a numeric data item with a value in the range 0-15, and Hex-Digits is a PIC X(16) item with a value of 0123456789ABCDEF, this converts that numeric value to a hexadecimal digit.

  • Table-Entry (6) (7:5)

    References characters 7 through 11 (5 characters in total) in the 6th occurrence of Table-Entry.

Reference modification may be used anywhere an identifier is legal, including serving as the receiving field of statements like MOVE ( 7.8.28 MOVE), STRING ( 7.8.45 STRING) and ACCEPT ( 7.8.1 ACCEPT), to name a few.

2.2.3 Arithmetic Expressions

Arithmetic-Expression Syntax

Unary-Expression-1 { **|^ } Unary-Expression-2
                   {  *|/ }
                   {  +|- }

Unary-Expression Syntax

{ [ +|- ] { ( Arithmetic-Expression-1 )          } }
{         { [ LENGTH OF ] { identifier-1       } } }
{         {   ~~~~~~ ~~   { literal-1          } } }
{         {               { Function-Reference } } }
{ Arithmetic-Expression-2                          }

Arithmetic expressions are formed using four categories of operations — exponentiation, multiplication & division, addition & subtraction, and sign specification.

In complex expressions composed of multiple operators and operands, a precedence of operation applies whereby those operations having a higher precedence are computed first before operations with a lower precedence.

As is the case in almost any other programming language, the programmer is always free to use pairs of parenthesis to enclose sub-expressions of complex expressions that are to be evaluated before other sub-expressions rather than let operator precedence dictate the sequence of evaluation.

In highest to lowest order of precedence, here is a discussion of each category of operation:

  • Level 1 (Highest) — Unary Sign Specification (+ and - with a single argument)

    The unary “minus” (-) operator returns the arithmetic negation of its single argument, effectively returning as its value the product of its argument and -1.

    The unary “plus” (+) operator returns the value of its single argument, effectively returning as its value the product of its argument and +1.

  • Level 2 — Exponentiation (** or ^)

    The value of the left argument is raised to the power indicated by the right argument. Non-integer powers are allowed. The ^ and ** operators are both supported to provide compatibility with programs written for other COBOL implementations.

  • Level 3 — Multiplication (*) and division (/)

    The * operator computes the product of the left and right arguments while the / operator computes the value of the left argument divided by the value of the right argument. If the right argument has a value of zero, expression evaluation will be prematurely terminated before a value is generated. This may cause program failure at run-time.

    A sequence of multiple 3rd-level operations (A * B / C, for example) will evaluate in strict left-to-right sequence if no parenthesis are used to control the order of evaluation.

  • Level 4 — Addition (+) or subtraction (-)

    The + operator calculates the sum of the left and right arguments while the - operator computes the value of the right argument subtracted from that of the left argument.

    A sequence of multiple 4th-level operations (A - B + C, for example) will evaluate in strict left-to-right sequence if no parenthesis are used to control the order of evaluation.

The syntactical rules of COBOL, allowing a dash (-) character in data item names, can lead to some ambiguity.

01  C        PIC 9 VALUE 5.
01  D        PIC 9 VALUE 2.
01  C-D      PIC 9 VALUE 7.
01  I        PIC 9 VALUE 0.
...
COMPUTE I=C-D+1

The COMPUTE ( 7.8.9 COMPUTE) statement will evaluate the arithmetic expression C-D+1 and then save that result in I.

What value will be stored in I? The number 4, which is the result of subtracting the value of D (2) from the value of C (5) and then adding 1? Or, will it be the number 8, which is the value of adding 1 to the value of data item C-D (7)?

The right answer is 8 — the value of data item C-D plus 1! Hopefully, that was the intended result.

The GnuCOBOL compiler actually went through the following decision-making logic when generating code for the COMPUTE Statement:

  1. Is there a data item named C-D defined? If so, use its value for the character sequence C-D.

  2. If there is no C-D data item, then are there C and D data items? If not, the COMPUTE statement is in error. If there are, however, then code will be generated to subtract the value of D from C and add 1 to the result.

Had there been at least one space to the left and/or the right of the -, there would have been no ambiguity — the compiler would have been forced to use the individual C and D data items.

To avoid any possible ambiguity, as well as to improve program readability, it’s considered good COBOL programming practice to always code at least one space to both the left and right of every operator in arithmetic expressions as well as the = sign on a COMPUTE.

Here are some examples of how the precedence of operations affects the results of arithmetic expressions (all examples use numeric literals, to simplify the discussion).

  • Expression Result Notes

  • 3 * 4 + 1

    13 * has precedence over +

  • 4 * 2 ^ 3 - 10

    22 2^3 is 8 (^ has precedence over *), times 4 is 32, minus 10 is 22.

  • (4 * 2) ^ 3 - 10

    502 Parenthesis provide for a recursive application of the arithmetic expression rules, effectively allowing you to alter the precedence of operations. 4 times 2 is 8 (the use of parenthesis “trumps” the exponentiation operator, so the multiplication happens first); 8 ^ 3 is 512, minus 10 is 502.

  • 5 / 2.5 + 7 * 2 - 1.15

    14.85 Integer and non-integer operands may be freely intermixed

Of course, arithmetic expression operands may be numeric data items (any USAGE except POINTER or PROGRAM POINTER) as well as numeric literals.

2.2.4 Conditional Expressions

Conditional expressions are expressions which identify the circumstances under which a program may take an action or cease taking an action. As such, conditional expressions produce a value of TRUE or FALSE.

There are seven types of conditional expressions, as discussed in the following sections.

2.2.5 Condition Names

These are the simplest of all conditions. Observe the following code:

05  SHIRT-SIZE               PIC 99V9.
    88 TINY                  VALUE 0 THRU 12.5
    88 XS                    VALUE 13 THRU 13.5.
    88 S                     VALUE 14, 14.5.
    88 M                     VALUE 15, 15.5.
    88 L                     VALUE 16, 16.5.
    88 XL                    VALUE 17, 17.5.
    88 XXL                   VALUE 18, 18.5.
    88 XXXL                  VALUE 19, 19.5.
    88 VERY-LARGE            VALUE 20 THRU 99.9.

The condition names TINY, XS, S, M, L, XL, XXL, XXXL and VERY-LARGE will have TRUE or FALSE values based upon the values within their parent data item (SHIRT-SIZE).

A program wanting to test whether or not the current SHIRT-SIZE value can be classified as XL could have that decision coded as a combined condition (the most complex type of conditional expression), as either:

IF SHIRT-SIZE = 17 OR SHIRT-SIZE = 17.5

- or -

IF SHIRT-SIZE = 17 OR 17.5

Or it could simply utilize the condition name XL as follows:

IF XL

2.2.6 Class Conditions

Class-Condition Syntax

identifier-1 IS [ NOT ] { NUMERIC          }
                  ~~~   { ~~~~~~~          }
                        { ALPHABETIC       }
                        { ~~~~~~~~~~       }
                        { ALPHABETIC-LOWER }
                        { ~~~~~~~~~~~~~~~~ }
                        { ALPHABETIC-UPPER }
                        { ~~~~~~~~~~~~~~~~ }
                        { OMITTED          }
                        { ~~~~~~~          }
                        { class-name-1     }

Class conditions evaluate the type of data that is currently stored in a data item.

  1. The NUMERIC class test considers only the characters ‘0‘, ‘1‘, … , ‘9‘ to be numeric; only a data item containing nothing but digits will pass a NUMERIC class test. Spaces, decimal points, commas, currency signs, plus signs, minus signs and any other characters except the digit characters will all fail NUMERIC class tests.

  2. The ALPHABETIC class test considers only upper-case letters, lower-case letters and spaces to be alphabetic in nature.

  3. The ALPHABETIC-LOWER and ALPHABETIC-UPPER class conditions consider only spaces and the respective type of letters to be acceptable in order to pass such a class test.

  4. The NOT option reverses the TRUE/FALSE value of the condition.

  5. Note that what constitutes a “letter” (or upper/lower case too, for that manner) may be influenced through the use of CHARACTER CLASSIFICATION specifications in the OBJECT-COMPUTER ( 5.1.2 OBJECT-COMPUTER) paragraph.

  6. Only data items whose USAGE ( 6.9.61 USAGE) is either explicitly or implicitly defined as DISPLAY may be used in NUMERIC or any of the ALPHABETIC class conditions.

  7. Some COBOL implementations disallow the use of group items or PIC A items with NUMERIC class conditions and the use of PIC 9 items with ALPHABETIC class conditions. GnuCOBOL has no such restrictions.

  8. The OMITTED class condition is used when it is necessary for a subprogram to determine whether or not a particular argument was passed to it. In such class conditions, <identifier-1> must be a linkage section item defined on the USING clause of the subprograms PROCEDURE DIVISION header. 7.1 PROCEDURE DIVISION USING, for additional information.

The <class-name-1> option allows you to test for a user-defined class. Here’s an example. First, assume the following SPECIAL-NAMES ( 5.1.3 SPECIAL-NAMES) definition of the user-defined class ‘Hexadecimal‘:

SPECIAL-NAMES.
    CLASS Hexadecimal IS '0' THRU '9', 'A' THRU 'F', 'a' THRU 'f'.

Now observe the following code, which will execute the 150-Process-Hex-Value procedure if Entered-Value contains nothing but valid hexadecimal digits:

IF Entered-Value IS Hexadecimal
    PERFORM 150-Process-Hex-Value
END-IF

2.2.7 Sign Conditions

Sign-Condition Syntax

identifier-1 IS [ NOT ] { POSITIVE }
                  ~~~   { ~~~~~~~~ }
                        { NEGATIVE }
                        { ~~~~~~~~ }
                        { ZERO     }
                          ~~~~

Sign conditions evaluate the numeric state of a data item defined with a PICTURE ( 6.9.37 PICTURE) and/or USAGE ( 6.9.61 USAGE) that supports numeric values.

  1. A POSITIVE or NEGATIVE class condition will be TRUE only if the value of <identifier-1> is strictly greater than or less than zero, respectively.

  2. A ZERO class condition can be passed only if the value of <identifier-1> is exactly zero.

  3. The NOT option reverses the TRUE/FALSE value of the condition.

2.2.8 Switch-Status Conditions

In the SPECIAL-NAMES paragraph, an external switch name can be associated with one or more condition names. These condition names may then be used to test the ON/OFF status of the external switch.

Here are the relevant sections of code in a program named testprog, which is designed to simply announce if SWITCH-1 is on:

...
ENVIRONMENT DIVISION.
SPECIAL-NAMES.
    SWITCH-1 ON STATUS IS Switch-1-Is-ON.
...
PROCEDURE DIVISION.
...
    IF Switch-1-Is-ON
        DISPLAY "Switch 1 Is On"
    END-IF
...

The following are two different command window sessions — the left on a Unix/Cygwin/OSX system and the right on a windows system — that will set the switch on and then execute the testprog program. Notice how the message indicating that the program detected the switch was set is displayed in both examples:

$ COB_SWITCH_1=ON           C:>SET COB_SWITCH_1=ON
$ export COB_SWITCH_1       C:>testprog
$ ./testprog                Switch 1 Is On
Switch 1 Is On              C:>
$

2.2.9 Relation Conditions

Relation-Condition Syntax

{ identifier-1            } IS [ NOT ] RelOp { identifier-2            }
{ literal-1               }      ~~~         { literal-2               }
{ arithmetic-expression-1 }                  { arithmetic-expression-2 }
{ index-name-1            }                  { index-name-2            }

RelOp Syntax

{ EQUAL TO                 }
{ ~~~~~                    }
{ EQUALS                   }
{ ~~~~~~                   }
{ GREATER THAN             }
{ ~~~~~~~                  }
{ GREATER THAN OR EQUAL TO }
{ ~~~~~~~      ~~ ~~~~~    }
{ LESS THAN                }
{ ~~~~                     }
{ LESS THAN OR EQUAL TO    }
{ ~~~~      ~~ ~~~~~       }
{ =                        }
{ >                        }
{ >=                       }
{ <                        }
{ <=                       }

These conditions evaluate how two different values “relate” to each other.

  1. When comparing one numeric value to another, the USAGE ( 6.9.61 USAGE) and number of significant digits in either value are irrelevant as the comparison is performed using the actual algebraic values.

  2. When comparing strings, the comparison is made based upon the program’s collating sequence. When the two string arguments are of unequal length, the shorter is assumed to be padded (on the right) with a sufficient number of spaces as to make the two strings of equal length. String comparisons take place on a corresponding character-by-character basis, left to right, until the TRUE/FALSE value for the relation test can be established. Characters are compared according to their relative position in the program’s COLLATING SEQUENCE (as defined in SPECIAL-NAMES ( 5.1.3 SPECIAL-NAMES)), not according to the bit-pattern values the characters have in storage.

  3. By default, the program’s COLLATING SEQUENCE will, however, be based entirely on the bit-pattern values of the various characters.

  4. There is no functional difference between using the wordy version (IS EQUAL TO, IS LESS THAN, …) versus the symbolic version (=, <, …) of the actual relation operators.

2.2.10 Combined Conditions

Combined Condition Syntax

[ ( ] Condition-1 [ ) ] { AND } [ ( ] Condition-2 [ ) ]
                        { ~~~ }
                        { OR  }
                        { ~~  }

A combined condition is one that computes a TRUE/FALSE value from the TRUE/FALSE values of two other conditions (which could themselves be combined conditions).

  1. If either condition has a value of TRUE, the result of ORing the two together will result in a value of TRUE. ORing two FALSE conditions will result in a value of FALSE.

  2. In order for AND to yield a value of TRUE, both conditions must have a value of TRUE. In all other circumstances, AND produces a FALSE value.

  3. When chaining multiple, similar conditions together with the same operator (OR/AND), and left or right arguments have common subjects, it is possible to abbreviate the program code. For example:

    IF ACCOUNT-STATUS = 1 OR ACCOUNT-STATUS = 2 OR ACCOUNT-STATUS = 7
    

    Could be abbreviated as:

    IF ACCOUNT-STATUS = 1 OR 2 OR 7
    
  4. Just as multiplication takes precedence over addition in arithmetic expressions, so does AND take precedence over OR in combined conditions. Use parenthesis to change this precedence, if necessary. For example:

    • FALSE AND FALSE OR TRUE AND TRUE

      Evaluates to TRUE

    • (FALSE AND FALSE) OR (TRUE AND TRUE)

      Evaluates to TRUE (since AND has precedence over OR) - this is identical to the previous example

    • (FALSE AND (FALSE OR TRUE)) AND TRUE

      Evaluates to FALSE

2.2.11 Negated Conditions

Negated Condition Syntax

NOT Condition-1
~~~

A condition may be negated by prefixing it with the NOT operator.

  1. The NOT operator has the highest precedence of all logical operators, just as a unary minus sign (which “negates” a numeric value) is the highest precedence arithmetic operator.

  2. Parenthesis must be used to explicitly signify the sequence in which conditions are evaluated and processed if the default precedence isn’t desired. For example:

    • NOT TRUE AND FALSE AND NOT FALSE

      Evaluates to FALSE AND FALSE AND TRUE which evaluates to FALSE

    • NOT (TRUE AND FALSE AND NOT FALSE)

      Evaluates to NOT (FALSE) which evaluates to TRUE

    • NOT TRUE AND (FALSE AND NOT FALSE)

      Evaluates to FALSE AND (FALSE AND TRUE) which evaluates to FALSE

2.2.12 Use of Periods

All COBOL implementations distinguish between sentences and statements in the procedure division. A Statement is a single executable COBOL instruction. For example, these are all statements:

MOVE SPACES TO Employee-Address
ADD 1 TO Record-Counter
DISPLAY "Record-Counter=" Record-Counter

Some COBOL statements have a scope of applicability associated with them where one or more other statements can be considered to be part of or related to the statement in question. An example of such a situation might be the following, where the interest on a loan is being calculated and displayed at 4% interest if the loan balance is under $10,000, and 4.5% otherwise. (WARNING: the following code has an error!):

IF Loan-Balance < 10000
    MULTIPLY Loan-Balance BY 0.04 GIVING Interest
ELSE
    MULTIPLY Loan-Balance BY 0.045 GIVING Interest
DISPLAY "Interest Amount = " Interest

In this example, the IF statement actually has a scope that can include two sets of associated statements: one set to be executed when the IF ( 7.8.23 IF) condition is TRUE, and another if it is FALSE.

Unfortunately, there’s a problem with the above. A human being looking at that code would probably infer that the DISPLAY ( 7.8.12 DISPLAY) statement, because of its lack of indentation, is to be executed regardless of the TRUE/FALSE value of the IF condition. Unfortunately, the compiler (any COBOL compiler) won’t see it that way because it really couldn’t care less what sort of indentation, if any, is used. In fact, any COBOL compiler would be just as happy to see the code written like this:

IF Loan-Balance < 10000 MULTIPLY Loan-balance
BY 0.04 GIVING Interest ELSE MULTIPLY
Loan-Balance BY 0.045 GIVING Interest DISPLAY
"Interest Amount = " Interest

How then do we inform the compiler that the DISPLAY statement is outside the scope of the IF?

That’s where sentences come in.

A COBOL Sentence is defined as any arbitrarily long sequence of statements, followed by a period (.) character. The period character is what terminates the scope of a set of statements. Therefore, our example should have been coded like this:

IF Loan-Balance < 10000
    MULTIPLY Loan-Balance BY 0.04 GIVING Interest
ELSE
    MULTIPLY Loan-Balance BY 0.045 GIVING Interest.
DISPLAY "Interest Amount = " Interest

See the period at the end of the second MULTIPLY ( 7.8.29 MULTIPLY)? That is what terminates the scope of the IF, thus making the DISPLAY statement’s execution completely independent of the TRUE/FALSE status of the IF.

2.2.13 Use of VERB/END-VERB Constructs

Prior to the 1985 COBOL standard, using a period character was the only way to signal the end of a statement’s scope.

Unfortunately, this caused some problems. Take a look at this code:

IF A = 1
    IF B = 1
        DISPLAY "A & B = 1"
ELSE *> This ELSE has a problem!
    IF B = 1
        DISPLAY "A NOT = 1 BUT B = 1"
    ELSE
        DISPLAY "NEITHER A NOR B = 1".

The problem with this code is that indentation — so critical to improving the human-readability of a program — can provide an erroneous view of the logical flow. An ELSE is always associated with the most-recently encountered IF; this means the emphasized ELSE will be associated with the IF B = 1 statement, not the IF A = 1 statement as the indentation would appear to imply.

This sort of problem led to a band-aid solution being added to the COBOL language: the NEXT SENTENCE clause:

IF A = 1
    IF B = 1
        DISPLAY "A & B = 1"
    ELSE
        NEXT SENTENCE
ELSE
    IF B = 1
        DISPLAY "A NOT = 1 BUT B = 1"
    ELSE
        DISPLAY "NEITHER A NOR B = 1".

NEXT SENTENCE informs the compiler that if the B = 1 condition is false, control should fall into the first statement that follows the next period.

With the 1985 standard for COBOL, a much more elegant solution was introduced. Any COBOL Verb (the first reserved word of a statement) that needed such a thing was allowed to use an END-<verb> construct to end its scope without disrupting the scope of any other statement it might have been in. Any COBOL 85 compiler would have allowed the following solution to our problem:

IF A = 1
    IF B = 1
        DISPLAY "A & B = 1"
    END-IF
ELSE
    IF B = 1
        DISPLAY "A NOT = 1 BUT B = 1"
    ELSE
        DISPLAY "NEITHER A NOR B = 1".

This new facility made the period almost obsolete, as our program segment would probably be coded like this today:

IF A = 1
    IF B = 1
        DISPLAY "A & B = 1"
    END-IF
ELSE
    IF B = 1
        DISPLAY "A NOT = 1 BUT B = 1"
    ELSE
        DISPLAY "NEITHER A NOR B = 1"
    END-IF
END-IF

COBOL (GnuCOBOL included) still requires that each procedure division paragraph contain at least one sentence if there is any executable code in that paragraph, but a popular coding style is now to simply code a single period right before the end of each paragraph.

The standard for the COBOL language shows the various END-<verb> clauses are optional because using a period as a scope-terminator remains legal.

If you will be porting existing code over to GnuCOBOL, you’ll find it an accommodating facility capable of conforming to whatever language and coding standards that code is likely to use. If you are creating new GnuCOBOL programs, however, I would strongly counsel you to use the END-<verb> structures in those programs.

2.2.14 Concurrent Access to Files

The manipulation of data files is one of the COBOL language’s great strengths. There are features built into COBOL to deal with the possibility that multiple programs may be attempting to access the same file concurrently. Multiple program concurrent access is dealt with in two ways — file sharing and record locking.

Not all GnuCOBOL implementations support file sharing and record-locking options. Whether they do or not depends upon the operating system they were built for and the build options that were used when the specific GnuCOBOL implementation was generated.

2.2.15 File Sharing

GnuCOBOL controls concurrent-file access at the highest level through the concept of file sharing, enforced when a program attempts to open a file. This is accomplished via a UNIX operating-system routine called fcntl. That module is not currently supported by Windows and is not present in the MinGW Unix-emulation package. GnuCOBOL builds created using a MinGW environment will be incapable of supporting file-sharing controls — files will always be shared in such environments. A GnuCOBOL build created using the Cygwin environment on Windows would have access to fcntl and therefore will support file sharing. Of course, actual Unix builds of GnuCOBOL, as well as OSX builds, should have no issues because fcntl should be available.

Any limitations imposed on a successful OPEN ( 7.8.30 OPEN) will remain in place until your program either issues a CLOSE ( 7.8.7 CLOSE) against the file or the program terminates.

File sharing is controlled through the use of a SHARING clause:

SHARING WITH { ALL OTHER }
~~~~~~~      { ~~~       }
             { NO OTHER  }
             { ~~        }
             { READ ONLY }
               ~~~~ ~~~~

This clause may be used either in the file’s SELECT statement ( 5.2.1 SELECT), on the OPEN statement ( 7.8.30 OPEN) which initiates your program’s use of the file, or both. If a SHARING option is specified in both places, the specifications made on the OPEN statement will take precedence over those from the SELECT statement.

Here are the meanings of the three options:

  • ALL OTHER

    When your program opens a file with this sharing option in effect, no restrictions will be placed on other programs attempting to OPEN the file after your program did. This is the default sharing mode.

  • NO OTHER

    When your program opens a file with this sharing option in effect, your program announces that it is unwilling to allow any other program to have any access to the file as long as you are using that file; OPEN attempts made in other programs will fail with a file status of 37 (PERMISSION DENIED) until such time as you CLOSE ( 7.8.7 CLOSE) the file.

  • READ ONLY

    Opening a file with this sharing option indicates you are willing to allow other programs to OPEN the file for input while you have it open. If they attempt any other OPEN, theirs will fail with a file status of 37. Of course, your program may fail if someone else got to the file first and opened it with a sharing option that imposed file-sharing limitations.

If the SELECT of a file is coded with a FILE STATUS clause, OPEN failures — including those induced by sharing failures — will be detectable by the program and a graceful recovery (or at least a graceful termination) will be possible. If no such clause was coded, however, a runtime message will be issued and the program will be terminated.

2.2.16 Record Locking

Record-locking is supported by advanced file-management software built-in to the GnuCOBOL implementation you are using. This software provides a single point-of-control for access to files — usually ORGANIZATION INDEXED files. One such runtime package capable of doing this is the Berkeley Database (BDB) package — a package frequently used in GnuCOBOL builds to support indexed files.

The various I/O statements your program can execute are capable of imposing limitations on access by other concurrently-executing programs to the file record they just accessed. These limitations are syntactically imposed by placing a lock on the record using a LOCK clause. Other records in the file remain available, assuming that file-sharing limitations imposed at the time the file was opened didn’t prevent access to the entire file.

  1. If the GnuCOBOL build you are using was configured to use the Berkeley Database (BDB) package for indexed file I/O, record locking will be available by using the run-time environment variable.

  2. If the SELECT ( 5.2.1 SELECT) statement or file OPEN ( 7.8.30 OPEN) specifies SHARING WITH NO OTHER, record locking will be disabled.

  3. If the file’s SELECT contains a LOCK MODE IS AUTOMATIC clause, every time a record is read from the file, that record is automatically locked. Other programs may access other records within the file, but not a locked record.

  4. If the file’s SELECT contains a LOCK MODE IS MANUAL clause, locks are placed on records only when a READ statement executed against the file includes a LOCK clause (this clause will be discussed shortly).

  5. If the LOCK ON clause is specified in the file’s SELECT, locks (either automatically or manually acquired) will continue to accumulate as more and more records are read, until they are explicitly released. This is referred to as multiple record locking.

    Locks acquired via multiple record locking remain in effect until the program holding the lock…

    • …terminates, or …

    • …executes a CLOSE statement ( 7.8.7 CLOSE) against the file, or …

    • …executes an UNLOCK statement ( 7.8.50 UNLOCK) against the file, or …

    • …executes a COMMIT statement ( 7.8.8 COMMIT) or …

    • …executes a ROLLBACK statement ( 7.8.38 ROLLBACK).

  6. If the LOCK ON clause is not specified, then the next I/O statement your program executes, except for START ( 7.8.43 START), will release the lock. This is referred to as single record locking.

  7. A LOCK clause, which may be coded on a READ ( 7.8.32 READ), REWRITE ( 7.8.37 REWRITE) or WRITE statement ( 7.8.52 WRITE) looks like this:

    { IGNORING LOCK    }
    { ~~~~~~~~ ~~~~    }
    { WITH [ NO ] LOCK }
    {        ~~   ~~~~ }
    { WITH KEPT LOCK   }
    {      ~~~~ ~~~~   }
    { WITH IGNORE LOCK }
    {      ~~~~~~ ~~~~ }
    { WITH WAIT        }
           ~~~~
    

    The WITH [ NO ] LOCK option is the only one available to REWRITE or WRITE statements.

    The meanings of the various record locking options are as follows:

    • IGNORING LOCK

    • WITH IGNORE LOCK

      These options (which are synonymous) inform GnuCOBOL that any locks held by other programs should be ignored.

    • WITH LOCK

      Access to the record by other programs will be denied.

    • WITH NO LOCK

      The record will not be locked. This is the default for all statements.

    • WITH KEPT LOCK

      When single record locking is in effect, as a new record is accessed, locks held for previous records are released. By using this option, not only is the newly accessed record locked (as WITH LOCK would do), but prior record locks will be retained as well. A subsequent READ without the KEPT LOCK option will release all “kept” locks, as will the UNLOCK statement.

    • WITH WAIT

      This option informs GnuCOBOL that the program is willing to wait for a lock held (by another program) on the record being read to be released.

      Without this option, an attempt to read a locked record will be immediately aborted and a file status of 51 will be returned.

      With this option, the program will wait for a preconfigured time for the lock to be released. If the lock is released within the preconfigured wait time, the read will be successful. If the preconfigured wait time expires before the lock is released, the read attempt will be aborted and a 51 file status will be issued.