13 Programming for XFD

This chapter deals with the initial support for ODBC and OCI as file-handlers (so far PostgrSQL, MySQL, SQLite, MSSQL) and OCI, along with new directory COB_SCHEMA_DIR containing the necessary internal schema files to match the file definition to the database table all though usage of $XDF.

13.1 GnuCobol use SQL for files

The ACUCOBOL compiler has a defined syntax for describing records that can be stored in SQL tables. This is referred to as XFD (extended file descriptor). The various XFD directives are intermixed in the record description of a data file as either $XFD (similar to $SET) or as comments *((XFD … )). Each directive precedes the data item it is affecting. When a program is compiled that has these directives and the compiler is directed by a compile option, then the record layout is written out to a text file which is later used at run-time to process the data records. i.e., facilitate mapping the COBOL data fields to/from SQL columns.

The goal of this project is for GnuCOBOL to accept directives similar in syntax to the ACUCOBOL XFD syntax1 (plus a few additions as required) and develop run-time modules to handle all of the COBOL I/O verbs while the data is stored in a SQL table.

The purpose for doing this is to allow legacy COBOL to have data stored in SQL tables without rewriting the COBOL code. If a COBOL based application is already using EXEC SQL this feature of having INDEXED files stored in SQL tables is really not of any use. Once the data is in SQL tables it may be accessed by other SQL tools. To get access to more advanced SQL features, the COBOL code will need to be recoded to use EXEC SQL, but this can be done at points which provide the most advantage.

At present both INDEXED and RELATIVE files are supported using either an ODBC or OCI (Oracle Call Interface) interface. RELATIVE files are supported by adding a special column that gets a unique number as the RELATIVE KEY. The intention is that you could store the data for the ‘master data files’ of an application in an SQL database. Often COBOL based applications have intermediate data files and it would be a waste of time to place these into the database. Also keep in mind that the performance of ISAM will be much better than using an SQL database. However, SQL provides the opportunity to use ‘ad hoc inquiry’ and other third party tools to process your data.

When directed by the XFD directives, the GnuCOBOL compiler will write out the CREATE TABLE statements that could be used to represent the file. The GnuCOBOL compiler will also write out a data description file which is read at runtime and used to manage the conversion of data between the COBOL record view and the SQL column view. The format of this data description file is not exactly the same as what ACUCOBOL had used since the file is easily regenerated. The more important feature is to accept similar XFD directives intermixed in the record description.

XFD Directives

The following directives are accepted intermixed in the COBOL record description. The KEY IS clauses from SELECT statements will be used to generate SQL INDEX statements.

$XFD Name XFD Directive Descriptions
ALL Indicates this is a special module defining all files in an application system so all files in this module should be defined for use via SQL
FILE Name to be used for SQL table, filename.ddl and filename.xd instead of guessing from the SELECT/ASSIGN. If FILE is not provided then if ASSIGN “filename” is used that is taken, other wise the SELECT name is used for the table name and the .ddl and .xd files.
ALPHA Next field is handled as a single CHAR type. If a group item the entire group is a single CHAR column. May optionally have alternate name.
BINARY Next field handled as BINARY (RAW) so arbitrary hex data is stored. If a group item the entire group becomes a single BINARY column. May optionally have alternate name
CHAR Next field is handled as a single CHAR type. If a group item the entire group is a single CHAR column. May optionally have alternate name.
DATE Next field is a date in the defined format
M - month (01 – 12), Y - year (2 or 4 digit), D - day of month (01-31)
H - hour (00 -23), N – minute, S – second, T - hundredths of a second
J - Julian day, E - day of year
YY%nn – ‘nn’ is the pivot year
YYY+nnnn – ‘nnnn’ is added to YYY to compute the actual year
GROUP type The next field or group item is taken as a single SQL column of the declared type
NAME Name to be used for the next SQL column
NUMERIC Next field is handled as NUMERIC, may optionally have alternate name
USE type The word ‘USE’ is essentially ignore and the next word defines the field type
VARCHAR Emit next field as VARCHAR instead of CHAR
VAR-LENGTH Emit next field as VARCHAR instead of CHAR
WHEN fld op val ‘fld’ is some previous field in the record, ‘val’ is a numeric value of alpha value in quotes, op is one of =, !=, >, >=, <, <=. ‘val’ may be ‘other’ to indicate other cases. If the condition is true then the following field/group format is used.
AND fld op val A continuation of the previous WHEN logically connect as an AND condition
OR fld op val A continuation of the previous WHEN logically connect as an OR condition

Name GnuCOBOL Compile time Option Descriptions
-fsqldb=name ’name’ indicates which database is to be used. Choose one of ODBC, MySQL, MSSQL, OCI, Oracle10, Oracle11, Oracle12.
-fsqlschema=name ’name’ is sub-directory into which the filename.ddl holding CREATE TABLE is written as well as the filename.xd holding the data mapping information.

If omitted then the ddl/xd files are written to the base ‘schema_dir’ directory.

Environment
Variable         runtime.cfg  GnuCOBOL Run time Option Descriptions

COB_SCHEMA_DIR   schema_dir   Define location where the filename.xd files are
                              stored. Default: $COB_CONFIG_DIR/schema
COB_SCHEMA_NAME  schema_name  Database schema name
COB_SCHEMA_UID   schema_uid   Database User-id for connecting to the database
COB_SCHEMA_PWD   schema_pwd   Password for database User-Id
COB_SCHEMA_DSN   schema_dsn   DSN name for database
COB_DEBUG_LOG                 Parameters for GnuCOBOL logging feature.
                              Module type is ‘db’

Implementation Plan - GnuCOBOL Developer Notes

To support $XFD directives changes were made to pplex.l, ppparse.y and scanner.l. Also codegen.c and field.c will need changes.

Additional flags may have to be added to cb_field to retain the XFD directive information.

In codegen.c (output_file_initialization) depending on use of XFD and compile options is the location to emit the CREATE TABLE and the data description file. The cob_file will get some new flags for io_routine to indicate the file is really handled by ODBC/OCI.

A new fileio routine (fodbc.c and later foci.c) were developed following the recently developed interface for multiple I/O handlers. The cob_load_xfd routine loads the data description file into internal structures. Multiple schemas could have the same table name/structure for development, QA, production, etc. This could be indicated via runtime.cfg options and/or environment variables. IO_asgname may have format=odbc or format=oci to indicate which method is used to access the SQL table for the file. You may also specify table=sql_table_name to override the default SQL table.

At run-time when the file is opened the file description is loaded into memory for fast access and conversion of the data between COBOL data type and SQL column data. Since SQL tables are usually collected under a database schema we’ll need a compile option to generate the external file description under either a default directory or under a given ‘schema’ name. At run-time an environment variable and/or runtime.cfg option could be used to define which database schema is to be used along with the database user-id and password. Often a database user-id has a default schema associated with it.

The module fsqlxfd.c contains routines common to both fodbc.c and foci.c such as loading of the filename.xd information and conversion of the data between SQL and COBOL data types. When the file is OPENed and the table is found to not exist, then cob_load_ddl loads the filename.ddl into memory for the OCI/ODBC routine to submit the CREATE TABLE/INDEX statements as required. As the table is loaded, if it contains column types not supported by the current database then there is a translation done during the load process.

Format of filename.xd

GnuCOBOL has its own format instead of using what ACUCOBOL had. The first character on each line indicates what the line is. Lines starting with # or * are comments. Each field is separated by a comma.

Name Header Line Description
H Is a header
1 Version number
table SQL table name
‘,’ For character used as decimal, default ‘,’
‘.’ For character used as comma, default ‘.’
Hex Indicate how numeric signs are handled
num File type, 3 for INDEXED, 2 for RELATIVE

Name Label Description
L Is a Label
Num Numeric label, may be referenced by ‘goto’ or ‘condition’

Name Goto Description
G Is a Goto
Num Numeric label to transfer to

Name Condition Description
C Is a Condition to be tested (Appear in postfix sequence) Complex WHEN clauses become multiple Condition tests
Label If condition test is true, then goto this label, 0 indicates more coming, non-zero indicates end of the WHEN condition to be tested
Opcode >=, >, <=, <, =, !=, &&, ||, !
Operand Usually the data field
Value String in quotes or numeric value

Name Date Format Description
D Date format definition
num Unique number for this format
‘date’ Date format string
num Total number of digits in this date field
num 1 if DATE present else 0
num 1 if TIME present else 0
yy rule + means add yyAdj to YY, % defines pivot point
num Value for adjusting the Year
P:L Year Position and Length within field
P:L Month Position and Length within field
P:L Day Position and Length within field
P:L Hours Position and Length within field
P:L Minutes Position and Length within field
P:L Seconds Position and Length within field
P:L Century date Position and Length within field

You may define how your application stores DATE information. The database will always expect dates to be in a full YYYYMMDD format and date/time to be in YYYYMMDDHHMISS format. Date fields could be defined like the following:

$XFD  DATE "YYYYMMDD"
$XFD  DATE "YY%60MMDD"
$XFD  DATE "YYY+1800MMDD"
$XFD  DATE "YYMMDDCC"

The Y is a place holder for YEAR, MM for month, DD for day, HH hour, MI minutes, SS seconds, CC for century. If the Ys are followed by ‘%’ then the digits after the ‘%’ defines a pivot year used to map the YY value into a 4 digit year. In the above example if the YY value is below 60 then it is 19YY else 20YY

If the Y’s are followed by ‘+’ then the digits after the ‘+’ are added to the Y value. In above example, the year is 1800 + YYY value. There is a limit of 16 different DATE formats per record.

If the day is defined like DDD (3 Ds) then it is taken to be the day of the year. For example:

$XFD DATE "YYYYDDD"

Name Data Description
F Define data Field
num Offset from record to start of COBOL data field
num Bytes occupied by COBOL data field
num Type of data field
1 Arbitrary Binary Data
2 PIC S9 COMP-5
3 PIC 9 COMP-5
4 PIC 9 COMP-6
5 PIC S9 BINARY/COMP/COMP-4
6 PIC 9 BINARY/COMP/COMP-4
7 PIC x COMP-X
8 COMP-1/COMP-2
9 PIC S9 COMP-3/PACKED DECIMAL
10 PIC 9 COMP-3/PACKED DECIMAL
11 PIC S9 SIGN LEADING
12 PIC S9 SIGN LEADING SEPARATE
13 PIC S9
14 PIC 9 SIGN TRAILING
15 PIC 9 SIGN TRAILING SEPARATE
16 PIC 9
17 PIC A
18 PIC X National characters
19 PIC X Wide characters
20 PIC X
21 PIC X - VARCHAR
num Bytes needed to store as SQL data
num Digits in numeric field
num Scale (or decimal places)
num Unique id of date format (see D)
num COBOL level number
name SQL Column name used for data field

Name Key Definitions Description
K Define an index
num Index number (1 – 16)
dup Y if duplicates allowed, else N
sup Y if key may be suppressed, else N
char Suppress character either as ‘c’ or 0xHH or “string”
columns Comma separate list of column names in order that make up the index

Example Records to Table With the following:

           SELECT OPTIONAL TSPFILE
           ASSIGN TO "testsql"
           ORGANIZATION INDEXED ACCESS DYNAMIC
           RECORD KEY IS PRIME-KEY
           SOURCE IS CM-CUST-NUM

           ALTERNATE RECORD KEY IS SPLIT-KEY2
           SOURCE IS CM-TELEPHONE,CM-MACHINE WITH DUPLICATES
                SUPPRESS WHEN "900"

           ALTERNATE RECORD KEY IS SPLIT-KEY3
           SOURCE IS CM-DISK,CM-DP-MGR,CM-MACHINE WITH DUPLICATES
                SUPPRESS WHEN ALL "*"
           FILE STATUS IS CUST-STAT
           .

           SELECT FLATFILE ASSIGN "relfile"
           ORGANIZATION RELATIVE
           ACCESS IS RANDOM RELATIVE KEY IS REC-NUM
           FILE STATUS IS CUST-STAT.

           SELECT FLATSEQ ASSIGN "relfile"
           ORGANIZATION RELATIVE
           ACCESS IS SEQUENTIAL RELATIVE KEY IS REC-NUM
           FILE STATUS IS CUST-STAT.

Note that FLATFILE and FLATSEQ are the same file called “relfile” with a different
ACCESS. Given the following record description:

$XFD ALL
 FD  FLATFILE
     BLOCK CONTAINS 5 RECORDS.

 01  FLAT-RECORD.
     10  RL-CUST-NUM                     PICTURE X(8).
     10  RL-COMPANY                      PICTURE X(25).
     10  RL-TRAILER                      PICTURE X(16).

 FD  FLATSEQ
     BLOCK CONTAINS 5 RECORDS.

 01  RS-RECORD.
     10  RS-CUST-NUM                     PICTURE X(8).
     10  RS-COMPANY                      PICTURE X(25).
     10  RS-TRAILER                      PICTURE X(16).

 FD  TSPFILE
     BLOCK CONTAINS 5 RECORDS.

$XFD NAME=tspfilex
 01  TSPFL-RECORD.
     05  TSPFL-REC.
$XFD USE GROUP CUSTNUM
     10  CM-CUST-NUM.
       15  CM-CUST-PRE                   PICTURE X(3).
       15  CM-CUST-NNN                   PICTURE X(5).
     10  CM-STATUS                       PICTURE X.
     10  CM-COMPANY                      PICTURE X(25).
$XFD USE GROUP VAR_LENGTH custaddr
     10  CM-ADDRESS.
       15  CM-ADDRESS-1                  PICTURE X(25).
       15  CM-ADDRESS-2                  PICTURE X(25).
       15  CM-ADDRESS-3                  PICTURE X(25).
     10  CM-TELEPHONE                    PICTURE 9(10).
     10  CM-DP-MGR                       PICTURE X(25).
     10  CM-MACHINE                      PICTURE X(8).
     10  CM-MEMORY                       PICTURE X(4).
$XFD WHEN (CM-STATUS = 'A' && CM-TELEPHONE > 100)
$XFD AND CM-MACHINE = 'B' || CM-COMPANY = ' '
     10  CM-MEMORYX REDEFINES CM-MEMORY.
       15  CM-MEMSZ                      PICTURE 9(2).
       15  CM-MEMUNIT                    PICTURE X(2).
     10  CM-DISK                         PICTURE X(8).
     10  CM-TAPE                         PICTURE X(8).
$XFD WHEN CM-STATUS = 'X'
     10  CM-TAPEX REDEFINES CM-TAPE      PICTURE 9(8).
$XFD WHEN CM-STATUS = 'Y'
     10  CM-TAPEY REDEFINES CM-TAPE      PICTURE 9(6)V99.
     10  CM-NO-TERMINALS                 PICTURE 9(5) BINARY.
     10  CM-COMPX                        PICTURE XXX COMP-X.
     10  CM-COMP5                        PICTURE 9(7) COMP-5.
     10  CM-COMP1                        COMP-1.
     10  CM-COMP2                        COMP-2.
     10  CM-PRICE                        PICTURE 9(3)V99 COMP-3.
     10  CM-PRICES                       PICTURE S9(5)V99.
$XFD DATE "MMDDYYYY"
     10  CM-DATE                        PICTURE 9(8) COMP-3.
$XFD DATE "YYMMDDCC"
     10  CM-DATE2                        PICTURE 9(8) COMP-3.

And  the resulting SQL table(s) would look like:

tspfilex.ddl --

CREATE TABLE tspfilex (
   custnum                                  CHAR(8) NOT NULL,
   status                                   CHAR(1),
   company                                  CHAR(25),
   custaddr                                 VARCHAR(75),
   telephone                                DECIMAL(10) NOT NULL,
   dp_mgr                                   CHAR(25) NOT NULL,
   machine                                  CHAR(8) NOT NULL,
   memory                                   CHAR(4),
   memsz                                    DECIMAL(2),
   memunit                                  CHAR(2),
   disk                                     CHAR(8) NOT NULL,
   tape                                     CHAR(8),
   tapex                                    DECIMAL(8),
   tapey                                    DECIMAL(8,2),
   no_terminals                             DECIMAL(5),
   compx                                    DECIMAL(8),
   comp5                                    DECIMAL(7),
   comp1                                    FLOAT(23),
   comp2                                    FLOAT(53),
   price                                    DECIMAL(5,2),
   prices                                   DECIMAL(7,2),
   date_x                                   DATE,
   date2                                    DATE
);
CREATE UNIQUE INDEX pk_tspfilex ON tspfilex (custnum);
CREATE INDEX k1_tspfilex ON tspfilex (telephone,machine);
CREATE INDEX k2_tspfilex ON tspfilex (disk,dp_mgr,machine);

relfile.ddl --

CREATE TABLE relfile (
   cust_num                                 CHAR(8),
   company                                  CHAR(25),
   trailer                                  CHAR(16),
   rid_relfile                              BIGINT PRIMARY KEY
);

The tspfilex.xd description for this follows:

H,1,tspfilex,2,',','.',0,3
D,1,'MMDDYYYY',8,1,0,,0,4:4,0:2,2:2,0:0,0:0,0:0,0:0
D,2,'YYMMDDCC',8,1,0,,0,0:2,2:2,4:2,0:0,0:0,0:0,6:2
F,0000,0008,20,0009,0,0,,10,custnum
F,0008,0001,20,0002,0,0,,10,status
F,0009,0025,20,0026,0,0,,10,company
F,0034,0075,21,0076,0,0,,10,custaddr
F,0109,0010,16,0013,10,0,,10,telephone
F,0119,0025,20,0026,0,0,,10,dp_mgr
F,0144,0008,20,0009,0,0,,10,machine
C,0,=,status,'A'
C,0,>,telephone,100
C,0,&&
C,0,=,machine,'B'
C,0,=,company,' '
C,0,||
C,2,&&
F,0152,0004,20,0005,0,0,,10,memory
G,3
L,2
F,0152,0002,16,0005,2,0,,15,memsz
F,0154,0002,20,0003,0,0,,15,memunit
L,3
F,0156,0008,20,0009,0,0,,10,disk
C,5,=,status,'X'
C,6,=,status,'Y'
F,0164,0008,20,0009,0,0,,10,tape
G,7
L,5
F,0164,0008,16,0011,8,0,,10,tapex
G,7
L,6
F,0164,0008,16,0011,8,2,,10,tapey
L,7
F,0172,0004,06,0013,5,0,,10,no_terminals
F,0176,0003,07,0011,8,0,,10,compx
F,0179,0004,03,0013,7,0,,10,comp5
F,0183,0004,08,0036,15,8,,10,comp1
F,0187,0008,08,0036,34,17,,10,comp2
F,0195,0003,10,0008,5,2,,10,price
F,0198,0007,13,0010,7,2,,10,prices
F,0205,0005,10,0032,8,0,1,10,date_x
F,0210,0005,10,0032,8,0,2,10,date2
K,0,N,N,,custnum
K,1,Y,Y,"900",telephone,machine
K,2,Y,Y,0x2A,disk,dp_mgr,machine

The relfile.xd description for this follows:

H,1,relfile,0,',','.',0,2
F,0000,0008,20,0009,0,0,,10,cust_num
F,0008,0025,20,0026,0,0,,10,company
F,0033,0016,20,0017,0,0,,10,trailer
F,0049,0004,03,0015,12,0,,00,rid_relfile
K,0,N,N,,rid_relfile

Application Schema

Sometimes COBOL programs will have local views of the data by copying a record to some other group item which redefines the data differently. In some cases these “hidden” redefines will cause problems for SQL columns. For example, if a field is defined as PIC 9(x) but sometimes the same location in the record contains non-numeric data, then SQL will reject the storing of such data as it violates the strict data typing. By using the $XFD WHEN you are able to define which a portion of the record has one view other another. The I/O logic will then only copy the valid column of data.

It may therefore be a good idea to collect all of the file definitions into a single module, verify that all possible redefines are identified. The $XFD ALL directives indicates that such a module is being compiled. Think of this module as defining all of the master data files in an application which will be migrated into SQL tables.

Check the DDL

It is a good idea to check the DDL generated by the GnuCOBOL compiler and make sure it suits your local database guidelines. You may also need to add local directives such as a STORAGE clause for Oracle. You should then use the appropriate tool (eg. Mysql or sqlplus) and manually create all of the tables to be used by an application. There may also be considerations for what database user-id has permission to create tables, read/write tables etc.

Automatic Table Creation

At OPEN time, the filename.xd is processed and if not present that is considered a serious error (30) is returned.

At OPEN time, a check is done to see if the table exists and if not, the run-time code then reads the filename.ddl and submits it to the database to create the table and indexes. If the filename.ddl is not present then OPEN is given an error (30). If runtime.cfg option create_table is ‘true’ then the runtime will attempt to recreate filename.ddl from the information in filename.xd.

Taken from Development Notes by Ron Norman February 2020.