CRG Type Mangler ================ This is alternative to tools/standards such as XDR. It has a c-like type definition language which can be used to create C type headers and compilable C code to convert between those C types and a platform independent byte stream. I.e. its important to use this if you want to transfer types across the network. Enhancements compared to XDR are: * more C-like support of pointers (handling pointer == NULL). * automatic handling of pointer-based trees, linked lists and meshes. * support for crg-style linked lists which allow a struct to be in multiple lists. * different switch and variable array representations to give flatter C structures (no extra levels of structs introduced). introduces fewer extra structs (which xdr does for switches and var. arrays). * defines a standard type to describe a type. * read values from human-readable ascii files for any type (except lists and networks with loops) driven by ctmd_p type description. Planned future enhancements: * add self-describing types. * support for type-driven (interpreted) encoding & decoding for handling unseen types. * support for compiling in self-extracting values of arbitrary defined types. * improved reading of typed values in human-readable form. * writing of typed values in human-readable form. Known problems: * byte_t a[...]; treated differently to typedef byte_t a_t; a_t a[...]; * byte_t a[...]; has to be explicitly long-word aligned by the type designer. Command-line usage ================== The main command is "ctm". This reads type files and generates C headers or body files with en/de-coding routines. To generate a header file do "ctm -h -o OUTFILE TYPEFILE", to generate a body file do "ctm -c -o OUTFILE TYPEFILE". If the "-o OUTFILE" option is omitted then a file name is generated from the TYPEFILE with a ".h" or ".c" suffix as appropriate. The command will not overwrite the output file - it must be deleted before using ctm. Type definition language ======================== The type defintion file, like C, is insensitive to the number and layout of whitespace (spaces, tabs and returns). Text within C-style comments is ignored by the parser ("/* ... */"). Text within lex-style delimiters ("%{ ... %}") is copied to the output file verbatim but must only occur outside any top-level declarations. The type language syntax is: File ---- A file consists of one of more directives. Each directive may be either a "#include", an interface, type, struct or enum definition or a constant definition. #include -------- #include "FILENAME.t" A C-preprocessor-style "#include" statement is copied to the output, converting any trailing "." suffix on the filename for ".h". The "#include" file name should be in double quotes ("") rather than angle brackets (<>). The file name should be the name of a readable ctm type definition file. The current version of ctm does not use this information directly, but future version may for enhanced error checking and optimisation. => #include "FILENAME.h" constant definition ------------------- A constant definition has the form: const NAME = VALUE ; "const" is a reserved keyword. NAME can be any valid identifier. VALUE should be a simple integer value (a number, another constant or an enumeration value label). Constants are converted to "#define" statements in the generated header file. => #define NAME VALUE struct definition ----------------- Named structs can be defined at the top level only: struct NAME { DECLARATION_LIST } ; They are copied out to the header file. An encoding/decoding routine is also generated for the struct with the name "ctm_struct__NAME". enum definition --------------- Named enums can be defined at the top level only in the same manner as for C. type definition --------------- Types defiintions are signalled, as in C, by the "typedef" keyword. The declaration following "typedef" defines the type of the identifier in the declarator. E.g. "typedef int_t a_t" defines "a_t" to have the type "int_t". definitons ---------- A definition comprises a type and a single declarator. declarators ----------- There are four kinds of declarator: * simple - just the declarator name. This is the same as the C type. * pointer - single "*" and the declarator name for a pointer to a single instance. This is the same as the C type. * fixed size array (1D) - declarator name and "[" "]" with integer fixed size (constant, enum label, number). This is the same as the C type. Note that fixed size arrays of "byte_t" are handled differently to other arrays and are fast-marshalled - see other notes. * variable size array (1D) - declarator name and "[" "]" with integer declaration inside, for the size. This unfolds to a double declaration such as: char_t array[int_t size]; => int_t size; char_t *array; Note that a variable size array can only occur inside a struct. Note alse that variable size arrays of "byte_t" are handled differently to other arrays and are fast-marshalled - see other notes. types ----- A type can be a list type, a basic type, a struct type, an enum type or a switch type. switch type ----------- The switch type replaces the undiscriminated "union" type in normal C. It is of the form: switch ( DECLARATION ) { case CONSTANT: case CONSTANT: DECLARATION; default: DECLARATION; } The declaration in the "switch ()" is the discriminator which must be a simple integer or boolean type. The declarations inside the switch becomes elements in a union, chosen according to the specified ("case") values of the discriminator. E.g. switch (bool_t ready) { CASE TRUE: int_t count; CASE FALSE; void dummy; } abc; => bool_t ready; union { int_t count; void dummy; } abc; Note that the disriminator is folded out into the containing struct - a switch type, like a variable size array can only occur inside a struct. basic types ----------- The basic types, from crg_types, are: bool_t, byte_t, char_t, int8_t, uint8_t, int16_t, uint16_t, int_t, uint_t, int32_t, uint32_t, int64_t, uint64_t (if supported by compiler), float32_t, float64_t. list types ---------- There are two supplementary types to support double linked lists in the style of crg/list.h. The first, which is like a simple type is the list element header, "list_hdr_t". A "list_hdr_t" can only occur in the top level struct of a named struct or a simple typedef. E.g. typedef struct { list_hdr_t hdr; .... ; } element_t; It is treated as a struct with two pointers ("next" and "prev") to items of the type of the containing struct; in the example this would be "element_t*". The type of a list is list_t. The syntax for a list is: list_t l; The ELEMENT_TYPE is used to establish the nominal types of the "first" and "last" pointers, i.e. the elements in the list. The ELEMENT_HEADER_NAME should be the item name of the relevent "list_hdr_t" item in the element type ("hdr" in the example above). interface definition -------------------- An interface definition is similar to a struct definition. It takes the form: interface NAME { INTERFACE_DECLARATION_LIST } = "IDENTIFIER_STRING"; The IDENTIFIER_STRING is optional. The interface declaration list may only include simple declarations (no pointers or arrays), but it may include functions (and is the only place where they are valid). Taking the example interface: interface if1 { int32_t a_number; bool_t a_function(user_type_t input); } = "test interface 1"; This generates a # define for the interface id: #define INTERFACE__IF1 "test interface 1" It defines an enum for the items in the interface: enum { INTERFACE__IF1__A_NUMBER, INTERFACE__IF1__A_FUNCTION }; It defines a union big enough to take any value of the types in the interface operations: typedef union { int32_t a_number; union { bool_t reply; user_type_t input; } a_function; } interface__if1_t; typedef interface__if1_t *interface__if1_p; Finally, it defines an en/decoding function, similar to the other ctm_... functions but with additional arguments: crg_bool_t ctm_interface__if1(ctms_p ctms, ctmi_operation_t *op, crg_uint32_t *id, crg_uint32_t *item, interface__if1_t *value); The additional args are: * op - the operation (CTMI_EVENT, CTMI_ATTRIBUTE, CTMI_ATTRIBUTE_OPTIONAL, CTMI_ACTION, CTMI_REPLY) concerned - this is not used by the function except to identify an action reply (CTMI_REPLY); * id - invocation id, passed straight through; and * item - the index of the item, as in the generated enum (elements numbered in ascending order from 0). Clearly, an interface is not a regular type - it has no ctmd_... function and cannot be used to read/write values to/from ascii files. It also cannot be used in typedefs, etc. ASCII VALUE FILES ================= The function "ctmread_value" takes a type description (ctmd_p) ands uses it to read a value from an ascii file. The forms of the values are outlined below. C-style comments are ignored. Whitespace is not significant except to delimit tokens and values. integer type ------------ Value = simple integer or character in single quotes, e.g. 'a'. boolean ------- value = TRUE, FALSE, 0 or 1. float type ---------- Currently value is NNN.NNN - no exponential notation. enum type --------- Value is one of the labels of the enum (not an integer). struct type ----------- The value of a struct is '{' then a comma (',') separated list of the elements in the struct followed by '}'. For example a valid value of type struct { int a; string_t b; } would be { 10, "hello" } array types ----------- An array value starts with '{' and contains a comma separated list of values and finishes with a '}' - this is the same for both fixed and variable size arrays. E.g. int_t items[int_t num_items] e.g. => { 10, 20, 30, 40, 50 } switch types ------------ A switch type starts with a valid value of the discriminator value followed by a colon (':') and a valid value for the case corresponding to that value, e.g. switch (bool_t discrim) { case TRUE: string_t true_val; case FALSE: int_t false_val; } e.g. => TRUE : "hello" pointer type ------------ A pointer indirection is invisible in the value file - a NULL pointer cannot be specified. Multiple pointers to the same item cannot be specified. The item will be allocated with malloc and a valid value looked for in the file. NOTES ===== Notes compared to C: -------------------- * each declaration statement has only one declarator (e.g. "int a"; but not "int a, b;"). * structs within structs cannot be named (tagged). * union does not exist - use switch to generate a discriminated union. * the encoding/decoding can only work off the information in the type - so void *s will always be assumed to point to a void and that's what you'll get out. Also, pointers to sub-parts of a struct (as in subclassing) areliable to confuse until or unless self-describing types are introduced. * "#include" with angle bracketed filename ("") not supported. * A declarator * fixed size arrays of byte_t must be long word aligned by the type designer. This is because of the fast en/decode used. * "enum NAME" is not a valid type - you have to typedef something to be the enum when it is defined. Other notes: ------------ * variable size arrays and switches can only occur within structs. * list_hdr_t's can only occur in a top-level struct which has a tag name or which is also a typedef for a simple user type (no pointer, etc.) * only arrays of byte_t will be fast marshalled. Note that something that is typedef bute_t will NOT be fast marshalled in the current version though it might be in future version: AVOID THIS. Although int8_t, uint8_t and char_t look a lot like byte_t they are marshalled differently and cannot be used to decode from the other.