DRAFT: Kueea Module Definition Language

**This document is a working draft.**

KMDL documents are textual documents that define a Kueea Module. Their syntax is designed to be human-readable and fairly easy to write.

Terminology

One KMDL document defines one Kueea Module. The phrase "the defined module" means the module being defined, i.e. the one defined by the currently parsed document.

Some classes are defined by Kueea System to be part of its ABI. These objects may be passed to a function via dedicated CPU registers. This document refers to these classes as "PCS types" or "PCS classes".

The key words ‘MUST,’ ‘MUST NOT,’ ‘REQUIRED,’ ‘SHALL,’ ‘SHALL NOT,’ ‘SHOULD,’ ‘SHOULD NOT,’ ‘RECOMMENDED,’ ‘NOT RECOMMENDED,’ ‘MAY,’ and ‘OPTIONAL’ in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Document syntax

KMDL document is a sequence of Unicode characters. A parser processes the document line by line. Maximum length of a line is 1024 octets, including the line separator.

Lines are separated with a sequence of two characters: U+000D CARRIAGE RETURN and U+000A LINE FEED.

Whitespace is either U+0020 SPACE or U+0009 HORIZONTAL TAB.

There are three kinds of lines: instructions, descriptions and comments.

Instructions

Instruction lines begin with a U+002E FULL STOP character. There MAY be whitespace at the beginning of the line.

.instruction param1 param2
   .indented-instruction param1

Item descriptions

Any other line is a human-readable textual description of the item declared by the most-recent item-declaration instruction.

.item ex1
Description of the ex1 item.

Description of the ex1 item.

.item ex2

Description of the ex2 item.

Theese lines are fed to another parser. Their syntax is not defined by this document.

Comments

One-line comments begin with a U+0023 NUMBER SIGN character. There MAY be whitespace at the beginning of the line.

# This is a one-line comment.
  # This is a one-line comment.
This is not a comment.
#     This is a one-line comment.

Comment sections begin and end with a line beginning with two consecutive U+0023 NUMBER SIGN characters. There MAY be whitespace at the beginning of the line.

# This is a one-line comment.
This is not a comment.
## This is the first line of a comment section.
This is a comment.
.This is a comment.
# This is inside a comment section.
    ## This is the last line of a comment section.
This is not a comment.

ABNF

The following KMDL rule expresses the syntax in ABNF. The encoding of string literals is UTF-8.

KMDL  = "." mbeg CRLF *( line CRLF ) ; mbeg defined later
line  = comm / inst / text

comm  = cmul / cone
cmul  = *WSP "##" *OCTET CRLF *WSP "##" *OCTET
cone  = *WSP "#"  *OCTET

inst  = *WSP "." name *( 1*WSP iarg ) *WSP
name  = 4LOALPHA
iarg  = 1*VCHAR

text  = *OCTET

Syntax and parameters of instructions are defined in their respective sections also using ABNF. These rules match the name *( 1*WSP parm ) fragment.

Instruction lines MUST NOT contain code points above U+007F. These code points MAY be used in comments and item descriptions.

Common instruction parameters

This section defines rules for parameters commonly used by instructions.

Integers

Integers are unsigned and may be 16, 32, or 64-bit long. They are written in decimal or hexadecimal notation.

u16    = u16dec / u16hex
u16dec = 1*5DIGIT
u16hex = "0x" 1*4HEXDIGIT

u32    = u32dec / u32hex
u32dec = 1*10DIGIT
u32hex = "0x" 1*8HEXDIGIT

u64    = u64dec / u64hex
u64dec = 1*20DIGIT
u64hex = "0x" 1*16HEXDIGIT

Universally Unique Identifier

Modules, classes and interfaces are identified by 128-bit Universally Unique Identifiers. UUIDs are written in their textual notation as used in URNs. [RFC4122]

uuid = 8HEXDIGIT 3( "-" 4HEXDIGIT ) 8HEXDIGIT
; example: 12345678-1234-1234-1234-abcdef012345

Item name

Items are given a human-readable name for reference. Names are case-sensitive and exist within the scope of the defined module. There cannot be two items within a module with the same name.

name = ( ALPHA / "_" ) *( ALPHA / DIGIT / "_" )

This document recommends that:

  • names use small Latin letters with words separated with U_005F LOW LINE;
  • name of functions are composed as if each word were an object, for example object_units_replace.

Item references

Items are referenced via their names. The reference begin with the module identifier, followed by a "path" to the item.

The initial module reference may be omitted. This is a shorthand notation for referencing the defined module.

mref = name / uuid                      ; module reference
iref = [ mref ] 1*( "." name )          ; item reference
cref = [ mref ] 1*( "." name ) ":" u16  ; class reference

Class references have a 16-bit unsigned integer at the end. This is the level of the referenced class.

References are resolved after the parsing phase finishes. Referenced item MAY be defined later in a document.

Symbol identifier

Symbols are identified by a 64-bit unsigned integer. There cannot be two symbols within a module with the same value.

sid  = u64

Symbol identifiers are optional to define. By default, the value is the result of hashing a UTF-8 character string with the FNV-1a (Fowler-Noll-Vo) hash function with a 64-bit prime. These identifiers should only be explicitly defined in case of a collision.

For objects defined in the global scope of a module, the hashed string is the concatenation of: a U+005F LOW LINE character and the name of a symbol. For example: _global.

For function members of a class or class interface, the hashed string is the concatenation of: name of the class, a U+005F LOW LINE character and the name of a member. For example: class_member.

Instructions / Parser definition

Instruction names are case-sensitive and written with small letters.

All instruction are four characters long.

A KMDL parser is a stateful parser. Some instructions change the current state of the parser.

The parser state consist of:

  • load: list of KMDL documents to be parsed,
  • mbeg: current module,
  • mlvl: current module level,
  • cbeg: current class or class interface,
  • clvl: current class level,
  • item: current item,
  • text: current text format.
  • wslv: current line identation,
  • skip: ignored line identation.

The result of parsing a document is a tree of declared items:

module
|-mlvl
| |-class
| | |-clvl
| | | |-data
| | | |-data
| | | |-method
| | |-clvl
| |   |-data
| |   |-method
| |-interface
| | |-data
| | |-method
| |-object
| |-message
|-mlvl
  |-object
  |-event

Module

KMDL documents begin with the mbeg instruction. This instruction must not have any preceeding whitespace. It declares the identifier of the module defined by the document.

This instruction cannot appear more than once in a KMDL document.

mbeg = %s"mbeg" 1*WSP uuid

Items defined by a module are grouped into levels. By default, the value of the module level is zero. The level can be changed with the mlvl instruction.

mlvl = %s"mlvl" 1*WSP u32

The first and only parameter is the new value of the module level. The level applies to all items defined after the instruction. Some instructions may override this value.

Modules reference items defined in other modules. In order to be able to use them, a module must import (load) another module.

load = %s"load" 1*WSP uuid [ 1*WSP name ]

The first parameter to the load instruction is the UUID of the imported module; the optional second is a name given to that module.

Item description

The syntax of the description text is Markdown by default and may be changed at any point with the text instruction.

text = %s"text" 1*WSP 1*VCHAR *( 1*WSP 1*VCHAR )

The first argument is the name of the new format. The name should be a subtype of a text media type.

All subsequent arguments are parameters of the format.

This will be interpreted as _Markdown_ text.
.text html
<p>This will be interpreted as <b>HTML</b> text.</p>

Enumerations

Enumerations are defined with the "enum" instruction.

enum = %s"enum" 1*WSP name 1*WSP type
eval = %s"eval" 1*WSP name 1*WSP 1*( VCHAR )

The first argument to the "enum" instruction is the name of the enumeration. The second is the type of enumerated values; it must be a PCS class.

Subsequent "eval" instructions define enumerated values. The first argument is the name for the value. The second is the named value; the syntax depends on the type.

Enumerations define another name for the class type.

Objects

Data objects can be either read-only (read) or read-write (rdwr). These are objects defined in the global scope of the module.

read = %s"read" 1*WSP ddef
rdwr = %s"rdwr" 1*WSP ddef
ddef = type 1*WSP dnam [ 1*WSP sid ]
dnam = name [ "[" u32 "]" ]

type = tpre / cref
tpre = %s"octet" / %s"boolean"

The first argument is a type of the object. Predefined types are booleans and octets only.

The second argument is a name of the object. If the object is an array, its length is fixed. The length is given in square brackets right after the name.

The third argument is an optional symbol identifier.

Functions

Functions are declared by three instructions: .ftsk, .fmod and .fkrn; respectively: task function, module function and kernel function.

ftsk = %s"ftsk" 1*WSP fsym
fmod = %s"fmod" 1*WSP fsym
fkrn = %s"fkrn" 1*WSP fsym

fsym = fval 1*WSP name [ 1*WSP sid ]
fval = type / hndt / %s"void" / %s"comp"

hndt = hndp "<" ( type / %s"undef" ) ">"
hndp = %s"none" / %s"read" / %s"rdex" / %s"rdwr" / %s"rwex"

The first argument is a type of the function's return value that is one of: no value (void), 2-bit result of a comparision (comp), instance of a class or a handle to an object stored in memory.

Handles begin with access rights associated with the handle followed by a type of referenced object in angle brackets. Handles may reference objects of undefined (undef) type. The access rights are one of:

none
No access rights.
read
Read-only access.
rdex
Read and execute rights.
rdwr
Read and write rights.
rwex
Read, write and execute rights.

Each subsequent farg instruction declares the next in-order parameter. A variable argument list parameter (farg ...) must be the last one.

farg = %s"farg" 1*WSP ( argt 1*WSP name / "..." )
argt = type / hndt / tref
tref = "[" ( type / hndt ) "]"
thnd = hndt ":" hndp [ "<" ( type / %s"undef" ) ">" ]

The first parameter to the instruction is the type of the argument or three consecutive periods that indicate a variable argument list.

The second parameter is the name of the argument for reference.

The syntax of the argument type needs explanation. Arguments to functions are one of:

  • an object passed by value of length up to 128 octets (this limit applies to return values, too),
  • an object passed by handle (by memory reference),
  • an object passed by reference to a handle (bidirectional handle),
  • a reference to an instance of a PCS class that is guaranteed to be modified only by the function.

Objects passed by value are specified by simply writing a class reference.

.farg module.class:0 by_value

Objects passed by handle are written by specifying the access rights and the then the type of the object surrounded in angle brackets.

.farg read<module.class:0> by_handle1 ; read-only access
.farg rdwr<module.class:0> by_handle2 ; read-write access

Objects passed by a bidirectional handle (one that gives access to the function and back to the caller) are written in square brackets. Inside the brackets are the handle passed to the function, then a colon and the handle passed back to the caller. If the type is the same, it can be omitted.

.farg [none<module.class:0>:rdwr]                 bidi_handle1
.farg [rdwr<module.class:0>:rdwr<module.class:1>] bidi_handle2

The bidi_handle1 parameter in the example is a reference to a handle. The function is not given any access rights to the refrenced memory. Upon return, the handle contains an address to an instance of module.class, level 0 and the caller receives read-write access to this object.

The bidi_handle2 parameter in the example is a reference to a handle. The function is given read-write access rights to an instance of module.class, level 0. Upon return, the handle contains an address to an instance of module.class, level 1 and the caller has read-write access to this object.

The last category is written as a class reference in angle brackets. This is a reference to an instance of a PCS class.

.farg [int.u8] u8_ref

Their primary use is returning a small object in a situation when passing a buffer by handle would expose too much data to the function. This is also faster than passing by handle (no handle processing).

The referenced object may be safely copied before the call is made and then copied back to the original buffer upon returning back.

Classes and interfaces

Class and interface declarations begin a new scope for function declarations. The scope ends at the declaration of another class or interface or when the cend instruction is encountered, whichever comes first.

cbeg = %s"cbeg" 1*WSP uuid 1*WSP name
ibeg = %s"ibeg" 1*WSP uuid 1*WSP name
cend = %s"cend"

The cbeg and ibeg instructions begin the definition of, respectively, a new class and a new class interface. The definition ends with the cend instruction or an instruction that implicitly ends the definition, like beginning of another class.

The first parameter is the UUID of the class or interface. These UUIDs are global in scope. This document recommends to use a namespaced version.

The second parameter is a name for the item.

Data members

Data members of the class are defined by the data instruction. Members are defined in ascending order of their memory offset.

Unions of members are defined as an alternate of the previous member. The dalt instruction defines a member residing at the same memory offset as that of the previously defined data member.

data = %s"data" 1*WSP type 1*WSP memb [ 1*WSP u16 ]
dalt = %s"dalt" 1*WSP type 1*WSP memb [ 1*WSP u16 ]
memb = name [ "[" [ name ":" ] u32 "]" ]

These are different instructions than for global objects. There is no symbol identifier, access is not specified and the length of an array has a different syntax.

The first parameter is the type of the object.

The second parameter is the name for the item. When the array length has the form [name:u32], the member is a variable-length array. The name references a previously defined member that must be a PCS integer class. This member holds the actual length of the array. The integer after the colon defines the maximum length of the array. The maximum is not for specifying how much memory is to be allocated. The value is used in calculation of the maximum length of an instance. Offsets of all subsequent members are calculated at runtime.

The optional third parameter is the memory address alignment requirement of the member, expressed in octets.

Class members are groued into levels, similarly to a module. The default value of a class level is zero. The level is changed with the clvl instruction.

clvl = %s"clvl" 1*WSP u16 [ 1*WSP u16 ]

The first parameter is the new value of the class level.

The second parameter is a memory address alignment of the first member. This value overrides member-defined alignment, if any.

The alignment of an object is the longest alignment of its members.

For example, let us consider this class definition:

.cbeg 00000000-0000-0000-0000-000000000000 ex1
.clvl 0
.data octet m1
.clvl 1
.data octet m2 4
.clvl 2 8
.data octet m3
.cend

An instance of the class at level 0 has only the member m1. Alignment of the object is 1 (the alignment of an octet). Length of the object is 1 octet.

An instance at level 1 has the members m1 and m2. Since m2 has been given an alignment of 4, three octets of padding are inserted between m1 and m2. Alignment of the object is 4. Length of the object is 5 octets.

An instance at level 2 has the members m1, m2 and m3. Alignment of m1 is changed to 8. Alignment of the object is 8. Length of the object is 6 octets.

It is possible to reduce the alignment value of a member. If the class u32 has an alignment of 4, in the following example:

.cbeg 00000000-0000-0000-0000-000000000000 ex2
.clvl 0
.data octet two[2]
.data .u32  four   2
.cend

the alignment of the class is 2 and the four member occupies four octets, beginning at the third octet. Length of the object is 6 octets.

Function members

Any function declared while in the class or interface scope is defined as a method of that class or class interface.

Interfaces can declare function prototypes with the func instruction.

func = %s"func" 1*WSP fval 1*WSP name

Function parameters are declared in the same way as for other functions.

These prototypes may be then implemented by a class with an itsk, imod and ikrn instructions.

itsk = %s"itsk" 1*WSP fref
imod = %s"imod" 1*WSP fref
ikrn = %s"ikrn" 1*WSP fref

fref = iref 1*WSP name [ 1*WSP sid ]

The difference is that they do not begin a function definition. This instruction defines a symbol for the implementation. The function is already defined by the func instruction.

Messages

Static messages are defined with the msgd instruction.

msgd = %s"msgd" 1*WSP name [ 1*WSP sid ]

Dynamic messages are defined with the msgf instruction.

msgf = %s"msgf" 1*WSP fsym

Parameters are defined in the same way as for other functions.

Events

Events are defined with the evnt instruction.

evnt = %s"evnt" 1*WSP name [ 1*WSP sid ]

This is also treated as a definition of handler function for the event. Parameters are defined in the same way as for other functions.

The first argument to an event function is always a handle to an object passed to an event-handler registration function.

Event handlers have no return value.

Internet Media Type

Media type of these documents is text/prs.kueea.kmdl.

The charset parameter must be included with the value UTF-8.