DRAFT: Kueea Module Definition Language
**This document is a working draft.**
KMDL documents are textual documents that define a Kueea Module. Their syntax is designed to be human-readable and fairly easy to write.
One KMDL document defines one Kueea Module. The phrase "the defined module" means the module being defined, i.e. the one defined by the currently parsed document.
Some classes are defined by Kueea System to be part of its ABI. These objects may be passed to a function via dedicated CPU registers. This document refers to these classes as "PCS types" or "PCS classes".
The key words ‘MUST,’ ‘MUST NOT,’ ‘REQUIRED,’ ‘SHALL,’ ‘SHALL NOT,’ ‘SHOULD,’ ‘SHOULD NOT,’ ‘RECOMMENDED,’ ‘NOT RECOMMENDED,’ ‘MAY,’ and ‘OPTIONAL’ in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
KMDL document is a sequence of Unicode characters. A parser processes the document line by line. Maximum length of a line is 1024 octets, including the line separator.
Lines are separated with a sequence of two characters: U+000D CARRIAGE RETURN and U+000A LINE FEED.
Whitespace is either U+0020 SPACE or U+0009 HORIZONTAL TAB.
There are three kinds of lines: instructions, descriptions and comments.
Instruction lines begin with a U+002E FULL STOP character. There MAY be whitespace at the beginning of the line.
.instruction param1 param2 .indented-instruction param1
Any other line is a human-readable textual description of the item declared by the most-recent item-declaration instruction.
.item ex1 Description of the ex1 item. Description of the ex1 item. .item ex2 Description of the ex2 item.
Theese lines are fed to another parser. Their syntax is not defined by this document.
One-line comments begin with a U+0023 NUMBER SIGN character. There MAY be whitespace at the beginning of the line.
# This is a one-line comment. # This is a one-line comment. This is not a comment. # This is a one-line comment.
Comment sections begin and end with a line beginning with two consecutive U+0023 NUMBER SIGN characters. There MAY be whitespace at the beginning of the line.
# This is a one-line comment. This is not a comment. ## This is the first line of a comment section. This is a comment. .This is a comment. # This is inside a comment section. ## This is the last line of a comment section. This is not a comment.
KMDL rule expresses the syntax in ABNF. The encoding of string literals is UTF-8.
KMDL = "." mbeg CRLF *( line CRLF ) ; mbeg defined later line = comm / inst / text comm = cmul / cone cmul = *WSP "##" *OCTET CRLF *WSP "##" *OCTET cone = *WSP "#" *OCTET inst = *WSP "." name *( 1*WSP iarg ) *WSP name = 4LOALPHA iarg = 1*VCHAR text = *OCTET
Syntax and parameters of instructions are defined in their respective sections also using ABNF. These rules match the
name *( 1*WSP parm ) fragment.
Instruction lines MUST NOT contain code points above U+007F. These code points MAY be used in comments and item descriptions.
Common instruction parameters
This section defines rules for parameters commonly used by instructions.
Integers are unsigned and may be 16, 32, or 64-bit long. They are written in decimal or hexadecimal notation.
u16 = u16dec / u16hex u16dec = 1*5DIGIT u16hex = "0x" 1*4HEXDIGIT u32 = u32dec / u32hex u32dec = 1*10DIGIT u32hex = "0x" 1*8HEXDIGIT u64 = u64dec / u64hex u64dec = 1*20DIGIT u64hex = "0x" 1*16HEXDIGIT
Universally Unique Identifier
Modules, classes and interfaces are identified by 128-bit Universally Unique Identifiers. UUIDs are written in their textual notation as used in URNs. [RFC4122]
uuid = 8HEXDIGIT 3( "-" 4HEXDIGIT ) 8HEXDIGIT ; example: 12345678-1234-1234-1234-abcdef012345
Items are given a human-readable name for reference. Names are case-sensitive and exist within the scope of the defined module. There cannot be two items within a module with the same name.
name = ( ALPHA / "_" ) *( ALPHA / DIGIT / "_" )
This document recommends that:
- names use small Latin letters with words separated with U_005F LOW LINE;
- name of functions are composed as if each word were an object, for example
Items are referenced via their names. The reference begin with the module identifier, followed by a "path" to the item.
The initial module reference may be omitted. This is a shorthand notation for referencing the defined module.
mref = name / uuid ; module reference iref = [ mref ] 1*( "." name ) ; item reference cref = [ mref ] 1*( "." name ) ":" u16 ; class reference
Class references have a 16-bit unsigned integer at the end. This is the level of the referenced class.
References are resolved after the parsing phase finishes. Referenced item MAY be defined later in a document.
Symbols are identified by a 64-bit unsigned integer. There cannot be two symbols within a module with the same value.
sid = u64
Symbol identifiers are optional to define. By default, the value is the result of hashing a UTF-8 character string with the FNV-1a (Fowler-Noll-Vo) hash function with a 64-bit prime. These identifiers should only be explicitly defined in case of a collision.
For objects defined in the global scope of a module, the hashed string is the concatenation of: a U+005F LOW LINE character and the name of a symbol. For example:
For function members of a class or class interface, the hashed string is the concatenation of: name of the class, a U+005F LOW LINE character and the name of a member. For example:
Instructions / Parser definition
Instruction names are case-sensitive and written with small letters.
All instruction are four characters long.
A KMDL parser is a stateful parser. Some instructions change the current state of the parser.
The parser state consist of:
- load: list of KMDL documents to be parsed,
- mbeg: current module,
- mlvl: current module level,
- cbeg: current class or class interface,
- clvl: current class level,
- item: current item,
- text: current text format.
- wslv: current line identation,
- skip: ignored line identation.
The result of parsing a document is a tree of declared items:
module |-mlvl | |-class | | |-clvl | | | |-data | | | |-data | | | |-method | | |-clvl | | |-data | | |-method | |-interface | | |-data | | |-method | |-object | |-message |-mlvl |-object |-event
KMDL documents begin with the
mbeg instruction. This instruction must not have any preceeding whitespace. It declares the identifier of the module defined by the document.
This instruction cannot appear more than once in a KMDL document.
mbeg = %s"mbeg" 1*WSP uuid
Items defined by a module are grouped into levels. By default, the value of the module level is zero. The level can be changed with the
mlvl = %s"mlvl" 1*WSP u32
The first and only parameter is the new value of the module level. The level applies to all items defined after the instruction. Some instructions may override this value.
Modules reference items defined in other modules. In order to be able to use them, a module must import (load) another module.
load = %s"load" 1*WSP uuid [ 1*WSP name ]
The first parameter to the
load instruction is the UUID of the imported module; the optional second is a name given to that module.
The syntax of the description text is Markdown by default and may be changed at any point with the
text = %s"text" 1*WSP 1*VCHAR *( 1*WSP 1*VCHAR )
The first argument is the name of the new format. The name should be a subtype of a
text media type.
All subsequent arguments are parameters of the format.
This will be interpreted as _Markdown_ text. .text html <p>This will be interpreted as <b>HTML</b> text.</p>
Enumerations are defined with the "enum" instruction.
enum = %s"enum" 1*WSP name 1*WSP type eval = %s"eval" 1*WSP name 1*WSP 1*( VCHAR )
The first argument to the "enum" instruction is the name of the enumeration. The second is the type of enumerated values; it must be a PCS class.
Subsequent "eval" instructions define enumerated values. The first argument is the name for the value. The second is the named value; the syntax depends on the type.
Enumerations define another name for the class
Data objects can be either read-only (
read) or read-write (
rdwr). These are objects defined in the global scope of the module.
read = %s"read" 1*WSP ddef rdwr = %s"rdwr" 1*WSP ddef ddef = type 1*WSP dnam [ 1*WSP sid ] dnam = name [ "[" u32 "]" ] type = tpre / cref tpre = %s"octet" / %s"boolean"
The first argument is a type of the object. Predefined types are booleans and octets only.
The second argument is a name of the object. If the object is an array, its length is fixed. The length is given in square brackets right after the name.
The third argument is an optional symbol identifier.
Functions are declared by three instructions:
.fkrn; respectively: task function, module function and kernel function.
ftsk = %s"ftsk" 1*WSP fsym fmod = %s"fmod" 1*WSP fsym fkrn = %s"fkrn" 1*WSP fsym fsym = fval 1*WSP name [ 1*WSP sid ] fval = type / hndt / %s"void" / %s"comp" hndt = hndp "<" ( type / %s"undef" ) ">" hndp = %s"none" / %s"read" / %s"rdex" / %s"rdwr" / %s"rwex"
The first argument is a type of the function's return value that is one of: no value (
void), 2-bit result of a comparision (
comp), instance of a class or a handle to an object stored in memory.
Handles begin with access rights associated with the handle followed by a type of referenced object in angle brackets. Handles may reference objects of undefined (
undef) type. The access rights are one of:
- No access rights.
- Read-only access.
- Read and execute rights.
- Read and write rights.
- Read, write and execute rights.
farg instruction declares the next in-order parameter. A variable argument list parameter (
farg ...) must be the last one.
farg = %s"farg" 1*WSP ( argt 1*WSP name / "..." ) argt = type / hndt / tref tref = "[" ( type / hndt ) "]" thnd = hndt ":" hndp [ "<" ( type / %s"undef" ) ">" ]
The first parameter to the instruction is the type of the argument or three consecutive periods that indicate a variable argument list.
The second parameter is the name of the argument for reference.
The syntax of the argument type needs explanation. Arguments to functions are one of:
- an object passed by value of length up to 128 octets (this limit applies to return values, too),
- an object passed by handle (by memory reference),
- an object passed by reference to a handle (bidirectional handle),
- a reference to an instance of a PCS class that is guaranteed to be modified only by the function.
Objects passed by value are specified by simply writing a class reference.
.farg module.class:0 by_value
Objects passed by handle are written by specifying the access rights and the then the type of the object surrounded in angle brackets.
.farg read<module.class:0> by_handle1 ; read-only access .farg rdwr<module.class:0> by_handle2 ; read-write access
Objects passed by a bidirectional handle (one that gives access to the function and back to the caller) are written in square brackets. Inside the brackets are the handle passed to the function, then a colon and the handle passed back to the caller. If the type is the same, it can be omitted.
.farg [none<module.class:0>:rdwr] bidi_handle1 .farg [rdwr<module.class:0>:rdwr<module.class:1>] bidi_handle2
bidi_handle1 parameter in the example is a reference to a handle. The function is not given any access rights to the refrenced memory. Upon return, the handle contains an address to an instance of
module.class, level 0 and the caller receives read-write access to this object.
bidi_handle2 parameter in the example is a reference to a handle. The function is given read-write access rights to an instance of
module.class, level 0. Upon return, the handle contains an address to an instance of
module.class, level 1 and the caller has read-write access to this object.
The last category is written as a class reference in angle brackets. This is a reference to an instance of a PCS class.
.farg [int.u8] u8_ref
Their primary use is returning a small object in a situation when passing a buffer by handle would expose too much data to the function. This is also faster than passing by handle (no handle processing).
The referenced object may be safely copied before the call is made and then copied back to the original buffer upon returning back.
Classes and interfaces
Class and interface declarations begin a new scope for function declarations. The scope ends at the declaration of another class or interface or when the
cend instruction is encountered, whichever comes first.
cbeg = %s"cbeg" 1*WSP uuid 1*WSP name ibeg = %s"ibeg" 1*WSP uuid 1*WSP name cend = %s"cend"
ibeg instructions begin the definition of, respectively, a new class and a new class interface. The definition ends with the
cend instruction or an instruction that implicitly ends the definition, like beginning of another class.
The first parameter is the UUID of the class or interface. These UUIDs are global in scope. This document recommends to use a namespaced version.
The second parameter is a name for the item.
Data members of the class are defined by the
data instruction. Members are defined in ascending order of their memory offset.
Unions of members are defined as an alternate of the previous member. The
dalt instruction defines a member residing at the same memory offset as that of the previously defined data member.
data = %s"data" 1*WSP type 1*WSP memb [ 1*WSP u16 ] dalt = %s"dalt" 1*WSP type 1*WSP memb [ 1*WSP u16 ] memb = name [ "[" [ name ":" ] u32 "]" ]
These are different instructions than for global objects. There is no symbol identifier, access is not specified and the length of an array has a different syntax.
The first parameter is the type of the object.
The second parameter is the name for the item. When the array length has the form
[name:u32], the member is a variable-length array. The name references a previously defined member that must be a PCS integer class. This member holds the actual length of the array. The integer after the colon defines the maximum length of the array. The maximum is not for specifying how much memory is to be allocated. The value is used in calculation of the maximum length of an instance. Offsets of all subsequent members are calculated at runtime.
The optional third parameter is the memory address alignment requirement of the member, expressed in octets.
Class members are groued into levels, similarly to a module. The default value of a class level is zero. The level is changed with the
clvl = %s"clvl" 1*WSP u16 [ 1*WSP u16 ]
The first parameter is the new value of the class level.
The second parameter is a memory address alignment of the first member. This value overrides member-defined alignment, if any.
The alignment of an object is the longest alignment of its members.
For example, let us consider this class definition:
.cbeg 00000000-0000-0000-0000-000000000000 ex1 .clvl 0 .data octet m1 .clvl 1 .data octet m2 4 .clvl 2 8 .data octet m3 .cend
An instance of the class at level 0 has only the member
m1. Alignment of the object is 1 (the alignment of an octet). Length of the object is 1 octet.
An instance at level 1 has the members
m2 has been given an alignment of 4, three octets of padding are inserted between
m2. Alignment of the object is 4. Length of the object is 5 octets.
An instance at level 2 has the members
m3. Alignment of
m1 is changed to 8. Alignment of the object is 8. Length of the object is 6 octets.
It is possible to reduce the alignment value of a member. If the class
u32 has an alignment of 4, in the following example:
.cbeg 00000000-0000-0000-0000-000000000000 ex2 .clvl 0 .data octet two .data .u32 four 2 .cend
the alignment of the class is 2 and the
four member occupies four octets, beginning at the third octet. Length of the object is 6 octets.
Any function declared while in the class or interface scope is defined as a method of that class or class interface.
Interfaces can declare function prototypes with the
func = %s"func" 1*WSP fval 1*WSP name
Function parameters are declared in the same way as for other functions.
These prototypes may be then implemented by a class with an
itsk = %s"itsk" 1*WSP fref imod = %s"imod" 1*WSP fref ikrn = %s"ikrn" 1*WSP fref fref = iref 1*WSP name [ 1*WSP sid ]
The difference is that they do not begin a function definition. This instruction defines a symbol for the implementation. The function is already defined by the
Static messages are defined with the
msgd = %s"msgd" 1*WSP name [ 1*WSP sid ]
Dynamic messages are defined with the
msgf = %s"msgf" 1*WSP fsym
Parameters are defined in the same way as for other functions.
Events are defined with the
evnt = %s"evnt" 1*WSP name [ 1*WSP sid ]
This is also treated as a definition of handler function for the event. Parameters are defined in the same way as for other functions.
The first argument to an event function is always a handle to an object passed to an event-handler registration function.
Event handlers have no return value.
Internet Media Type
Media type of these documents is
charset parameter must be included with the value