This technical report details my discoveries about NewtonScript bytecode. It is not an official Apple document and as the information was discovered by observation only, it may be neither accurate nor complete. This document is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.
The discoveries detailed in this technical report cover how compiled NewtonScript is represented on the Apple MessagePad platform, how values are encoded, the general format of bytecode, and the behaviour of individual bytecodes.
The author is indebted to Jason Harper for his ViewFrame Demo program, which made me realise that bytecode might not be as impenetrable as I first thought. Any omissions or errors in this document are though, entirely my responsibility.
In order to understand fully the description of NewtonScript bytecode given in the second section of this report a number of things first need to be made clear.
A compiled NewtonScript function is represented on the MessagePad as a frame with the following fixed format:
{ class: 'CodeBlock, instructions: , literals: [...], argFrame: { _nextArgFrame: {} _Parent: {}, _implementor: {}, ... }, numArgs: }
To give a concrete example, here is a short NewtonScript function and the frame that represents it:
func( foo, bar ) { begin class: 'CodeBlock, local fred := '[]; instructions: , /* 18 A5 7D 02 is return fred */ literals: [ '[] ], end argFrame: { _nextArgFrame: {}, _Parent: {}, _implementor: {}, foo: NIL, bar: NIL, fred: NIL }, numArgs: 2 }
A knowledge of how NewtonScript encodes values is useful for understanding some of the bytecodes. Certain simple values such as integers, and the constants True and Nil are encoded as immediate values. More complex values such as floats, icons, frames and arrays are encoded as separate data blocks and then referred to via a pointer.
NewtonScript has a uniform way of encoding both the simple values and the pointers to more complex values within a 32 bit field. It achieves this by using the lowest two bits of the field as a type identifier. What the value means for each of the four possible types is as follows:
00 | The field represents a signed integer in the range -229 to +229-1. i.e. if the low two bits of the 32 bit value are zero, the integer value that it represents can be obtained by performing value >> 2. | |
01 | The field represents a pointer to a complex structure. The pointer value can be obtained by performing value & 0xFFFFFFFC. | |
10 | The field represents a unique system value. As far as I'm aware the possible values are: | |
0x00000002 | Nil | |
0x0000001A | True | |
0x000UUUU6 | The Unicode character \uUUUU, e.g. the NewtonScript constant $A translates to the immediate value 0x00000496. | |
11 |
The field represents a magic pointer. These are the constants
that are reperesented as @ |
All NewtonScript bytecodes are either one or three bytes long. If the low three bits of the first byte in the sequence are all 1 then the following two bytes are also part of the instruction, otherwise it is a single byte instruction. The following are all examples of valid byte codes (shown in hexadecimal):
Very often (but not always) the low three bits are used as a count value and if the required value exceeds 6 then all three bits are set and the following two bytes are used instead for the value. Hence this initially rather odd seeming format is actually quite useful for keeping bytecode compact. The use of this count field is explained in more detail below.
All of the bytecode descriptions given below follow the same format:
Name Octal Hex Binary Stack behaviour Longer description
The bytecodes are given as octal values first, because this makes it easier to show the low three bits of the code as a separate digit. As mentioned above, these three bits often act as a count field. When this is the case, the letter `N' will be used instead of a digit and the role played by the count field will be explained in the stack behaviour and description sections. Remember that any bytecode listed as xxN can be either one or three bytes long!
The hex value of the bytecode is also given as tools such as ViewFrame tend to show raw data in hex format rather than octal. In the case of bytecodes with a count field, the hex value given will be with the count set to zero.
Bytecode is executed on a virtual machine that uses a stack of NewtonScript values (see section 1.2), hence how each instruction modifies the stack important. The stack behaviour is shown as:
The items are listed in the order that they are pushed onto the stack, i.e. the leftmost is the first item pushed and the rightmost the last. Repetition of items is indicated by braces (...) followed by a repetition count. If an item on the stack must be of a certain type, the description says so, otherwise placeholders such as <#007F>X" are used. References to arrays, frames and symbols are indicated by [], {} and ' respectively. An empty stack is indicated by the symbol *.
Name | Octal | Hex | Binary |
Stack behaviour | |||
Longer description | |||
pop | 000 | 00 | 0000 0000 |
X -> * | |||
Removes the top item from the stack | |||
dup | 001 | 01 | 0000 0001 |
X -> X, X | |||
Duplicates the top item on the stack | |||
return | 002 | 02 | 0000 0010 |
* -> * | |||
Exits from a routine. By convention, the top item on the stack is the return value of the routine. It is possible to return from a routine with more items on the stack (or none at all), however the current Apple NewtonScript compiler ensures that the functions that it produces always leave one and only one value on the stack. | |||
self | 003 | 03 | 0000 0011 |
* -> self | |||
Pushes a reference to the frame containing the current function onto the stack. In other words this pushes the value represented by the NewtonScript value self. | |||
create closure(?) | 004 | 04 | 0000 0100 |
CodeBlock -> CodeBlock | |||
This bytecode is inserted by the compiler whenever it has just pushed a CodeBlock frame either as a parameter to a function call or message send, or as the target of a call...with statement. My guess is that this initialises the three reserved fields in the argFrame of the CodeBlock (see 1.1), but I haven't yet attempted to verify this. | |||
foreach next | 005 | 05 | 0000 0101 |
[iterator] -> * | |||
Steps to next item in iteration. See 307 000 021 - foreach for a full explanation. | |||
foreach complete | 006 | 06 | 0000 0110 |
[iterator] -> boolean | |||
Checks if iteration is complete. See 307 000 021 - foreach for a full explanation. | |||
end exception handling(?) | 007 000 007 | 07 00 07 | 0000 0111 0000 0000 0000 0111 |
* -> * | |||
See on exception for full explanation. | |||
unused(?) | 01N | 08 through 0F | 0000 1??? |
- | |||
Apparently unused. | |||
unused(?) | 02N | 10 through 17 | 0001 0??? |
- | |||
Apparently unused. | |||
push literal | 03N | 18 through 1F | 0001 1??? |
* -> literal N | |||
Pushes the literal which is element N of the literals array of this code block onto the stack. | |||
push immediate | 04N | 20 through 27 | 0010 0??? |
* -> N | |||
N here is treated as an immediate value as described in
section 1.2. This instruction is used to push any
immediate value that can be represented in 16 (sign
extended) bits or less. Values that require the full 32
bits are stored as literals and pushed using 03N.
Some examples of the use of this instruction: |
|||
call global | 05N | 28 through 2F | 0010 1??? |
argument x N, 'FunctionSymbol -> * | |||
N arguments to the function are expected on the stack
(pushed in the same order that they are listed in the
function parameter list), followed by a reference to the
symbol that represents the name of the function (this is
stored in the literals array and pushed using 03N).
Although the stack description indicates that the
instruction itself pushes nothing onto the stack, remember
that all NewtonScript functions conventionally push a
single item onto the stack before returning.
Example:
|
|||
call with | 06N | 30 through 37 | 0011 0??? |
argument x N, CodeBlock -> * | |||
This directly translates the NewtonScript
call ... with() construct. N arguments
to the function are expected on the stack (pushed in the
same order that they are listed in the function parameter
list), followed by a reference to a CodeBlock frame that
has had the create closure instruction called on
it. As before, remember that all NewtonScript functions
conventionally push a single item onto the stack before
returning.
Example:
|
|||
send message | 07N | 38 through 3F | 0011 1??? |
argument x N, {Receiver}, 'Message -> * | |||
This directly translates the NewtonScript ":" construct.
N arguments to the message are expected on the stack
(pushed in the same order that they are listed in the
function parameter list), followed by a reference to the
receiving frame and then the symbol of the message.
Again, remember that all NewtonScript functions
conventionally push a single item onto the stack before
returning.
Example:
|
|||
conditional send message | 10N | 40 through 47 | 0100 0??? |
argument x N, {Receiver}, 'Message -> * or Nil | |||
This directly translates the NewtonScript ":?" construct.
N arguments to the message are expected on the stack
(pushed in the same order that they are listed in the
function parameter list), followed by a reference to the
receiving frame and then the symbol of the message to be
sent if the given function exists. If the function
doesn't exist, Nil is placed on the stack else it's left
up to the function to put something there.
Example:
|
|||
inherited send message | 11N | 48 through 4F | 0100 1??? |
argument x N, 'Message -> * | |||
This directly translates the NewtonScript "inherited:"
construct. N arguments to the message are expected on the
stack (pushed in the same order that they are listed in
the function parameter list), followed by the symbol of
the message. Again, remember that all NewtonScript
functions conventionally push a single item onto the stack
before returning.
Example:
|
|||
conditional inherited send message | 12N | 50 through 57 | 0101 0??? |
argument x N, 'Message -> * or Nil | |||
This directly translates the NewtonScript "inherited:?"
construct. N arguments to the message are expected on the
stack (pushed in the same order that they are listed in
the function parameter list), followed by the symbol of
the message to be sent if the given function exists. If
the function doesn't exist, Nil is placed on the stack
else it's left up to the function to put something there.
Example:
|
|||
goto | 13N | 58 through 5F | 0101 1??? |
* -> * | |||
This instruction causes execution to pass to location N, where N is the offset from the start of the block of bytecode (i.e. gotos are absolute, not relative). | |||
goto if not Nil | 14N | 60 through 67 | 0110 0??? |
X -> * | |||
X is removed from the stack and examined. If it is Nil, execution continues with the next instruction in sequence. If it is not Nil, execution passes to location N where N is the offset from the start of the block of bytecode. | |||
goto if Nil | 15N | 68 through 6F | 0110 1??? |
X -> * | |||
X is removed from the stack and examined. If it is not Nil, execution continues with the next instruction in sequence. If it is Nil, execution passes to location N where N is the offset from the start of the block of bytecode. | |||
push external variable | 16N | 06 | 0111 0??? |
* -> literal index N | |||
This bytecode pushes onto the stack the value contained in
a variable or slot from outside of the function. The name
of the variable to push is given by literal N in the
function's literals array; this literal should be a symbol
reference.
Example:
|
|||
push local variable | 17N | 78 through 7F | 0111 1??? |
* -> value of slot N of argFrame | |||
This bytecode pushes onto the stack the value contained in
the Nth slot of the CodeBlock's argFrame frame. In other
words, this bytecode is commonly used to push either the
value of an argument to the function or that of a local
variable.
Example:
|
|||
create frame | 20N | 80 through 87 | 1000 0??? |
value x N, [frameMap] -> {frame} | |||
This bytecode creates a frame given N slot values on the
stack plus a frame map describing the slots in the frame.
The frame map is an array whose first item is Nil and then
each subsequent item is a reference to a symbol giving the
name of the corresponding slot in the frame being created.
The values are pushed onto the stack in the order of the
slot names in the frame map. If this is as clear as mud,
maybe the example will help:
Example:
|
|||
create array | 21N | 88 through 8F | 1000 1??? |
element x N, 'Class -> [Class: ...]
or integer, 'Class -> [Class: ...] |
|||
This bytecode creates an array given N element values on
the stack plus a reference to a symbol defining the class
of the array. The class of the array is specified in
NewtonScript with an initial " In the special case of N being 0xFFFF, the instruction takes a single NewtonScript integer from the stack instead of N element values and creates an empty array (i.e. each element is Nil) with the given number of elements. Example:
|
|||
push slot value | 221 | 91 | 1001 0001 |
{frame}, 'SlotName -> slot value | |||
This takes a frame and the name of a slot in that frame
and pushes the value contained in the slot. If the slot
name is a single name then it is represented by a symbol
reference. If the slot name is a path expression
(e.g. foo.bar) then it is represented by an array of class
pathExpr with one element per section of the path. Each
element is then a reference to the symbol for that part of
the path.
Example:
|
|||
assign to slot | 230 | 98 | 1001 1000 |
{X}, 'SlotName, Y -> * | |||
This assigns the value Y to the slot SlotName of frame X, i.e. X.SlotName := Y. As with push slot value the slot name is represented either by a simple symbol reference or a pathExpr array for a complex path expression. | |||
assign to slot and push result | 231 | 99 | 1001 1001 |
{X}, 'SlotName, Y -> Y | |||
Identical to assign to slot except that the value assigned to the slot is also pushed onto the stack. | |||
assign to local variable | 24N | 06 | 1010 0??? |
X -> * | |||
Value X is assigned to the Nth slot of the CodeBlock's
argFrame frame. In other words, this bytecode is usually
used to set the value of a local variable.
Example:
|
|||
assign to external variable | 25N | A8 through AF | 1010 1??? |
X -> * | |||
Value X is assigned to a variable or slot from outside of the function. The name of the variable to assign to is given by literal N in the function's literals array; this literal should be a symbol reference. | |||
increment local variable | 26N | 1011 0??? | |
Increment -> Increment, local variable N + Increment | |||
This bytecode adds the given integer increment to the local variable in the Nth slot of the CodeBlock's argFrame and the returns both the increment and the new value of the local variable. This instruction is used internally as part of the coding of for loops. | |||
for loop goto | 27N | B8 through BF | 1011 1??? |
Increment, value, limit -> * | |||
This instruction is placed at the end of a for loop construct. It takes the increment, current value of the counter (after having been incremented by increment local variable and the limit value of the loop. If the counter value now exceeds the loop limit, execution continues with the next instruction in sequence. If it does not, execution passes to location N where N is the offset from the start of the block of bytecode. This instruction is used internally as part of the coding of for loops. | |||
add | 300 | C0 | 1100 0000 |
X, Y -> X+Y | |||
The 30N bytecode series encodes a number of different low-level system function calls. Any NewtonScript operator or built-in function not explicitly mentioned here is implemented by using a standard function call (bytecode 05N). This first bytecode of the series implements addition (of integers or floats). | |||
subtract | 301 | C1 | 1100 0001 |
X, Y -> X-Y | |||
Subtraction of integers or floats. | |||
dereference array object | 302 | C2 | 1100 0010 |
[X], Y -> X[Y] | |||
This pushes the array element Y of array X. Y should be an integer. | |||
assign to array element and push | 303 | C3 | 1100 0011 |
[X], Y, Z -> Z | |||
This assignes value Z to the pushes the array element Y of array X i.e. X[Y] := Z. Y should be an integer. | |||
compare equal | 304 | C4 | 1100 0100 |
X, Y -> Boolean (X=Y) | |||
This compares X and Y for equality and pushes a Boolean result (True or Nil). | |||
not | 305 | C5 | 1100 0101 |
X -> Boolean (not X) | |||
This takes any value X and returns not X. Not Nil is True, not anything else is Nil. | |||
compare not equal | 306 | C6 | 1100 0110 |
X, Y -> Boolean (X<>Y) | |||
This compares X and Y for inequality and pushes a Boolean result (True or Nil). | |||
multiply | 307 000 007 | C7 00 07 | 1100 0111 0000 0000 0000 0111 |
X, Y -> X*Y | |||
This implements multiplication (of integers or floats. | |||
divide with float result | 307 000 010 | C7 00 08 | 1100 0111 0000 0000 0000 1000 |
X, Y -> Float (X/Y) | |||
This implements division (of integers or floats) with a float result, i.e. it corresponds to the "/" operator in NewtonScript. | |||
divide with integer result | 307 000 011 | C7 00 09 | 1100 0111 0000 0000 0000 1001 |
X, Y -> Integer (X div Y) | |||
This implements division of integers with an integer result, i.e. it corresponds to the <#007F>div" operator in NewtonScript. | |||
compare less than | 307 000 012 | C7 00 0A | 1100 0111 0000 0000 0000 1010 |
X, Y -> Boolean (X < Y) | |||
This checks if X is less than Y and pushes a Boolean result (True or Nil). | |||
compare greater than | 307 000 013 | C7 00 0B | 1100 0111 0000 0000 0000 1011 |
X, Y -> Boolean (X > Y) | |||
This checks if X is greater than Y and pushes a Boolean result (True or Nil). | |||
compare greater or equal | 307 000 014 | C7 00 0C | 1100 0111 0000 0000 0000 1100 |
X, Y -> Boolean (X >= Y) | |||
This checks if X is greater than or equal to Y and pushes a Boolean result (True or Nil). | |||
compare less or equal | 307 000 015 | C7 00 0D | 1100 0111 0000 0000 0000 1101 |
X, Y -> Boolean (X <= Y) | |||
This checks if X is less than or equal to Y and pushes a Boolean result (True or Nil). | |||
binary and | 307 000 016 | C7 00 0E | 1100 0111 0000 0000 0000 1110 |
Integer X, integer Y -> Integer band(X,Y) | |||
This performs a binary and of the two integers X and Y. | |||
binary or | 307 000 017 | C7 00 0F | 1100 0111 0000 0000 0000 1111 |
Integer X, integer Y -> Integer bor(X,Y) | |||
This performs a binary or of the two integers X and Y. | |||
binary not | 307 000 020 | C7 00 10 | 1100 0111 0000 0000 0001 0000 |
Integer X -> Integer bnot(X) | |||
This performs a binary not of the integer X. | |||
foreach | 307 000 021 | C7 00 11 | 1100 0111 0000 0000 0001 0001 |
X, Boolean -> [Iterator] | |||
This is the fundamental instruction used to implement all
NewtonScript foreach loops. If the Boolean is Nil it
begins a normal foreach loop and if it is True it begins a
foreach deeply loop. The iterator returned is then passed
to the other two related instructions 005 (foreach next)
and 006 (foreach complete). The iterator itself is an
array containing various useful state information about
what is being iterated over and how far the iteration has
progressed.
foreach next takes the iterator and modifies it so that the next item in the iteration becomes the current item. foreach complete checks the state of the iterator and returns True if all items have been iterated over and Nil if not. So, a foreach loop is implemented in bytecode as:
|
|||
length | 307 000 022 | C7 00 12 | 1100 0111 0000 0000 0001 0010 |
X -> Integer length(X) | |||
This implements the length function. | |||
clone | 307 000 023 | C7 00 13 | 1100 0111 0000 0000 0001 0011 |
X -> clone(X) | |||
This implements the clone function. | |||
set class | 307 000 024 | C7 00 14 | 1100 0111 0000 0000 0001 0100 |
X, 'Symbol -> modified X | |||
This implements the SetClass function e.g. SetClass( data, 'Binary ). X is a complex data structure (frame, array or data block); SetClass sets the class of that structure and then returns a reference to the now modified structure. | |||
add array slot | 307 000 025 | C7 00 15 | 1100 0111 0000 0000 0001 0101 |
[X], Y -> Y | |||
This implements the AddArraySlot function e.g. AddArraySlot( foo, 2 ). Y is added as the last element of the array X. | |||
make string | 307 000 026 | C7 00 16 | 1100 0111 0000 0000 0001 0110 |
[Array of strings] -> String | |||
This takes an array of strings and combines them into a
single string. This function is used to implement the
NewtonScript operators "&" and "&&". "&&" is currently
done by inserting an extra single space string into the
array of strings to be translated.
Example:
|
|||
slot exists | 307 000 027 | C7 00 17 | 1100 0111 0000 0000 0001 0111 |
{X}, 'SlotName -> Boolean (True if X contains the given slot) | |||
This implements the exists operator, but only when checking for the existance of a slot in a frame (e.g. foo.bar exists). When checking for the existance of a variable, a function call (bytecode 05N) to the function HasVar is used instead. | |||
class of | 307 000 030 | C7 00 18 | 1100 0111 0000 0000 0001 1000 |
X -> ClassOf(X) | |||
This implements the ClassOf function. | |||
on exception | 31N | C8 through CF | 1100 1??? |
('Exception, byte offset)N -> * | |||
This takes N pairs of an exception symbol and an offset
into the bytecode to jump to if that exception occurs.
This is used to encode the high level try ... onexception
construct. Note that unlike all the goto bytecodes, the
offset is encoded as a standard NewtonScript integer. At
the end of both the normal and the exception code, the
bytecode 007 000 007 is executed - I guess it clears
whatever state was set up.
Example:
|