Some days ago, I finished the first public version of my audiovisual virtual machine, IBNIZ. I also showed it off on YouTube with the following video:
As demonstrated by the video, IBNIZ (Ideally Bare Numeric Impression giZmo) is a virtual machine and a programming language that generates video and audio from very short strings of code. Technically, it is a two-stack machine somewhat similar to Forth, but with the major execption that the stack is cyclical and also used at an output buffer. Also, as every IBNIZ program is implicitly inside a loop that pushes a set of loop variables on the stack on every cycle, even an empty program outputs something (i.e. a changing gradient as video and a constant sawtooth wave as audio).
How does it work?
To illustrate how IBNIZ works, here's how the program ^xp is executed, step by step:
So, in short: on every loop cycle, the VM pushes the values T, Y and X. The operation ^ XORs the values Y and X and xp pops off the remaining value (T). Thus, the stack gets filled by color values where the Y coordinate is XORed by the X coordinate, resulting in the ill-famous "XOR texture".
The representation in the figure was somewhat simplified, however. In reality, IBNIZ uses 32-bit fixed-point arithmetic where the values for Y and X fall between -1 and +1. IBNIZ also runs the program in two separate contexts with separate stacks and internal registers: the video context and the audio context. To illustrate this, here's how an empty program is executed in the video context:
The colorspace is YUV, with the integer part of the pixel value interpreted as U and V (roughly corresponding to hue) and the fractional part interpreted as Y (brightness). The empty program runs in the so-called T-mode where all the loop variables -- T, Y and X -- are entered in the same word (16 bits of T in the integer part and 8+8 bits of Y and X in the fractional). In the audio context, the same program executes as follows:
Just like in the T-mode of the video context, the VM pushes one word per loop cycle. However, in this case, there is no Y or X; the whole word represents T. Also, when interpreting the stack contents as audio, the integer part is ignored altogether and the fractional part is taken as an unsigned 16-bit PCM value.
Also, in the audio context, T increments in steps of 0000.0040 while the step is only 0000.0001 in the video context. This is because we need to calculate 256x256 pixel values per frame (nearly 4 million pixels if there are 60 frames per second) but suffice with considerably fewer PCM samples. In the current implementation, we calculate 61440 audio samples per second (60*65536/64) which is then downscaled to 44100 Hz.
The scheduling and main-looping logic is the only somewhat complex thing in IBNIZ. All the rest is very elementary, something that can be found as instructions in the x86 architecture or as words in the core Forth vocabulary. Basic arithmetic and stack-shuffling. Memory load and store. An if/then/else structure, two kinds of loop structures and subroutine definition/calling. Also an instruction for retrieving user input from keyboard or pointing device. Everything needs to be built from these basic building blocks. And yes, it is Turing complete, and no, you are not restricted to the rendering order provided by the implicit main loop.
The full instruction set is described in the documentation. Feel free to check it out experiment with IBNIZ on your own!
So, what's the point?
The IBNIZ project started in 2007 with the codename "EDAM" (Extreme-Density Art Machine). My goal was to participate in the esoteric programming language competition at the same year's Alternative Party, but I didn't finish the VM at time. The project therefore fell to the background. Every now and then, I returned to the project for a short while, maybe revising the instruction set a little bit or experimenting with different colorspaces and loop variable formats. There was no great driving force to insppire me to finish the VM until mid-2011 after some quite succesful experiments with very short audiovisual programs. Once some of my musical experiments spawned a trend that eventually even got a name of its own, "bytebeat", I really had to push myself to finally finishing IBNIZ.
The main goal of IBNIZ, from the very beginning, was to provide a new platform for the demoscene. Something without the usual fallbacks of the real-world platforms when writing extremely small demos. No headers, no program size overhead in video/audio access, extremely high code density, enough processing power and preferrably a machine language that is fun to program with. Something that would have the potential to displace MS-DOS as the primary platform for sub-256-byte demoscene productions.
There are also other considerations. One of them is educational: modern computing platforms tend to be mind-bogglingly complex and highly abstracted and lack the immediacy and tangibility of the old-school home computers. I am somewhat concerned that young people whose mindset would have made them great programmers in the eighties find their mindset totally incompatible with today's mainstream technology and therefore get completely driven away from programming. IBNIZ will hopefully be able to serve as an "oldschool-style platform" in a way that is rewarding enough for today's beginninng programming hobbyists. Also, as the demoscene needs all the new blood it can get, I envision that IBNIZ could serve as a gateway to the demoscene.
I also see that IBNIZ has potential for glitch art and livecoding. By taking a nondeterministic approach to experimentation with IBNIZ, the user may encounter a lot of interesting visual and aural glitch patterns. As for livecoding, I suspect that the compactness of the code as well as the immediate visibility of the changes could make an IBNIZ programming performance quite enjoyable to watch. The live gigs of the chip music scene, for example, might also find use for IBNIZ.
About some design choices and future plans
IBNIZ was originally designed with an esoteric programming language competition in mind, and indeed, the language has already been likened to the classic esoteric language Brainfuck by several critical commentators. I'm not that sure about the similarity with Brainfuck, but it does have strong conceptual similarities with FALSE, the esoteric programming language that inspired Brainfuck. Both IBNIZ and FALSE are based on Forth and use one-character-long instructions, and the perceived awkwardness of both comes from unusual, punctuation-based syntax rather than deliberate attempts at making the language difficult.
When contrasting esotericity with usefulness, it should be noted that many useful, mature and well-liked languages, such as C and Perl, also tend to look like total "line noise" to the uninitiated. Forth, on the other hand, tends to look like mess of random unrelated strings to people unfamiliar with the RPN syntax. I therefore don't see how the esotericity of IBNIZ would hinder its usefulness any more than the usefulness of C, Perl or Forth is hindered by their syntaxes. A more relevant concern would be, for example, the lack of label and variable names in IBNIZ.
There are some design choices that often get questioned, so I'll perhaps explain the rationale for them:
- The colors: the color format has been chosen so that more sensible and neutral colors are more likely than "coder colors". YUV has been chosen over HSV because there is relatively universal hardware support for YUV buffers (and I also think it is easier to get richer gradients with YUV than with HSV).
- Trigonometric functions: I pondered for a long while whether to include SIN and ATAN2 and I finally decided to do so. A lot of demoscene tricks depend, including all kinds of rotating and bouncing things as well as more advanced stuff such as raycasting, depends on the availability of trigonometry. Both of these operations can be found in the FPU instruction set of the x86 and are relatively fundamental mathematical stuff, so we're not going into library bloat here.
- Floating point vs fixed point: I considered floating point for a long while as it would have simplified some advanced tricks. However, IBNIZ code is likely to use a lot of bitwise operations, modular bitwise arithmetic and indefinitely running counters which may end up being problematic with floating-point. Fixed point makes the arithmetic more concrete and also improves the implementability of IBNIZ on low-end platforms that lack FPU.
- Different coordinate formats: TYX-video uses signed coordinates because most effects look better when the origin is at the center of the screen. The 'U' opcode (userinput), on the other hand, gives the mouse coordinates in unsigned format to ease up pixel-plotting (you can directly use the mouse coordinates as part of the framebuffer memory address). T-video uses unsigned coordinates for making the values linear and also for easier coupling with the unsigned coordinates provided by 'U'.
Right now, all the existing implementations of IBNIZ are rather slow. The C implementation is completely interpretive without any optimization phase prior to execution. However, a faster implementation with some clever static analysis is quite high on the to-do list, and I expect a considerable performance boost once native-code JIT compilers come into use. After all, if we are ever planning to displace MS-DOS as a sizecoding platform, we will need to get IBNIZ to run at least faster than DOSBOX.
The use of externally-provided coordinate and time values will make it possible to scale a considerable portion of IBNIZ programs to a vast range of different resolutions from character-cell framebuffers on 8-bit platforms to today's highest higher-than-high-definition standards. I suspect that a lot of IBNIZ programs can be automatically compiled into shader code or fast C-64 machine language (yes, I've made some preliminary calculations for "Ibniz 64" as well). The currently implemented resolution, 256x256, however, will remain as the default resolution that will ensure compatibility. This resolution, by the way, has been chosen because it is in the same class with 320x200, the most popular resolution of tiny MS-DOS demos.
At some point of time, it will also become necessary to introduce a compact binary representation of IBNIZ code -- with variable bit lengths primarily based on the frequency of each instruction. The byte-per-character representation already has a higher code density than the 16-bit x86 machine language, and I expect that a bit-length-optimized representation will really break some boundaries for low size classes.