ⓘ Bytecode


ⓘ Bytecode

Bytecode, also termed portable code or p-code, is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references that encode the result of compiler parsing and performing semantic analysis of things like type, scope, and nesting depths of program objects.

The name bytecode stems from instruction sets that have one-byte opcodes followed by optional parameters. Intermediate representations such as bytecode may be output by programming language implementations to ease interpretation, or it may be used to reduce hardware and operating system dependence by allowing the same code to run cross-platform, on different devices. Bytecode may often be either directly executed on a virtual machine a p-code machine i.e., interpreter, or it may be further compiled into machine code for better performance.

Since bytecode instructions are processed by software, they may be arbitrarily complex, but are nonetheless often akin to traditional hardware instructions: virtual stack machines are the most common, but virtual register machines have been built also. Different parts may often be stored in separate files, similar to object modules, but dynamically loaded during execution.


1. Execution

A bytecode program may be executed by parsing and directly executing the instructions, one at a time. This kind of bytecode interpreter is very portable. Some systems, called dynamic translators, or just-in-time JIT compilers, translate bytecode into machine code as necessary at runtime. This makes the virtual machine hardware-specific but does not lose the portability of the bytecode. For example, Java and Smalltalk code is typically stored in bytecode format, which is typically then JIT compiled to translate the bytecode to machine code before execution. This introduces a delay before a program is run, when the bytecode is compiled to native machine code, but improves execution speed considerably compared to interpreting source code directly, normally by around an order of magnitude 10x.

Because of its performance advantage, today many language implementations execute a program in two phases, first compiling the source code into bytecode, and then passing the bytecode to the virtual machine. There are bytecode based virtual machines of this sort for Java, Raku, Python, PHP, Tcl, mawk and Forth. The implementation of Perl and Ruby 1.8 instead work by walking an abstract syntax tree representation derived from the source code.

More recently, the authors of V8 and Dart have challenged the notion that intermediate bytecode is needed for fast and efficient VM implementation. Both of these language implementations currently do direct JIT compiling from source code to machine code with no bytecode intermediary.


2. Examples

> > > import dis # "dis" - Disassembler of Python byte code into mnemonics. > > > dis.disprint"Hello, World!") 1 0 LOAD_NAME 0 print 2 LOAD_CONST 0 Hello, World! 4 CALL_FUNCTION 1 6 RETURN_VALUE
  • Common Intermediate Language executed by Common Language Runtime, used by.NET Framework languages such as C#
  • Berkeley Packet Filter
  • BANCStar, originally bytecode for an interface-building tool but used also as a language
  • GNU Emacs is a text editor with most of its functions implemented by Emacs Lisp, its built-in dialect of Lisp. These features are compiled into bytecode. This architecture allows users to customize the editor with a high level language, which after compiling into bytecode yields reasonable performance.
  • Byte Code Engineering Library
  • EM, the Amsterdam Compiler Kit virtual machine used as an intermediate compiling language and as a modern bytecode language
  • EiffelStudio for the Eiffel programming language
  • CMUCL and Scieneer Common Lisp implementations of Common Lisp can compile either to native code or to bytecode, which is far more compact
  • Java bytecode, which is executed by the Java virtual machine
  • Ericsson implementation of Erlang uses BEAM bytecodes
  • BCEL
  • Dis bytecode, designed for the Inferno operating system, is executed by the Dis virtual machine
  • JMangler
  • CLISP implementation of Common Lisp used to compile only to bytecode for many years; however, now it also supports compiling to native code with the help of GNU lightning
  • Icon and Unicon programming languages
  • Infocom used the Z-machine to make its software applications more portable
  • ASM
  • Dalvik bytecode, designed for the Android platform, is executed by the Dalvik virtual machine
  • Javassist
  • ActionScript executes in the ActionScript Virtual Machine AVM, which is part of Flash Player and AIR. ActionScript code is typically transformed into bytecode format by a compiler. Examples of compilers include one built into Adobe Flash Professional and one built into Adobe Flash Builder and available in the Adobe Flex SDK.
  • Embeddable Common Lisp implementation of Common Lisp can compile to bytecode or C code
  • C to Java virtual machine compilers
  • Adobe Flash objects
  • Tcl
  • p-code of UCSD Pascal implementation of the Pascal language
  • The Spin interpreter built into the Parallax Propeller microcontroller
  • WebAssembly
  • Scheme 48 implementation of Scheme using bytecode interpreter
  • The SQLite database engine translates SQL statements into a bespoke byte-code format.
  • LSL, a scripting language used in virtual worlds compiles in bytecode running on a virtual machine. Second Life has the original Mono version, Inworldz developed the Phlox version.
  • O-code of the BCPL programming language
  • OCaml language optionally compiles to a compact bytecode form
  • Lua language uses a register-based bytecode virtual machine
  • The R environment for statistical computing offers a bytecode compiler through the compiler package, now standard with R version 2.13.0. It is possible to compile this version of R so that the base and recommended packages exploit this.
  • YARV and Rubinius for Ruby
  • Pyramid 2000 adventure game
  • Pick BASIC also referred to as Data BASIC or MultiValue BASIC
  • m-code of the MATLAB language
  • Bytecodes of many implementations of the Smalltalk language
  • SWEET16
  • Parrot virtual machine
  • Multiplan
  • KEYB, the MS-DOS/PC DOS keyboard driver with its resource file KEYBOARD.SYS containing layout information and short p-code sequences executed by an interpreter inside the resident driver.
  • Tiny BASIC
  • Visual FoxPro compiles to bytecode