Machine Language or Machine Code

image Machine Language is the only language that can truly be run by a computer.  ALL other languages, no matter what the languages, are eventually converted down to machine code that’s specific to the CPU of the machine the software is running on.  At this level, the differences between CPU models becomes critical.

CPUs are the brains and the work horses of computers.  The CPU is the part that actually takes action on each of the instructions in a computer program.  The instructions that CPUs can execute are actually quite simple, regardless of the complexity of the source code and the language of the software that the original developer(s) designed.  Most CPUs have the following parts:

  • Program Counter (keeps track of what instruction is being executed).
  • Stack Pointer (keeps track of where the program was when it jumped to another section of code).
  • Stack (holds all the previous left code locations from prior jumps)
  • Registers (hardware variables to hold temporary values for quick calculations).  There are various types of registers depending on the processor.

In addition to those parts, the CPU also has direct access to the system’s memory.

The things that a CPU can actually do:

  • Copy data from one memory location to another (usually one one or a handful of bytes at a time).
  • Read bits, bytes, and words from any memory location or from registers.
  • Store bits, bytes, and words from any memory location to to registers.
  • Shift bits in a byte, or word in a register or a memory location.
  • Compare values between memory and/or registers and switch code execution to different parts in memory based on the result.
  • Add, subtract, increment, or decrement values in memory or registers.
  • Switch to another memory location for code execution.
  • Return from a particular section of code execution to a previous section that was executing.
  • push and pop values to and from the stack.

Depending on the processor, some also have the following capabilities:

  • perform floating point math.
  • automatically jump to a new memory location for code execution because of an external event (called an interrupt) such as keyboard input, mouse input, etc…

Various processors have varying capabilities, but the previous list contains the basic capabilities of the majority of processors.

Multi-Threading:

Multi-Core Processor Up until about 2002 or so, most PC processors (by PC, I mean “Personal Computer”, not “Windows” and since most people consider Macs more “personal” than others, “PC” definitely includes Mac as well)… anyway, up until around 2002 or so, most PC processors were capable of executing only one instruction at a time.  When that instruction completes, the next instruction would be executed, and so on, ad infinitum until the power is turned off.

In 2002, Intel introduced a partial multi threading technology called “Hyper Threading”.  This allowed one processor to appear as two processors to the operating system software.  By enabling this feature, the operating system could assign different processes to the two virtual processors, making it appear that two processes were running simultaneously.  Hyperthreading wasn’t pure or true parallel processing because it didn’t have two complete cores.  The processor looked at code that was about to execute and decided whether or not it could handle that portion of 2 threads simultaneously, and if so, it would.  Code that was optimized for multiple processors did see a measurable increase when this technology was enabled, but certainly no where near double speed as you’d expect with true parallelism.  In fact, code that was not optimized for it would actually run slower in some situations.

Some time around 2006, Intel started making processors with 2 actual complete processing units (called multi-core CPUs).  These were true multi-processing units and could execute 2 things at once all the time.  Since then both Intel and AMD have released faster multi-core processors and have been increasing the amount of “cores” in them.  As of this writing, you can get one with as many as 6 cores and an 8 core processor is just around the corner.

image The way a processor executes machine code is when it powers on, it begins looking at a hard wired memory address for instructions.  Each processor type has a different place in memory to look for the first instruction.  Regardless of where it is, it looks at the byte value in that memory location.  Each byte value means something different to the processor.  One byte value could mean to load a register with a value at a memory address specified in the next few bytes, another byte value may mean to jump to another memory location to find the next instruction, etc…  Each type of processor has its own set of instructions that’s unique to that processor, so machine code written for one type of processor cannot be executed on another type.  For example, the old 8-bit Apple // computers used a Motorola 6502 8 bit processor.  That processor does not have the circuitry to execute machine byte code the same way an Intel x86 compatible processor does and vice versa.  This is why, up until recently, you couldn’t take a program written for a Windows machine and run it on a Mac and vice versa, because the two machines used completely different hardware (different types of CPUs that couldn’t understand each other’s machine code).  Of course, around 2007 or so, Apple switched from the PowerPC processors to Intel processors.  Now, Macs can run Windows programs natively (without any kind of run time translation).

image The set of instructions that a particular type of processor can execute is called that processor’s “Instruction Set”.  This is one of the primary things that makes a processor what it is.  Whenever a programmer writes a program in a high level language (Machine code is called a low level language) like C, C++, C#, JavaScript, or whatever, the code is eventually converted down into machine code for the processor of the machine it run on.  There are NO exceptions to this.  ALL code is eventually converted to machine code.

Few people (if any) write code directly in machine code.  This image would entail writing actual bytes, not human readable instructions.  The lowest level people general write code in is Assembly Language, which is just 1/2 a step above Machine Code.  Assembly language is still one machine instruction at a time, but instead of writing actual bytes, the programmer writes in pneumonics.  In other words, there’s an English like word for each machine instruction available and the programmer writes those English like words.  The programmer saves this source code into a text file, then executes a program called an assembler than then converts the instructions in the text file directly into machine code, which can then be executed by the processor.

A note about terminology:

Machine Code or Machine Language:  These are the bytes that make up the instructions for a CPU to process.

Assembly Language:  This is a programming language.  Each processor with a different instruction set requires a completely different Assembly language with instructions (or pneumonics) specific to that processor.

Assembler:

  1. A program that converts an assembly language source file into machine code.  In a higher level language, we have programs called compilers that convert the high level language into machine code.  An assembler is different because it’s not interpreting high level concepts to form large machine code equivalents.  Instead, it just takes the instructions, one at a time, and converts each into one actual machine code instruction.  It’s a different and simpler process, so it is not called a compiler and is just an assembler.
  2. A language, not to be confused with Assembly!!!!  This language is NOT Assembly Language!  Though, the majority of programmers incorrectly use the terms “assembly” and “assembler” interchangeably.  Assembler is a language that’s specific to a particular vendor’s assembler.  For example, when you’re writing an Assembly Language program, you may need to leave an instruction in the source code that tells the Assembler (the program that converts your source code) how to convert some instructions.  You may want the assembler to generate 64 bit versions of your instructions rather than 32 bit versions.  This instruction left in the source code is NOT a machine instruction.  It’s an instruction for the Assembler that only has meaning during assembly time.  It does NOT get translated into a machine code.  Since this is a special instruction for the Assembler program, the language for those types of NON-MACHINE instructions are called Assembler Language.  As you can see, this is quite different from Assembly Language, which is an actual instruction for the hardware CPU.

One Reply to “Machine Language or Machine Code”

Leave a Reply