What’s a GUID or UUID?

A GUID (pronounced gwid (rhymes with lid) or goo-id (rhymes with druid), is a 4 letter acronym for “Globally Unique IDentifier”.  Also called a UUID (usually spelled out in speech as “you you eye dee”) is a 4 letter acronym for “Universally Unique IDentifier”.  Those are two labels for the same thing (called “GUID” for the remainder of this article).

A GUID is a 128 bit value that represents a unique ID.  GUIDs are used in data storage and computer programming to give a unique ID to a piece of data, whether it’s a file, a record in a database, or anything else a programmer can come up with.  The 128 bit (or 16 byte) value is generated by an algorithm designed to have generally random results.  With 128 bits to generate, the results are almost guaranteed to be unique.  It’s extremely unlikely that a GUID generated at any point in time will be identical to another GUID generated on another computer by a completely different program, at another point in time.  Having said that, GUIDs are not guaranteed to be unique, but they’re unique enough that you can safely assume that all the ones you generate are unique.

Here are some examples of some GUIDs:

d9573e45-2ee6-4962-82a1-5c7a19159ed3
e75f65aa-68f9-4552-80b7-201978c72102
cc010ab1-a86e-4c59-a875-974bb29c90ba
b27a9e44-6e1d-4b7d-9b47-b87a00a35721
5b4ff12e-9c46-4f0e-ba45-eb6fc12b7d92
140aff9c-0d6b-420c-98d3-814d80f6967a
df143227-fb2e-40db-ba30-1fc988baa3bb
bc09e400-8319-4619-84f5-a6fa4174845c
7dc144f2-b0cb-4b80-9f16-cd5b086c343c
89b74727-b505-4cc1-bff2-2a55d02503c2
af4d7de1-6c9e-4191-9b1d-9d973b42f135
2cf6439d-aab7-49de-ba73-40dc467cfedc
e3a06dcc-27a1-45a3-9445-296de36533ed
b8650a2b-2a10-4825-ac9c-5ce003745141
ca70f6af-7916-4fe5-9097-dd5a62447dc0

Those GUIDs were generated with my Online GUID Generator.  Go ahead and try it.

When displayed, they are usually displayed on paper or on screen as a series of hexadecimal strings.  Hyphens are usually inserted when displaying a GUID, but are not part of the GUID.  The hyphens are only there to help humans keep their place while reading or transcribing them.  The placement of the dashes is pretty standard though.  8 characters of hex, dash, 4 hex chars, dash, 4 more hex chars, dash, 4 more hex chars, dash, then 12 more hex chars.  It’s not uncommon to see a GUID with curly braces around it, like this:

{140aff9c-0d6b-420c-98d3-814d80f6967a}

The alphabetic characters may be displayed as uppercase, lowercase, or even mixed case.  The casing is not part of the uniqueness or of the identifying nature of a GUID.  If two GUIDs differ only in their casing, they are identical.  The casing, curly braces, and dashes are all optional and are only chosen for visual effect.  They are not part of the GUID.  They have no bearing whatsoever on the value of the GUID.

Why are GUIDs used as opposed to other forms of ID?

Some alternative forms of ID, specifically in databases, to uniquely identify a row in a table are:

  1. Auto-incrementing integers (Numbers that automatically increase with each new record).
  2. Picking one or more fields of the actual data to represent something unique to that record.
  3. Timestamps (usually with millisecond precision).

I’ll relist each alternative again, but give advantages that GUIDs have over each of them:

  1. Auto-incrementing integers (Numbers that automatically increase with each new record).
    1. If the data in the database that keeps track of what the next number is gets corrupt, then there may be duplicates or errors when attempting to create new records.  This does not occur with GUIDs.
    2. When a program writes a new record to the table, the database server is the one that generates a new incrementing ID.  This requires 2 calls to the database which is inefficient and sometimes difficult.  GUIDs can be generated on the client and sent to the database, with the rest of the data, and still be unique.  Integers cannot be generated on the client, unless unusually complex methods are employed.
    3. These have to be generated by the database itself, so the calling program doesn’t know what the next number is nor what was just generated (unless they make that second call to the database).  Since GUIDs can be safely created on the client, a 2nd call is not necessary since the client already knows that the ID is before it makes the 1st call.
    4. GUIDs can be generated by the database OR by the calling program before it calls the database to store the record, so only ONE call is needed to the database.
    5. Merging:  When merging two tables or two databases, since auto-incrementing integers usually start with zero, each table in each database will have a record with ID 0, and one with 1, and so on.  Tables with GUIDs have IDs that are unique across all records in a table, across all tables in a database, across all databases in existence.  Merging tables that use GUIDs as their record identifiers will not cause conflicts like integers will.
    6. Replication:  Many databases support linking 2 or more database servers together to cause updates in one database to be reflected in another.  This is difficult, if not impossible, to do with auto-incrementing numbers.
  2. Picking one or more fields of the actual data to represent something unique to that record.
    1. Once a column (or columns) is/are selected for being a unique key, those data values are now locked and cannot change.  Using GUIDs gives you the flexibility to change those values without losing the key.  For example, how many web sites have you registered with where once you’ve chosen your logon name, you can NEVER change it again?  That’s most likely because the people who designed their database that stores the logon names use the actual logon name as the unique key.  If they would instead, generate a GUID and use THAT as the unique key, then letting the users change their logon name would be as easy as letting them change their e-mail address or credit card number.
  3. Timestamps (usually with millisecond precision).
    1. Time stamps lose precision when transferring between two or more technologies.  For example, a C/C++, Java, or .NET program can generate a timestamp, then store it in a field in a table in a database… say a MS SQL database, or IBM DB2, or Oracle, or MySQL.  If they then load that timestamp back and compare it to the one they generated that’s still in memory, they will likely be unequal because one of the two technologies doesn’t use as much precision as the other.  So, looking up a record by a timestamp is likely to end in failure.

Leave a Reply