​Re: UUID vs Longint primary key

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

​Re: UUID vs Longint primary key

4D Tech mailing list
On  Fri, 4 Aug 2017 12:52:28 -0700

David Adams <[hidden email]>
​ wrote:​

>
> ​[snip] ​
> 4D's UUIDs function as globally unique row *serial numbers*. That's great
> for backups and convenient for physical relations, but it has exactly zero
> to do with a real "primary key" or relational integrity.
> ​ [snip]​
>

​Care to elaborate on that statement? I don't get why you'd say that.​
_____________________
Steve Simpson
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ​Re: UUID vs Longint primary key

4D Tech mailing list
I am not David but I agree with his assessment. Relational model databases by definition are not supposed to use keys that have no meaning. They are supposed to create relations that have meaning. Even the use of numbers breaks the theoretical rule. The problem is that we have pushed all to use relational model rather that what he logical model proposes. Who out there remembers hierarchical, tree, or inverted structure models.

Regards
Chuck

Sent from my iPhone

> On Aug 6, 2017, at 10:47 AM, steve simpson via 4D_Tech <[hidden email]> wrote:
>
> On  Fri, 4 Aug 2017 12:52:28 -0700
> ​
> David Adams <[hidden email]>
> ​ wrote:​
>
>>
>> ​[snip] ​
>> 4D's UUIDs function as globally unique row *serial numbers*. That's great
>> for backups and convenient for physical relations, but it has exactly zero
>> to do with a real "primary key" or relational integrity.
>> ​ [snip]​
>>
>
> ​Care to elaborate on that statement? I don't get why you'd say that.​
> _____________________
> Steve Simpson
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:[hidden email]
> **********************************************************************

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ​Re: UUID vs Longint primary key

4D Tech mailing list
In reply to this post by 4D Tech mailing list
On Sun, Aug 6, 2017 at 7:47 AM, steve simpson via 4D_Tech <
[hidden email]> wrote:

> On  Fri, 4 Aug 2017 12:52:28 -0700
> ​
> David Adams <[hidden email]>
> ​ wrote:​
>
> >
> > ​[snip] ​
> > 4D's UUIDs function as globally unique row *serial numbers*. That's great
> > for backups and convenient for physical relations, but it has exactly
> zero
> > to do with a real "primary key" or relational integrity.
> > ​ [snip]​
>
>
Sure, but only briefly (collective sigh of relief, I'm sure.)

When you design a relational database, you sort out the "keys" by finding
the field or combination of fields in a table that make the row unique. The
next part of that design is that each field in the row should be about the
row and nothing but the row. What does a UUID or longint ID have to do with
the data in the row? Nothing at all. This kind of "synthetic key" is an
implementation-level accommodation. It has nothing to do with the data,
it's just something bolted on for practical, real-world reasons. For
example, you might have a unique customer name (just saying) and they might
want to change it later. Ugh! With a unique-but-meaningless ID, that's no
problem.

I use synthetic keys all of the time and have forever. So, no quarrel with
using them for links and so on. But they do bring on another problem: These
sorts of IDs not only do not maintain the uniqueness of your rows, they
very easily obscure duplicate rows.

Imagine that you've got an asset management system and it stores the
location of each desk in a facility. There's a table for every room in the
facility. So, you've got rooms and desks. Both are finite, both are real
and tangible. So, over in your room table youve got something like

Building     Keller
Room        123

Back in the real world, the building names are unique and, within a
building, room names or numbers are unique. So, you've got a unique row
here. Imagine using that data as a key:

Keller123

That's a sound key and the row design is also solid. But this kind of key
is a real pain down the row when things change. It may be that a new
numbering scheme is introduced or a building is renamed for some reason. (A
big donation, etc.) The actual collection of real-world buildings and rooms
hasn't changed (in this example), but you need to update the records to
reflect the new real-world scheme. So you have to go back and change the
[Room] record, and then any linked [Desk] records, or any other links. Ugh.

I'll be most, if not all of us, got burned by using a "real" key like this
early on. I did. So, longints and now UUIDs. Easier! So, now I'll use a
longint:

1  Keller 123

I link on 1, I can rename the building or room, no drama.

So what was I talking about? So far I'm a fan of using longints/UUIds (I
am.) The problem is this:

1  Keller 123
2  Keller 123
3  Keller 124

You've got unique *rows* but duplicated *data*. There are two Keller 123
records - but they've got different unique longints bolted on so they
aren't treated as duplicates. The "unique" IDs are masking the duplicates.
Notice that the "unique ID" has nothing at all to do with the rooms in the
world. It's just a unique number. Unique rows, duplicate data.

A UUID, longint, etc. is a practical way to link records (and the one I
use), but it then leaves the task of doing integrity checks and so on to
you.

As I understand it, UUIDs were added to provide globally unique serial
numbers to records for the sake of journaled backups. That's a good goal,
and 4D Backups seems to work really well now.  I never upgraded my main
source from V13 because, for quite some time, it was a bit of a mess to
sort out the UUIDs and I couldn't be blethered to pour the time into
figuring out. After a few years, I found a reliable set of SQL instructions
to do a retrofit. In any case, I'm working on something now that I believe
began in V15 so it's already been sorted out there from the start.
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ​Re: UUID vs Longint primary key

4D Tech mailing list
In reply to this post by 4D Tech mailing list
> Relational model databases by definition are not supposed to use keys that have no meaning.

You may have stated that backwards… relational model databases by design should never, never, never use business data for keys. This was the number one rule drilled into me by my college relational database classes. I found this to be very wise advice in the real world. It came in handy when building a database for vehicles and the suggestion was to use VIN as a primary key since they were unique… it turns our that as unique as VIN should be, they are not… it was no problem for out database to handle vehicles with the same VIN because we followed the non-business-data-for-primary-key rule.

On the other hand, you do model the data after business relations, but the keys that tie that relation data need/should never be seen in a well designed system. If a user readable key is needed by business, then there should be another data piece that the user can read (like an MRN, medical record number, or an abbreviation that is unique and human readable) But these should never be used to link together data in a structure in primary key foreign key relation.

Just my two cents.

Neil

--
Neil Dennis
4D Developer since 1990


GreaText - Designing Software for the Way You Work
716 East 1850 N
North Ogden, UT 84414

mailto:[hidden email]
http://www.greatext.com/



**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ​Re: UUID vs Longint primary key

4D Tech mailing list
> Relational model databases by definition are not supposed to use keys
that have no meaning.

On Sun, Aug 6, 2017 at 12:08 PM, npdennis via 4D_Tech <[hidden email]>
wrote:
> You may have stated that backwards…

Nope, that's exactly what I meant. But apologies for not making the point
more carefully. I'll make up for that now.

> relational model databases by design should never, never, never use
business data for keys.
> This was the number one rule drilled into me by my college relational
database classes. I
> found this to be very wise advice in the real world.

I agree! But there are two different pieces to this. In the one part, no
synthetic keys and in the other, synthetic keys. When you did your RDBMS
classes, they probably discussed formal design versus system
implementation. When you're creating a normalized *design*, you don't have
artificial/synthetic keys - because your focus is on the pure data itself
and such keys have no meaning. (They're a database artifact, not part of
the real world entity you're modeling.) The "real" primary key gets sorted
out when you're putting your design into second or third "normal form."
(Even if you don't think of it in those terms. I mean the actual design
step, whatever name you use - if you even use a name.) When you go to
*implement* your tables, it's sensible to use a synthetic key. So, in my
example, I used

Building + Room

...because that's enough to uniquely identify each room with no spare
values in the field, no redundant rows, and nothing that belongs in another
table. (Well, you would have a unique set of Building records, as noted in
the earlier post.) So, that's a correct and sensible *design* and is how
I'd think about the actual data. But implementation? Another story. The
building is renamed and I have to go back and update all of the related
room records? PITA.

It's 100% normal to think about the formal table design and the physical
implementation as distinct steps with different rules. And your VIN
example, like Social Security Number, is an excellent example of a "natural
key" that turns out not to be a great idea. I use synthetic keys for *all*
of my tables (well, there might be a rare exception where I use a natural
key, but that's super rare...and I probably end up regretting that most of
the time.)

In my earlier post I was just assuming that the distinction between design
and implementation was clear, but those steps aren't distinct in 4D. (They
are distinct in design tools and used to be a pretty common concept - they
still are distinct in design-oriented RDBMS development tools.)

There's a natural tension here, and there are two competing tasks that
require work to handle. You only get one of them for free, the other you
have to take care of on your own.

* Natural keys and multi-field keys the truly define a row are a *pain* in
the real world where "unique" values aren't always entered
correctly/stable/etc. To avoid hassles with updates, synthetic keys are
super helpful. They give you _stable keys_. (Mutable keys are absolutely a
pain.)

     Synthetic keys take away the pain of related table changes on UPDATE.

* Synthetic keys have _nothing_ to do with the real data in the row. That's
kind of the point. Which means that you get *no help* in preventing
duplicate rows of data. Like my example of the duplicate building + room
with different IDs.

     Synthetic keys give you the pain of having to hand-check for
duplicates.

The problem shows up when you have "unique" rows with duplicate data. That
gets ugly in a big fat hurry when it's a 1 table and children link to
different rows in the 1 table that aren't really "different." Note that I'm
not talking here about the kind of messy data - like customer names and
addresses - where you need to merge records. That's a related but different
question. My building + room example is nice and clean. The data is
identical, it's just that somehow a duplicate got entered. (I don't want to
muddy the waters with messy data, it's not necessary or helpful for the
current discussion.)

Here's a possibly apocryphal tangent from my first years in the work world.
Back when things were typed. On typewriters. It was customary to put at the
bottom of each page 1/5, 2/5 and so on. I heard tell of a report that had a
sixth page added. Rather than retype all of it, here's what the boss got
back at the bottom of the six pages:

1/5
2/5
3/5
4/5
6/6
Not the same thing, but I always thought of it as a good example ;-)

In 4D, you get a benefit from using synthetic keys - you get stable keys
for update for free. I'll take that deal! But you get no automatic help
with duplicate control. You can add this by implementing a compound index
marked as unique. Or you can write custom scanner code to audit your data.
The exact options depend on the database, but the work is required either
way. In 4D databases, what I most often see is that people aren't checking
for duplicate rows either at the engine level (compound index with unique)
or via record audits. 4D itself has no referential integrity scanning tools
built in. (It could, for example, in MSC for relations with arrows drawn.
But then it would need a way of distinguishing required N:1 versus optional
N:1 relations. What's called a "dependent" versus "independent" entity on
the child side.) So, mostly I see creeping bad data.

Since I'm diving into Postgres these days (mostly on the import side - my
table designs aren't even right yet - pacing myself), I see that they have
a wide range of tools for enforcing row uniqueness and referential
integrity. Nice. They've also got UUIDs as a field type because, well,
they're pretty darn handy.
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ​Re: UUID vs Longint primary key

4D Tech mailing list
In reply to this post by 4D Tech mailing list
yes,
BUT -
I find that by placing the internal linking value (non-editable) on an
entry form
GREATLY enhances the ability/simplicity of tracking down data issues.

ex (without viewable internal key):
user: "... customer John Smith does not show the correct invoice(s)
Dev/Sys admim: John Smith.. just a minute... hmm There are 43 John
Smith's in the system, which one do you mean?
user: you know JOHN SMITH!
......... [no need to continue  :) ]

ex: (with visible key):
user: "... customer John Smith does not show the correct invoice(s)
Dev/Sys admin: Can you please give me the number in the upper left hand
corner, just below the screen title 'Customer Information'...
user:  1234567
Dev/Sys admin: just a moment... Ok got it. Now what is the exact
problem....
......... [of course at this point getting the user to give useful
information is next to impossible, but at least there is now a
reference to the problem record/customer...]


On Sun, 6 Aug 2017 13:08:14 -0600, npdennis via 4D_Tech wrote:
>
> On the other hand, you do model the data after business relations,
> but the keys that tie that relation data need/should never be seen in
> a well designed system. If a user readable key is needed by business,
> then there should be another data piece that the user can read (like
> an MRN, medical record number, or an abbreviation that is unique and
> human readable) But these should never be used to link together data
> in a structure in primary key foreign key relation.
>
---------------
Gas is for washing parts
Alcohol is for drinkin'
Nitromethane is for racing
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ​Re: UUID vs Longint primary key

4D Tech mailing list
> I find that by placing the internal linking value (non-editable) on an
> entry form
> GREATLY enhances the ability/simplicity of tracking down data issues.

This same thing can also be done by adding the same field to the same table but not linking off of it internally :)



--
Neil Dennis
4D Developer since 1990


GreaText - Designing Software for the Way You Work
716 East 1850 N
North Ogden, UT 84414

mailto:[hidden email]
http://www.greatext.com/


**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ​Re: UUID vs Longint primary key

4D Tech mailing list
In reply to this post by 4D Tech mailing list
David,
how do you tell these two issues apart:
Customer name : John Smith
Customer Name : John Smyth

is this a typo (one should be Smyth and is not, or one should be Smith
and is not)?
is it real (2 John Smiths with different spellings)

I see this problem with a 'free form' entry inventory data. i.e. the
user can enter whatever in the item name/description field.

I do not see (for my inventory) a way to keep duplicate entries like:
Glove, latex, Medium
Medium Glove, latex
Latex glove size medium

yes, I can check part numbers, but... same glove from different
suppliers can/will have different part numbers
ex:
Joe's supply house: 12345
Fred's house of supplies: 56342-m

Worse, I've found that the same product, from the same vendor in
differing purchase amounts (1 vs case) is the same part number, but
different pricing! So.. even a check on part numbers is insufficient to
stop duplicate entries.


I have setup a means for the users to remove duplicates, but of course
they never do....

always welcome a way to improve this situation
Chip

On Sun, 6 Aug 2017 13:06:28 -0700, David Adams via 4D_Tech wrote:
> Since I'm diving into Postgres these days (mostly on the import side - my
> table designs aren't even right yet - pacing myself), I see that they have
> a wide range of tools for enforcing row uniqueness and referential
---------------
Gas is for washing parts
Alcohol is for drinkin'
Nitromethane is for racing
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ​Re: UUID vs Longint primary key

4D Tech mailing list
In reply to this post by 4D Tech mailing list
ok - I'm confused.
what is the difference between a display only field showing an internal
ID number,
and a duplicate, display only ID number showing on the entry form?

 
On Tue, 8 Aug 2017 08:07:00 -0600, npdennis wrote:

>> I find that by placing the internal linking value (non-editable) on an
>> entry form
>> GREATLY enhances the ability/simplicity of tracking down data issues.
>
> This same thing can also be done by adding the same field to the same
> table but not linking off of it internally :)
>
>
>
> --
> Neil Dennis
> 4D Developer since 1990
>
>
> GreaText - Designing Software for the Way You Work
> 716 East 1850 N
> North Ogden, UT 84414
>
> mailto:[hidden email]
> http://www.greatext.com/
>
>
---------------
Gas is for washing parts
Alcohol is for drinkin'
Nitromethane is for racing
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: ​Re: UUID vs Longint primary key

4D Tech mailing list
> what is the difference between a display only field showing an internal ID number,
> and a duplicate, display only ID number showing on the entry form?
 
It really comes down to preference, you can link how you wish. A read only numeric ID works in most cases.

The reason I don't use them is that if you expose a read only ID to the business, the business owns that ID not you or the application. At that point they can request things change around that ID that would/could break your linking. Most the time you will be OK, other times it can mess you up or cause you to say "No we can't do that" to a client.

As much as I like to think the business is constant and won't change, and that unique numbers stay unique, it does happen all of the time, even around read only keys. I have been around a long time and have been glad that I use internal IDs instead of business exposed keys on multiple occasions.

Using keys that the client never sees, fixes this potential (in a lot of cases unlikely) problem. As for me and my databases, I link my data with keys that I own and the business can't see.

I also use UUID over longint

 - UUID is faster (do to "random" data in the index)
 - UUID solves problems with distributed systems that sync
 - UUID fixes the home grown sequence problem with transactions
 - UUID is not easily readable by human and keeps me from being tempted to expose them :)


Neil




--

Privacy Disclaimer: This message contains confidential information and is intended only for the named addressee. If you are not the named addressee you should not disseminate, distribute or copy this email. Please delete this email from your system and notify the sender immediately by replying to this email.  If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.

The Alternative Investments division of UMB Fund Services provides a full range of services to hedge funds, funds of funds and private equity funds.  Any tax advice in this communication is not intended to be used, and cannot be used, by a client or any other person or entity for the purpose of (a) avoiding penalties that may be imposed on any taxpayer or (b) promoting, marketing, or recommending to another party any matter addressed herein.
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Loading...