Preventing Duplicate Data WAS UUID vs Longint primary key

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Preventing Duplicate Data WAS UUID vs Longint primary key

4D Tech mailing list
> How do you deal with that problem (Preventing duplicate data)

When unique data is required because of a business need, I do implement one of your suggested methods: "Carefully program your system to detect and prevent duplicate rows."

I would suggest not doing this in a trigger, but instead on data entry (imports, user entry). The 4D command "Find in Field" works in many of these cases.

Neil




--


Privacy Disclaimer: This message contains confidential information and is intended only for the named addressee. If you are not the named addressee you should not disseminate, distribute or copy this email. Please delete this email from your system and notify the sender immediately by replying to this email.  If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.

The Alternative Investments division of UMB Fund Services provides a full range of services to hedge funds, funds of funds and private equity funds.  Any tax advice in this communication is not intended to be used, and cannot be used, by a client or any other person or entity for the purpose of (a) avoiding penalties that may be imposed on any taxpayer or (b) promoting, marketing, or recommending to another party any matter addressed herein.
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Preventing Duplicate Data WAS UUID vs Longint primary key

4D Tech mailing list
perhaps you can use sounded and store that in a hidden field.

Regards
Chuck
------------------------------------------------------------------------------------------------
 Chuck Miller Voice: (617) 739-0306
 Informed Solutions, Inc. Fax: (617) 232-1064      
 mailto:cjmiller<AT SIGN>informed-solutions.com
 Brookline, MA 02446 USA Registered 4D Developer                
       Providers of 4D and Sybase connectivity
          http://www.informed-solutions.com 
------------------------------------------------------------------------------------------------
This message and any attached documents contain information which may be confidential, subject to privilege or exempt from disclosure under applicable law.  These materials are intended only for the use of the intended recipient. If you are not the intended recipient of this transmission, you are hereby notified that any distribution, disclosure, printing, copying, storage, modification or the taking of any action in reliance upon this transmission is strictly prohibited.  Delivery of this message to any person other than the intended recipient shall not compromise or waive such confidentiality, privilege or exemption from disclosure as to this communication.

> On Aug 7, 2017, at 12:56 PM, Dennis, Neil via 4D_Tech <[hidden email]> wrote:
>
> When unique data is required because of a business need, I do implement one of your suggested methods: "Carefully program your system to detect and prevent duplicate rows."
>
> I would suggest not doing this in a trigger, but instead on data entry (imports, user entry). The 4D command "Find in Field" works in many of these cases.
>
> Neil

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Preventing Duplicate Data WAS UUID vs Longint primary key

4D Tech mailing list
In reply to this post by 4D Tech mailing list
What I have done is to have a field in the table  that contains the keys that make the record unique with all spaces, special characters, and vowels removed, as well as eliminating any consecutive consonants . See my example below. So far this has worked pretty well for me and I guess would fall under David’s category of "Carefully program your system to detect and prevent duplicate rows”

John

—————— Example ------------
$firstName:=[Contacts]firstName (John)
$lastName:=[Contacts]lastName (Baughman)
$company:=[Contacts]company (BY’te DESIGN Hawwaii)  notice I have mistakenly put more than 1 w in Hawaii.
$DupeCheck:= $firstName+ $lastName+$company
[Contacts]DupeCheck:=AlphaOnlyNoVowels ($DupeCheck;"*”)  //AlphaOnlyNoVowels does the heavy lifting. The asterisk tells the method to remove consecutive consonants.

[Contacts]DupeCheck now contains  “JHNBGHMNBYTDSGNHW”

I wrap the above in a duplicate checking method for the [Contacts] table called ContactsDuplicateManager

Whenever a record is updated or created in the Contacts table…

$DupeCheck:= ContactsDuplicateManager ("isDuplcate”;[Contacts]ID)

The ContactsDuplicate method creates the check string as above and searches the contacts table for duplicates using the [Contacts]DupeCheck field. If no duplicates are found it returns the check string. If a duplicate is found it returns the check string with a prepended asterisk.  The contact ID if passed prevents the dupe check from finding the record being updated. If this is a new record 0 is passed for the Contact ID. So…

If ($DupeCheck =“*@“
   Handle the duplicate in context. If, for example this is a user updating or creating a contact record, warn the user of the possible duplicate with available options.

else
   [Contacts]DupeCheck:=$DupeCheck
   SAVE RECORD([Contacts)
 
end if

-------------------------------------





> On Aug 7, 2017, at 6:56 AM, Dennis, Neil via 4D_Tech <[hidden email]> wrote:
>
>> How do you deal with that problem (Preventing duplicate data)
>
> When unique data is required because of a business need, I do implement one of your suggested methods: "Carefully program your system to detect and prevent duplicate rows."
>
> I would suggest not doing this in a trigger, but instead on data entry (imports, user entry). The 4D command "Find in Field" works in many of these cases.
>
> Neil
>
>
>
>
> --
>
>
> Privacy Disclaimer: This message contains confidential information and is intended only for the named addressee. If you are not the named addressee you should not disseminate, distribute or copy this email. Please delete this email from your system and notify the sender immediately by replying to this email.  If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.
>
> The Alternative Investments division of UMB Fund Services provides a full range of services to hedge funds, funds of funds and private equity funds.  Any tax advice in this communication is not intended to be used, and cannot be used, by a client or any other person or entity for the purpose of (a) avoiding penalties that may be imposed on any taxpayer or (b) promoting, marketing, or recommending to another party any matter addressed herein.
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:[hidden email]
> **********************************************************************

John Baughman
Kailua, Hawaii
(808) 262-0328
[hidden email]





**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Preventing Duplicate Data WAS UUID vs Longint primary key

4D Tech mailing list
John:

Thanks for sharing this. I had not thought of this way after all these years. I will take the idea and apply where appropriate, in addition to my normal code to reduce duplicates.


Jody Bevan
ARGUS Productions Inc.
Developer

Argus Productions Inc. <https://www.facebook.com/ArgusProductions/>




> On Aug 7, 2017, at 11:50 AM, John Baughman via 4D_Tech <[hidden email]> wrote:
>
> What I have done is to have a field in the table  that contains the keys that make the record unique with all spaces, special characters, and vowels removed, as well as eliminating any consecutive consonants . See my example below. So far this has worked pretty well for me and I guess would fall under David’s category of "Carefully program your system to detect and prevent duplicate rows”
>
> John
>
> —————— Example ------------
> $firstName:=[Contacts]firstName (John)
> $lastName:=[Contacts]lastName (Baughman)
> $company:=[Contacts]company (BY’te DESIGN Hawwaii)  notice I have mistakenly put more than 1 w in Hawaii.
> $DupeCheck:= $firstName+ $lastName+$company
> [Contacts]DupeCheck:=AlphaOnlyNoVowels ($DupeCheck;"*”)  //AlphaOnlyNoVowels does the heavy lifting. The asterisk tells the method to remove consecutive consonants.
>
> [Contacts]DupeCheck now contains  “JHNBGHMNBYTDSGNHW”
>
> I wrap the above in a duplicate checking method for the [Contacts] table called ContactsDuplicateManager
>
> Whenever a record is updated or created in the Contacts table…
>
> $DupeCheck:= ContactsDuplicateManager ("isDuplcate”;[Contacts]ID)
>
> The ContactsDuplicate method creates the check string as above and searches the contacts table for duplicates using the [Contacts]DupeCheck field. If no duplicates are found it returns the check string. If a duplicate is found it returns the check string with a prepended asterisk.  The contact ID if passed prevents the dupe check from finding the record being updated. If this is a new record 0 is passed for the Contact ID. So…
>
> If ($DupeCheck =“*@“
>   Handle the duplicate in context. If, for example this is a user updating or creating a contact record, warn the user of the possible duplicate with available options.
>
> else
>   [Contacts]DupeCheck:=$DupeCheck
>   SAVE RECORD([Contacts)
>
> end if
>
> -------------------------------------
>

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Preventing Duplicate Data WAS UUID vs Longint primary key

4D Tech mailing list
As a variant on John's technique, I'll combine fields into a single text
block that I then run through a fast hashing algorithm that returns a
longint.

What good is that longint? It helps in two cases:

* If you're comparing two copies of the same record during an update/sync,
etc., then you can hash the new copy and see if the hash differs from the
stored original. If they match, you can figure there's been no update.
[Subject to availability, limitations apply. See 'hashing' for complete
terms and details.]

* When you don't know if the row is a duplicate or not, hash the incoming
data and see if it matches something else. If it  does not match, you've
got a new row. If it does match one or more rows, you *might* have a
duplicate, but at least you only need to gets a small # of records to find
out.

Regarding hashing, I don't use SHA1, MD5, etc. Because I don't need them
and don't want the overhead. Instead I use some hashing functions from an
old (10+ years ago) tech notes. They still work great and are a good match
for exactly this sort of application.
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Preventing Duplicate Data WAS UUID vs Longint primary key

4D Tech mailing list
In reply to this post by 4D Tech mailing list
Jody,
what are your normal duplicate reducing/removal technic(s)?

On Mon, 7 Aug 2017 12:28:21 -0600, Jody Bevan via 4D_Tech wrote:

> John:
>
> Thanks for sharing this. I had not thought of this way after all
> these years. I will take the idea and apply where appropriate, in
> addition to my normal code to reduce duplicates.
>
>
> Jody Bevan
> ARGUS Productions Inc.
> Developer
>
> Argus Productions Inc. <https://www.facebook.com/ArgusProductions/>
>
>
>
>
>> On Aug 7, 2017, at 11:50 AM, John Baughman via 4D_Tech
>> <[hidden email]> wrote:
>>
>> What I have done is to have a field in the table  that contains the
>> keys that make the record unique with all spaces, special
>> characters, and vowels removed, as well as eliminating any
>> consecutive consonants . See my example below. So far this has
>> worked pretty well for me and I guess would fall under David’s
>> category of "Carefully program your system to detect and prevent
>> duplicate rows”
>>
>> John
>>
>> ―――――― Example ------------
>> $firstName:=[Contacts]firstName (John)
>> $lastName:=[Contacts]lastName (Baughman)
>> $company:=[Contacts]company (BY’te DESIGN Hawwaii)  notice I have
>> mistakenly put more than 1 w in Hawaii.
>> $DupeCheck:= $firstName+ $lastName+$company
>> [Contacts]DupeCheck:=AlphaOnlyNoVowels ($DupeCheck;"*”)  
>> //AlphaOnlyNoVowels does the heavy lifting. The asterisk tells the
>> method to remove consecutive consonants.
>>
>> [Contacts]DupeCheck now contains  “JHNBGHMNBYTDSGNHW”
>>
>> I wrap the above in a duplicate checking method for the [Contacts]
>> table called ContactsDuplicateManager
>>
>> Whenever a record is updated or created in the Contacts table…
>>
>> $DupeCheck:= ContactsDuplicateManager ("isDuplcate”;[Contacts]ID)
>>
>> The ContactsDuplicate method creates the check string as above and
>> searches the contacts table for duplicates using the
>> [Contacts]DupeCheck field. If no duplicates are found it returns the
>> check string. If a duplicate is found it returns the check string
>> with a prepended asterisk.  The contact ID if passed prevents the
>> dupe check from finding the record being updated. If this is a new
>> record 0 is passed for the Contact ID. So…
>>
>> If ($DupeCheck =“*@“
>>   Handle the duplicate in context. If, for example this is a user
>> updating or creating a contact record, warn the user of the possible
>> duplicate with available options.
>>
>> else
>>   [Contacts]DupeCheck:=$DupeCheck
>>   SAVE RECORD([Contacts)
>>
>> end if
>>
>> -------------------------------------
>>
>
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:[hidden email]
> **********************************************************************
---------------
Gas is for washing parts
Alcohol is for drinkin'
Nitromethane is for racing
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Preventing Duplicate Data WAS UUID vs Longint primary key

4D Tech mailing list
Chip:

I have always tried doing at the point of data entry. Obviously no code is perfect, and either is the object between the keyboard and the chair. My experience is that few Administrators would take the time to filter through duplicates. Getting them to do system maintenance was almost impossible. They had too many other responsibilities. Typically they only worked on this if a problem was identified. For that we have a built in record merger.

Application: Medical

If a phone number is associated with the record for an individual - then I will use that. How it is implemented depends on the application (even within the application).

Lets say I am adding in a person. They enter in a name (which is not that good as people will call themselves by different names). I remember one patient that would present with a different name (About 6 personalities) in a walk in clinic. If a new record was going to be created they would try and get a phone number from the patient. When the phone number was put in a dialog would come up displaying all the names of people with the same phone number. It let the staff (if they ‘felt like it’) to try and filter for duplicates at that point.

For those staff that were diligent it prevented many duplicates in the medical data base. We had clinics that had 420,000 ‘regular’ patients. Then they had a walk in that had another 500,000 irregular patients. Their Administration was more eager to keep the duplicates down.

We also used the Health Insurance Number as another indicator. Not all patient’s would present with their health care card though. With ‘universal’ health care it is not as critical.

Application: National Order / Deliver Desk

We do the same in Canada for Postal Code and Street name

With a national order delivery system we had the addresses divided up so that we could look up Postal Codes (Canada) by address. This was also used for keeping duplicates down. This is when I learned that in Canada Postal Codes change weekly (nationally - there are postal codes changing somewhere).


Looking in your system for unique or even semi-unique data can be used to help keep duplicates down.
In the medical system we let them have a picture in the patient’s record that was presented (if security settings permitted) at the time of entry to the facility.

Those are the ones that I have used.

Jody



Jody Bevan
ARGUS Productions Inc.
Developer

Argus Productions Inc. <https://www.facebook.com/ArgusProductions/>




> On Aug 8, 2017, at 8:25 AM, Chip Scheide via 4D_Tech <[hidden email]> wrote:
>
> Jody,
> what are your normal duplicate reducing/removal technic(s)?

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Loading...