Re: Do I want to use SELECTION TO ARRAY instead of GOTO SELECTED RECORD server-side in V17

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Do I want to use SELECTION TO ARRAY instead of GOTO SELECTED RECORD server-side in V17

4D Tech mailing list
> Bernd,
>
> Thanks for the report, it's interesting and believable. In this case, I
> have to get the data into memory to send it... I guess I could write to a
> file, load that and then send it. Interesting concept, thanks!

No need to write to a file.  Appending text to a text variable is very slow, but using TEXT TO BLOB($textToAdd;$blob;UTF8 text without length;*)  is very fast.  Give that a try and see the huge speed difference.

Bart
**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: Do I want to use SELECTION TO ARRAY instead of GOTO SELECTED RECORD server-side in V17

4D Tech mailing list

> Le 17 sept. 2018 à 16:15, Bart Davis via 4D_Tech <[hidden email] <mailto:[hidden email]>> a écrit :
>
> No need to write to a file.  Appending text to a text variable is very slow, but using TEXT TO BLOB($textToAdd;$blob;UTF8 text without length;*)  is very fast.  Give that a try and see the huge speed difference.

You can also append in a text array, then implode that array using that blob technique.
<http://forums.4d.com/Post/FR/15873353/2/17457534#17457534 <http://forums.4d.com/Post/FR/15873353/2/17457534#17457534>>
Huge amount of text can be hold this way, very fast.

About the 2Gb limit of text, I noticed the limit in number of chars is about the half.

--
Arnaud de Montard



**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: Do I want to use SELECTION TO ARRAY instead of GOTO SELECTED RECORD server-side in V17

4D Tech mailing list
This has been a very interesting discussion. I decided to test the various techniques using the code I pasted below.

Essentially 6 different ways of building up a large text variable.
1) simplest method, add text directly to the variable
2) use a 2048 buffer, add text to a buffer, once that hits 2048 bytes copy it to the text var, clear and repeat
3) use a 4096 buffer, add text to a buffer, once that hits 4096 bytes copy it to the text var, clear and repeat
4) Use TEXT TO BLOB just appending and then copy to text var at end
5) Use TEXT TO BLUB but with a 2048 buffer and only copy to blob once the buffer hits 2048 bytes
6) Use TEXT TO BLUB but with a 4096 buffer and only copy to blob once the buffer hits 4096 bytes

The code loops through target text var sizes up to 10MB in 128k increments.
Each way is run 5 times and then the result is averaged and appended to an array. Once the 1-3 techniques exceed 600ms I stop running them since they behave so badly.

At the end of the test the results are placed in the clipboard (for pasting into Excel and then graphed) and in an Alert.

Based on this test, technique #5/#6 are the fastest by far.
Techniques 4-6 seem to have linear performance.

For 10 MB text var, the #4 came in at 1613ms and #5 came in at 290.6ms

Very interesting. I am currently using techniques #3 when I am creating exports and reports to disk. Going to switch to the #5.

Downsize of the blob techniques is that you need double the amount of memory if you need to convert the blob back to a text variable like I did in this experiment.

Dani Beaubien
Open Road Development



ARRAY LONGINT($sizes;0)
ARRAY REAL($r1;0)
ARRAY REAL($r2;0)
ARRAY REAL($r3;0)
ARRAY REAL($r4;0)
ARRAY REAL($r5;0)
ARRAY REAL($r6;0)

C_BLOB($blob)
C_TEXT($textToAdd;$result)
$textToAdd:="123456789 "
C_BOOLEAN($skip1;$skip3;$skip4)

C_LONGINT($targetVarSize;$maxSize;$outerLoopMax)
$targetVarSize:=0
$maxSize:=1024*1024*10  // 10 MB
$outerLoopMax:=5

C_LONGINT($vs1;$ve1)
C_LONGINT($vs2;$ve2)
C_LONGINT($vs3;$ve3)
C_LONGINT($vs4;$ve4)
C_LONGINT($vs5;$ve5)
C_LONGINT($vs6;$ve6)
C_TEXT($tmpTxt)
C_LONGINT($innerLoopMax)
Repeat
        $targetVarSize:=$targetVarSize+(1024*128)
        $innerLoopMax:=Int($targetVarSize/Length($textToAdd))
        MESSAGE(" $targetVarSize = "+String($targetVarSize/1024)+" Kb")

        If (Not($skip1))
                $vs1:=Milliseconds
                For ($i;1;$outerLoopMax)
                        $result:=""
                        For ($j;1;$innerLoopMax)
                                $result:=$result+$textToAdd
                        End for
                End for
                $ve1:=Milliseconds
                If ((($ve1-$vs1)/$outerLoopMax)>600)
                        $skip1:=True
                End if
        End if


        If (Not($skip3))
                $vs3:=Milliseconds
                For ($i;1;$outerLoopMax)
                        $result:=""
                        $tmpTxt:=""
                        For ($j;1;$innerLoopMax)
                                $tmpTxt:=$tmpTxt+$textToAdd
                                If (Length($tmpTxt)>2048)
                                        $result:=$result+$tmpTxt
                                        $tmpTxt:=""
                                End if
                        End for
                        $result:=$result+$tmpTxt
                End for
                $ve3:=Milliseconds
                If ((($ve3-$vs3)/$outerLoopMax)>600)
                        $skip3:=True
                End if
        End if


        If (Not($skip4))
                $vs4:=Milliseconds
                For ($i;1;$outerLoopMax)
                        $result:=""
                        $tmpTxt:=""
                        For ($j;1;$innerLoopMax)
                                $tmpTxt:=$tmpTxt+$textToAdd
                                If (Length($tmpTxt)>4096)
                                        $result:=$result+$tmpTxt
                                        $tmpTxt:=""
                                End if
                        End for
                        $result:=$result+$tmpTxt
                End for
                $ve4:=Milliseconds
                If ((($ve4-$vs4)/$outerLoopMax)>600)
                        $skip4:=True
                End if
        End if


        $vs2:=Milliseconds
        For ($i;1;$outerLoopMax)
                $result:=""
                SET BLOB SIZE($blob;0)
                For ($j;1;$innerLoopMax)
                        TEXT TO BLOB($textToAdd;$blob;UTF8 text without length;*)
                End for
                $result:=BLOB to text($blob;UTF8 text without length)
        End for
        $ve2:=Milliseconds


        $vs5:=Milliseconds
        For ($i;1;$outerLoopMax)
                $result:=""
                $tmpTxt:=""
                SET BLOB SIZE($blob;0)
                For ($j;1;$innerLoopMax)
                        $tmpTxt:=$tmpTxt+$textToAdd
                        If (Length($tmpTxt)>2048)
                                TEXT TO BLOB($tmpTxt;$blob;UTF8 text without length;*)
                                $tmpTxt:=""
                        End if
                End for
                TEXT TO BLOB($tmpTxt;$blob;UTF8 text without length;*)
                $result:=BLOB to text($blob;UTF8 text without length)
        End for
        $ve5:=Milliseconds


        $vs6:=Milliseconds
        For ($i;1;$outerLoopMax)
                $result:=""
                $tmpTxt:=""
                SET BLOB SIZE($blob;0)
                For ($j;1;$innerLoopMax)
                        $tmpTxt:=$tmpTxt+$textToAdd
                        If (Length($tmpTxt)>4096)
                                TEXT TO BLOB($tmpTxt;$blob;UTF8 text without length;*)
                                $tmpTxt:=""
                        End if
                End for
                TEXT TO BLOB($tmpTxt;$blob;UTF8 text without length;*)
                $result:=BLOB to text($blob;UTF8 text without length)
        End for
        $ve6:=Milliseconds

        APPEND TO ARRAY($sizes;$targetVarSize)
        APPEND TO ARRAY($r1;Round(($ve1-$vs1)/$outerLoopMax;2))
        APPEND TO ARRAY($r2;Round(($ve2-$vs2)/$outerLoopMax;2))
        APPEND TO ARRAY($r3;Round(($ve3-$vs3)/$outerLoopMax;2))
        APPEND TO ARRAY($r4;Round(($ve4-$vs4)/$outerLoopMax;2))
        APPEND TO ARRAY($r5;Round(($ve5-$vs5)/$outerLoopMax;2))
        APPEND TO ARRAY($r6;Round(($ve6-$vs6)/$outerLoopMax;2))

Until ($targetVarSize>($maxSize))

C_TEXT($msg)
$msg:="size"
$msg:=$msg+",text"
$msg:=$msg+",text 2048"
$msg:=$msg+",text 4096"
$msg:=$msg+",blob"
$msg:=$msg+",blob 2048"
$msg:=$msg+",blob 4096"
$msg:=$msg+"\r"
For ($i;1;Size of array($sizes))
        $msg:=$msg+String(Int($sizes{$i}/1024))+" Kb"
        $msg:=$msg+","+String($r1{$i})  // text simple
        $msg:=$msg+","+String($r3{$i})  // text 2048 buffer
        $msg:=$msg+","+String($r4{$i})  // text 4096 buffer
        $msg:=$msg+","+String($r2{$i})  // blob simple
        $msg:=$msg+","+String($r5{$i})  // blob 2048 buffer
        $msg:=$msg+","+String($r6{$i})  // blob 4096 buffer
        $msg:=$msg+"\r"
End for
SET TEXT TO PASTEBOARD($msg)



> On Sep 17, 2018, at 4:42 PM, Arnaud de Montard via 4D_Tech <[hidden email]> wrote:
>
>
>> Le 17 sept. 2018 à 16:15, Bart Davis via 4D_Tech <[hidden email] <mailto:[hidden email]>> a écrit :
>>
>> No need to write to a file.  Appending text to a text variable is very slow, but using TEXT TO BLOB($textToAdd;$blob;UTF8 text without length;*)  is very fast.  Give that a try and see the huge speed difference.
>
> You can also append in a text array, then implode that array using that blob technique.
> <http://forums.4d.com/Post/FR/15873353/2/17457534#17457534 <http://forums.4d.com/Post/FR/15873353/2/17457534#17457534>>
> Huge amount of text can be hold this way, very fast.
>
> About the 2Gb limit of text, I noticed the limit in number of chars is about the half.
>
> --
> Arnaud de Montard
>
>
>
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> Archive:  http://lists.4d.com/archives.html
> Options: https://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:[hidden email]
> **********************************************************************

**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: Do I want to use SELECTION TO ARRAY instead of GOTO SELECTED RECORD server-side in V17

4D Tech mailing list
Dani,

Thank you, for a very large value of "thank you." I woke up this morning
all excited to write some tests, and you had already done the work. It's
really great that you took the time to do the tests, to share your
conclusions and your test method. Thank you!

In my case, I'm not so kind or generous ;-) I've got a specific method that
I'm using for testing. I pass in a method and it does the work. This is a
fixed data set, so the results are always identical. The only thing I'm
changing is how I'm building up the block of text. I posted about the four
method I tired and they were all the same, about 6 minutes. I went back,
added a new test for *SET TEXT TO BLOB* in the loop and then one call to *BLOB
to text* at the end.

The results? What otherwise take about 6 minutes now takes just over 2
seconds. I had to run it a few times to convince myself I wasn't imagining
things. I wasn't. I thought that I was past the point where I'd ever see a
100x (well, x160 in this case) improvement from a small code change.

So thank you all for this information, it's a game-changer for me!

Onto the other point, memory. A Dani commented, if you need text then you
have to briefly double the RAM consumed. In my case, I need text. (The text
is passed to a plugin that doesn't accept BLOBs in this situation.) What do
people thing about RAM these days? I'm still kind of conservative. Tim
Nevels (off-line) said to me the other day, "David, why are you worrying
about memory? Everyone has tons of memory." I'm already not worrying about
performance in advance, and I *don't* feel like spending more time worrying
about security, so can't I worry about memory a bit? Where do you guys
stand on this on modern gear? When was the last time you bumped into
harmfull memory constraints?

Thanks
**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: Do I want to use SELECTION TO ARRAY instead of GOTO SELECTED RECORD server-side in V17

4D Tech mailing list
In reply to this post by 4D Tech mailing list
Dani,

Thanks for taking the time to test this and post your results. I looked at some document generation routines where I was caching some text before writing it out with SEND PACKET (to avoid lots of calls to SEND PACKET with small amounts of text). I changed to a blob cache and it seemed to be a little slower. I also tried pre-sizing the blob cache to avoid dynamic resizing with TEXT TO BLOB. Still a bit slower than just using text variable and writing it out when it is over 4K in size.

I compared your method 6 with just writing to a file, including opening and closing the file. The time was almost the same with writing to a file being slightly faster.

I'm sure there are cases where you can't use files, but if that is not a constraint it seems you can avoid the memory question and get good performance. I suspect SSD drives disk cache on modern systems are a big help these days.
 

John DeSoi, Ph.D.



$targetVarSize:=$maxSize
$innerLoopMax:=Int($targetVarSize/Length($textToAdd))

$vs6:=Milliseconds
For ($i;1;$outerLoopMax)
        $result:=""
        $tmpTxt:=""
        SET BLOB SIZE($blob;0)
        For ($j;1;$innerLoopMax)
                $tmpTxt:=$tmpTxt+$textToAdd
                If (Length($tmpTxt)>4096)
                        TEXT TO BLOB($tmpTxt;$blob;UTF8 text without length;*)
                        $tmpTxt:=""
                End if
        End for
        TEXT TO BLOB($tmpTxt;$blob;UTF8 text without length;*)
        $result:=BLOB to text($blob;UTF8 text without length)
End for
$ve6:=Milliseconds-$vs6

C_LONGINT($ms)
C_TIME($doc)

$ms:=Milliseconds
$doc:=Create document("test1.txt")
For ($i;1;$outerLoopMax)
        $result:=""
        $tmpTxt:=""
        SET BLOB SIZE($blob;0)
        For ($j;1;$innerLoopMax)
                $tmpTxt:=$tmpTxt+$textToAdd
                If (Length($tmpTxt)>4096)
                          //TEXT TO BLOB($tmpTxt;$blob;UTF8 text without length;*)
                        SEND PACKET($doc;$tmpTxt)
                        $tmpTxt:=""
                End if
        End for
          //TEXT TO BLOB($tmpTxt;$blob;UTF8 text without length;*)
          //$result:=BLOB to text($blob;UTF8 text without length)
End for
CLOSE DOCUMENT($doc)
$ms:=Milliseconds-$ms

ALERT(String($ve6)+" - "+String($ms))



> On Sep 18, 2018, at 11:50 AM, Dani Beaubien via 4D_Tech <[hidden email]> wrote:
>
> Based on this test, technique #5/#6 are the fastest by far.
> Techniques 4-6 seem to have linear performance.
>
> For 10 MB text var, the #4 came in at 1613ms and #5 came in at 290.6ms
>
> Very interesting. I am currently using techniques #3 when I am creating exports and reports to disk. Going to switch to the #5.
>
> Downsize of the blob techniques is that you need double the amount of memory if you need to convert the blob back to a text variable like I did in this experiment.
>

**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: Do I want to use SELECTION TO ARRAY instead of GOTO SELECTED RECORD server-side in V17

4D Tech mailing list
Thanks to John for adding writing to disk to Dani's test set. You make some
good points about documents in the modern world. You can probably append to
a document all day long and never see any change in speed. Years ago, the
legend was that you didn't have to buffer before calling *SEND PACKET* because
4D was doing it already. Meaning also that you had to close/open the
document it you were writing a log to track down crashes. No idea if this
is true now.

And, yeah, then you can *DOCUMENT TO TEXT* in one go, it that's what you're
after. But, and I don't know if this is just me or not, but I hate writing
scratch files. Just...don't like it. I can't back that up with any good
reasons, I just have a bit of an allergy to the file system. Permissions
issues, etc. Maybe I should rethink that. Another nice feature of files is
that you can use them in so many ways...they can then be read by another
program for dispatch, etc. So, hmm.
**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: Do I want to use SELECTION TO ARRAY instead of GOTO SELECTED RECORD server-side in V17

4D Tech mailing list
In reply to this post by 4D Tech mailing list

> Le 18 sept. 2018 à 18:50, Dani Beaubien <[hidden email]> a écrit :
>
> This has been a very interesting discussion. I decided to test the various techniques using the code I pasted below.
> [...]

Thanks for testing  :-)

Some thoughts…
- I had surprises with tests using a single string of the same length ($textToAdd:="123456789") vs "true" samples (various length)
- the more you concatenate, the slower it is
- the longer the string you add too, the slower it is
 <https://screencast.com/t/W29IFFNs3>
- I use the "growing blob" in 32bits since a while now, no blob size problem
- I don't remember of a situation were strings to concatenate were not in a text array before, this is some code to test filling such an array:
 $textToAdd:="123456789"*100  //changing length here does not seem to change the speed; try random too…
 $end_l:=Tickcount+(60*10)
 $i_l:=0
 $ms_l:=Milliseconds
 ARRAY TEXT($concat_at;0)
 ARRAY LONGINT($ms_al;0)
 Repeat
  $i_l:=$i_l+1
  APPEND TO ARRAY($concat_at;$textToAdd)
  If ($i_l%2000=0)
  APPEND TO ARRAY($ms_al;Milliseconds-$ms_l)
  $ms_l:=Milliseconds
  End if
 Until (Tickcount>$end_l)
 $average_r:=Average($ms_al)
 $max_r:=Max($ms_al)
 $min_r:=Min($ms_al)
 TRACE
- SSD disks are great if memory can be a problem
 loop
   add to text array
   if(text array size is more than xxx)
     concatenate text array into text  //using blob of course  ;-)
     append text to file
     set text array to zero
   end if
 end loop
- see also >http://forums.4d.com/Post/FR/19466463/1/19466704#19466704>, I didn't test

--
Arnaud de Montard



**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: Do I want to use SELECTION TO ARRAY instead of GOTO SELECTED RECORD server-side in V17

4D Tech mailing list
Just for the record, my 160x speed improvement came with pretty naive code.
I loop to a record, build the text I need for that row, append the block to
the BLOB, clear the text and hit the next row. All-in-all, only a couple of
lines of extra code.

After what everyone has contributed and suggested, along with my own local
testing, my summary goes like this:

*Build a text var*
Yeah, don't do that. In my case, the final block size is ~3MB, not
huge...but the resizing is pretty aggressive.

*Build a BLOB var*
Works great. The downside is that if you ultimately need text, you (even if
briefly) need to double the amount of RAM you're consuming. I'm defaulting
to a max BLOB of 5MB and change, so this should never be a problem.

*Write to a text file*
This should never slow down, no matter how big the file gets. This has
always been true, the limiting factor being the speed of the drive. Drives
are fast now. Memory is also fine because you take no RAM to store the
document and can load it into RAM in one go very easily.

I'm going with a BLOB instead of documents. Why? Old prejudice, and I like
avoiding the file system. Permissions, file contention issues in 4D (been
there not long ago...and after *more than a year*, there's reportedly a
fix), data leakage. Plus, old prejudice. In my specific case, a BLOB should
be fine as I'm not pushing the envelope on memory usage. Under other
circumstances, I can easily see documents as being the best bet.
Particularly if you have other reasons to want the data in documents for
dispatching, archiving, etc.

Thanks for the help!
**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************