Fastest way to get text into a BLOB?

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Fastest way to get text into a BLOB?

4D Tech mailing list
I've got a bunch of text in records that I need to bundle up into a BLOB to
hand to a plug-in. Has anyone every sorted out the fastest approach to this
sort of thing? Line-by-line directly adding to the BLOB? Building up row of
text and converting it all at once? Building up a big text block and
converting it all at once?

I hope that the question is clear!

P.S. I haven't tested any of this out, so, yeah, lazy. But if anyone has
delved into this and already has an answer, I'd be grateful.
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fastest way to get text into a BLOB?

4D Tech mailing list
I have found the C_TEXT + ARRAY BLOB combo to be quite useful.

textual data are represented in the text which is a JSON catalog.

arbitrary data are referenced by a number in the JSON,
which is the element number for the BLOB array.

> 2017/08/01 10:44、David Adams via 4D_Tech <[hidden email]> のメール:
>
> I've got a bunch of text in records that I need to bundle up into a BLOB to
> hand to a plug-in. Has anyone every sorted out the fastest approach to this
> sort of thing? Line-by-line directly adding to the BLOB? Building up row of
> text and converting it all at once? Building up a big text block and
> converting it all at once?



**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fastest way to get text into a BLOB?

4D Tech mailing list
Thanks for the answer! I was not clear in my question :(  I'm calling a
command in the PostgreSQL plug-in like this:

       PgSQL Send Copy Data( connection id ; data in a blob ; rowCount ) →
result

The second argument contains the data you're sending, and it has to be
passed into the plug-in as a BLOB. Presumably, this is because the plug-in
supports very old versions of 4D that had a 32,000 character limit on text
values. My source data consists of strings and longints, but I need to get
it into a BLOB.

This plug-in command is making it easy to call the Postgres COPY IN
command. That's Postgres' native bulk loader. I'm likely to be transferring
millions of rows at a time, so high-speed bulk loading is of real interest.
These days it's rare that I find the need to worry about optimizing
operations but, in this case, it is necessary.

So, with that background, I'm asking about the fastest way to load up a
BLOB...if that's even a meaningful question and if anyone has advice.

Thanks!
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fastest way to get text into a BLOB?

4D Tech mailing list
I see only 2 alternatives there, either append text to a blob directly, or append to a text variable and then move it to a blob.

I ran a quick test using both alternatives, and a series of TEXT TO BLOB is more than 10x faster then appending to a text variable and then moving it to a blob.

Code is attached. Testing with smaller repeat counts yields basically no difference. With a large repeat count then we are talking about 20ms (blob) against 287ms (text). Running compiled.

hth
julio

> On Aug 1, 2017, at 7:22 AM, David Adams via 4D_Tech <[hidden email]> wrote:
>
> Thanks for the answer! I was not clear in my question :(  I'm calling a
> command in the PostgreSQL plug-in like this:
>
>       PgSQL Send Copy Data( connection id ; data in a blob ; rowCount ) →
> result
>
> The second argument contains the data you're sending, and it has to be
> passed into the plug-in as a BLOB. Presumably, this is because the plug-in
> supports very old versions of 4D that had a 32,000 character limit on text
> values. My source data consists of strings and longints, but I need to get
> it into a BLOB.
>
> This plug-in command is making it easy to call the Postgres COPY IN
> command. That's Postgres' native bulk loader. I'm likely to be transferring
> millions of rows at a time, so high-speed bulk loading is of real interest.
> These days it's rare that I find the need to worry about optimizing
> operations but, in this case, it is necessary.
>
> So, with that background, I'm asking about the fastest way to load up a
> BLOB...if that's even a meaningful question and if anyone has advice.
>
> Thanks!

--
Julio Carneiro
[hidden email]


===============================
C_LONGINT($start;$time1;$time2;$i)
C_BLOB($blob1;$blob2)
C_TEXT($text)

SET BLOB SIZE($blob;0)
$start:=Milliseconds
For ($i;1;10000)
        TEXT TO BLOB("abcdefghijklmnopqrstuvwxyz01234567890";$blob1;UTF8 text without length;*)
End for
$time1:=Milliseconds-$start

SET BLOB SIZE($blob2;0)
$text:=""
$start:=Milliseconds
For ($i;1;10000)
        $text:=$text+"abcdefghijklmnopqrstuvwxyz01234567890"
End for
TEXT TO BLOB($text;$blob;UTF8 text without length)
$time2:=Milliseconds-$start

ALERT("Add to Blob:"+String($time1)+"\nAdd to Text:"+String($time2))


**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fastest way to get text into a BLOB?

4D Tech mailing list
In reply to this post by 4D Tech mailing list
Hi David,

Try this method :

  //================================================================================
  //@4ddoc-start : en
  //@name : BLB_appendText
  //@scope : public
  //@deprecated : no
  //@description : This function will append some text to a blob
  //@parameter[1-INOUT-blobPtr-POINTER] :  blob pointer (modified)
  //@parameter[2-IN-text-TEXT] : text to append
  //@parameter[3-IN-encoding-TEXT] : encoding (optionnal, default value : "utf-8")
  //@notes : in an intensive loop building some text, 4D is not very efficient. This is since 4D v11 probably related to unicode
  // in 4D v12.5 on Mac OS 10.6 compiled :
  // text => nb loop : 46858, final length : 5 533 885 chars, duration : 328 seconds, throughtput  16 871 char/sec
  // blob => nb loop : 46858, final size   : 5 533 885 bytes, duration :  10 seconds, throughtput 553 388 char/sec
  // this is a 32 x speed factor
  //
  //  // slow :
  //C_TEXT($vt_buffer)
  //$vt_buffer:=""
  //For ($i;1;$vl_lines)
  //$vt_buffer:=$vt_buffer+$tt_line{$i}
  //End for
  //CONVERT FROM TEXT($vt_buffer;"utf-8";$vx_buffer)
  //
  //  // 32 x faster :
  //C_BLOB($vx_buffer)
  //SET BLOB SIZE($vx_buffer;0)
  //For ($i;1;$vl_lines)
  //BLB_appendText (->$vx_buffer;$tt_line{$i};"utf-8")
  //End for
  //
  //@example : BLB_appendText
  //@see :
  //@version : 1.00.00
  //@author : Bruno LEGAY (BLE) - Copyrights A&C Consulting - 2008
  //@history : CREATION : Bruno LEGAY (BLE) - 15/10/2013, 12:17:37 - v1.00.00
  //@4ddoc-end
  //================================================================================

C_POINTER($1;$vp_blobPtr)  //blobPtr
C_TEXT($2;$vt_buffer)  //buffer
C_TEXT($3;$vt_encoding)  //encoding

C_LONGINT($vl_nbParam)
$vl_nbParam:=Count parameters
If ($vl_nbParam>1)
$vp_blobPtr:=$1
$vt_buffer:=$2

Case of
: ($vl_nbParam=2)
$vt_encoding:="utf-8"

Else
$vt_encoding:=$3
End case

If (Type($vp_blobPtr->)=Is BLOB)

If (Length($vt_buffer)>0)

C_BLOB($vx_buffer)
SET BLOB SIZE($vx_buffer;0)
CONVERT FROM TEXT($vt_buffer;$vt_encoding;$vx_buffer)

C_LONGINT($vl_nbBytesToCopy)
$vl_nbBytesToCopy:=BLOB size($vx_buffer)
If ($vl_nbBytesToCopy>0)

C_LONGINT($vl_srcOffset)
$vl_srcOffset:=0

C_LONGINT($vl_dstOffset)
$vl_dstOffset:=BLOB size($vp_blobPtr->)

COPY BLOB($vx_buffer;$vp_blobPtr->;$vl_srcOffset;$vl_dstOffset;$vl_nbBytesToCopy)

End if   //copy buffer not empty

SET BLOB SIZE($vx_buffer;0)

End if   //text is not empty

End if   //type not blob

End if





HTH
Bruno


Cordialement,

Bruno LEGAY - A&C Consulting
Tél. : +33 (0)1 41 74 13 79
http://www.ac-consulting.fr 
     
 Des compétences réunies pour des applications réussies
   
       
      Merci de penser à l'environnement avant d'imprimer




**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fastest way to get text into a BLOB?

4D Tech mailing list
In reply to this post by 4D Tech mailing list
I use this model too with a C_OBJECT and ARRAY BLOB. Element 0 is the JSON manifest; any types not easily represented in JSON are other elements in the blob array. This makes it easy to pack all 4D data types into a single unit. I use it to may remote procedure calls from one 4D version to another over HTTP.

It would be great if the plugin API supported objects. Do you know if that is ever going to happen?

John DeSoi, Ph.D.


> On Aug 1, 2017, at 1:06 AM, Keisuke Miyako via 4D_Tech <[hidden email]> wrote:
>
> I have found the C_TEXT + ARRAY BLOB combo to be quite useful.
>
> textual data are represented in the text which is a JSON catalog.
>
> arbitrary data are referenced by a number in the JSON,
> which is the element number for the BLOB array.

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fastest way to get text into a BLOB?

4D Tech mailing list
care to expand on this?

I do not understand this part:
> arbitrary data are referenced by a number in the JSON,
> which is the element number for the BLOB array.

Thanks

On Tue, 1 Aug 2017 07:54:15 -0500, John DeSoi via 4D_Tech wrote:

> I use this model too with a C_OBJECT and ARRAY BLOB. Element 0 is the
> JSON manifest; any types not easily represented in JSON are other
> elements in the blob array. This makes it easy to pack all 4D data
> types into a single unit. I use it to may remote procedure calls from
> one 4D version to another over HTTP.
>
> It would be great if the plugin API supported objects. Do you know if
> that is ever going to happen?
>
> John DeSoi, Ph.D.
>
>
>> On Aug 1, 2017, at 1:06 AM, Keisuke Miyako via 4D_Tech
>> <[hidden email]> wrote:
>>
>> I have found the C_TEXT + ARRAY BLOB combo to be quite useful.
>>
>> textual data are represented in the text which is a JSON catalog.
>>
>> arbitrary data are referenced by a number in the JSON,
>> which is the element number for the BLOB array.
>
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:[hidden email]
> **********************************************************************
---------------
Gas is for washing parts
Alcohol is for drinkin'
Nitromethane is for racing
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fastest way to get text into a BLOB?

4D Tech mailing list
Thanks for the help everyone, you're very kind. Bruno and Julio, thanks for
posting detailed code - I'll steal shamelessly. Julio, thanks for going to
the trouble of testing this out...the speed difference you found is pretty
substantial.
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fastest way to get text into a BLOB?

4D Tech mailing list
In reply to this post by 4D Tech mailing list
For example, if you want to represent any type of 4D variable in an object, each property ($oValue below) is an object with the (1) the type, and (2) the value OR the index of the value in the blob array. So if I pass a pointer to store a text property, it might be simply

OB SET($oValue;"type";Is Text";"value";$pValue->)

If the value is a 2D array or other non-simple type, I append it to the blob array (using VARIABLE TO BLOB) and then the value is something like

OB SET($oValue;"type";2D Array";"index";$lastIndex)

This provides a way to serialize and unserialize every 4D type using an object manifest. The blob array and object manifest can be reduced to a single (blob) value by putting the object in element 0 of the blob array and then using VARIABLE TO BLOB on the blob array. This makes it easy to store or send over the network as a message.

John DeSoi, Ph.D.



> On Aug 1, 2017, at 9:12 AM, Chip Scheide <[hidden email]> wrote:
>
> care to expand on this?
>
> I do not understand this part:
>> arbitrary data are referenced by a number in the JSON,
>> which is the element number for the BLOB array.

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fastest way to get text into a BLOB?

4D Tech mailing list
Lots of good answers, thanks for another one John.

I'm starting with standard rows with UUIDs, strings, longs, reals, and
perhaps text. I need to go for maximum speed...most of the work on tuning
is on the Postgres side. Their high-speed entry command is called COPY IN
and Rob's plug-in supports it with a special command that takes the data as
a BLOB. In this case, I don't have any big types, binary formats, or JSON
to move. Since you bring up wire formats, I'm just thinking - 4D natively
stores all text as UTF16 and my range of values all fit comfortably in
Latin1. Character encodings is one of those things that I just have never
take the time to get my head around. Does UTF16 take more space to store
Latin1 data? It seems like it needn't, and yet when I put text into BBEdit,
it takes 2x space as UTF16 than as UTF8.

If I can stuff values into a BLOB and not see them bloat, that will be
great. For example, UUIDs are longs of some kind internally but become much
longer strings when converted. Anyway, I am hoping for the most compact
wire format possible.

Note: Rob's tools also support the Postgres COPY OUT, for those following
along at home.

Note: COPY is the go-to bulk load optimization in Postgres, but multi-value
inserts are also supposed to be a big help. I'm working through it now and
can post some summary findings, if people are interested.
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fastest way to get text into a BLOB?

4D Tech mailing list
If you want the best performance and don't have a requirement to use the plugin, I'd bet the fastest way to bulk load with Postgres is going to be using files. If you are on the same machine, you can COPY directly from a file if you have superuser access. If not, you can still launch psql and use the \copy command.

Assuming you are on v16 and the data could be segmented accordingly, you could even launch multiple pre-emptive workers in parallel to export and use multiple connections to import. Each postgres connection will use a separate CPU core, if available.

John DeSoi, Ph.D.

> On Aug 1, 2017, at 2:42 PM, David Adams via 4D_Tech <[hidden email]> wrote:
>
> I'm starting with standard rows with UUIDs, strings, longs, reals, and
> perhaps text. I need to go for maximum speed...most of the work on tuning
> is on the Postgres side.

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fastest way to get text into a BLOB?

4D Tech mailing list
In reply to this post by 4D Tech mailing list
On Aug 1, 2017, at 3:42 PM, David Adams via 4D_Tech <[hidden email]> wrote:

>
> I'm starting with standard rows with UUIDs, strings, longs, reals, and
> perhaps text. I need to go for maximum speed...most of the work on tuning
> is on the Postgres side. Their high-speed entry command is called COPY IN
> and Rob's plug-in supports it with a special command that takes the data as
> a BLOB. In this case, I don't have any big types, binary formats, or JSON
> to move. Since you bring up wire formats, I'm just thinking - 4D natively
> stores all text as UTF16 and my range of values all fit comfortably in
> Latin1. Character encodings is one of those things that I just have never
> take the time to get my head around. Does UTF16 take more space to store
> Latin1 data? It seems like it needn't, and yet when I put text into BBEdit,
> it takes 2x space as UTF16 than as UTF8.

Is the server on the same LAN as the client? If not, you may want to consider chunking your COPY data in case of network failure. Also, in this case, wire speed is likely to be the bottleneck. It may also be faster to build a UTF8 file and use LEP/psql to copy the data.

If this is a one-shot data transfer, as opposed to a scenario where a 4D app generates the data that will be regularly transferred to the Postgresql database, then the simplest/quickest way to generate data files that can be copied to the Postgresql server is probably the right answer.

I haven’t seen if anyone has tested yet whether expanding a blob becomes significantly slower as the blob expands into the gigabytes range, vs pre-sizing the blob.

> Note: COPY is the go-to bulk load optimization in Postgres, but multi-value
> inserts are also supposed to be a big help. I'm working through it now and
> can post some summary findings, if people are interested.

COPY is supposed to be many times faster than even multi-row inserts, this StackOverflow answer has a short explanation:

https://stackoverflow.com/a/32045034/980575

Jim Crate

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fastest way to get text into a BLOB?

4D Tech mailing list
In reply to this post by 4D Tech mailing list
John,

Great idea - I was planning on trying out files after going with the
plug-in. I hadn't thought of your idea of parallelizing the task via files,
that's genius. Unfortunately, I think that these will be cooperative
threads as both plug-ins and LEP aren't accessible from a pre-emptive
thread. But if feels like something should be possible. What about setting
up several files and then just making command-lien calls to \copy? At that
point, the client work is handled by the PG CLI. Hope that made sense....
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fastest way to get text into a BLOB?

4D Tech mailing list
In reply to this post by 4D Tech mailing list
Jim,

Thanks for the suggestions, they're excellent.

Partly, I'm using this current task as an opportunity to dig into the
subject of bulk imports in Postgres. So far, I've figured out:

* Prepared statements don't deliver any real speed gain.

* Yes, bundling operations into transactions with explicit BEGIN; and
COMMIT; statements helps a lot. Shipping versions of Postgres use a
transaction invisibly for each statement, which makes lots of individual
inserts a bit expensive.

* DROP index before import and CREATE index after import is a real
time-saver. It's not hard to automate the process. I have decided to leave
unique indexes in place as, well, they're important and otherwise you have
to drop table constraints. And as the docs say, "think twice."

I had to spend a bit of time to work out the full UPSERT (INSERT ON
CONFLICT) syntax in Postgres. At first my reactin was, eh? I was used to
MySQL where it's pretty easy. In Postgres? A lot more complicated-looking.
Once I understood the reasoning, I was pretty happy with what Postgres
offers in this respect. You can, for example, do something like this:

-- Add this row if it doesn't exist
-- If it does exist, update it with the new data
-- unless the new data is the same as the old data

It's pretty easy to do, particularly with some advanced planning. (I'm
using a checksum on the values to distinguish between changed and unchanged
rows.)

Next up was to retest multi-value inserts and COPY...and then look at
files, but you guys are getting me excited about files now.

There's likely to be a lot of importing on a regular basis, so it seems
fruitful to check out the details now. Ultimately, Postgres is just another
piece of infrastructure/plumbing that I just want to get out of the way.
But given the potential scale of the data, ignorant design choices could be
pretty painful to redo.

For the record
4D is for data grinding and prep - raw transactional/relational data.
Postgres is serving as a data warehouse with summary data and extracts
Analytics will be done by something else

So, it's a data pipeline with 4D at the first step (importing, cleaning and
collating source data) and Postgres at the next step (storing and exposing
reporting/analysis data.)

Loving the great answers, much appreciated!
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fastest way to get text into a BLOB?

4D Tech mailing list
In reply to this post by 4D Tech mailing list
>
> COPY is supposed to be many times faster than even multi-row inserts, this
> StackOverflow answer has a short explanation:
>
> https://stackoverflow.com/a/32045034/980575


For the sake of the archives, I thought I'd post a couple of more links on
this subject to augment Jim's link:

Depesz seems to be much respected in the Postgres world and his advice is
still considered sound:
https://www.depesz.com/2007/07/05/how-to-insert-data-to-database-as-fast-as-possible/

The official docs have a section on this subject too:
https://www.postgresql.org/docs/9.6/static/populate.html

Note that Postgres has been around for ages (since the 1980s) and so it's
easy to find threads with less than up-to-the-minute advice. I'm using the
current shipping version, 9.6.3. Version 9.5 added some substantial new
features (like INSERT ON CONFLICT) and various optimizations. So, if you're
reading this from the future, check out materials that take into account
whatever version you're using there.
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fastest way to get text into a BLOB?

4D Tech mailing list
In reply to this post by 4D Tech mailing list
On Aug 1, 2017, at 4:55 PM, David Adams via 4D_Tech <[hidden email]> wrote:
>
> Great idea - I was planning on trying out files after going with the
> plug-in. I hadn't thought of your idea of parallelizing the task via files,
> that's genius. Unfortunately, I think that these will be cooperative
> threads as both plug-ins and LEP aren't accessible from a pre-emptive
> thread. But if feels like something should be possible. What about setting
> up several files and then just making command-lien calls to \copy? At that
> point, the client work is handled by the PG CLI. Hope that made sense….

If you’re using pre-emptive workers in 4D to generate the data into files, you could also generate a shell script that would run psql like this:

my_command > output.log 2>&1 &

and then LEP that shell script. That will run the command detached to a background process, so LEP should return instantly. All output from my_command will go to output.log. I haven’t tested this; background processes will die when the parent process dies, so if 4D does something funny with LEP it might not work.

You would use psql \copy from the command line like this:

psql -c "\copy tbname from '/tmp/the_file.csv' delimiter '|' csv"

See https://stackoverflow.com/a/42726226/980575 for more info.


Jim Crate

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fastest way to get text into a BLOB?

4D Tech mailing list
Jim,

Awesome, I'll check out that SO thread. Back to your idea, you can't use
LEP from within a pre-emptive process, it just isn't supported. But given a
large enough task (one worth parallelizing), it's not hard to imagine the
cost of dividing work and marshaling results could pay for itself:

Cooperative    Kick off the job
                        Figure out how to divide the task (record numbers
or what have you)
                       Launch a bunch of pre-emptive workers with
instructions on what to do

Pre-emptive   Each worker loads the right records
                      --- exports them to a unique file
                      --- signals that they're done

Cooperative   Use LEP to launch a CLI call to PG to import each of the files

The bulk of the 4D work, loading and processing the records, would then be
done on multiple cores. That's an awesome idea. For the signaling, you
could use CALL FORM or CALL WORKER, or write to a status table. The master
coordinator could receive notification/scan for outcomes and start pushing
data to Postgres as it's ready.

That definitely sounds workable. And Postgres COPY IN supports TSV, so that
makes me happy....
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fastest way to get text into a BLOB?

4D Tech mailing list
In reply to this post by 4D Tech mailing list
Awesome, thanks!

On Tue, Aug 1, 2017 at 17:04 John DeSoi <[hidden email]> wrote:

> David,
>
> You just need LEP at the end after the file is built, so you can just
> launch that in another process that returns immediately. On the Mac I use
> something like the line below to create an executable file and launch it
> with Terminal to monitor the import.
>
> TERMINAL ("psql < "+Convert path system to POSIX($path)+LF)
>
> John DeSoi, Ph.D.
>
>
> > On Aug 1, 2017, at 3:55 PM, David Adams via 4D_Tech <
> [hidden email]> wrote:
> >
> > Great idea - I was planning on trying out files after going with the
> > plug-in. I hadn't thought of your idea of parallelizing the task via
> files,
> > that's genius. Unfortunately, I think that these will be cooperative
> > threads as both plug-ins and LEP aren't accessible from a pre-emptive
> > thread. But if feels like something should be possible. What about
> setting
> > up several files and then just making command-lien calls to \copy? At
> that
> > point, the client work is handled by the PG CLI. Hope that made sense....
>
>
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Loading...