Cleaning Text & Replace string speed

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Cleaning Text & Replace string speed

4D Tech mailing list
I could not find it in the archives, so I quickly wrote a method to strip two unwanted, but unidentified characters from a body of text (6mb).  Each acceptable character was added to the output text var.  I stopped it after an hour of the compiled code not returning a result.

Since I am always impressed at how fast 4D commands are, I gave "Replace string" a try.
The same text file took 37 seconds to clean two offending characters, and 17 seconds to clean all but uppercase only characters from the same text, and in interpreted mode.  It took about 4 seconds in compiled mode for both tests.

Here's the code.

Keith - CDI

  // ----------------------------------------------------
  // Method: StringOmit
  // -   Uses REPLACE STRING* to clear characters
  // INPUT1: Text - to strip
  // INPUT2: Longint - lowest allowed character code
  // INPUT3: Longint - highest allowed character code
  // INPUT{4}: Longint - additional allowed character codes
  //
  // OUTPUT:  Text - with remaining characters
  // ----------------------------------------------------
C_TEXT($inText;$1;$0;$outText)
C_LONGINT($i;$len;$low;$2;$high;$3;$cp;$cc;$start)
C_BOOLEAN($hasAdded;$canAdd)

$inText:=$1
$len:=Length($inText)
$low:=$2
$high:=$3

$cp:=Count parameters
If ($cp>3)
        $hasAdded:=True
        ARRAY LONGINT($added;0)
        For ($i;4;$cp)
                APPEND TO ARRAY($added;${$i})
        End for
End if

$start:=1
$i:=$len+2

If ($hasAdded)  // also test the array of additional characters
        While ($start<=$len) & ($i>($len+1))
                For ($i;$start;$len)
                        $cc:=Character code($inText[[$i]])
                        If ($cc<$low) | ($cc>$high) & (Find in array($added;$cc)<1)
                                $inText:=Replace string($inText;Char($cc);"";*)
                                $start:=$i
                                $i:=$len+2
                                $len:=Length($inText)
                        End if
                End for
        End while
       
Else   // no point in testing an empty array
        While ($start<=$len) & ($i>($len+1))
                For ($i;$start;$len)
                        $cc:=Character code($inText[[$i]])
                        If ($cc<$low) | ($cc>$high)
                                $inText:=Replace string($inText;Char($cc);"";*)
                                $start:=$i
                                $i:=$len+2
                                $len:=Length($inText)
                        End if
                End for
        End while
       
End if

$0:=$inText



**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: Cleaning Text & Replace string speed

4D Tech mailing list
as a match regex exercise, you could do...

$test:=Method1 ("abcdefghijklmnopqrstuvwxyz";\
Character code("k");Character code("m");\
Character code("d");Character code("y"))
  //dklmy

>   // ----------------------------------------------------
>   // Method: StringOmit
>   // -   Uses REPLACE STRING* to clear characters
>   // INPUT1: Text - to strip
>   // INPUT2: Longint - lowest allowed character code
>   // INPUT3: Longint - highest allowed character code
>   // INPUT{4}: Longint - additional allowed character codes
>   //
>   // OUTPUT:  Text - with remaining characters
>   // ----------------------------------------------------
>
> C_TEXT($1;$in;$0;$out)
> C_LONGINT($2;$3)
> C_LONGINT(${4})
>
> C_LONGINT($i;$pos;$len)
> C_TEXT($min;$max;$motif)
>
> $min:="\\u"+Substring(String($2;"&x");3)
> $max:="\\u"+Substring(String($3;"&x");3)
>
> $motif:="["+$min+"-"+$max
>
> For ($i;4;Count parameters)
> $motif:=$motif+"\\u"+Substring(String(${$i};"&x");3)
> End for
>
> $motif:=$motif+"]+"
>
> $in:=$1
>
> $i:=1
>
> While (Match regex($motif;$in;$i;$pos;$len))
> $out:=$out+Substring($in;$pos;$len)
> $i:=$pos+$len
> End while
>
> $0:=$out
>



**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: Cleaning Text & Replace string speed

4D Tech mailing list
Even better.  With Match regex it takes about a second to remove a smaller number of characters ($text;32;126;9;13) from this file.   However, when few characters are selected to remain in the result  ($text;65;90;9;13), the time goes up into minutes; 8 compiled and 11 interpreted.  I guess it's all about the payload.

Thanks,
Keith - CDI

> On Jun 15, 2017, at 9:08 PM, Keisuke Miyako via 4D_Tech <[hidden email]> wrote:
>
> as a match regex exercise, you could do...
>
> $test:=Method1 ("abcdefghijklmnopqrstuvwxyz";\
> Character code("k");Character code("m");\
> Character code("d");Character code("y"))
>  //dklmy
>
>>  // ----------------------------------------------------
>>  // Method: StringOmit
>>  // -   Uses REPLACE STRING* to clear characters
>>  // INPUT1: Text - to strip
>>  // INPUT2: Longint - lowest allowed character code
>>  // INPUT3: Longint - highest allowed character code
>>  // INPUT{4}: Longint - additional allowed character codes
>>  //
>>  // OUTPUT:  Text - with remaining characters
>>  // ----------------------------------------------------
>>
>> C_TEXT($1;$in;$0;$out)
>> C_LONGINT($2;$3)
>> C_LONGINT(${4})
>>
>> C_LONGINT($i;$pos;$len)
>> C_TEXT($min;$max;$motif)
>>
>> $min:="\\u"+Substring(String($2;"&x");3)
>> $max:="\\u"+Substring(String($3;"&x");3)
>>
>> $motif:="["+$min+"-"+$max
>>
>> For ($i;4;Count parameters)
>> $motif:=$motif+"\\u"+Substring(String(${$i};"&x");3)
>> End for
>>
>> $motif:=$motif+"]+"
>>
>> $in:=$1
>>
>> $i:=1
>>
>> While (Match regex($motif;$in;$i;$pos;$len))
>> $out:=$out+Substring($in;$pos;$len)
>> $i:=$pos+$len
>> End while
>>
>> $0:=$out
>>
>
>
>
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:[hidden email]
> **********************************************************************

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************