Help with Regex

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Help with Regex

4D Tech mailing list
I needs some help with 2 regex expressions.

I have to process some files in groups from a list of files. Here is an example of the file names:

SP_050317
SP_050417
SP_050517
SP_INS_050317
SP_INS_050417
SP_INS_050517

I need to process the files that start with “SP_” and then the date as one group.

I need to process the files that start with “SP_INS_” and then the date in another group.

I will have a text array that contains all the file names. I could write some code to brute force accomplish this, but I’d like to use regex so that I have some flexibiltiy if the file naming scheme changes in the future. Then I’ll only have to change the regex pattern.

What regex pattern would I use with "Match regex" for the first group of files? And then what pattern to use for the second group of files?

Example Code:

$regexPattern_t:="what do I put here"

DOCUMENT LIST($folderPath_t;$documentNames_at)
For ($i;1;Size of array($documentNames_at))
   $documentName_t:=$documentNames_at{$i}
   $processFile_b:=Match regex($regexPattern_t;$documentName_t;1;$foundPosition_l;$foundLength_l)
   If ($processFile_b)  // need to process this file
        // do some things to the file
   End if
End for


Tim

********************************************
Tim Nevels
Innovative Solutions
785-749-3444
[hidden email]
********************************************

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help with Regex

4D Tech mailing list
Tim,

“SP_[0-9]*” will match the first lot, and “SP_INS_[0-9]*” will match the rest.

But it might be quicker to do the second batch first, matching the names simply to “SP_INS_@“ and removing them from the list when they’re done, then re-scanning for “SP_@“ to do the rest. Just a thought. Regular expressions are fantastic but can be overkill.

Jeremy


Jeremy Roussak
[hidden email]



> On 29 May 2017, at 19:29, Tim Nevels via 4D_Tech <[hidden email]> wrote:
>
> I needs some help with 2 regex expressions.
>
> I have to process some files in groups from a list of files. Here is an example of the file names:
>
> SP_050317
> SP_050417
> SP_050517
> SP_INS_050317
> SP_INS_050417
> SP_INS_050517
>
> I need to process the files that start with “SP_” and then the date as one group.
>
> I need to process the files that start with “SP_INS_” and then the date in another group.
>
> I will have a text array that contains all the file names. I could write some code to brute force accomplish this, but I’d like to use regex so that I have some flexibiltiy if the file naming scheme changes in the future. Then I’ll only have to change the regex pattern.
>
> What regex pattern would I use with "Match regex" for the first group of files? And then what pattern to use for the second group of files?
>
> Example Code:
>
> $regexPattern_t:="what do I put here"
>
> DOCUMENT LIST($folderPath_t;$documentNames_at)
> For ($i;1;Size of array($documentNames_at))
>   $documentName_t:=$documentNames_at{$i}
>   $processFile_b:=Match regex($regexPattern_t;$documentName_t;1;$foundPosition_l;$foundLength_l)
>   If ($processFile_b)  // need to process this file
>        // do some things to the file
>   End if
> End for
>
>
> Tim
>
> ********************************************
> Tim Nevels
> Innovative Solutions
> 785-749-3444
> [hidden email]
> ********************************************
>
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:[hidden email]
> **********************************************************************

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help with Regex

4D Tech mailing list
In reply to this post by 4D Tech mailing list
On Mon, May 29, 2017 at 8:29 PM, Tim Nevels via 4D_Tech <
[hidden email]> wrote:

> $regexPattern_t:="what do I put here"
>

Tim,

Try

SP_(\D{3}_|)(\d{2})(\d{2})(\d{2})
with properly escaped backslashes, it means
regexPattern_t:="SP_(\\D{3}_|)(\\d{2})(\\d{2})(\\d{2})"

This should catch for groups, first is empty or INS_ (with trailing _, was
lazy to remove it) and three groups of two digits each.
Test following code:
ARRAY TEXT($at;6)

$at{1}:="SP_050317"
$at{2}:="SP_050417"
$at{3}:="SP_050517"
$at{4}:="SP_INS_050317"
$at{5}:="SP_INS_050417"
$at{6}:="SP_INS_050517"

ARRAY LONGINT($start;0)
ARRAY LONGINT($len;0)

For ($i;1;6)
  $b:=Match
regex("SP_(\\D{3}_|)(\\d{2})(\\d{2})(\\d{2})";$at{$i};1;$start;$len)
  If ($b)
    $prefix:=Substring($at{$i};$start{1};$len{1}
    $day:=Substring($at{$i};$start{2};$len{2})
    $month:=Substring($at{$i};$start{3};$len{3})
    $year:=Substring($at{$i};$start{4};$len{4})
  End if
End for

--

Peter Bozek
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help with Regex

4D Tech mailing list
In reply to this post by 4D Tech mailing list
On Mon, May 29, 2017 at 8:29 PM, Tim Nevels via 4D_Tech <
[hidden email]> wrote:

> $regexPattern_t:="what do I put here"
>

with a timi little bit of playing,

$regexPattern_t:="SP_(\\D{3}|)(?>_|)(\\d{2})(\\d{2})(\\d{2})"

catches prefix (INS) without "_".

--

Peter Bozek
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help with Regex

4D Tech mailing list
In reply to this post by 4D Tech mailing list
Tim -

> On May 29, 2017, at 3:00 PM, [hidden email] wrote:
>
> Subject: Help with Regex


If you have flexibility to process file names on Mac & outside of 4D…

http://renamer.com <http://renamer.com/>

____________________________
David Eddy
Babson Park, MA

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help with Regex

4D Tech mailing list
In reply to this post by 4D Tech mailing list
On May 29, 2017, at 9:00 PM, Jeremy Roussak wrote:

> “SP_[0-9]*” will match the first lot, and “SP_INS_[0-9]*” will match the rest.
>
> But it might be quicker to do the second batch first, matching the names simply to “SP_INS_@“ and removing them from the list when they’re done, then re-scanning for “SP_@“ to do the rest. Just a thought. Regular expressions are fantastic but can be overkill.

Thanks Jeremy, that's exactly what I was looking for.  

I can't easily control the order of execution for each group or the arrival times for the files. I am currently doing the simple "SP_@" and "SP_INS_@" and am having problems.  So that's why I want to switch to using regex. Then I can just grab all file names in a folder and process either group at any time and always ensure I only process the files I want.

Tim

Sent from my iPad
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help with Regex

4D Tech mailing list
In reply to this post by 4D Tech mailing list
JR's solution works   Some would argue that more specifically alluding to the six digits is more robust in case some slightly mis-titled file exists.

Therefore
^SP_\d\d\d\d\d\d

^SP_INS_\d\d\d\d\d\d


iPhone RRL

> On May 29, 2017, at 11:45 AM, Jeremy Roussak via 4D_Tech <[hidden email]> wrote:
>
> Tim,
>
> “SP_[0-9]*” will match the first lot, and “SP_INS_[0-9]*” will match the rest.
>
>
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Help with Regex

4D Tech mailing list
Fair point, if you’re sure there will always be six digits. \d{6} will work, too.


Jeremy Roussak
[hidden email]



> On 30 May 2017, at 00:19, Robert Livingston <[hidden email]> wrote:
>
> JR's solution works   Some would argue that more specifically alluding to the six digits is more robust in case some slightly mis-titled file exists.
>
> Therefore
> ^SP_\d\d\d\d\d\d
>
> ^SP_INS_\d\d\d\d\d\d
>
>
> iPhone RRL
>
>> On May 29, 2017, at 11:45 AM, Jeremy Roussak via 4D_Tech <[hidden email]> wrote:
>>
>> Tim,
>>
>> “SP_[0-9]*” will match the first lot, and “SP_INS_[0-9]*” will match the rest.
>>
>>

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Loading...