|
chamenas Wizard
Joined: 26 Mar 2008 Posts: 1547
|
Posted: Fri Oct 03, 2008 11:56 am
Error with Regex |
So, I've never used {#,#} on my own before, and apparently I'm using it wrong.
Code: |
ID: Fireproof off
Pattern: ^([\w'- ]+){1,6} protective aura fades.$
#say %1's Fireproof has dropped!
Regular Expression
ID: Fireproof-m
Pattern: ^([\w'- ]+){1,6} Fireproof has dropped!$
#color white
#CAP Capture Window
Regular Expression
|
What exactly am I doing wrong to cause this error? (doesn't work for "a divine ring of protection's protective aura fades.") |
|
|
|
Vijilante SubAdmin
Joined: 18 Nov 2001 Posts: 5182
|
Posted: Fri Oct 03, 2008 6:45 pm |
The problem is actually with "[\w'- ]". A dash when in the class control "[]" is used to signify inclusion of all characters in between. For example "[a-z]" means the same as "[abcdefghijklmnopqrstuvwxyz]" In order to use a dash explicitly within a class you must either escape it or have it as the last character of the class. Correcting that specific item would be either "[\w' -]" or "[\w'\- ]".
The usage of {x,y} indicates to repeat the previous sequence a minimum of x times and a maximum of y times. In order for this to be effective you have to remove the space from the class which is covered by "+". The plus modifier for a sequence is the same as {1,}, and means minimum 1, maximum as much as possible. You want to limit your matching to 6 full words as an accelerator, but that can only be done by moving the space out of the class and into the position of the last item controlled by the {x,y} repetition. So far the modification is "([\w'-]+ ){1,6}".
Then you want the capture to actually contain all the words matched. As it stands at "([\w'-]+ ){1,6}", when a repetition occurs it dumps the previous capture value. This is because bothe the opening and closing parenthesis for the capture are repeated. To fix this use a non-capturing group for the repetition and place the capturing group outside of that. "((?:[\w'-]+ ){1,6})"
Finally this makes the space at the beginning of the solid text " Fireproof has dropped!" already captured. You could make that space optional, but it is much faster to trim it from your captured data. The final pattern is "^((?:[\w'-]+ ){1,6})Fireproof has dropped!$".
Why you marked it as a capture with the code you are presenting is beyond me, but since you did I made sure to explain all those details as well. A small speed gain is available whenever a non-capturing group is used instead of a capturing group. |
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
|
chamenas Wizard
Joined: 26 Mar 2008 Posts: 1547
|
Posted: Fri Oct 03, 2008 6:57 pm |
Thank you, I think I understand that better now. Time to tackle input triggers!
|
|
|
|
chamenas Wizard
Joined: 26 Mar 2008 Posts: 1547
|
Posted: Sat Oct 18, 2008 5:58 pm |
Code: |
^(([\w'-]+){1,2}) pages: .* |
So, I got this to work, then tried to make it {1,3} and added a . to [\w'-.] and I managed to break it so it doesn't work. I checked the regex site and it showed me what it was used for, but not how to trouble shoot it. I've tried to do anything I can to get it to work. this is what it should capture:
Erin Macquarie pages: ooh,
This is why I tried to add 3 and a .
Monet St. Croix
It's maddening that it worked before and that I tried correcting it in a way that should have been legal, can anyone help me? |
|
|
|
Vijilante SubAdmin
Joined: 18 Nov 2001 Posts: 5182
|
Posted: Sat Oct 18, 2008 10:52 pm |
First change, the inner parenthesis should be non-capturing. This improves the speed slightly.
^((?:[\w'-]+){1,2}) pages: .*
Next is to add the additional character; specifically the period.
^((?:[\w.'-]+){1,2}) pages: .*
Now we have to understand what "[\w.'-]+" will match. This is going to cover as many characters as possible from this group "01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ.'-abcdefghijklmnopqrstuvwxyz". So it will match "St.", "Croix", "Erin", or "Macquarie", etc. It does not match a space. The space is a seperate item and is the entire reason to use the range.
^((?:[\w'-]+ ){1,2}) pages: .*
Putting the space inside the repeating range means it needs to be removed from the outside.
^((?:[\w'-]+ ){1,2})pages: .*
Now you can set the repeat numbers to what ever you want, you just have to remember to %trim off the extra space in your code. |
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
|
chamenas Wizard
Joined: 26 Mar 2008 Posts: 1547
|
Posted: Sun Oct 19, 2008 3:47 am |
I knew I needed to place a space in there, but I couldn't get it to work right. I'm going to repeat it in my head a little and get it right. Sorry.
|
|
|
|
chamenas Wizard
Joined: 26 Mar 2008 Posts: 1547
|
Posted: Mon Oct 20, 2008 4:30 pm |
What wonderful fun, the space is the culprit again, but for a wholly different reason. In this particular pattern I need the space for between multiple words but not at the end, so that the colon is right after the last character of the last word. I cannot think of a way of delimiting the spaces like that, but I'm certain it has to do with how I arrange the space or how I tell it to repeat... does anyone know, or is there some reading that speaks specifically to this situation?
^Long distance to ((?:[\w'-.]+ ){1,3}): .* |
|
|
|
Rahab Wizard
Joined: 22 Mar 2007 Posts: 2320
|
Posted: Mon Oct 20, 2008 4:44 pm |
In that situation, I would probably remove the curly braces and move the space inside the square bracket pattern:
^Long distance to ([\w'-. ]+): .* |
|
|
|
chamenas Wizard
Joined: 26 Mar 2008 Posts: 1547
|
Posted: Mon Oct 20, 2008 4:49 pm |
Oh, that would effectively cover multiple words too? What's the point of the curly brackets then? To define a number of words when there isn't a constant following after?
|
|
|
|
Rahab Wizard
Joined: 22 Mar 2007 Posts: 2320
|
Posted: Mon Oct 20, 2008 5:03 pm |
+ means one or more of the previous pattern. I'm not very familiar with the curly brace syntax, but it looks to me that is used when you want to specify a specific number or repeats, or range of repeats. In your case, it was matching 1 to 3 instances of the pattern. By removing the curly braces and putting the space in the square brackets, the + will match any number of words separated by apostrophes, hyphens, periods, or spaces. At least, according to my understanding. I use regexes only occasionally.
|
|
|
|
chamenas Wizard
Joined: 26 Mar 2008 Posts: 1547
|
Posted: Mon Oct 20, 2008 5:28 pm |
Right, so I'm thinking curly is for when there isn't a defined constant to continue the pattern like ':', I was just so focused on that because of past experiences using it that I didn't realize I didn't need it, so thanks, I think I understand both methods better now as a result.
|
|
|
|
Vijilante SubAdmin
Joined: 18 Nov 2001 Posts: 5182
|
Posted: Thu Oct 23, 2008 7:28 pm |
The use of the brace syntax is for a controlled repeat. The reason to use it is to cause backtracking to give up an entire word at a time. This provides a small speed increase when the repeated section is followed by a wildcard or list. The regex where I first used this with you had a list following the repeated words. There it was a definite speed enhancement. With this particular regex it is not nearly as useful, but I still wanted to explain the syntax.
|
|
_________________ The only good questions are the ones we have never answered before.
Search the Forums |
|
|
|
|
|