Helpful Information
 
 
Category: Regex Programming
Using regex to parse arguments

I'm working on parsing a string from an RFC, and I can't get my regex to work. So I've written a small Java program to test. I don't understand the results, so I can't figure out what I'm doing wrong.

The applicable section deals with a "type=" string.

The regex that I'm using is:

type=(HOME|WORK|PREF|MSG|CELL)(,(HOME|WORK|PREF|MSG|CELL))*(;type=(HOME|WORK|PREF|MSG|CELL)(,(HOME|W ORK|PREF|MSG|CELL))*)*

The specs are that there can be either a series of type=X separated by semicolons,
type=X;type=Y;type=Z
or you can have a series of arguments,
type=X,Y,Z
where the X values are keywords


private static final String teltypesarg = "HOME|WORK|PREF|MSG|CELL";
private static final String teltypeseq = "type=("+teltypesarg + ")(,(" + teltypesarg +"))*";
private static final String teltypefull = teltypeseq + "(;"+teltypeseq + ")*";
static final Pattern teltypesPat = Pattern.compile(teltypefull, Pattern.CASE_INSENSITIVE);
String[] tests = {
"type=CELL,pref:(301) 996-1054",
"type=INTERNET;type=WORK;type=pref:jiabr@comcast.net",
"type=CELL,pref,msg:(703) 304-8914",
};
System.out.println(teltypefull);
for (String s : tests) {
System.out.println(s);
Matcher m = teltypesPat.matcher(s);
if ( m.find()) {
for ( int j =1; j <= m.groupCount(); j++)
System.out.println("gc: " + j + " = " + m.group(j) );
}
}


It seems to work fine for the "type=X;type=Y" model
The output doesn't do a proper greedy match with the series of keywords separated by commas. such as


type=CELL,pref,msg:(703) 304-8914
gc: 1 = CELL
gc: 2 = ,msg
gc: 3 = msg
gc: 4 = null
gc: 5 = null
gc: 6 = null
gc: 7 = null

Thanks
pat

If I'm not mistaken, Java uses a PCRE. One of the limitations of that is the regex

( pattern )*
only captures the last time it matches.

You could try this instead for a comma separated list

( pattern (?: , pattern )* )

Though if I were doing it, I would use String.split() first on the semi-colon then on the equals then on the comma.










privacy (GDPR)