I'm working on parsing a string from an RFC, and I can't get my regex to work. So I've written a small Java program to test. I don't understand the results, so I can't figure out what I'm doing wrong.
The applicable section deals with a "type=" string.
The specs are that there can be either a series of type=X separated by semicolons,
type=X;type=Y;type=Z
or you can have a series of arguments,
type=X,Y,Z
where the X values are keywords
private static final String teltypesarg = "HOME|WORK|PREF|MSG|CELL";
private static final String teltypeseq = "type=("+teltypesarg + ")(,(" + teltypesarg +"))*";
private static final String teltypefull = teltypeseq + "(;"+teltypeseq + ")*";
static final Pattern teltypesPat = Pattern.compile(teltypefull, Pattern.CASE_INSENSITIVE);
String[] tests = {
"type=CELL,pref:(301) 996-1054",
"type=INTERNET;type=WORK;type=pref:jiabr@comcast.net",
"type=CELL,pref,msg:(703) 304-8914",
};
System.out.println(teltypefull);
for (String s : tests) {
System.out.println(s);
Matcher m = teltypesPat.matcher(s);
if ( m.find()) {
for ( int j =1; j <= m.groupCount(); j++)
System.out.println("gc: " + j + " = " + m.group(j) );
}
}
It seems to work fine for the "type=X;type=Y" model
The output doesn't do a proper greedy match with the series of keywords separated by commas. such as