Helpful Information
 
 
Category: Regex Programming
Grouping has me stuck

I'm using Java's regex libraries, and I can't get grouping to work. Or at least I can't get it to work they way I want.

What I want is to match from the begining of the string up to, but not including any number of trailing semicolon characters. I expected that grouping it would let the first group be the characters I want.
But no.

Here is a code snipet:

static final Pattern pat = Pattern.compile("^(.*?);*$");

private static final String[] list = {
"abc;",
"N:Berger;Gary;;;",
"EMAIL;type=INTERNET;type=pref:halberman@alum.mit.edu"};

private void bar(String arg) {
Matcher m = pat.matcher(arg);
int count = 0;
while(m.find()) {
count++;
System.out.println("Match number "+count);
System.out.println("start(): "+m.start());
System.out.println("end(): "+m.end());
System.out.println(arg.substring(m.start(), m.end()));
for (int i = 0; i < m.groupCount(); i++) {
System.out.println(m.group(i));
}
}
}


Any pointers greatly appreciated.

hi, there are 2 little problems with what you're doing.

1. Your regular expression ("^(.*?);*$") is saying match smallest string which ends with zero or more semi-colons.

It should be "^(.*?);.*$" - match smallest string which end with semi-colon and zero or more of any character.

2. If/When you actually find a match your group count will be 1 as you only have 1 set of parenthesis, so your loop ...



for (int i = 0; i < m.groupCount(); i++)
{
System.out.println(m.group(i));
}


...will never show m.group(1). You need to change it to ...



for (int i = 0; i <= m.groupCount(); i++)
{
System.out.println(m.group(i));
}


Note : group(0) just means the whole string you're testing against.

Thanks, it was the missing <= that threw me.










privacy (GDPR)