Helpful Information
 
 
Category: Regex Programming
Unique characters only in string

Hi,

Need help with a simple expression - it's driving me nuts though! Basically need to match words that have unique letters (although they can contain characters too) in them - i.e. "bent" would return a match, but "here" wouldn't.

Any ideas? Tons of rep for the winner :tntworth:

Hmm...I don't think it's possible to do this quickly. Plus, it's probably always going to be quicker to loop through the string one character at a time. In PHP:

function has_duplicate_letters( $string ) {
$len = strlen($string);
for ( $p = 1; $p < $len; $p++ ) {
for ( $q = 0; $q < $p; $q++ ) {
if ( $string[$p] == $string[$q] ) return true;
}
}
return false;
}-Dan

Hi,

Need help with a simple expression - it's driving me nuts though! Basically need to match words that have unique letters (although they can contain characters too) in them - i.e. "bent" would return a match, but "here" wouldn't.


This regex will do the trick:


^(?:(.)(?!.*?\1))*$

That regexp may work in languages that support \1 references inside the match condition, which I don't normally use. However, it seems to only check that the first letter is unique, not all of them. Though without experience in that style of expression, I can't tell you.

-Dan

You don't normally use backreferences?

What Cavemann posted works - try it. Kinda complicated though.

(.)[^\1]*\1
[edit] This expression tests if something has a repeated character. If you want to test for uniqueness then just negate the result you get. In PHP

if (!preg_match('/(.)[^\1]*\1/', $text)) {
// each character in $text is unique
}

...

What Cavemann posted works - try it.

Err, "Cavemann"?


Kinda complicated though.

(.)[^\1]*\1

Most PCRE regex engines I know don't accept back references inside character sets. May I ask how you tested yours (in what language and with what input)?
Besides, if that had worked, you only seem to be checking if one character is repeated once, not what the OP is looking for (checking if all characters are unique).

That regexp may work in languages that support \1 references inside the match condition, which I don't normally use.

I see you are familiar with PHP, which supports back references. Note that (nearly) all PCRE regex engines (like PHP's preg-functions) support them.


However, it seems to only check that the first letter is unique, not all of them.

No, that is not correct.


Though without experience in that style of expression, I can't tell you.

-Dan

No offence, but before commenting on something you don't fully understand, perhaps you should first try it?

Err, "Cavemann"?
Sometimes when I refer to people I call them by a different name. For fun. No bad feelings.

The name Prometheus reminds me of an old clay animation called Prometheus and Bob (http://en.wikipedia.org/wiki/Prometheus_and_Bob). While the caveman is actually Bob (the alien is Prometheus) for some reason I remember it the other way around.
Since you repeated the last character in your name I repeated the last in mine too.

Thus "prometheuzz" -> "Cavemann" (capitalized because it's a name) :)

Yeah, I'll admit that was a bit of a stretch. Most of the time it's more obvious (like E-Oreo becoming just Oreo).


Most PCRE regex engines I know don't accept back references inside character sets. May I ask how you tested yours (in what language and with what input)?
I tested with PHP's preg_ functions (PHP 5.2.8, PCRE 7.8). If I had Perl I would have tried that.

$words = array(
"there",
"foo",
"bar",
"was not"
);

foreach ($words as $w) {
echo "$w: ";
var_dump(preg_match('/(.)[^\1]*\1/', $w));
}

Besides, if that had worked, you only seem to be checking if one character is repeated once, not what the OP is looking for (checking if all characters are unique).
Right. It checks for repeated characters. If it fails this test then all characters are unique.

I have a "check if it's invalid" mentality (as opposed to "check if it's valid") and considering how OP asked for something that does the exact opposite I probably should have mentioned that the result of my regex should be inverted.
(That, and I don't like using lookaheads or lookbehinds if I don't need to.)

PS: In that other thread, when I said "{ and } are special characters" I was simplifying. They're just as special as . * and ? (that is, most of the time but not always).

Sometimes when I refer to people I call them by a different name. For fun. No bad feelings.

The name Prometheus reminds me of an old clay animation called Prometheus and Bob (http://en.wikipedia.org/wiki/Prometheus_and_Bob). While the caveman is actually Bob (the alien is Prometheus) for some reason I remember it the other way around.
Since you repeated the last character in your name I repeated the last in mine too.

Thus "prometheuzz" -> "Cavemann" (capitalized because it's a name) :)

Yeah, I'll admit that was a bit of a stretch. Most of the time it's more obvious (like E-Oreo becoming just Oreo).

Ah, I noticed the double "n" in "Cavemann", but didn't know the animation. Thanks for the link. ; )


I tested with PHP's preg_ functions (PHP 5.2.8, PCRE 7.8). If I had Perl I would have tried that.

Hmm, Java's java.util.regex package (a high PCRE degree) does not support them. I would have guessed PHP's preg-functions wouldn't either, which is not the case!




$words = array(
"there",
"foo",
"bar",
"was not"
);

foreach ($words as $w) {
echo "$w: ";
var_dump(preg_match('/(.)[^\1]*\1/', $w));
}

Right. It checks for repeated characters. If it fails this test then all characters are unique.

I have a "check if it's invalid" mentality (as opposed to "check if it's valid") and considering how OP asked for something that does the exact opposite I probably should have mentioned that the result of my regex should be inverted.
(That, and I don't like using lookaheads or lookbehinds if I don't need to.)

Since your "raw" regex pattern only matched a single character, I couldn't see it working. But it seems that the if() statement in in PHP does a bit more than I would think (I know very little PHP...).
Anyway, thank you for you clarification on the nickname and your example!
; )










privacy (GDPR)