Helpful Information
 
 
Category: Regex Programming
Pulling my hair out..what little I have left! :)

Hi all- Newbie here.

I have a string as follows (for example): "I have visited the U.S., Puerto Rico, Canada, U.S. Virgin Islands and the Carribean."

I want to extract just the country names from that sentence, so: 1) U.S. 2) Puerto Rico 3) Canada 4) U.S. Virgin Islands and 5) Carribean.

I have something like: (United States|U\.S\.|Canada|Puerto Rico|U\.S\. Virgin Islands|Carribean)(,|\s|and|the)+

This matches the following: "U.S., Puerto Rico, Canada, U.S. Virgin Islands and the Carribean."

The problem I have is it matches U.S. twice, where it should only match once (it extracts U.S. from "U.S. Virgin Islands").

There must be a nicer way of doing this. It sucks to be a newb! :)

Thanks for any help.

*cough* The Carribean isn't a country ;)

Can't you just ignore the extra match?

(United States|U\.S\.(?! Virgin Islands)|Canada|Puerto Rico|U\.S\. Virgin Islands|Carribean)

*cough* The Carribean isn't a country ;)

Can't you just ignore the extra match?

(United States|U\.S\.(?! Virgin Islands)|Canada|Puerto Rico|U\.S\. Virgin Islands|Carribean)

Thanks...I knew there was a simple way of doing this...I was over complicating it!! :) I'm probably going to have lots more questions to post on this board, hopefully ones that aren't so easy.

P.S. Anyone recommend a good regex editor? I installed a trial version of regex buddy on Vista and after the first use, it got corrupted! Any freeware out there?

For matches such as these, if some of your alternatives are extensions of other (i.e. U.S. Virgin Islands contains U.S. and then some more text) you always need to put the longer one first, so it will attempt to match that one first.










privacy (GDPR)