Now the ^ isn't negating because it's not in a class. So how would I negate all those tags (meaning match anything EXCEPT those?)
Also, what's a better alternative to the .* match so that they can't just throw a newline in there and **** things up?
As you note, the negation appears to not be working because '^' serves as a negator for character classes (the [] construct.)
Possibly you could set the match search to a negated match search? (Change the =~ ?)
Also, what's a better alternative to the .* match so that they can't just throw a newline in there and f*** things up?
I've read that trying [^>]* will encompass everything (including n) until the close of the tag; will this help?
Hi ,
Am having the same issue and would like to know the answer of this post ...
I need a regular expression to strip out all HTML tags EXCEPT the ones I've allowed. ..
I have made many attempts but none succeed ..
Your answer will be highly appreciated ..
Regards ..
Miss Moon ;)
HTML::TokeParser::Simple does this pretty well.
my $parser = HTML::TokeParser::Simple->new(string => $html);
my $clean_html;
while ( my $token = $parser->get_token ) {
next unless (($token->is_text) || ($token->is_tag(qr/^img$|^[pbuia]$|^font$|^strong$|^em$|^code$|^pre$|^h\d{1}$|/ )));
$clean_html .= $token->as_is;
}
Another alternative, that's more suited to this specific task: HTML::TagFilter or HTML::Restrict looks good too.
Either way, I wouldn't recommend trying to use regular expressions for this.
Hi ,
Ya the regular expressions might not be the better solution for this but in my case i need it to be done using the regular expressions .. check the following of my attempts :
But each has its disadvantages .. Any idea about better solution ?
Regards ..
Miss Moon
Here You go :
{<(?!i|b|h[1-6]|/i|/b|/h[1-6][\s|>|/])[^>]*>}
Regards ..
Miss Moon
Best not to roll your own regular expression for this. It is better to use a time-tested perl module such as HTML::Parser or HTML::TokenParser. Using your own regular expression is certain not to work for special cases. For instance, Miss Moon, your regexp will not work for tags such as '<h1 class="foo">'.
See http://perldoc.perl.org/perlfaq6.html#How-do-I-match-XML%2C-HTML%2C-or-other-nasty%2C-ugly-things-with-a-regex%3F for an official Perl answer.
Also remember jzawinski's famous saying:
'Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.'