Helpful Information

Category: Regex Programming

Extracting from html using php

Hi,
I am completely new to regexp, but i think i need some to extract some info from a table that i have, maybe somebody can give me some pointers?

here is the html,

<table width="430" border="0" cellpadding="4"> <tr> <td width="122" valign="top"><img src="graphics/ice_logo_1.gif"></td> <td width="130" valign="top"><p class="style3"><strong>Home Broadband</strong><br /><span class="tool">Wireless <a href="" title=""> <img src="graphics/tooltip.gif" width="10" height="10" border="0"></a></span><br /><em>Cont </em> (36:1) <a href="" title=""><img src="graphics/tooltip.gif" width="10" height="10" border="0"></a><br> </p></td> <td width="100" valign="top"><span class="style3"> <em>Dn</em> 3Mbps <em><a href="" title=""> <img src="graphics/tooltip.gif" width="10" height="10" border="0"></a></em><br> <em>Up</em> 1Mbps</span> <span><em><a href="" title=""><strong><img src="graphics/tooltip.gif" width="10" height="10" border="0"></strong></a></em></span></td> <td width="78" align="right" valign="top"><p> €37.99 <a href="test" title=""> <strong><img src="graphics/tooltip.gif" width="10" height="10" border="0"></strong></a><br /> </p> </td> </tr> </table>

can somebody help me get the fields from this?

Generally using your own regexps to parse HTML is a bad idea. I'm not PHP programmer, but I'd be pretty confident there should be HTML parsing libraries available to do this kind of task.

This will give you all the contents of the <td> fields:

$string = <<<STRING
<table width="430" border="0" cellpadding="4"> <tr> <td width="122" valign="top"><img src="graphics/ice_logo_1.gif"></td> <td width="130" valign="top"><p class="style3"><strong>Home Broadband</strong><br /><span class="tool">Wireless <a href="" title=""> <img src="graphics/tooltip.gif" width="10" height="10" border="0"></a></span><br /><em>Cont </em> (36:1) <a href="" title=""><img src="graphics/tooltip.gif" width="10" height="10" border="0"></a><br> </p></td> <td width="100" valign="top"><span class="style3"> <em>Dn</em> 3Mbps <em><a href="" title=""> <img src="graphics/tooltip.gif" width="10" height="10" border="0"></a></em><br> <em>Up</em> 1Mbps</span> <span><em><a href="" title=""><strong><img src="graphics/tooltip.gif" width="10" height="10" border="0"></strong></a></em></span></td> <td width="78" align="right" valign="top"><p> €37.99 <a href="test" title=""> <strong><img src="graphics/tooltip.gif" width="10" height="10" border="0"></strong></a><br /> </p> </td> </tr> </table>
STRING;

preg_match_all("#<td[^>]*>(.+?)</td#", $string, $foo);
print_r($foo[1]);However, it also includes plenty of HTML. If the HTML is properly formed, you can use the DOM document model to parse it, but I personally like using regexp for this sort of thing.

-Dan