J2EE

java解析html取table里面的内容，解决办法

2010-06-05 00:23:15 阅读

java解析html取table里面的内容，
<BODY leftmargin="1" topmargin="1" onload="javascript:resizeMe()">
<CENTER><a href="http://www.ip138.com" target="_blank"><FONT class=tdc>手机号码归属地专业在线查询网</FONT></a>
</CENTER>
<HR SIZE=1 width=320>
<TABLE width="360" border=1 align="center" cellPadding=4 borderColor=#3366cc style="BORDER-COLLAPSE: collapse">
<FORM action="" method="get" name="mobileform" onsubmit="return checkMobile();">
<TR bgColor=#eff1f3 class=tdc>
<TD align=middle width=130 noswap>手机号码(段) </TD>
<TD align=middle width=*><INPUT class=tdc name="mobile" maxLength="11">
<INPUT name="action" type="hidden" value="mobile"> <INPUT class="bdtj" type="submit" value="查询">
</TD>
</TR>
</FORM>
</TABLE>
<BR>

<TABLE width="360" border="1" align="center" cellpadding="4" bordercolor=#3366cc style="border-collapse: collapse">
<TR>
<TD colspan=2 class=tdc1 align=center height=24 bgcolor=#6699cc>++ ip138.com查询结果 ++</TD>
</TR>
<TR class=tdc bgcolor=#EFF1F3>
<TD width="138" align="center" noswap>您查询的手机号码段</TD>
<TD width=* align="center" class=tdc2>1821***** <a href="http://jx.ip138.com/182172****/" target="_blank">测吉凶(<font color="red">新</font>)</a></TD>
</TR>
<TR class=tdc bgcolor=#EFF1F3>
<TD width="138" align="center" noswap>卡号归属地</TD>
<TD width=* align="center" class=tdc2>上海 上海</TD>
</TR>
<TR class=tdc bgcolor=#EFF1F3>
<TD width="138" align="center" noswap>卡 类 型</TD>
<TD width=* align="center" class=tdc2>上海移动全球通卡</TD>
</TR>
<TR class=tdc bgcolor=#EFF1F3>
<TD align="center">区号</TD>
<TD align="center" class=tdc2>021</TD>
</TR>
<TR class=tdc bgcolor=#EFF1F3>
<TD align="center">邮编</TD>
<TD align="center" class=tdc2>200000 <a href="http://alexa.ip138.com/post/" target="_blank">更详细的..</a></TD></TR>
</TABLE>
我要解析这个里面的结果，拿到table里面的数据，比喻拿到卡号归属地：上海，卡类型：全球通，这个怎么做了？

------解决方案--------------------
两种方式：
1、用htmlparse，网上查一下吧，一个开源的包，解析修改html很强大，效率也不错。
2、如果不用修改，用正则很简单。使用正则代码如下：

Java code


Pattern pattern = Pattern.compile(regex,Pattern.DOTALL);//regex为你要解析的正则
Matcher matcher = pattern.matcher(source);//source为html源码

while (matcher.find()) {
    list.add(matcher.group(1));//group(i),i表示正则里面从前到后第几个括号里的内容，i为0表示解析到的整个正则内容
}















 


 



 阅读




上一篇：javax.servlet.ServletException: Processing of multipart/form-data request failed解决办法 下一篇：返回列表