关于pdf解析 PdfBox解析提取题目作者时不对

发布时间：2011-07-03 07:05:34 文章来源：www.iduyao.cn 采编人员：星星草

关于pdf解析 PdfBox解析提取标题作者时不对
我现在想做一个pdf文献搜索系统，需要解析pdf  从中提取文献中的作者名标题期刊名 keywords 作为索引域。使用pdfbox的api 但是提取的内容不是想要的。那位帮我解决一下。谢谢！

我的代码：
PDDocument pDoc=new PDDocument(cosDoc);
try{
PDDocumentInformation docInfo = pDoc.getDocumentInformation();
if(docInfo != null){
String author = docInfo.getAuthor();
System.out.println("author" +author);
String title = docInfo.getTitle();
String summary = docInfo.getSubject();
String keywords = docInfo.getKeywords();
System.out.println("Author      " +author);
System.out.println("Title     " +title);
System.out.println("Summary     " +summary);
System.out.println("Keywords     " +keywords);

if(!author.isEmpty()){
doc.add(new Field("author",author,Field.Store.NO,Field.Index.ANALYZED));

}
if(!title.isEmpty()){
doc.add(new Field("title",title,Field.Store.NO,Field.Index.ANALYZED));
}
if(!summary.isEmpty()){

doc.add(new Field("summary",summary,Field.Store.NO,Field.Index.ANALYZED));
}
if(!keywords.isEmpty()){
doc.add(new Field("keywords",keywords,Field.Store.NO,Field.Index.ANALYZED));
}
}
}catch(Exception e){
closeCOSDocument(cosDoc);
closePDDocument(pDoc);
System.err.println("cannot get pdf meta-data" +e.getMessage());

}
return doc;
}

private static COSDocument parseDocument(FileInputStream is) throws IOException{
PDFParser parser=new PDFParser(is);
parser.parse();
return parser.getDocument();
}

private static void closeCOSDocument(COSDocument cosDoc){
if(cosDoc != null){
try{
cosDoc.close();
}catch(IOException e){
//
}
}
}

private static void closePDDocument(PDDocument pdDoc){
if(pdDoc != null){
try{
pdDoc.close();
}catch(IOException e){
//
}
}
}

运行报错  红字部分
authorxcannot get pdf meta-datanull

Author      x
Title     3.1 Editorials.indd Colin.indd
Summary     null
Keywords     null
authorx
Author      x
Title     3.1 N&V NR IF.indd
Summary     null
Keywords     null
cannot get pdf meta-datanull
authorx
Author      x
Title     3.1 Editorials.indd Colin.indd
Summary     null
Keywords     null
cannot get pdf meta-datanull
Optimizing index...
3271 total milliseconds
Documents 3
我想知道是代码的原因      还是 pdfbox的原因
------解决思路----------------------
不懂，关注一下
------解决思路----------------------
不知道楼主对pdf文件的自动生成的功力如何？

现在我接手的一个项目中，有相关的开发内容。与pdf有关的需求是：根据各家用户的定义文档格式（由于用户的行业背景不一样，所以定义格式会有相当大的差别，这一点，希望楼主心里有数），将其他来源的结果数据，准确填入到格式文件中，最终将合成结果生成pdf格式文件。至于此项目别的接口已经封装完毕，可直接通过API函数或临时文件两种方式握手通讯。

上一篇：Google earth 上显示的黄色彩的国界是调用的矢量数据吗？
下一篇：有没有GOOGLE工程师啊在这里可以联系的。解决办法

友情提示：
信息收集于互联网，如果您发现错误或造成侵权，请及时通知本站更正或删除，具体联系方式见页面底部联系我们，谢谢。

其他相似内容：

哪位高手有google地图校正的数据库或文件

谁有google地图校正的数据库或文件？谁有google地图校正的数据库或文件？或算法也行 QQ: 15537931 有同样问题的，加我QQ讨论下啊 ...
请问google map v3自适应容器(浏览器,div等)大小的解决办法

请教google map v3自适应容器(浏览器,div等)大小的解决方法? RT, 谢谢了!!! ------解决方案-------------------- #map_canvas ...
:MyEclipse6.0开发Android,提示无法找到AVD

求救:MyEclipse6.0开发Android,提示无法找到AVD 我用的MyEclipse6.0搭建android开发环境,创建AVD之后在AVD管理器里面可以看到,运...
网站集成google网页搜索的有关问题

网站集成google网页搜索的问题… 想在站点中利用google的api提供搜索，高手给指点一下啊，能不能在服务端（asp.net）利用google搜索结果？？ ...
google map服务器部署在局域网，用户联网；用google map API可以开发的系统可以正常运行吗解决方案

google map服务器部署在局域网，用户联网；用google map API可以开发的系统可以正常运行吗 google map服务器部署在局域网，用户联网；用g...
蛋疼~小弟我的chrome居然打不开12306.cn的购票网页

蛋疼~我的chrome居然打不开12306.cn的购票网页同事的chrome都能打开12306.cn的购票网页，而我的却打不开。我本本上的chrome版本...
google map局域网应用,该如何解决

google map局域网应用用google map写了一个项目，内网有的有电脑不能联网，可以访问地图吗？ ------解决方案-------------------- ...
GMAP.NET

求助GMAP.NET? RT,为什么自己搭建的WINFORM，加载不了GMAP.NET？求熟悉者告知万分感谢 ------解决方案-------------------- 1、...
求google maps api 源代码解决办法

求google maps api 源代码有没有google maps api3 的源代码可以下载 ------解决方案-------------------- api是谷歌对开发...
Google地图接口,如何算两地的距离

Google地图接口,怎么算两地的距离最近做一个项目，需要根据用户填写的两个地名，算出两地之间的距离，我打算使用Google地图，我已经申请...

关于pdf解析 PdfBox解析提取题目作者时不对

其他相似内容：

热门推荐：