According to document Jena Full Text Search, it is possible that the indexed text is content external to the RDF store with only additional triples in the RDF store. The subject URI returned as a search result may then be considered to refer via the indexed property to the external content.
To work with external content, the maintenance of the index is external to the RDF data store too. The key of the index is: building an URI field from indexed document's path.
When the text search is performed, the URIs returned as the search result will be used for matching the subject URI in the SPARQL query.
In this tutorial, we create lucene index for external text documents, perform a full text search in SPARQL using jena-text, then return the highlighting results.
The text files under directory to_index are the external files which will be used to create a Lucene index. File name of each text file is the email id, corresponding to the email id in the turtle file.
$ ls -la to_index/
total 16
drwxrwxr-x 2 hy hy 4096 12月 6 08:44 .
drwxrwxr-x 7 hy hy 4096 12月 13 15:06 ..
-rw-rw-r-- 1 hy hy 30 12月 6 08:44 id1
-rw-rw-r-- 1 hy hy 28 12月 6 08:44 id2
$ cat to_index/id1
context
good luck
background
$ cat to_index/id2
context
bad luck
background
// in file IndexFiles.java
/** Indexes a single document */
static void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOException {
try (InputStream stream = Files.newInputStream(file)) {
// make a new, empty document
Document doc = new Document();
// Add the path of the file as a field named "path". Use a
// field that is indexed (i.e. searchable), but don't tokenize
// the field into separate words and don't index term frequency
// or positional information:
Field pathField = new StringField("uri", App.EMAIL_URI_PREFIX + "id/" + file.getFileName().toString(), Field.Store.YES);
doc.add(pathField);
// Add the last modified date of the file a field named "modified".
// Use a LongPoint that is indexed (i.e. efficiently filterable with
// PointRangeQuery). This indexes to milli-second resolution, which
// is often too fine. You could instead create a number based on
// year/month/day/hour/minutes/seconds, down the resolution you require.
// For example the long value 2011021714 would mean
// February 17, 2011, 2-3 PM.
doc.add(new LongPoint("modified", lastModified));
// Add the contents of the file to a field named "contents". Specify a Reader,
// so that the text of the file is tokenized and indexed, but not stored.
// Note that FileReader expects the file to be in UTF-8 encoding.
// If that's not the case searching for special characters will fail.
//doc.add(new TextField("text", new BufferedReader(new InputStreamReader(stream, StandardCharsets.UTF_8))));
doc.add(new TextField("text", new String(Files.readAllBytes(file)), Field.Store.YES));
if (writer.getConfig().getOpenMode() == OpenMode.CREATE) {
// New index, so we just add the document (no old document can be there):
System.out.println("adding " + file);
writer.addDocument(doc);
} else {
// Existing index (an old copy of this document may have been indexed) so
// we use updateDocument instead to replace the old one matching the exact
// path, if present:
System.out.println("updating " + file);
writer.updateDocument(new Term("path", file.toString()), doc);
}
}
}
}
With in line Field pathField = new StringField("uri", App.EMAIL_URI_PREFIX + "id/" + file.getFileName().toString(), Field.Store.YES);, we create a StringField uri from the external text file's name. And Field.Store.YES indicates the text file's content will be stored in the index, which is used for working with jena-text's full text search highlighting feature.
Load dataset and define the index mapping
// in file JenaTextSearch.java
public static Dataset createCode()
{
// Base data
Dataset ds1 = DatasetFactory.create() ;
Model defaultModel = ModelFactory.createDefaultModel();
defaultModel.read("data.ttl", "N-TRIPLES");
ds1.setDefaultModel(defaultModel);
// Define the index mapping
EntityDefinition entDef = new EntityDefinition( "uri", "text", ResourceFactory.createProperty( App.EMAIL_URI_PREFIX, "content" ) );
Directory dir = null;
try {
dir = new SimpleFSDirectory(Paths.get("index")); // lucene index directory
}
catch( IOException e){
e.printStackTrace();
}
// Join together into a dataset
Dataset ds = TextDatasetFactory.createLucene( ds1, dir, new TextIndexConfig(entDef) ) ;
return ds ;
}
We define the index mapping according to the index we built. The index itself is maintained by the main app, see code in below.
1.java.util.Timer.schedule(TimerTask task, long delay):多长时间(毫秒)后执行任务
2.java.util.Timer.schedule(TimerTask task, Date time):设定某个时间执行任务
3.java.util.Timer.schedule(TimerTask task, long delay,longperiod
java.lang.UnsupportedClassVersionError: cn/support/cache/CacheType : Unsupported major.minor version 51.0 (unable to load class cn.support.cache.CacheType)
at org.apache.catalina.loader.WebappClassL
昨天发了一个提问,启动5个线程将一个List中的内容,然后将5个线程的内容拼接起来,由于时间比较急迫,自己就写了一个Demo,希望对菜鸟有参考意义。。
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CountDownLatch;
public c
select list.listname, list.createtime,listcount from dream_list as list , (select listid,count(listid) as listcount from dream_list_user group by listid order by count(
此文转自IBM.
Apache 服务简介
Web 服务器也称为 WWW 服务器或 HTTP 服务器 (HTTP Server),它是 Internet 上最常见也是使用最频繁的服务器之一,Web 服务器能够为用户提供网页浏览、论坛访问等等服务。
由于用户在通过 Web 浏览器访问信息资源的过程中,无须再关心一些技术性的细节,而且界面非常友好,因而 Web 在 Internet 上一推出就得到
1) I love you not because of who you are, but because of who I am when I am with you. 我爱你,不是因为你是一个怎样的人,而是因为我喜欢与你在一起时的感觉。 2) No man or woman is worth your tears, and the one who is, won‘t