5. 尽可能的使用RAM
原文 写道
在flush之前使用的RAM越多意味着segments越大, segments越大意味着以后需要合并的次数就越少。经 LUCENE-843 测试,发现对于内容集合来说,缓存设置为48MB时性能最好。不过,你的应用应该不是这个,呵呵.
下面,看看高人的翻译
6.关闭复合索引
Turn off compound file format.
Call setUseCompoundFile(false). Building the compound file format takes time during indexing (7-33% in testing for LUCENE-888). However, note that doing this will greatly increase the number of file descriptors used by indexing and by searching, so you could run out of file descriptors if mergeFactor is also large.
Re-use Document and Field instances As of Lucene 2.3 there are new setValue(...) methods that allow you to change the value of a Field. This allows you to re-use a single Field instance across many added documents, which can save substantial GC cost. It's best to create a single Document instance, then add multiple Field instances to it, but hold onto these Field instances and re-use them by changing their values for each added document. For example you might have an idField, bodyField, nameField, storedField1, etc. After the document is added, you then directly change the Field values (idField.setValue(...), etc), and then re-add your Document instance.
Note that you cannot re-use a single Field instance within a Document, and, you should not change a Field's value until the Document containing that Field has been added to the index. See Field for details.
writerFS = new IndexWriter(dirFS, new StandardAnalyzer(Version.LUCENE_30), true, MaxFieldLength.UNLIMITED); // Field f1 = new Field("f1", "", Store.YES, Index.ANALYZED); Field f2 = new Field("f2", "", Store.YES, Index.ANALYZED); for (int i = 0; i < 1000000; i++) { Document doc = new Document(); f1.setValue("f1 hello doc" + i); doc.add(f1); f2.setValue("f2 world doc" + i); doc.add(f2); writer.addDocument(doc); } // writer.commit(); writerFS.addIndexes(writer.getReader());