基于IKAnalyzer lucener的中文分词-java版本

用到2个jar包,本别是 lucene-core 和 IKAnalyzer-lucene ,版本号一定要对应,见pox.xml的版本号
我这里用的maven仓库地址是: https://maven.aliyun.com/repository/central 和 https://maven.aliyun.com/repository/public
pox.xml里面的配置如下:

        
            com.jianggujin
            IKAnalyzer-lucene
            8.0.0
        
        
            org.apache.lucene
            lucene-core
            7.6.0
        


        
            Ali_central
            Alibaba central
            https://maven.aliyun.com/repository/central
        
        
            Ali_public
            Alibaba public
            https://maven.aliyun.com/repository/public
        
    

代码也比较简单

public class TestFenci {
    private static Logger logger = Logger.getLogger(TestFenci.class);

    @Test
    public void fenci() throws IOException {
        String text = "中国空间站将于今年完成在轨建造 扎实迈好每一步";

        //创建分词对象
        Analyzer anal = new IKAnalyzer(true);
        StringReader reader = new StringReader(text);
        //分词
        TokenStream ts = anal.tokenStream("", reader);
        ts.reset();
        CharTermAttribute term = ts.getAttribute(CharTermAttribute.class);

        //遍历分词数据
        while (ts.incrementToken()) {
            System.out.print(term.toString() + "|");
        }
        reader.close();
        System.out.println();
    }
}

结果如下:


分词.png

你可能感兴趣的:(基于IKAnalyzer lucener的中文分词-java版本)