用于公司研究记录
#prepare language stuff
#build a large lexicon that invovles words in both the training and decoding.
(
echo "make word graph ..."
cd $H; mkdir -p data/{dict,lang,graph} && \
cp $thchs/resource/dict/{extra_questions.txt,nonsilence_phones.txt,optional_silence.txt,silence_phones.txt} data/dict && \
cat $thchs/resource/dict/lexicon.txt $thchs/data_thchs30/lm_word/lexicon.txt | \
grep -v '' | grep -v '' | sort -u > data/dict/lexicon.txt || exit 1;
utils/prepare_lang.sh --position_dependent_phones false data/dict "
gzip -c $thchs/data_thchs30/lm_word/word.3gram.lm > data/graph/word.3gram.lm.gz || exit 1;
utils/format_lm.sh data/lang data/graph/word.3gram.lm.gz $thchs/data_thchs30/lm_word/lexicon.txt data/graph/lang || exit 1;
)
这段的主题是:prepare language stuff
创建目录 data/ dict ,lang ,graph
关注cp 以及 gzip语句
得知是移动 extra_questions.txt,nonsilence_phones.txt,optional_silence.txt,silence_phones.txt
最后是移动 lm_word/lexicon.txt
#make_phone_graph
(
echo "make phone graph ..."
cd $H; mkdir -p data/{dict_phone,graph_phone,lang_phone} && \
cp $thchs/resource/dict/{extra_questions.txt,nonsilence_phones.txt,optional_silence.txt,silence_phones.txt} data/dict_phone && \
cat $thchs/data_thchs30/lm_phone/lexicon.txt | grep -v '
echo "
utils/prepare_lang.sh --position_dependent_phones false data/dict_phone "
gzip -c $thchs/data_thchs30/lm_phone/phone.3gram.lm > data/graph_phone/phone.3gram.lm.gz || exit 1;
utils/format_lm.sh data/lang_phone data/graph_phone/phone.3gram.lm.gz $thchs/data_thchs30/lm_phone/lexicon.txt \
data/graph_phone/lang || exit 1;
)
这段的主题是 make_phone_graph其实要与上面的主题结合一起看.
创建目录:data/dict_phone, phone,lang_phone
关注cp,以及gzip语句
得知是移动/lm_phone/lexicon.txt
这是执行完make phone_graph 的一些东西.
关注点,倒数第三行 data/graph_phone/lang/G.fst
好了,大量的词汇 会产生.fst .fst就是(openfst)
这个fst可能就是一个模型了.
上图是执行完 monophone 后的结果示意图
就是 data/graph/lang/tmp/LG.fst
data/lang exp/mono/log/
tree-info 都在 exp/mono/tree 无法用vim打开