Hugging Face Course-Diving in 抱抱脸 Tokenizers library (WordPiece tokenization & Unigram tokenization)
WordPiecetokenizationwordpiece是bert的分词算法,跟BPE很像,但实际的标记化工作是不同的Trainingalgorithm⚠️Googleneveropen-sourceditsimplementationofthetrainingalgorithmofWordPiece