[Arxiv 2023] SpecInfer:Accelerating LLM Serving with Speculative Inference + Token Tree Verification
ContentsIntroductionMethodSpeculativeInferenceCollectiveBoost-TuningLearning-basedSpeculativeSchedulerTokenTreeVerifierTreeAttentionVerificationOptimizationsEvaluationReferencesIntroductionLLMs的高内存和算力