2019.12.13 note

2019.12.13 note

Fast Transformer Decoding: One Write-Head is All You Need

2019.12.13 note_第1张图片

2019.12.13 note_第2张图片

2019.12.13 note_第3张图片

They have proposed an alternative to multi-head attention - multi-query attention, with much lower memory bandwidth requirements in the incremental setting. They believe that this enables wider adoption of attention-based sequence models in inference-performance-critical applications.

Single Headed Attention RNN: Stop Thinking With Your Head

2019.12.13 note_第4张图片

2019.12.13 note_第5张图片

When NAS Meets Robustness: In Search of Robust Architectures against Adversarial Attacks

In this work, they take an architectural perspective and investigate the patterns of network architectures that are resilient to adversarial attacks. To obtain the large number of networks needed for this study, they adopt one-shot neural architecture search, training a large network for once and then finetuning the sub-networks sampled therefrom. Their “robust architecture Odyssey” reveals several valuable observations: 1) densely connected patterns result in improved robustness; 2) under computational budget, adding convolution operations to direct connection edge is effective; 3) flow of solution procedure (FSP) matrix is a good indicator of network robustness. Based on these observations, they discover a family of robust architectures (RobNets).

2019.12.13 note_第6张图片

你可能感兴趣的:(Paper,Reading)