Vulkan 对于资源和内存是分开管理的,对于应用来说由更高的自由度管理内存,包括内存池管理、内存复用等,也带来额外的问题:
VkPhysicalDeviceLimits::maxMemoryAllocationCount
)通过 VkPhysicalDeviceMemoryProperties
可以获取当前 PhysicalDevice 的内存属性:
typedef struct VkPhysicalDeviceMemoryProperties {
uint32_t memoryTypeCount;
VkMemoryType memoryTypes[VK_MAX_MEMORY_TYPES];
uint32_t memoryHeapCount;
VkMemoryHeap memoryHeaps[VK_MAX_MEMORY_HEAPS];
} VkPhysicalDeviceMemoryProperties;
对于各个 MEMORY_PROPERTY_BIT :
vkMapMemory
map 并且 host 可见vkFlushMappedMemoryRanges
和 vkInvalidateMappedMemoryRanges
无扩展的情况下类型组合见下表,具体设备支持情况通过 VkPhysicalDeviceMemoryProperties::memoryTypes
数组内容返回。
DEVICE_LOCAL | HOST_VISIBLE | HOST_COHERENT | HOST_CACHED | LAZILY_ALLOCATED | PROTECTED | |
---|---|---|---|---|---|---|
0 | ||||||
1 | ✓ | ✓ | ||||
2 | ✓ | ✓ | ||||
3 | ✓ | ✓ | ✓ | |||
4 | ✓ | |||||
5 | ✓ | ✓ | ✓ | |||
6 | ✓ | ✓ | ✓ | |||
7 | ✓ | ✓ | ✓ | ✓ | ||
8 | ✓ | ✓ | ||||
9 | ✓ | |||||
10 | ✓ | ✓ |
spec 规定
同时返回值 memoryTypes
数组已排序,排序规则对于 X 和 Y 两个 MemoryType, X < Y 需要满足:
此条件保证应用单次遍历可以找到最优的符合条件的内存
Vulkan 内存申请需要以下步骤(不考虑 sparse resources),如下图中所示
vkGetImageMemoryRequirements
或者 vkGetBufferMemoryRequirements
获取 VkMemoryRequirements
VkMemoryRequirements
从 VkPhysicalDeviceMemoryProperties::memoryTypes
获取最佳 memoryTypeIndexvkAllocateMemory
申请内存搜索 memoryTypeIndex
可以借鉴以下函数:
int32_t FindProperties(const VkPhysicalDeviceMemoryProperties* properties, uint32_t memoryTypeBits, VkMemoryPropertyFlags requiredProperties)
{
const uint32_t memoryCount = properties->memoryTypeCount;
for (uint32_t i = 0; i < memoryCount; ++i) {
const bool isRequiredMemoryType = memoryTypeBits & (1 << i);
const bool hasRequiredProperties = (properties->memoryTypes[i].propertyFlags & requiredProperties) == requiredProperties;
if (isRequiredMemoryType && hasRequiredProperties)
return static_cast<int32_t>(i);
}
return -1;
}
memoryTypes
设备提供的了有限的组合,其中考虑用途常用组合为:
实际应用场景要复杂于上述组合,应用需要考虑 OOM,以及 fallback 场景,此处可以考虑 AMD Vulkan Memory Allocator,该 lib 将使用场景简化为了以下几种,并进行了内存池管理,后续补充对 vma 的分析。
需要考虑 Suballocation 原因:
VkPhysicalDeviceLimits::maxMemoryAllocationCount
限制,并且数量最低仅保证 4096 个。典型场景如 PerObject UBO 按照 Object 粒度申请,则很轻易地会耗尽,产生未定义行为。为此需要考虑预先申请 Memory Blocks,并通过 Suballocation 自行分配,Block 大小推荐 256M。此外内存对齐的几条建议规则:
VkPhysicalDeviceLimits::bufferImageGranularity
, VkMemoryRequirements::alignment
) 进行地址、大小对齐HOST_VISIBLE 内存可以通过 vkMapMemory
获得一个 host 虚拟地址指针,应用可以保留 mapped 指针,有两点优势:
例外场景:
AMD GPU && Windows < 10 平台,保留 DEVICE_LOCAL + HOST_VISIBLE 内存 mapped 指针,可能会导致内存迁移至系统内存
设备的通用内存要求支持Sub Allocation、Memory Aliasing 以及 Sparse Binding,而通用性可能会干扰特殊场景的优化。因此设备可能会提供专用内存,以在特定场景下有更好的访问性能。
Dedicated Allocation 需要开启 Device Extension VK_KHR_dedicated_allocation
,配合以下几个数据结构:
VkMemoryDedicatedRequirements
VkMemoryDedicatedAllocateInfo
VkMemoryDedicatedRequirements memDedicatedReq = {};
memDedicatedReq.sType = VK_STRUCTURE_TYPE_MEMORY_DEDICATED_REQUIREMENTS;
VkMemoryRequirements2 memoryReqs2 = {};
memoryReqs2.sType = VK_STRUCTURE_TYPE_MEMORY_REQUIREMENTS_2;
memoryReqs2.pNext = &memDedicatedReq;
vkGetImageMemoryRequirements2(vkDevice, &memoryReqsInfo, &memoryReqs2);
查询结果 VkMemoryDedicatedRequirements
,其中:
prefersDedicatedAllocation
为 TRUE 的情况下VkMemoryDedicatedAllocateInfo dedicatedInfo = {};
dedicatedInfo.sType = VK_STRUCTURE_TYPE_MEMORY_DEDICATED_ALLOCATE_INFO;
dedicatedInfo.image = image;
VkMemoryAllocateInfo memoryAllocateInfo = {};
memoryAllocateInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
memoryAllocateInfo.pNext = memDedicatedReq.prefersDedicatedAllocation ? &dedicatedInfo : nullptr;
memoryAllocateInfo.allocationSize = memoryReqs2.memoryRequirements.size;
memoryAllocateInfo.memoryTypeIndex = FindProperties(&phyMemProps, memoryReqs2.memoryRequirements.memoryTypeBits, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT);
VkDeviceMemory memory = VK_NULL_HANDLE;
vkAllocateMemory(vkDevice, &memoryAllocateInfo, nullptr, &memory);