mongodb 索引_mongodb索引深入了解索引

mongodb 索引

Getting a performance boost with the best usage of indexes, by understanding what’s the data structure, how it works’s/stored, how is it loaded into memory. How Query optimization make’s decision to select indexes.

通过了解什么是数据结构,如何工作/存储以及如何将其加载到内存中,以最佳利用索引来提高性能。 查询优化如何决定选择索引。

Basic understanding of indexes is required i.e what are indexes, index types, creating them. https://docs.mongodb.com/manual/indexes/

需要对索引有基本的了解,即什么是索引,索引类型,如何创建它们。 https://docs.mongodb.com/manual/indexes/

  • Data Structure

    数据结构
  • Storage on disk

    磁盘存储
  • Memory Allocation.

    内存分配。

数据结构 (Data Structure)

Index on a filed/fields is stored in order that we specify using B-Tree data structure. Stored in ordered Let see what does it mean’s and how it help’s.

字段/字段上的索引按我们使用B-Tree数据结构指定的顺序存储。 按顺序存储让我们看看它的含义以及它的帮助。

  • Index is created on the value of the filed referencing to the actual document stored.

    在引用实际存储的文档的字段的值上创建索引。
  • Using B-Tree indexes significantly reduces the number of comparison to find the document.

    使用B树索引显着减少了查找文档的比较次数。
MognoDB university MognoDB大学
  • Likewise in below picture we can see with index(sky blue line) even adding document still limit the number of document examined in comparison to without index/collscan.

    同样在下图中,我们可以看到带有索引(天蓝色线)的文档甚至与没有索引/ colscan相比,即使添加文档仍然限制了所检查文档的数量。
MognoDB university MognoDB大学

磁盘存储 (Storage On Disk)

Let’s see/visualize how the index are stored on disk. Index stored on disk is managed by the database storage engine itself.

让我们查看/可视化索引如何在磁盘上存储。 磁盘上存储的索引由数据库存储引擎本身管理。

  • Uses prefix Index compression- Repeated prefix value is not written, let us see example to understand what it mean’s.

    使用前缀索引压缩-不写重复的前缀值,让我们看示例以了解其含义。
db.getCollection("movieTicket")
.ensureIndex({"showDate":1, "seatNo":1, "status":1});

How the index ({“showDate”:1, “seatNo”:1, “status”:1}) is stored on disk.

索引({“ showDate ”:1,“ seatNo ”:1,“ status ”:1})如何存储在磁盘上。

showDate_1_seatNo_1_status_1 showDate_1_seatNo_1_status_1

How the index ({“seatNo”:1, “showDate”:1, “status”:1}) is stored on disk.

索引({“ seatNo ”:1,“ showDate ”:1,“ status ”:1})如何存储在磁盘上。

seatNo_1_showDate_1_status_1 seatNo_1_showDate_1_status_1

The compound index size may vary depending on the index field(carnality/selectivity) position.Index : “seatNo_1_showDate_1_status_1” will take more space than index : “showDate_1_seatNo_1_status_1”.

复合索引的大小可能会根据索引字段(carnality / selectivity)的位置而有所不同。Index :“ seatNo_1_showDate_1_status_1 ”将比索引“ showDate_1_seatNo_1_status_1 ”占用更多的空间。

Above example is to help understand how prefix compression works, not suggesting this to be criteria for index selection.

上面的示例旨在帮助理解前缀压缩的工作原理,而不是建议将其作为索引选择的标准。

  • Parallelized I/O operation’s by storing indexes in there separate sub-directory that can be linked to different disk.

    通过将索引存储在可以链接到不同磁盘的单独子目录中,可以实现并行I / O操作。

    Start mongd server with “ — wiredTigerDirectoryForIndexes” option.

    使用“ —wiredTigerDirectoryForIndexes”选项启动服务器。

    refer:

    参考:

    https://docs.mongodb.com/manual/reference/program/mongod/#cmdoption-mongod-wiredtigerdirectoryforindexes

    https://docs.mongodb.com/manual/reference/program/mongod/#cmdoption-mongod-wiredtigerdirectory for indexes

内存分配 (Memory Allocation)

Indexes perform best when it can fit in memory, but what if it does not and can we still optimize it by understanding how memory allocation for indexes work.

当索引适合内存时,索引的性能最佳,但是如果索引不适合该怎么办,我们仍然可以通过了解索引的内存分配方式来对其进行优化。

Memory here means the allocated memory to MongoDB, not the total RAM/memory on the machine.To adjust the size of the WiredTiger internal cache, see storage.wiredTiger.engineConfig.cacheSizeGB and --wiredTigerCacheSizeGB. Avoid increasing the WiredTiger internal cache size above its default value.

这里的内存是指分配给MongoDB的内存,而不是计算机上的总RAM /内存。 要调整WiredTiger内部缓存的大小,请参阅storage.wiredTiger.engineConfig.cacheSizeGB--wiredTigerCacheSizeGB 。 避免将WiredTiger内部缓存的大小增加到其默认值以上。

//To find Internal memory allocated to wiredTiger.cache
db.serverStatus().wiredTiger.cache ;
"maximum bytes configured" : 16586375168//Find the system memory
db.hostInfo();
{
"system" : {
..
"memSizeMB" : 32660,
"memLimitMB" : 32660,
"numCores" : 8,
....
},
...
}

The best case is where in the index fit’s completely in memory

最好的情况是索引完全适合内存中的位置

MognoDB university MognoDB大学

But in case if it doesn’t fit into memory.

但是以防万一它不适合内存。

MognoDB university MognoDB大学

Then indexed is traversed pages of it is flushed from memory and most recent one’s are loaded into memory.

然后,将遍历的索引页从内存中清除掉,并将最新的页面加载到内存中。

MognoDB university MognoDB大学

In a scenario where in the index that is getting used is continuously traversed i.e the index key is distributed all over the pages for the field value which the query is made.[index doesn’t fit completely into the memory].This will lead to a lot of page-in/page-out, utilizing Disk I/0 which will greatly impact the performance of the query/index. [refer below picture]

在不断遍历正在使用的索引的情况下,即索引关键字分布在整个页面中,用于进行查询的字段值。[索引未完全适合内存]。这将导致利用磁盘I / 0进行大量的页面输入/页面输出,这将极大地影响查询/索引的性能。 [参考下图]

Ex- for below query.

例如以下查询。

Use hint with query, which will force index use of your choice.

在查询中使用提示 ,这将强制使用您选择的索引。

db.movieTicket.find({"showDate":"", "seatNo": {$in: ['A1', 'A2', 'A3', 'A4']}}).hint({"showDate": 1, "seatNo": 1, "status": 1});

Single page will be read into the memory.

单页将被读入存储器。

db.movieTicket.find({"showDate":"", "seatNo": {$in: ['A1', 'A2', 'A3', 'A4']}}).hint({"seatNo": 1, "showDate": 1, "status": 1});

Multiple page’s will be read into the memory.

多页将被读入存储器。

To check no info related to indexed cached page’s read evicted.

不检查任何与索引的缓存页面的读取相关的信息。

// DB stats with indexDetails gives lot's of detailed information.
var stats = db.movieTicket.stats({indexDetails:true});// will just look for cache Information
stats.indexDetails.showDate_1_seatNo_1_status_1.cache;stats.indexDetails.seatNo_1_showDate_1_status_1.cache;
// OUTPUT - { "bytes currently in the cache" : 210148,
"bytes dirty in the cache cumulative" : 0,
"bytes read into cache" : 79139,
"bytes written from cache" : 0,
"checkpoint blocked page eviction" : 0,
"data source pages selected for eviction unable to be evicted" : 0,
"eviction walk passes of a file" : 0,
"eviction walk target pages histogram - 0-9" : 0,
"eviction walk target pages histogram - 10-31" : 0,
"eviction walk target pages histogram - 128 and higher" : 0,
"eviction walk target pages histogram - 32-63" : 0,
"eviction walk target pages histogram - 64-128" : 0,
"eviction walks abandoned" : 0,
"eviction walks gave up because they restarted their walk twice" : 0,
"eviction walks gave up because they saw too many pages and found no candidates" : 0,
"eviction walks gave up because they saw too many pages and found too few candidates" : 0,
"eviction walks reached end of tree" : 0,
"eviction walks started from root of tree" : 0,
"eviction walks started from saved location in tree" : 0,
"hazard pointer blocked page eviction" : 0,
"in-memory page passed criteria to be split" : 0,
"in-memory page splits" : 0,
"internal pages evicted" : 0,
"internal pages split during eviction" : 0,
"leaf pages split during eviction" : 0,
"modified pages evicted" : 0,
"overflow pages read into cache" : 0,
"page split during eviction deepened the tree" : 0,
"page written requiring cache overflow records" : 0, "pages read into cache" : 6,
"pages read into cache after truncate" : 0,
"pages read into cache after truncate in prepare state" : 0,
"pages read into cache requiring cache overflow entries" : 0,"pages requested from the cache" : 70,
"pages seen by eviction walk" : 0,
"pages written from cache" : 0,
"pages written requiring in-memory restoration" : 0,
"tracked dirty bytes in the cache" : 0,
"unmodified pages evicted" : 0
}

As more pages will be read into memory and evict/flush from memory that will require disk read which will lead to performance hit.

随着更多页面将被读取到内存中并从内存中逐出/清除,这将需要磁盘读取,这将导致性能下降。

  • https://university.mongodb.com/courses/M201/about

    https://university.mongodb.com/courses/M201/about

  • https://docs.mongodb.com/manual/indexes/

    https://docs.mongodb.com/manual/indexes/

By understanding how indexed is structured, stored on disk and loaded into memory to create better optimized Indexes, There are still many more deciding factor’s that come into picture when selecting index. Hope this will be helpful, do share your thought.

通过了解索引的结构,如何将其存储在磁盘上以及如何将其加载到内存中以创建更好的优化索引,选择索引时还有更多决定因素。 希望这会有所帮助,请分享您的想法。

Happy Indexing, Thank you :)

快乐索引,谢谢:)

翻译自: https://medium.com/swlh/mongodb-indexes-deep-dive-understanding-indexes-9bcec6ed7aa6

mongodb 索引

你可能感兴趣的:(mongodb,索引,mysql,es,leetcode)