MySQL创建自定义哈希索引
如果存储引擎不支持哈希索引,则可以模拟像Memory存储引擎一样创建哈希索引,这样可以享受哈希索引的便利,例如只需要很小的索引就可以为超长的键创建索引。
思路很简单:在B-Tree基础上创建一个伪哈希索引。这和真正的哈希索引不是一回事。因为还是使用真正的哈希索引进行查找,但是它使用哈希值而不键本身进行索引查找。你需要做的就是在查询的where子句中手动指定使用哈希函数。
下面我们来看一个例子:
mysql> CREATE TABLE pseudohash( id int unsigned not null auto_increment, url varchar(255) not null, url_crc int unsigned not null default 0, primary key(id) ); Query OK, 0 rows affected mysql> describe pseudohash; +---------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +---------+------------------+------+-----+---------+----------------+ | id | int(10) unsigned | NO | PRI | NULL | auto_increment | | url | varchar(255) | NO | | NULL | | | url_crc | int(10) unsigned | NO | | 0 | | +---------+------------------+------+-----+---------+----------------+ 3 rows in set
我们可能会存储大量的url,并需要根据url进行搜索查找。如果使用B-Tree来存储URL,存储的内容就会很大,因为URL本身都很长。正常情况下,我们会在url列上建立索引,使用如下的方式来查找特定的url。
创建url列上的索引
mysql> create unique index url_index on pseudohash (url); Query OK, 0 rows affected Records: 0 Duplicates: 0 Warnings: 0 mysql> show index from pseudohash; +------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | pseudohash | 0 | PRIMARY | 1 | id | A | 1 | NULL | NULL | | BTREE | | | | pseudohash | 0 | url_index | 1 | url | A | 1 | NULL | NULL | | BTREE | | | +------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 2 rows in set
插入一条数据
mysql> insert into pseudohash (url) values('www.cnivi.com'); Query OK, 1 row affected mysql> select * from pseudohash; +----+---------------+------------+ | id | url | url_crc | +----+---------------+------------+ | 3 | www.cnivi.com | 0 | +----+---------------+------------+ 2 rows in set
你可能会使用这种方式来查询
mysql> select url from pseudohash where url="www.cnivi.com"; +---------------+ | url | +---------------+ | www.cnivi.com | +---------------+ 1 row in set
这种方式可见的坏处就是url本身可能很长,使用B-Tree存储的内容太大。
解决方法就是删除掉url列上的索引,而新增一个被索引的url_crc列
如下面操作:
mysql> show index from pseudohash; +------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | pseudohash | 0 | PRIMARY | 1 | id | A | 2 | NULL | NULL | | BTREE | | | | pseudohash | 0 | url_index | 1 | url | A | 2 | NULL | NULL | | BTREE | | | +------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 2 rows in set mysql> drop index url_index on pseudohash; Query OK, 0 rows affected Records: 0 Duplicates: 0 Warnings: 0 mysql> show index from pseudohash; +------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | pseudohash | 0 | PRIMARY | 1 | id | A | 1 | NULL | NULL | | BTREE | | | +------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 1 row in set mysql> create index url_crc_index on pseudohash (url_crc); Query OK, 0 rows affected Records: 0 Duplicates: 0 Warnings: 0 mysql> show index from pseudohash; +------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | pseudohash | 0 | PRIMARY | 1 | id | A | 1 | NULL | NULL | | BTREE | | | | pseudohash | 1 | url_crc_index | 1 | url_crc | A | 1 | NULL | NULL | | BTREE | | | +------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 2 rows in set mysql>
使用CRC32做哈希,就可以使用下面这中方式查询:
mysql> insert into pseudohash (url,url_crc) values('www.ceit.com/adasds/sdfsafwe/sadfsadfew/weqfvscxz',crc32('www.ceit.com/adasds/sdfsafwe/sadfsadfew/weqfvscxz')); Query OK, 1 row affected mysql> select * from pseudohash; +----+---------------------------------------------------+------------+ | id | url | url_crc | +----+---------------------------------------------------+------------+ | 1 | www.ceit.com | 3722524106 | | 5 | www.ceit.com/adasds/sdfsafwe/sadfsadfew/weqfvscxz | 2788083859 | +----+---------------------------------------------------+------------+ 2 rows in set mysql> select * from pseudohash where url = 'www.ceit.com/adasds/sdfsafwe/sadfsadfew/weqfvscxz' and url_crc = crc32('www.ceit.com/adasds/sdfsafwe/sadfsadfew/weqfvscxz'); +----+---------------------------------------------------+------------+ | id | url | url_crc | +----+---------------------------------------------------+------------+ | 5 | www.ceit.com/adasds/sdfsafwe/sadfsadfew/weqfvscxz | 2788083859 | +----+---------------------------------------------------+------------+ 1 row in set mysql>
where查询条件
where url = 'www.ceit.com/adasds/sdfsafwe/sadfsadfew/weqfvscxz' and url_crc = crc32('www.ceit.com/adasds/sdfsafwe/sadfsadfew/weqfvscxz');
这样做的性能就会很高,因为mysql优化器会使用这个选择性很高而体积很小的基于url_crc列的索引来完成查询。即使有很多个记录有相同的索引值,查找仍然很快,只需要根据哈希值做快速的整数比较就能找到索引条目,然后一一比较返回对应的行。对完整的URL字符串做索引,那就会非常慢。。。。
处理哈希冲突的情况:要避免冲突问题,必须在where条件中带入哈希值和对应列值。
如果采用这种方式,记住不要使用SHA1()和MD5()作为哈希函数。因为这两个哈希函数计算出来的哈希值是非常长的字符串,会浪费大量空间,比较时也会更慢。
创建触发器,维护哈希值
mysql> delimiter // mysql> CREATE TRIGGER pseudohash_crc_insert BEFORE INSERT ON pseudohash FOR EACH ROW BEGIN SET NEW.url_crc = crc32(NEW.url); END; // Query OK, 0 rows affected mysql> CREATE TRIGGER pseudohash_crc_update BEFORE UPDATE ON pseudohash FOR EACH ROW BEGIN SET NEW.url_crc = crc32(NEW.url); END;// Query OK, 0 rows affected mysql> delimiter ; mysql>
查看触发器
mysql> SELECT * FROM information_schema.triggers WHERE TRIGGER_NAME='pseudohash_crc_insert' \G *************************** 1. row *************************** TRIGGER_CATALOG: def TRIGGER_SCHEMA: test TRIGGER_NAME: pseudohash_crc_insert EVENT_MANIPULATION: INSERT EVENT_OBJECT_CATALOG: def EVENT_OBJECT_SCHEMA: test EVENT_OBJECT_TABLE: pseudohash ACTION_ORDER: 0 ACTION_CONDITION: NULL ACTION_STATEMENT: BEGIN SET NEW.url_crc = crc32(NEW.url); END ACTION_ORIENTATION: ROW ACTION_TIMING: BEFORE ACTION_REFERENCE_OLD_TABLE: NULL ACTION_REFERENCE_NEW_TABLE: NULL ACTION_REFERENCE_OLD_ROW: OLD ACTION_REFERENCE_NEW_ROW: NEW CREATED: NULL SQL_MODE: NO_ENGINE_SUBSTITUTION DEFINER: root@localhost CHARACTER_SET_CLIENT: utf8 COLLATION_CONNECTION: utf8_general_ci DATABASE_COLLATION: latin1_swedish_ci 1 row in set (0.09 sec) mysql>
测试触发器
mysql> insert into pseudohash (url) values('www.12345.com'); Query OK, 1 row affected mysql> select * from pseudohash where id = 6; +----+---------------+------------+ | id | url | url_crc | +----+---------------+------------+ | 6 | www.12345.com | 1421191900 | +----+---------------+------------+ 1 row in set mysql> update pseudohash set url = 'www.111111.com' where id = 6; Query OK, 1 row affected Rows matched: 1 Changed: 1 Warnings: 0 mysql> select * from pseudohash where id = 6; +----+----------------+------------+ | id | url | url_crc | +----+----------------+------------+ | 6 | www.111111.com | 1825805409 | +----+----------------+------------+ 1 row in set mysql>
====END====