Aerospike (简称 AS )是一个分布式,可扩展的键值存储的 NoSQL 数据库 。T 级别大数据高并发的结构化 数据存储读写操作达微妙级, 99% 的响应可在 1 毫秒内实现采用混合架构,索引存储在内存中,而数据可存储在机械硬盘 (HDD) 或固态硬盘 (SSD) 上(也可存储在 内存)AS 内部在访问 SSD 屏蔽了文件系统层级,直接访问地址,保证了数据的读取速度。AS 同时支持二级索引与 Client 聚合,支持简单的 sql 操作( aql ),相比于其他 nosql 数据库,有一定优势。
个性化推荐厂告是建立在了和掌握消费者独特的偏好和习性的基础之上,对消费者的购买需求做出准确的预 或引导,在合适的位
置、合适的时间,以合适的形式向消费者呈现与其需求高度吻合的广告,以此来促进用户的消费行为。
用户行为日志收集系统收集日志之后推送到ETL做数据的清洗和转换
把ETL过后的数据发送到推荐引擎计算每个消费者的推荐结果,其中推荐逻辑包括规则和算法两部分,具体的规则有用户最近浏览、加入购物车、加入收藏等,算法则包括商品相似性、用户相似性、文本相似性、图片相似性等算法。
把推荐引擎的结果存入Aerospike集群中,并提供给广告投放引擎实时获取。
当用户浏览一个加入SSP(供应方平台)的站点时,SSP会把此次请求发送到AD EXCHANGE(广告交易平台),然后ADX会把这次请求发送给多家DSP,DSP(需求方平台)根据自身的DMP(数据管理平
台),通过对次用户的了解程度进行竞价,最终竞价胜出的DSP获得展现广告的机会。
DSP竞价(RTB:实时竞价)胜出的关键是DMP能够根据用户的历史浏览等数据分析和定位用户属性,其中实时竞价广告中非常重要的一个环节就是UserProfile(用户画像)。
分别通过HDFS和HBASE对日志进行离线和实时的分析,然后把用户画像的标签(tag : 程序猿、宅男...)结果存入高性能的Nosql数据库Aerospike中,同时把数据备份到异地数据中心。前端广告投放请求通过
决策引擎(投放引擎)向用户画像数据库中读取相应的用户画像数据,然后根据竞价算法出价进行竞价。竞价成功之后就可以展现广告了。而在竞价成功之后,具体给用户展现什么样的广告,就是有上面说的个性化推荐广告来完成的。
Aerospike是NoSQL的数据存储,Redis是缓存
Aerospike是多线程的,而 Redis 是单线程的
Redis 需要开发人员自己管理分片并提供分片算法用于在各分片之间平衡数据;
client: hash 一致性hash
codis :代理处理sharding
RedisCluster: hash槽 而 AerospikeDB 可以自动处理相当于分片的工作;
在 Redis 中,为了增加吞吐量,需要增加 Redis 分片的数量,并重构分片算法及重新平衡数据,这通常需要停机;
而在 AerospikeDB 中,可以动态增加数据卷和吞吐量,无需停机,并且 AerospikeDB 可以自动平衡数据和流量;
在 Redis 中,如果需要复制及故障转移功能,则需要开发人员自己在应用程序层同步数据;
而在 AerospikeDB 中,只需设置复制因子,然后由 AerospikeDB 完成同步复制操作,保持即时一致性;而且 AerospikeDB 可以透明地完成故障转移;
Redis是在内存中运行的 ,AerospikeDB在内存中存储索引,在HDD、SSD中保存数据,也可以在内存中。
Aerospike分为三个层次:
Client层:
Distribution层:
负责管理集群内部数据的平衡分布、备份、容错和不同集群之间的数据同步。主要包含三个模块:
用于追踪集群节点。关键算法是确定哪些节点是集群的一部分的Paxos-like一致投票过程。
Aerospike实现专门的心跳检测(主动与被动),用于监控节点间的连通性。
当有节点添加或移除时,该模块保证数据的重新分布,按照系统配置的复制因子确保每个数据块跨节点和跨数据中心复制。
确保读写的一致性与隔离性,写操作先写副本在写主库。该模块包括:
Sync/Async Replication(同步/异步复制):为保证写一致性,在提交数据之前向所有副本传播更新并将结果返回客户端。
Proxy (代理):集群重配置期间客户端可能出现短暂过期,透明代理请求到其他节点。
Duplicate Resolution(副本解析):当集群从活动分区恢复时,解决不同数据副本之间的冲突。
Data层:
负责数据的存储,Aerospike 属于弱语法的key-value数据库。数据存储模式如下:
wget https://www.aerospike.com/download/server/latest/artifact/el6
tar -zxvf aerospike-server-community-5.0.0.7-el6.tgz
mv aerospike-server-community-5.0.0.7-el6 aerospike-server
cd aerospike-server
./asinstall
[root@192 aerospike-server]# yum list installed | grep aerospike
aerospike-server-community.x86_64 5.0.0.7-1.el6 installed
aerospike-tools.x86_64 3.26.2-1.el6 installed
#卸载aerospike
[root@localhost ~]# rpm -e aerospike-server-community.x86_64
[root@localhost ~]# rpm -e aerospike-tools.x86_64
[root@localhost ~]# rm -rf /etc/aerospike/
systemctl start aerospike
systemctl stop aerospike
systemctl restart aerospike
systemctl status aerospike
asadm 进入管理端
Admin> info
Admin> i net
aql> show namespaces
+------------+
| namespaces |
+------------+
| "test" |
| "bar" |
Aerospike
|
Mysql |
namespace
|
database |
set | table |
bin | column |
record | row |
key pk kv | pk |
--主键 bins 插入可以不同
INSERT INTO [.] (PK, ) VALUES (, )
DELETE FROM [.] WHERE PK =
is the namespace for the record.
is the set name for the record.
is the record's primary key.
is a comma-separated list of bin names.
is comma-separated list of bin values
没有update
当insert 同一pk时,数据为修改
Examples:
INSERT INTO test.demo (PK, foo, bar) VALUES ('key1', 123, 'abc')
DELETE FROM test.demo WHERE PK = 'key1' insert into test.user1 (PK,name,age,sex,address) VALUES (2,'zhaoyun',21, 'M','beijing')
insert into test.user2(pk,name,sex,age) values(1,'zhangfei','M',23)
-- pk都是1 则是对原纪录的修改
insert into test.user2(pk,name,sex,age) values(1,'diaochan','F',18)
QUERY SELECT FROM [.]
SELECT FROM [.] WHERE =
SELECT FROM [.] WHERE BETWEEN AND
SELECT FROM [.] WHERE PK =
SELECT FROM [.] IN WHERE =
SELECT FROM [.] IN WHERE BETWEEN AND
is the namespace for the records to be queried.
is the set name for the record to be queried.
is the record's primary key.
is the name of a bin.
is the value of a bin.
is the type of a index user wants to query.
(LIST/MAPKEYS/MAPVALUES)
can be either a wildcard (*) or a comma-separated list of bin names.
is the lower bound for a numeric range query.
is the lower bound for a numeric range query.
Examples:
SELECT * FROM test.demo
SELECT * FROM test.demo WHERE PK = 'key1'
SELECT foo, bar FROM test.demo WHERE PK = 'key1'
SELECT foo, bar FROM test.demo WHERE foo = 123
SELECT foo, bar FROM test.demo WHERE foo BETWEEN 0 AND 999
select * from test.user2 where name='zhaoyun'
--没有建立索引 ,不能查询
Error: (201) AEROSPIKE_ERR_INDEX_NOT_FOUND
create index idx_1 on test.user2(name) string
select * from test.user2 where name='zhaoyun'
+-----------+-----+-----+-----------+
| name | sex | age | address |
+-----------+-----+-----+-----------+
| "zhaoyun" | "M" | 21 | "beijing" |
+-----------+-----+-----+-----------+
CREATE INDEX ON [.] () NUMERIC|STRING|GEO2DSPHERE
CREATE LIST/MAPKEYS/MAPVALUES INDEX ON [.] () NUMERIC|STRING|GEO2DSPHERE CREATE INDEX idx_foo ON test.demo (foo) NUMERIC
DROP INDEX test.demo idx_foo
com.aerospike
aerospike-client
4.4.9
//IP+port
AerospikeClient client=new AerospikeClient("192.168.127.128",3000);
//写策略
WritePolicy wp=new WritePolicy();
//超时时间 wp.setTimeout(1000);
/*
key
*/
Key k1=new Key("test","user1",1);
/*
bins
*/
// KV Bin b11=new Bin("name","zhangfei");
Bin b12 = new Bin("sex","M");
Bin b13 = new Bin("age",23);
//写值
client.put(wp,k1,b11,b12,b13);
//读值
Record r1 = client.get(wp,k1,"name","age","sex");
System.out.println(r1);
System.out.println("===================================");
Key k2=new Key("test","user1",2);
/*
bins
*/
// KV
Bin b21=new Bin("name","diaochan");
Bin b22=new Bin("sex","F");
Bin b23=new Bin("age",21);
//写值
client.put(wp,k2,b21,b22,b23);
/*取得指定key的数据 */
//批量执行策略
BatchPolicy bp=new BatchPolicy(wp);
//key的数组
Key[] ks={k1,k2};
//循环输出
for(Record r:client.get(bp,ks)){
System.out.println(r);
}
Aerospike集群管理
如上,一个4个节点的集群,每个节点存储1/4数据的主节点,同时也存储1/4数据的副本。如果节点1不可访问,节点1的副本将被拷贝到其他节点上。
复制因子( replication factor)是一个配置参数,不能超过集群节点数。副本越多可靠性越高。
作为必须经过所有数据副本的写请求也越高。实践中,大部分部署使用的数据因子为2(一份主数据和一个副本)。 同步复制保证即时一致性,没有数据丢失。在提交数据并返回结果给客户端之前,写事务被传播到所有副本。
主成功同时备成功后,客户端认为是成功
在集群重新配置期间,当Aerospike智能终端发送请求到那些短暂过时的错误节点时,Aerospike智能集群会透明的代理请求至正确的节点。
heartbeat {
mode multicast
multicast-group 239.1.139.1
port 3000
address 192.168.127.131
interval 150
timeout 10
}
heartbeat {
mode mesh
# add current node address here
address 192.168.127.131
port 3000
# add all cluster node address here
mesh-seed-address-port 192.168.127.131 3002
mesh-seed-address-port 192.168.127.128 3002
interval 150
timeout 10
}
vim /etc/aerospike/aerospike.conf
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
proto-fd-max 15000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
port 3000
access-address 192.168.127.128 3002
}
heartbeat {
mode mesh
address 192.168.127.128
port 3002
#all cluster
mesh-seed-address-port 192.168.127.128 3002
mesh-seed-address-port 192.168.127.131 3002
# To use unicast-mesh heartbeats, remove the 3 lines above, and see
# aerospike_mesh.conf for alternative.
interval 150
timeout 10
}
fabric {
address any
port 3001
}
info {
address any
port 3003
}
}
namespace test {
replication-factor 2
memory-size 256M
storage-engine memory
}
namespace bar {
replication-factor 2
memory-size 256M
storage-engine memory
}
service {
user root
group root
paxos-single-replica-limit 1
# Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
proto-fd-max 15000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
port 3000
access-address 192.168.127.131 3002
}
heartbeat {
mode mesh
address 192.168.127.131
port 3002
#all cluster
mesh-seed-address-port 192.168.127.131 3002
mesh-seed-address-port 192.168.127.128 3002
# To use unicast-mesh heartbeats, remove the 3 lines above, and see
# aerospike_mesh.conf for alternative.
interval 150
timeout 10
}
fabric {
address any
port 3001
}
info {
address any
port 3003
}
}
namespace test {
replication-factor 2
memory-size 256M
storage-engine memory
}
namespace bar {
replication-factor 2
memory-size 256M
storage-engine memory
}
Host[] hosts = new Host[]{
new Host("192.168.127.128", 3000),
new Host("192.168.127.131", 3000)
};
ClientPolicy policy = new ClientPolicy();
AerospikeClient client = new AerospikeClient(policy, hosts);
//写策略
WritePolicy wp = new WritePolicy();
//超时时间
wp.setTimeout(500);
Key key1 = new Key("test", "SUser", "11");
Bin bin11 = new Bin("name", "zhangfei-c");
Bin bin12 = new Bin("age", 25);
Bin bin13 = new Bin("sex", "M-c");
client.put(wp, key1, bin11, bin12, bin13);
Key key2 = new Key("test", "SUser", "22");
Bin bin21 = new Bin("name", "zhaoyun-c");
Bin bin22 = new Bin("age", 24);
Bin bin23 = new Bin("sex", "M-c");
client.put(wp, key2, bin21, bin22, bin23);
Record r1 = client.get(wp, key1, "name", "age", "sex");
System.out.println(r1);