kafka & storm @api2:
follow the tutorial: http://kafka.apache.org/quickstart
$ wget http://mirrors.tuna.tsinghua.edu.cn/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgz
$ tar xvfz kafka_2.11-1.0.0.tgz
$ cd kafka_2.11-1.0.0
start zookeeper:
$ bin/zookeeper-server-start.sh config/zookeeper.properties
open another window, start kafka server:
$ bin/kafka-server-start.sh config/server.properties
open window #3 (recommend to use tmux), create topic "test":
$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Created topic "test".
check:
$ bin/kafka-topics.sh --list --zookeeper localhost:2181
test
send message:
$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
consumer:
$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
storm
http://storm.apache.org/releases/current/Setting-up-development-environment.html
装0.9.2
$ wget http://mirrors.tuna.tsinghua.edu.cn/apache/storm/apache-storm-0.9.7/apache-storm-0.9.7.tar.gz
$ tar xvfz ...
现在有storm client了
pyleus
https://www.jianshu.com/p/224995b66a84
$ pip install pyleus
$ git clone https://github.com/Yelp/pyleus.git
让pyleus能够找到storm client
$ export PATH=$PATH:/home/evan/storm/apache-storm-0.9.7/bin
$ cd examples/word_count
$ pyleus build pyleus_topology.yaml
得到word_count.jar
$ pyleus local word_count.jar
本地启动topology
复制一份pyleus_kafka_topology.yaml:
# An ultra-simple topology which shows off Storm and the pyleus.storm library
name: word_count
topology:
- spout:
name: kafka-test
type: kafka
options:
topic: test
zk_hosts: localhost:2181
zk_root: /pyleus-kafka-offsets/word_count
consumer_id: pyleus-word_count
from_start: true
- bolt:
name: split-words
module: word_count.split_words
parallelism_hint: 3
groupings:
- shuffle_grouping: kafka-test
- bolt:
name: count-words
module: word_count.count_words
parallelism_hint: 3
groupings:
- fields_grouping:
component: split-words
fields:
- word
- bolt:
name: log-results
module: word_count.log_results
groupings:
- global_grouping: count-words
$ pyleus local word_count.jar
本地启动topology。与kafka连接上了。
$ tail -f /tmp/word_count_results.log
观察log
~/kafka$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
输入一些文本,观察log的输出。
It works.
Test
然后使用python向kafka输出:
$ sudo pip install kafka-python
创建test.py如下:
from kafka import KafkaProducer
import json
import os
import time
from sys import argv
producer = KafkaProducer(bootstrap_servers='127.0.0.1:9092')
def log(str):
t = time.strftime(r"%Y-%m-%d_%H-%M-%S",time.localtime())
print("[%s]%s"%(t,str))
def list_file(path):
dir_list = os.listdir(path);
for f in dir_list:
producer.send('test',f) #topic name
producer.flush()
log('send: %s' % (f))
list_file(argv[1])
producer.close()
log('done')
$ tail -f /tmp/word_count_results.log
观察log
$ python test.py /bin
发送/bin下的文件名到kafka
观察log输出
All work well.
QY 20180129