kafka-storm-笔记

kafka & storm @api2:

follow the tutorial: http://kafka.apache.org/quickstart

$ wget http://mirrors.tuna.tsinghua.edu.cn/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgz

$ tar xvfz kafka_2.11-1.0.0.tgz

$ cd kafka_2.11-1.0.0

start zookeeper:

$ bin/zookeeper-server-start.sh config/zookeeper.properties

open another window, start kafka server:

$ bin/kafka-server-start.sh config/server.properties

open window #3 (recommend to use tmux), create topic "test":

$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

Created topic "test".

check:

$ bin/kafka-topics.sh --list --zookeeper localhost:2181

test

send message:

$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

consumer:

$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

storm

http://storm.apache.org/releases/current/Setting-up-development-environment.html

装0.9.2

$ wget http://mirrors.tuna.tsinghua.edu.cn/apache/storm/apache-storm-0.9.7/apache-storm-0.9.7.tar.gz

$ tar xvfz ...

现在有storm client了

pyleus

https://www.jianshu.com/p/224995b66a84

$ pip install pyleus

$ git clone https://github.com/Yelp/pyleus.git

让pyleus能够找到storm client

$ export PATH=$PATH:/home/evan/storm/apache-storm-0.9.7/bin

$ cd examples/word_count

$ pyleus build pyleus_topology.yaml

得到word_count.jar

$ pyleus local word_count.jar

本地启动topology

复制一份pyleus_kafka_topology.yaml:

# An ultra-simple topology which shows off Storm and the pyleus.storm library
name: word_count
topology:
    - spout:
        name: kafka-test
        type: kafka
        options:
          topic: test
          zk_hosts: localhost:2181
          zk_root: /pyleus-kafka-offsets/word_count
          consumer_id: pyleus-word_count
          from_start: true
    - bolt:
        name: split-words
        module: word_count.split_words
        parallelism_hint: 3
        groupings:
            - shuffle_grouping: kafka-test
    - bolt:
        name: count-words
        module: word_count.count_words
        parallelism_hint: 3
        groupings:
            - fields_grouping:
                component: split-words
                fields:
                    - word
    - bolt:
        name: log-results
        module: word_count.log_results
        groupings:
            - global_grouping: count-words

$ pyleus local word_count.jar

本地启动topology。与kafka连接上了。

$ tail -f /tmp/word_count_results.log

观察log

~/kafka$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

输入一些文本,观察log的输出。

It works.

Test

然后使用python向kafka输出:

$ sudo pip install kafka-python

创建test.py如下:

from kafka import KafkaProducer
import json
import os
import time
from sys import argv

producer = KafkaProducer(bootstrap_servers='127.0.0.1:9092')

def log(str):
    t = time.strftime(r"%Y-%m-%d_%H-%M-%S",time.localtime())
    print("[%s]%s"%(t,str))

def list_file(path):
    dir_list = os.listdir(path);
    for f in dir_list:
         producer.send('test',f)  #topic name
         producer.flush()
         log('send: %s' % (f))

list_file(argv[1])
producer.close()
log('done')

$ tail -f /tmp/word_count_results.log

观察log

$ python test.py /bin

发送/bin下的文件名到kafka

观察log输出

All work well.

QY 20180129

你可能感兴趣的:(kafka-storm-笔记)