利用python elasticsearch模块将json数据导入es

系统为centos6.9 mini,es集群为一个三节点的测试集群,一个主节点,二个数据节点,主节点的ip 为 192.168.104.71

1、安装setuptools,安装 pip
    cd /opt
    wget https://bootstrap.pypa.io/ez_setup.py 
    python ez_setup.py 
    cd /opt
  wget "https://pypi.python.org/packages/source/p/pip/pip-1.5.4.tar.gz#md5=834b2904f92d46aaa333267fb1c922bb" --no-check-certificate
    tar -zxvf pip-1.5.4.tar.gz 
    cd pip-1.5.4
    python setup.py install
2、安装 python elasticsearch 模块
     pip install elasticsearch
3、代码如下,命名为   json2es.python
#!/usr/bin/python
# -*- coding: UTF-8 -*-
 
from itertools import islice
import json , sys
from elasticsearch import Elasticsearch , helpers
import threading
 
_index = 'packets-2018-07-30'   #修改为索引名
_type = 'pcap_file'     #修改为类型名
es_url = 'http://192.168.104.71:9200/'  #修改为elasticsearch服务器
 
reload(sys)
sys.setdefaultencoding('utf-8')  
es = Elasticsearch(es_url)
#es.indices.create(index='webinfo', ignore=400,body = mapping)
es.indices.create(index=_index, ignore=400)
chunk_len = 10
num = 0
 
def bulk_es(chunk_data):
    bulks=[]
    try:
        for i in xrange(chunk_len):
            bulks.append({
                    "_index": _index,
                    "_type": _type,
                    "_source": chunk_data[i]
                })
        helpers.bulk(es, bulks)
    except:
        pass
 
with open(sys.argv[1]) as f:
    while True:
        lines = list(islice(f, chunk_len))
        num =num +chunk_len
        sys.stdout.write('\r' + 'num:'+'%d' % num)
        sys.stdout.flush()
        bulk_es(lines)
        if not lines:
            print "\n"
            print "task has finished"
            break
运行  python json2es.py test3.json

你可能感兴趣的:(分布式系统)