python流式数据处理_流式数据处理

1、直接登陆服务器:ssh 2014210***@thumedia.org -p 6349

创建streaming.py:touch streaming.py,并且如下编辑:

#! /usr/bin/python

import logging

import math

import time

pg2count={}

t=1

while 1:

fp=open('/tmp/hw3.log','r')

for line in fp:

line = line.strip()

times, page, count = line.split()[0],line.split()[1],line.split()[2]

if count.isdigit() & page.startswith('Page-'):

try:

pg2count[page] = [pg2count[page][0] + int(count),t]

except:

pg2count[page] = [int(count),t]

fp.close()

a=sorted(pg2count.items(), key=lambda page:page[1][0], reverse = True)

print '%s%s%s' % ('the page rank at current time ',times,' is:')

for i in range(0,10):

print '%s\t%d' % (a[i][0],a[

你可能感兴趣的:(python流式数据处理)