JAVA抓取百度指数数据

在论坛看帖子看到一则抓取百度指数的需求,自己最近刚好看到httpclient和httpparser

思路:
1、查看百度指数页面,找出页面的编码方式。
2、浏览器提交一些测试数据,并观察浏览器地址栏的变化。
3、httpclient测试是否可以读取该页面的数据内容
4、拼凑百度地址栏的数据信息

PS:比较简单 就没有多余的注释信息
代码如下:


package com.lch.hibaidu;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URLEncoder;

import org.apache.http.HttpEntity;
import org.apache.http.HttpHost;
import org.apache.http.HttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;

public class GetIndex {

	public static void main(String[] args) throws Exception {
		
		String QueryString = "3Q";
		String URLQueryString = URLEncoder.encode(QueryString);

		DefaultHttpClient httpClient = new DefaultHttpClient();
		HttpHost targetHost = new HttpHost("index.baidu.com");

		HttpGet httpGet = new HttpGet("/main/word.php?word="+URLQueryString);
		System.out.println("目标: " + targetHost);
		System.out.println("请求: " + httpGet.getRequestLine());

		HttpResponse response = httpClient.execute(targetHost, httpGet);
		HttpEntity entity = response.getEntity();
		System.out.println("---------------------------------");
		System.out.println(response.getStatusLine());

		if (entity != null) {
			System.out.println("Response content length : "
					+ entity.getContentLength());
		}

		BufferedReader buReader = new BufferedReader(new InputStreamReader(
				entity.getContent(), "gb2312"));
		String line = null;
		while ((line = buReader.readLine()) != null) {
			System.out.println(line);
		}
		if (entity != null) {
			entity.consumeContent();
		}
	}
}


得到数据结果信息后,可以通过httpparser进行分析,这个就不多说了!

你可能感兴趣的:(java,apache,浏览器,百度,Flash)