之前在项目中,遇到这样一个问题。需要读取一个网页的内容,却发现只读取到了网页内容的一部分。
下面是代码:
public static void read1(String urlStr) {
URL url = null;
InputStream is = null;
InputStreamReader isr = null;
BufferedReader br = null;
StringBuffer sb = new StringBuffer();
try {
url = new URL(urlStr);
is = url.openStream();
isr = new InputStreamReader(is,"utf-8");
br = new BufferedReader(isr);
char[] c = new char[1024];
while(br.read(c) != -1) {
final String ss = new String(c);
sb.append(ss);
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}finally {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
System.out.println(sb.toString());
}
将获取的内容存储到本地文件中,发现,其占用空间大小为4096字节,这正好为系统一页的大小(不同系统的页的大小是不一样的,一般都是512字节的整数倍)。说明在请求服务端资源时,服务端只读了资源地一部分,那么如何才能源源不断地读取整个资源呢?目前,没有找到解决方案。
但是想到,用浏览器访问时,是可以访问到全部内容的。于是,我改成了org.apache.commons.httpclient.HttpClient,果然,获取到了全部内容。
public static String doGet(String url) {
String respstr = "";
try {
GetMethod getMethod = new GetMethod(url);
HttpClient httpclient = new HttpClient();
httpclient.executeMethod(getMethod );
InputStream inputStream = getMethod.getResponseBodyAsStream();
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
StringBuffer stringBuffer = new StringBuffer();
String str= "";
while((str = br.readLine()) != null){
stringBuffer .append(str );
}
respstr=stringBuffer.toString();
}catch (Exception e) {
e.printStackTrace();
throw new RuntimeException(e);
}
return respstr;
}