通过hadoop Java API操作实现本地文件上传至hdfs:
在测试api接口之前,我们来看看hadoop的配置文件:
core-site.xml:
配置项:hadoop.tmp.dir表示命名节点上存放元数据的目录位置,对于数据节点则为该节点上存放文件数据的目录。
配置项:fs.default.name表示命名的IP地址和端口号,缺省值是file:///,对于JavaAPI来讲,连接HDFS必须使用这里的配置的URL地址,对于数据节点来讲, 数据节点通过该URL来访问命名节点。
hdfs-site.xml:
---------------------------------------------------------------------------------------------------------------------------------
这个是一个简单的测试用例,将本地文件夹helloworld下所有文件上传至hdfs:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileStatus;
public static void main(String args[]) throws Exception
{System.out.println(file.getPath());
}
}
在运行之前我们要将配置的hadoop/etc/hadoop下的配置文件:core-site.xml、hdfs-site.xml拷贝至工程环境的bin目录下。
运行结果:
java:
14/08/29 16:45:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/08/29 16:45:04 WARN hdfs.DomainSocketFactory: The short-circuit local reads feature is disabled because UNIX Domain sockets are not available on Windows.
hdfs://rack:8020/data
hdfs://rack:8020/hbase
hdfs://rack:8020/helloworld
hdfs://rack:8020/jobtracker
hdfs://rack:8020/sts
进入hadoop查看上传结果:
[guo@compute-51-00 bin]$ ./hadoop fs -ls /
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hadoop-2.0.0-cdh4.3.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop-2.0.0-cdh4.3.0/share/hadoop/mapreduce/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Found 5 items
drwxrwxrwx - guo guo 0 2014-07-30 10:21 /data
drwxrwxrwx - guo guo 0 2014-08-27 20:42 /hbase
drwxr-xr-x - cys guo 0 2014-08-29 16:44 /helloworld
drwxrwxrwx - guo guo 0 2014-07-30 10:21 /jobtracker
drwxrwxrwx - guo guo 0 2014-08-28 16:21 /sts