利用java API实现本地文件上传至hdfs

通过hadoop Java  API操作实现本地文件上传至hdfs:

在测试api接口之前,我们来看看hadoop的配置文件:


core-site.xml:

配置项:hadoop.tmp.dir表示命名节点上存放元数据的目录位置,对于数据节点则为该节点上存放文件数据的目录。 

配置项:fs.default.name表示命名IP地址和端口号,缺省值是file:///,对于JavaAPI来讲,连接HDFS必须使用这里的配置的URL地址,对于数据节点来讲,                  数据节点通过该URL来访问命名节点。



 
 
 
 
   fs.defaultFS
   hdfs://rack:8020
   The name for the cluster. HBase will use this to connect to HDFS
 

 
 
   io.compression.codecs
   
 org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.      SnappyCodec
 

 
 
   hadoop.tmp.dir
   /data/hadoop
   The name for the cluster. HBase will use this to connect to HDFS
 


----------------------------------

hdfs-site.xml:


 
 
 
   dfs.nameservices
   rack
 

 
 
   dfs.ha.namenodes.rack
   racknn1,racknn2
 

 
 
   dfs.namenode.rpc-address.rack.racknn1
   compute-51-00:8020
 

 
   dfs.namenode.rpc-address.rack.racknn2
   compute-52-06:8020
 

 
 
   dfs.namenode.http-address.rack.racknn1
   compute-51-00:50070
 

 
   dfs.namenode.http-address.rack.racknn2
   compute-52-06:50070
 

 
 
   dfs.namenode.shared.edits.dir
   qjournal://compute-51-00:8485;compute-52-06:8485;compute-52-08:8485/rack
 

 
 
   dfs.journalnode.edits.dir
   /data/hadoop/dfs/jn
 

 
 
   dfs.client.failover.proxy.provider.rack
   org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
 


---------------------------------------------------------------------------------------------------------------------------------

这个是一个简单的测试用例,将本地文件夹helloworld下所有文件上传至hdfs:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileStatus;

public static void main(String args[]) throws Exception

        {
Configuration conf = new Configuration();
FileSystem hdfs=FileSystem.get(conf);
Path src=new Path("D:\\C++\\helloworld");
Path dst =new Path("hdfs://rack:8020/");                       //hdfs根目录为 /      
                hdfs.copyFromLocalFile(src, dst);
        System.out.println("Upload to"+conf.get("fs.default.name"));
                FileStatus files[]=hdfs.listStatus(dst);
              for(FileStatus file:files){

                        System.out.println(file.getPath());

                           }

        }

在运行之前我们要将配置的hadoop/etc/hadoop下的配置文件:core-site.xml、hdfs-site.xml拷贝至工程环境的bin目录下。

运行结果:

java:

14/08/29 16:45:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/08/29 16:45:04 WARN hdfs.DomainSocketFactory: The short-circuit local reads feature is disabled because UNIX Domain sockets are not available on Windows.
hdfs://rack:8020/data
hdfs://rack:8020/hbase
hdfs://rack:8020/helloworld
hdfs://rack:8020/jobtracker
hdfs://rack:8020/sts

进入hadoop查看上传结果:

[guo@compute-51-00 bin]$ ./hadoop fs -ls /
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hadoop-2.0.0-cdh4.3.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop-2.0.0-cdh4.3.0/share/hadoop/mapreduce/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Found 5 items
drwxrwxrwx   - guo guo         0 2014-07-30 10:21 /data
drwxrwxrwx   - guo guo          0 2014-08-27 20:42 /hbase
drwxr-xr-x   - cys      guo          0 2014-08-29 16:44 /helloworld
drwxrwxrwx   - guo guo          0 2014-07-30 10:21 /jobtracker
drwxrwxrwx   - guo guo        0 2014-08-28 16:21 /sts



            

你可能感兴趣的:(Hadoop)