[Hadoop培训笔记]05-HDFS详细分析(二)

注:开源力量Hadoop Development网络培训个人笔记,培训链接:http://new.osforce.cn/course/52


回顾:

1)HDFS读过程:DistributedFileSystem --> FSDataInputStream --> DFSClient.open(RPC通信机制) --> namenode.open
      HDFS写过程:DistributedFileSystem --> FSDataOutputStream --> DFSClient.create(RPC通信机制) --> namenode.create

2)SecondaryNamenode的作用与机制:SNN不是完全意义上的NN的一个备份,是拉取FSimage和edits文件在SNN的内存中进行合并
      配置:fs.checkopint.period, fs.checkpoint.size, fs.checkpoint.dir
      checkpoint node
      backup node(完全意义上的NN备份)

3)一旦丢失NN或者元数据信息丢失,我们可以通过从SNN的checkpoint目录恢复我们的元数据信息。
      hadoop namenode -importCheckpoint
      hadoop-daemon.sh start namenode

4)机架感知:默认情况下所有DN认为是处于同一个机架,不管是否物理上是否属于同一个机架。
      /default-rack
      topology.script.file.name 属性值是一个脚本,这个脚本里面记录的是真正意义上的网络拓扑结构。
      /d1/rack1/h1


HDFS API使用:http://hadoop.apache.org/docs/current1/api/ (Hadoop 1.2.1)

主要用代码讲解了其中一些API的使用。以下是JUnit测试代码,不详细解释了。

效果可以用如下(类似)命令查看:

./hadoop fs -lsr /
./hadoop fs -cat /test/a.txt

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.BlockLocation;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;
import org.junit.Test;

import junit.framework.TestCase;


public class TestHDFS extends TestCase {
	
	public static String hdfsUrl = "hdfs://192.168.56.101:9100";
	//create HDFS folder
	@Test
	public void testHDFSMkdir() throws IOException{
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(hdfsUrl), conf);
		Path path = new Path("/test");
		fs.mkdirs(path);
	}
	
	//create a file
	@Test
	public void testCreateFile() throws IOException{
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(hdfsUrl), conf);
		Path path = new Path("/test/a.txt");
		FSDataOutputStream out = fs.create(path);
		out.write("hello hadoop".getBytes());
	}
	
	//rename a file
	@Test
	public void testRenameFile() throws IOException{
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(hdfsUrl), conf);
		Path path = new Path("/test/a.txt");
		Path newpath = new Path("/test/b.txt");
		System.out.println(fs.rename(path, newpath));
	}
	
	//upload a local file to HDFS
	@Test
	public void testUploadFile1() throws IOException{
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(hdfsUrl), conf);
		Path src = new Path("/home/xwchen/hadoop/hadoop-1.2.1/bin/rcc");
		Path dst = new Path("/test");
		fs.copyFromLocalFile(src, dst);
	}
	
	@Test
	public void testUploadFile2() throws IOException{
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(hdfsUrl), conf);
		InputStream in = new BufferedInputStream(new FileInputStream(new File("/home/xwchen/hadoop/hadoop-1.2.1/bin/rcc")));
		FSDataOutputStream out = fs.create(new Path("/test/rcc1"));
		IOUtils.copyBytes(in, out, 4096);
	}
	
	@Test
	public void testUploadFile3() throws IOException{
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(hdfsUrl), conf);
		InputStream in = new BufferedInputStream(new FileInputStream(new File("/home/xwchen/hadoop/hadoop-1.2.1/bin/rcc")));
		FSDataOutputStream out = fs.create(new Path("/test/rcc2"), new Progressable(){

			@Override
			public void progress() {
				System.out.println(".");
				
			}});
		IOUtils.copyBytes(in, out, 4096);
	}
	
	
	@Test //dd if=/dev/zero of=data bs=1024 count=1024
	public void testUploadFile4() throws IOException{
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(hdfsUrl), conf);
		InputStream in = new BufferedInputStream(new FileInputStream(new File("/home/xwchen/hadoop/hadoop-1.2.1/bin/data")));
		FSDataOutputStream out = fs.create(new Path("/test/data"), new Progressable(){

			@Override
			public void progress() {
				System.out.println(".");
				
			}});
		IOUtils.copyBytes(in, out, 4096);
	}
	
	//list files under folder
	@Test
	public void testListFiles() throws IOException{
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(hdfsUrl), conf);
		Path dst = new Path("/test");
		FileStatus[] files = fs.listStatus(dst);
		for(FileStatus file: files){
			System.out.println(file.getPath().toString());
		}
	}
	
	//list block info of file
	@Test
	public void testGetBlockInfo() throws IOException{
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(hdfsUrl), conf);
		Path dst = new Path("/test/data");
		FileStatus fileStatus = fs.getFileStatus(dst);
		BlockLocation[] blkLoc = fs.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
		for(BlockLocation loc: blkLoc){
			//System.out.println(loc.getHosts());
			for (int i=0; i<loc.getHosts().length; i++)
				System.out.println(loc.getHosts()[i]);
		}
	}
}


测验相关:

  • FileSystem类是一个抽象类。
  • Client端与Namenode之间的RPC通信协议是ClientProtocol。
  • 传统的Hadoop集群节点拓扑结构包括机架和主机层,但虚拟化在此基础上还需要知道hypervisor这一层。HVE(Hadoop Virtualization Extensions)功能使得Hadoop集群能够感知机架-物理主机-Hadoop节点三层架构,并且根据相应算法使运行于同一台物理主机上的存储节点和计算节点之间的通信方式满足数据本地化的要求。
  • DistributedFileSystem调用create方法后的返回类型是FSDataOutputStream
  • Java中对文件读取速度最快的数据结构是FileChannel,其次是BufferInputStream,FileInputStrea,RandomAccessFile(http://bbs.itheima.com/thread-48379-1-1.html)
  • FSDataOutputStream实现的接口:Closeable, DataOutput, Flushable, CanSetDropBehind, Syncable
  • 文件系统如ZFS,Moose、Glusterfs和Lustre使用FUSE实现,FastDFS没有提供FUSE功能
    用户空间文件系统(Filesystem in Userspace,简称FUSE)是操作系统中的概念,指完全在用户态实现的文件系统。(http://zh.wikipedia.org/wiki/FUSE)
  • 默认的Namenode web管理端口是50070
  • DirectByteBuffer和ByteBuffer:ByteBuffer需要通过wrap方法来封装字节数组,ByteBuffer在heap上分配内存,DirectByteBuffer的字节访问速度比ByteBuffer快,ByteBuffer由JVM负责垃圾回收(Direct不是)

你可能感兴趣的:(hadoop,hdfs,培训笔记)