今天在项目中使用mahout的过程中,遇到了这个异常:
[14:43:13.059] [2015-11-24 14:43:13,058] [INFO ] resin-port-8080-23 AbstractJob - Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[file:////root/mahout_test/email/tzm_email], --keyPrefix=[], --method=[sequential], --output=[file:////root/mahout_test/email/SeqFile], --startPhase=[0], --tempDir=[temp]} [14:43:13.255] [2015-11-24 14:43:13,255] [WARN ] resin-port-8080-23 NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [14:43:13.255] [2015-11-24 14:43:13,255] [WARN ] resin-port-8080-23 NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [14:43:13.395] [2015-11-24 14:43:13,391] [ERROR] resin-port-8080-23 KmeansTestController - method<test> e<java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation> java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:214) at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2365) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2375) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.mahout.text.SequenceFilesFromDirectory.runSequential(SequenceFilesFromDirectory.java:101) at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:88) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at com.kxm.service.kmeans.KmeansCluster.run(KmeansCluster.java:80) at com.kxm.controller.KmeansTestController.test(KmeansTestController.java:43) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:215) at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:132) at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:104) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandleMethod(RequestMappingHandlerAdapter.java:749) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:689) at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:83) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:938) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:870) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:961) at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:863) at javax.servlet.http.HttpServlet.service(HttpServlet.java:159) at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:837) at javax.servlet.http.HttpServlet.service(HttpServlet.java:97) at com.caucho.server.dispatch.ServletFilterChain.doFilter(ServletFilterChain.java:109) at com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:156) at com.caucho.server.webapp.AccessLogFilterChain.doFilter(AccessLogFilterChain.java:95) at com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:289) at com.caucho.server.http.HttpRequest.handleRequest(HttpRequest.java:838) at com.caucho.network.listen.TcpSocketLink.dispatchRequest(TcpSocketLink.java:1346) at com.caucho.network.listen.TcpSocketLink.handleRequest(TcpSocketLink.java:1302) at com.caucho.network.listen.TcpSocketLink.handleRequestsImpl(TcpSocketLink.java:1286) at com.caucho.network.listen.TcpSocketLink.handleRequests(TcpSocketLink.java:1194) at com.caucho.network.listen.TcpSocketLink.handleAcceptTaskImpl(TcpSocketLink.java:993) at com.caucho.network.listen.ConnectionTask.runThread(ConnectionTask.java:117) at com.caucho.network.listen.ConnectionTask.run(ConnectionTask.java:93) at com.caucho.network.listen.SocketLinkThreadLauncher.handleTasks(SocketLinkThreadLauncher.java:169) at com.caucho.network.listen.TcpSocketAcceptThread.run(TcpSocketAcceptThread.java:61) at com.caucho.env.thread2.ResinThread2.runTasks(ResinThread2.java:173) at com.caucho.env.thread2.ResinThread2.run(ResinThread2.java:118)
经过排查,原因是因为我除了mahout必要的包之外,还引入了mahout-core包。mahout-core包中会引入hadoop-core,这是引起异常的原因。
正确的pom,我的mahout版本是0.11:
<!-- mahout --> <!-- <dependency> --> <!-- <groupId>org.apache.mahout</groupId> 不可引入mahout-core --> <!-- <artifactId>mahout-core</artifactId> --> <!-- <version>0.11.0</version> --> <!-- </dependency> --> <dependency> <groupId>org.apache.mahout</groupId> <artifactId>mahout-math</artifactId> <version>0.11.0</version> </dependency> <dependency> <groupId>org.apache.mahout</groupId> <artifactId>mahout-hdfs</artifactId> <version>0.11.0</version> </dependency> <dependency> <groupId>org.apache.mahout</groupId> <artifactId>mahout-mr</artifactId> <version>0.11.0</version> </dependency> <dependency> <groupId>org.apache.mahout</groupId> <artifactId>mahout-integration</artifactId> <version>0.11.0</version> </dependency>