以前对Hadoop有过一点了解,但没有深入,现在越来越感觉这东西挺有意思的,打算学习下,前两天买了两本Hadoop相关的书,先粗略的翻了下,今天就动手先把环境搭起来。
环境:centos6.2,jdk7_u45,
下载,解压过程就不说了,直接环境配置(包括JAVA_HOME的配置,以及HADOOP_HOME的环境变量配置,都略过了)。参考文档
1,修改hadoop-env.sh中修改JAVA_HOME
2,修改core-site.xml配置文件
hadoop.tmp.dir /data/hadoop/tmp fs.defaultFS hdfs://localhost:9000 true
3,修改hdfs-site.xml配置文件
dfs.namenode.name.dir file:///data/hadoop/dfs/name true dfs.datanode.data.dir file:///data/hadoop/dfs/data true dfs.replication 1 dfs.permissions.enabled false
4,复制mapred-site.xml.template成mapred-site.xml,修改mapred-site.xml
cp mapred-site.xml.template mapred-site.xmlvi mapred-site.xml
mapreduce.framework.name yarn
5,修改yarn-site.xml配置文件
yarn.resourcemanager.hostname localhost hostanem of RM yarn.resourcemanager.resource-tracker.address localhost:5274 host is the hostname of the resource manager and port is the port on which the NodeManagers contact the Resource Manager. yarn.resourcemanager.scheduler.address localhost:5273 host is the hostname of the resourcemanager and port is the port on which the Applications in the cluster talk to the Resource Manager. yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler In case you do not want to use the default scheduler yarn.resourcemanager.address localhost:5271 the host is the hostname of the ResourceManager and the port is the port on which the clients can talk to the Resource Manager. yarn.nodemanager.local-dirs the local directories used by the nodemanager yarn.nodemanager.address localhost:5272 the nodemanagers bind to this port yarn.nodemanager.resource.memory-mb 10240 the amount of memory on the NodeManager in GB yarn.nodemanager.remote-app-log-dir /app-logs directory on hdfs where the application logs are moved to yarn.nodemanager.log-dirs the directories used by Nodemanagers as log directories yarn.nodemanager.aux-services mapreduce_shuffle shuffle service that needs to be set for Map Reduce to run
到此为止,hadoop单机版配置已经完成。
1)接下来我们先格式化namenode,然后启动namenode
hadoop namenode –format
hadoop-daemon.sh start namenode可以查看http://localhost:50070/dfshealth.jsp中logs的日志 (带namenode*.log字眼),确认是否启动成功,如果没有报错则启动成功。
2)接着启动hdfs datanode
hadoop-daemon.sh start datanode同时也可以在开始页面上查询对应的日志文件(带datanode *.log字眼),如果没有报错,和namenode通信成功,即启动成功。
还可以在命令行数据Jps查看是否有结果
3)继续启动yarn
yarn-daemon.sh start resourcemanageryarn-daemon.sh start nodemanager
判断启动成功与否方法同上面一致。
最后进入hadoop-2.2.0\share\hadoop\mapreduce录入中,测试运行
hadoop jar hadoop-mapreduce-examples-2.2.0.jar randomwriter out查看运行是否成功