云计算 频道

Hadoop学习笔记之二:部署与应用实例

  Hadoop环境变量

  在/home/dbrg/HadoopInstall/hadoop-conf目录下的hadoop_env.sh中设置Hadoop需要的环境变量,其中JAVA_HOME是必须设定的变量。HADOOP_HOME变量可以设定也可以不设定,如果不设定,HADOOP_HOME默认的是bin目录的父目录,即本文中的/home/dbrg/HadoopInstall/hadoop。我的是这样设置的

  export HADOOP_HOME=/home/dbrg/HadoopInstall/hadoop

  export JAVA_HOME=/usr/java/jdk1.6.0

  从这个地方就可以看出前面所述的创建hadoop0.12.0的链接hadoop的优点了,当以后更新hadoop的版本的时候,就不需要在改配置文件,只需要更改链接就可以了。

  Hadoop配置文件

  如前所述,在hadoop-conf/目录下,打开slaves文件,该文件用来指定所有的从节点,一行指定一个主机名。即本文中的dbrg-2,dbrg-3,因此slaves文件看起来应该是这样的

  dbrg-2

  dbrg-3

  在conf/目录中的hadoop-default.xml中包含了Hadoop的所有配置项,但是不允许直接修改!可以在hadoop-conf/目录下的hadoop-site.xml里面定义我们需要的项,其值会覆盖hadoop-default.xml中的默认值。可以根据自己的实际需要来进行定制。以下是我的配置档:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
  
<name>fs.default.name</name>
  
<value>dbrg-1:9000</value>
  
<description>The name of the default file system. Either the literal string "local" or a host:port for DFS.</description>
</property>
<property>
  
<name>mapred.job.tracker</name>
  
<value>dbrg-1:9001</value>
  
<description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.</description>
</property>
<property>
  
<name>hadoop.tmp.dir</name>
  
<value>/home/dbrg/HadoopInstall/tmp</value>
  
<description>A base for other temporary directories.</description>
</property>
<property>
  
<name>dfs.name.dir</name>
  
<value>/home/dbrg/HadoopInstall/filesystem/name</value>
  
<description>Determines where on the local filesystem the DFS name node should store the name table. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. </description>
</property>
<property>
  
<name>dfs.data.dir</name>
  
<value>/home/dbrg/HadoopInstall/filesystem/data</value>
  
<description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.</description>
</property>
<property>
  
<name>dfs.replication</name>
  
<value>1</value>
  
<description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.</description>
</property>
</configuration>

  

1
相关文章