Hadoop3 – Pseudo-Distributed Mode

Environment

  • Mac OS X (10.15.3)
  • Hadoop 3.1.3 (Explain other entry for Hadoop 2)
  • Java 1.8

Details and ref is Hadoop page (Link)

Preparation

If you want to run by script (script mode), please set up password less ssh : (Mac OSX ssh localhost passwordless)

Install Java (Java 8)

Steps

  • Download Apache Hadoop
  • Decompress tar.gz (Use tar xvfz)
  • Change configuration files
  • Format HDFS (Only 1 time)
  • Start HDFS
  • Start YARN
  • Check (jps, hadoop cluster web console)
  • Stop YARN
  • Stop HDFS

Download Apache Hadoop from website

Apache Hadoop Page (Link)

Try 3.1.3 (This is tar.gz)

Decompress and any places

Structure

This is the structure I used for this entry

hadoop-3.1.3
|- bin
     |- hadoop
     |- hdfs
     |- yarn

|- sbin
    |- start-dfs.sh
    |- start-yarn.sh

|- etc
     |- hadoop
            |- hadoop-env.sh
            |- core-site.xml
            |- hdfs-site.xml
            |- mapred-site.xml

Change configuration

Let’s prepare configuration to work with Pseudo-Distributed Mode

hadoop-env.sh

#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=$(/usr/libexec/java_home)
export HADOOP_HOME="/Users/dj110/bigdata/work/hadoop-3.1.3"

core-site.xml

 <configuration>
  <property>
  	<name>fs.defaultFS</name>
	<value>hdfs://localhost:9000</value>
  </property>
</configuration>

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
	<value>1</value>
  </property>
</configuration>

mapred-site.xml

<configuration>
  <property>
    <name>mapreduce.framewok.name</name>
	<value>yarn</value>
  </property>
</configuration>

yarn-site.xml

<configuration>
   <property>
      <name>yarn.nodemanager.aux-services</name>
	  <value>mapreduce_shuffle</value>
   </property>
</configuration>

Now it’s ready to start

Format HDFS (Only 1 time)

This is only first time to do. Format HDFS = Remove all files under HDFS file system root. Go to Hadoop HOME and run following – Namenode format

bin/hdfs namenode -format

Start HDFS

sbin/start-dfs.sh

Start YARN

sbin/start-yarn.sh

 Check

You can access following Web UI

http://localhost:9870/dfshealth.html#tab-overview
http://localhost:8088/cluster

Hadoop 2 is different port, please check Hadoop 2 document.

Try jps to check java process

jps

Can see following results

2227 DataNode
2723 Jps
2563 ResourceManager
2660 NodeManager
2360 SecondaryNameNode
2126 NameNode

NameNode, DataNode, SecondaryNameNode => HDFS

ResourceManager, NodeManager => YARN

Stop

Stop YARN

 sbin/stop-yarn.sh

Stop HDFS

sbin/stop-dfs.sh
Hadoop
スポンサーリンク
Professional Programmer2

コメント