Configurando os Datanodes do Hadoop no Ubuntu 16.04

No post anterior configuramos o Namenode, agora é a vez dos Datanodes

Informar ao master (namenode) os hostnames dos datanodes:

$ cat > $HADOOP_HOME/etc/hadoop/slaves << "EOF"
datanode01
datanode02
EOF

Instalação do Hadoop nos Datanodes

Copiar as chaves publicas dos 3 nodes para o arquivo ~hadoop/.ssh/authorized_keys de cada um dos nodes do cluster:

$ cat ~hadoop/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCv3JwqGL0VshGz+cyoz29mQLUrrPMHCo4ZiqfWCsbtB+50WKX+866DJHo39byRw5qF6hu7otKNc6jRETPzIj4OHNJvsNNnogwcakXeio6eY1C05nNYWD/NZtovbljx6b3qivwLMCK4PpPeMz88TY4845LeRguK+QFUV8IoWLZjf314qPwfVijuZUqiiASRLEsFW5TbMcYcCpnDCkiVhD2oW84LX7CcHUFLjZIzpJ0/vHsUI3h06R10y7Wj68Dnuq6msttRIIWHdrNM/2l3LdFgZL7XR6RQhVrgR0cEVQ4kuWOcOjhGyzACWR6Yv9ukBErIJYgFxTwvx0qglQcSX6OF hadoop@namenode01
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDUpSmXGnQB5OUdRenMM62txj6CaRcUNfdSCFoSI0fowc5kUNM9LFsXXvM5SaW90Q9uqTE+h/D7mOgx0lE5Di09IVdgHzYQR7cxzQL7xqXJczasxk+RG2xaQRpuXU4KgIVFbx64cvLbCHx9mJt5irbvcV3HK7k58vcNIYzDQNdO2EtFnnuCg4f/iNRnFb+eE1YOZPoErox63WO3YGRwPcRudLqrCwImqgUYHUDg0S4KrSHgyWQ40eab4OKC+exC9v9lskt1XU8KNTstcpzaJ+C/oP4JD3iUhfVlo22iFMBw19wg4hdgp3ccQMiOYc+qpFv0d4iJUuKt3CTqEfaVpXZZ hadoop@datanode01
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC33VMDFp+lnn39icLjvT5HOKz1Aqs2zJxoEcmNqLgsIZHeohoQcn0cZuPJRZ5f45WKMAv9I1EaVrPrw3kUYHAEfc+GUbKGinKKwmiprfUmL1IPTfuNvC+oFt6fdPZD2Mmv2imz2Yl47CiH5eClrE7jD4imAQs8FD5DRQ59GKyi5JNJfg7K/BJj1jqKOi6axpcLdK3KtbJ497iF0TXffSZsVB2Q2RP97mVFcj0wzlW3wEJiWtzqVRSsyZClVqyNzSBBQO7EodGyPlS+19BrOzhGnuhkBcRxvvlTq8J+eE45MbvjH8XnRFhrHsfFNhJikBoyk0OkF2rgKHdWyu6hMeil hadoop@datanode02
$

Realizar a copia da estrutura /srv/hadoop do namenode01 para os datanodes (datanode01 e datanode02)

$ scp -r /srv datanode01:/
$ scp -r /srv datanode02:/

Criação das pastas abaixo nos datanodes

$ mkdir -p $HADOOP_HOME/hdfs/datanode
$ mkdir -p $HADOOP_HOME/yarn/local
$ mkdir -p $HADOOP_HOME/yarn/log

Iniciar o DFS no namenode01

hadoop@namenode01:~$ $HADOOP_HOME/sbin/start-dfs.sh
[...]
Starting namenodes on [namenode01]
namenode01: starting namenode, logging to /srv/hadoop-2.9.0/logs/hadoop-hadoop-namenode-namenode01.out
datanode02: starting datanode, logging to /srv/hadoop-2.9.0/logs/hadoop-hadoop-datanode-datanode02.out
datanode01: starting datanode, logging to /srv/hadoop-2.9.0/logs/hadoop-hadoop-datanode-datanode01.out
[...]

Validação dos daemons

hadoop@namenode01:~$ jps
2653 Jps
2333 NameNode
2509 SecondaryNameNode
hadoop@namenode01:~$

Criação de pastas no HDFS para uso do yarn e history server

hadoop@namenode01:~$ hadoop fs -mkdir /tmp
hadoop@namenode01:~$ hadoop fs -chmod -R 1777 /tmp
hadoop@namenode01:~$ hadoop fs -mkdir /user
hadoop@namenode01:~$ hadoop fs -chmod -R 1777 /user
hadoop@namenode01:~$ hadoop fs -mkdir /user/app
hadoop@namenode01:~$ hadoop fs -chmod -R 1777 /user/app
hadoop@namenode01:~$ hadoop fs -mkdir -p /var/log/hadoop-yarn
hadoop@namenode01:~$ hadoop fs -chmod -R 1777 /var/log/hadoop-yarn
hadoop@namenode01:~$ hadoop fs -mkdir -p /var/log/hadoop-yarn/apps
hadoop@namenode01:~$ hadoop fs -chmod -R 1777 /var/log/hadoop-yarn/apps

# Listando a estrutura de arquivos do HDFS
hadoop@namenode01:~$ hadoop fs -ls -R /
drwxrwxrwt - hadoop supergroup 0 2017-11-30 20:46 /tmp
drwxrwxrwt - hadoop supergroup 0 2017-11-30 20:48 /user
drwxrwxrwt - hadoop supergroup 0 2017-11-30 20:48 /user/app
drwxr-xr-x - hadoop supergroup 0 2017-11-30 20:48 /var
drwxr-xr-x - hadoop supergroup 0 2017-11-30 20:48 /var/log
drwxrwxrwt - hadoop supergroup 0 2017-11-30 20:49 /var/log/hadoop-yarn
drwxrwxrwt - hadoop supergroup 0 2017-11-30 20:49 /var/log/hadoop-yarn/apps
hadoop@namenode01:~$

Start do Yarn

hadoop@namenode01:~$ $HADOOP_HOME/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /srv/hadoop-2.9.0/logs/yarn-hadoop-resourcemanager-namenode01.out
datanode02: starting nodemanager, logging to /srv/hadoop-2.9.0/logs/yarn-hadoop-nodemanager-datanode02.out
datanode01: starting nodemanager, logging to /srv/hadoop-2.9.0/logs/yarn-hadoop-nodemanager-datanode01.out
hadoop@namenode01:~$

Agora iremos iniciar o MapReduce History Server. Executamos o start a partir do host que ira ser o node do History Server no nosso caso será o próprio namenode01. Vamos editar o arquivo $HADOOP_HOME/etc/hadoop/mapred-site.xml no NameNode. Subistituimos o valor da propriedade names como abaixo  de NameNode to 0.0.0.0:

mapreduce.jobhistory.address de NameNode:10020 para 0.0.0.0:10020.
mapreduce.jobhistory.webapp.address de NameNode:19888 para 0.0.0.0:19888
hadoop@namenode01:~$ $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /srv/hadoop-2.9.0/logs/mapred-hadoop-historyserver-namenode01.out
hadoop@namenode01:~$

Abaixo a lista dos processos em execução em cada um dos nodes do cluster

hadoop@namenode01:~$ jps
3057 ResourceManager
3361 JobHistoryServer
3399 Jps
2333 NameNode
2509 SecondaryNameNode
hadoop@namenode01:~$

hadoop@datanode01:~$ jps
2545 Jps
2417 NodeManager
2290 DataNode
hadoop@datanode01:~$

hadoop@datanode02:~$ jps
2575 Jps
2319 DataNode
2447 NodeManager
hadoop@datanode02:~$

Criando uma pasta de testes no HDFS e fazendo o upload de um arquivo:

hadoop@namenode01:~$ hadoop fs -mkdir /analysis
hadoop@namenode01:~$ hadoop fs -ls /
Found 4 items
drwxr-xr-x - hadoop supergroup 0 2017-11-30 20:59 /analysis
drwxrwxrwt - hadoop supergroup 0 2017-11-30 20:46 /tmp
drwxrwxrwt - hadoop supergroup 0 2017-11-30 20:48 /user
drwxr-xr-x - hadoop supergroup 0 2017-11-30 20:48 /var
hadoop@namenode01:~$

Criando um arquivo e fazendo o upload

hadoop@namenode01:~$ echo "Testando, arquivo, hdfs" > ./teste.txt
hadoop@namenode01:~$ hadoop fs -put ./teste.txt /analysis/teste.txt
hadoop@namenode01:~$ hadoop fs -ls /analysis
Found 1 items
-rw-r--r-- 3 hadoop supergroup 24 2017-11-30 21:13 /analysis/teste.txt
hadoop@namenode01:~$ hadoop fs -tail /analysis/teste.txt
Testando, arquivo, hdfs
hadoop@namenode01:~$

Agora podemos checar o status do cluster através do Browser Web. Lembre-se de usar o IP público do NameNode, ResourceManager e HistoryServer respectivamente. Em nosso caso o IP do NameNode IP 192.168.0.11

http://namenode01:50070
http://namenode01:8088
http://namenode01:19888

Abaixo podemos ver o sistema de arquivos do HDFS em

http://namenode01:50070/explorer.html

Agora iremos executar um job map-reduce de exemplo em nosso Cluster.

hadoop@namenode01:~$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.0.jar pi 2 4
Number of Maps = 2
Samples per Map = 4
Wrote input for Map #0
Wrote input for Map #1
Starting Job
17/11/30 21:20:46 INFO client.RMProxy: Connecting to ResourceManager at namenode01/192.168.0.11:8032
17/11/30 21:20:52 INFO input.FileInputFormat: Total input files to process : 2
17/11/30 21:20:53 INFO mapreduce.JobSubmitter: number of splits:2
17/11/30 21:20:54 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
17/11/30 21:20:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1512083545978_0001
17/11/30 21:21:05 INFO impl.YarnClientImpl: Submitted application application_1512083545978_0001
17/11/30 21:21:06 INFO mapreduce.Job: The url to track the job: http://namenode01:8088/proxy/application_1512083545978_0001/
17/11/30 21:21:06 INFO mapreduce.Job: Running job: job_1512083545978_0001
17/11/30 21:22:43 INFO mapreduce.Job: Job job_1512083545978_0001 running in uber mode : false
17/11/30 21:22:43 INFO mapreduce.Job: map 0% reduce 0%
17/11/30 21:23:54 INFO mapreduce.Job: map 100% reduce 0%
17/11/30 21:24:58 INFO mapreduce.Job: map 100% reduce 100%
17/11/30 21:25:00 INFO mapreduce.Job: Job job_1512083545978_0001 completed successfully
17/11/30 21:25:03 INFO mapreduce.Job: Counters: 49
 File System Counters
 FILE: Number of bytes read=50
 FILE: Number of bytes written=607194
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=534
 HDFS: Number of bytes written=215
 HDFS: Number of read operations=11
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=3
 Job Counters
 Launched map tasks=2
 Launched reduce tasks=1
 Data-local map tasks=2
 Total time spent by all maps in occupied slots (ms)=121666
 Total time spent by all reduces in occupied slots (ms)=57322
 Total time spent by all map tasks (ms)=121666
 Total time spent by all reduce tasks (ms)=57322
 Total vcore-milliseconds taken by all map tasks=121666
 Total vcore-milliseconds taken by all reduce tasks=57322
 Total megabyte-milliseconds taken by all map tasks=124585984
 Total megabyte-milliseconds taken by all reduce tasks=58697728
 Map-Reduce Framework
 Map input records=2
 Map output records=4
 Map output bytes=36
 Map output materialized bytes=56
 Input split bytes=298
 Combine input records=0
 Combine output records=0
 Reduce input groups=2
 Reduce shuffle bytes=56
 Reduce input records=4
 Reduce output records=0
 Spilled Records=8
 Shuffled Maps =2
 Failed Shuffles=0
 Merged Map outputs=2
 GC time elapsed (ms)=830
 CPU time spent (ms)=15840
 Physical memory (bytes) snapshot=409604096
 Virtual memory (bytes) snapshot=1325137920
 Total committed heap usage (bytes)=258678784
 Shuffle Errors
 BAD_ID=0
 CONNECTION=0
 IO_ERROR=0
 WRONG_LENGTH=0
 WRONG_MAP=0
 WRONG_REDUCE=0
 File Input Format Counters
 Bytes Read=236
 File Output Format Counters
 Bytes Written=97
Job Finished in 259.733 seconds
Estimated value of Pi is 3.50000000000000000000
hadoop@namenode01:~$

Stop do Cluster Hadoop

hadoop@namenode01:~$ $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh stop historyserver
hadoop@namenode01:~$ $HADOOP_HOME/sbin/stop-yarn.sh
hadoop@namenode01:~$ $HADOOP_HOME/sbin/stop-dfs.sh

Douglas Ribas de Mattos
E-mail: douglasmattos0@gmail.com
Github: https://github.com/douglasmattos0
LinkedIn: https://www.linkedin.com/in/douglasmattos0/

Deixe uma resposta

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *