更新于:

spark安装文档

Scala安装

  1. 将bigdata1,2,3转到hadoop3之前,调整内存为4G,首先安装部署好hadoop和zookeeper。并启动它们。看到该有的进程。

  2. 打开idea,用terminal工具进行联机

  3. 解压Scala,并重命名

1
2
tar -zxvf /opt/software/scala-2.12.18.tgz -C /opt/module/
mv /opt/module/scala-2.12.18/ /opt/module/scala/
  1. 设置scala的环境变量,并生效
1
vi /etc/profile

在末尾增加:

1
2
export SCALA_HOME=/opt/module/scala
export PATH=$PATH:$SCALA_HOME/bin

刷新生效

1
source /etc/profile
  1. 验证scala的安装情况
1
2
3
4
5
6
7
8
9
10
[root@master module]# scala

Welcome to Scala 2.12.18 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_161).
Type in expressions for evaluation. Or try :help.
scala> val a=1
a: Int = 1
scala> val b=2
b: Int = 2
scala> a+b
res0: Int = 3

按CTRL+c退出scala界面

Spark安装

  1. 解压spark,并重命名
1
2
tar -zxvf /opt/software/spark-3.1.1-bin-hadoop3.2.tgz -C /opt/module/
mv spark-3.1.1-bin-hadoop3.2/ spark
  1. 配置spark环境变量,并生效
1
vi /etc/profile

在末尾增加:

1
2
export SPARK_HOME=/opt/module/spark
export PATH=$PATH:$SPARK_HOME/bin

刷新生效

1
source /etc/profile
  1. 修改spark的配置文件
1
2
cp /opt/module/spark/conf/spark-env.sh.template /opt/module/spark/conf/spark-env.sh
vi /opt/module/spark/conf/spark-env.sh

按pagedown按键快速定位到文件末尾,添加

1
2
3
export HADOOP_HOME=/opt/module/hadoop
export JAVA_HOME=/opt/module/jdk
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

9.分发文件,包括scala、spark、/etc/profile

1
2
3
4
5
6
scp -r /opt/module/scala/ slave1:/opt/module/
scp -r /opt/module/scala/ slave2:/opt/module/
scp -r /opt/module/spark/ slave1:/opt/module/
scp -r /opt/module/spark/ slave2:/opt/module/
scp /etc/profile slave1:/etc/profile
scp /etc/profile slave2:/etc/profile

在其他节点刷新生效

1
2
[root@slave1 ~]# source /etc/profile
[root@slave2 ~]# source /etc/profile
  1. 提交计算任务,测试spark on yarn
1
spark-submit --class org.apache.spark.examples.SparkPi --master yarn /opt/module/spark/examples/jars/spark-examples_2.12-3.1.1.jar

可以看到提交任务的情况,并看到圆周率Pi的估计值

1
2
3
4
2024-04-07 08:55:33,209 INFO cluster.YarnScheduler: Killing all running tasks in stage 0: Stage finished
2024-04-07 08:55:33,211 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 1.251154 s
Pi is roughly 3.1335756678783393
2024-04-07 08:55:33,232 INFO server.AbstractConnector: Stopped Spark@57a4d5ee{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
  1. 如果发现spark on yarn资源不足的报错

java.nio.channels.ClosedChannelException

需要这样处理:

(1)先关闭hadoop,在master上执行 stop-all.sh

(2)修改Hadoop配置文件 yarn-site.xml ,添加如下内容

1
vi /opt/module/hadoop/etc/hadoop/yarn-site.xml
1
2
3
4
5
6
7
8
<property>
  <name>yarn.nodemanager.pmem-check-enabled</name>
  <value>false</value>
</property>
<property>
  <name>yarn.nodemanager.vmem-check-enabled</name>
  <value>false</value>
</property>

(3)将该文件 yarn-site.xml 分发到其他服务器。

1
2
3
scp /opt/module/hadoop/etc/hadoop/yarn-site.xml slave1:/opt/module/hadoop/etc/hadoop/yarn-site.xml

scp /opt/module/hadoop/etc/hadoop/yarn-site.xml slave2:/opt/module/hadoop/etc/hadoop/yarn-site.xml

(4)重新启动hadoop

1
2
3
4
stop-yarn.sh
stop-dfs.sh
start-dfs.sh
start-yarn.sh