用idea编写并运行第一个spark scala处理程序
1、用idea创建工程,类别为:Scala,选IDEA类型,下一步选择JDK 1.8和Scala SDK:scala-sdk-2.12.20(scala安装位置)。
2、点右键,添加框架支持(Add Framework Support),选择Maven支持。
3、pom.xml的内容:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>groupId</groupId>
<artifactId>scala01</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.12.20</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-common-utils_2.13</artifactId>
<version>3.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-network-common_2.13</artifactId>
<version>3.5.2</version>
</dependency>
</dependencies>
</project>
刷新maven工程。
4、创建包:com.rainpet
包下新建Scala文件Hello.scala,内容如下:
package com.rainpet
import org.apache.spark.{SparkConf, SparkContext}
object Hello {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("educoder").setMaster("local")
val sc = new SparkContext(conf)
//var file1=sc: SparkContext.textFile("hdfs://master:9000/user/input/1.txt")
//ci'pin'tong'ji
var file1 = sc.textFile("hdfs://master:8020/user/input/1.txt")
var wordCounts = file1.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
wordCounts.collect().foreach(println)
}
}
5、点运行或调试,即可以正常运行了。