Spark learning notes [1]-scala environment installation and basic syntax

Spark learning notes [1]-scala environment installation and basic syntax

   just as the saying goes, if you want to do a good job, you must first use your tools. Spark's development language is not Java but scala. Although they both run on the JVM, the basic characteristics of the two languages are still somewhat different. Here is a concept that JVM is not equal to Java. Any language can run on the JVM as long as it can compile a class file conforming to the JVM specification

   compared with Java, Scala language is simpler and more concise. In fact, as a functional programming language, the meaning of turning functional into language is that any function is a variable, which is a bit similar to the function pointer in C + +. Due to the concise syntax, the problem is that compared with Java, the code readability of the code written in scala is so poor

   becoming a language is only a tool in the final analysis. The language features are different, but the functions it can support are basically the same. This article focuses on some basic features of scala language. For details, you can directly check the official website

1. Preparation (take windows system as an example)

  • 1) Download the installation package of scala from the official website, , the windows system downloads the corresponding msi file and double-click the installation. During the installation process, it cannot be installed in the default folder. The default folder is Program Files (x86). If the path contains spaces, an error will be reported

  • 2) Add environment variables, or write path by the program during installation

  • 3) Open the command line and enter scala. If you can enter the editing interface of scala, the installation is completed

  • 4) I. The integrated development tool used in this article is IDEA. scala can be integrated into IDEA. You need to install scala plug-ins first. For details, please refer to the blog:

2. Basic grammar and language characteristics

2-1 introduction to main, class, object and statement writing rules
  • Java's class definition keyword is class. scala also has the object keyword in addition to class. The main method of the main method can only be written in the class defined by object. The example code is as follows:
object Collection {

  def main(args: Array[String]): Unit = {}
  • A Java file can only consist of one main class, and the main class name and file name must be consistent. scala does not have this requirement
  • **Difference between object and class: * * when using class to define a class in Scala, there can be no static variables or static methods (there is no static keyword at all). The function of static in scala can be realized by using object class (so main must be in object), but the object class and class class must be defined in a file with the same name. At this time, this object is called "partner object", For example, class A and Object A in the same file are associated objects, and variables in a are static attributes of class A
  • The class defined by Object is equivalent to a static singleton Object
  • In Java, a semicolon is used to mark a line of statements. In scala, semicolons are optional, but if there are multiple sentences on the same line, they need to be separated by commas
  • Although Scala allows file names and class names to be inconsistent, and a file can have multiple classes, Scala also needs to be compiled into a class file conforming to the JVM specification, so the scala compiler will eventually generate a class with the same class name and file name. Therefore, different files in the same package are not allowed to have classes with the same name
  • In Scala, business logic can be executed in the class body. In Java, business logic can only be executed in methods. The business logic defined in the class body in scala [that is, exposed code] will be compiled into the default constructor by the scala compiler. For example, the exposed code in the class is defined as follows
class test{

  var a:Int = 3
  val name = "bbb"



2-2 variable definition
var/val Variable name:type = value  //var defines variables and val defines constants. Variables defined by var can be changed. Val is similar to final
2-3 constructor

   the constructor in scala can be written. If it is not written, the default constructor is composed of exposed code in the class body. If a personalized constructor is defined, the definition method is as follows

def this(Parameter name:Parameter type.....){
        //The default constructor must be called
2-4 class name constructor

  in addition to the conventional constructor, scala also has a class name constructor, which is defined as follows

 class A(var/val name[name]:String[type]){

     new A(name="aaa")

The basic features of the class name constructor are as follows:

  • 1) . var/val can be omitted. It is val and private by default
  • 2) . only the parameters in the class name constructor can be set to VaR, and the parameters in other methods are of val type and are not allowed to be set to var type
  • 3), if there is a class name constructor, and a custom constructor is defined, the default constructor in the custom constructor is required to initialize it, for example, to change the A into a constructor.
calss A(name:String){
         def this(age:Int){
            this("aaa") //Displays the parameters in the initialization class name constructor
2-5 process control
2-5-1 if/else

Like java

 var i:Int = 0;
if(i == 0){
2-5-2 while loop

It is the same as java, but there is no such self increasing syntax as + + and it is directly replaced by + = 1

 var i = 0
            i +=1
2-5-3 for loop

scala does not support the syntax of for (int i = 0; I < 10; I + +) and only supports the enhanced for loop, which is similar to the for(a: iterator in java. The syntax of using scala to implement for (int i = 0; I < 10; I + +) is:

 for(i <- 0 to (9,1)){ //Include 9

for(j <- 0 until (10,1)){//Excluding 10

The loop expression can be followed by a judgment condition. For example, the loop body is executed only when a certain condition is met

 for(i <- 0 to (9,1) if (i%2==0)){

The direct use of double-layer for loop is more concise, such as printing multiplication table

  for(i <- 1 to 9 /*Outer circulation*/;j <- 1 to 9 if (i >= j) /*inner loop */){
              print(s"$j * $i = ${i*j} \t")
2-6 function
2-6-1 general functions

Function definition

      def Function name(Parameter list [all parameter names]:Format of parameter type]):return type = {


If the function has no return value, the return value type is Unit. The return value type of the function can be return, or the variable can be written directly on the last line, for example

     def test1(): Int ={
        var i = 3
        //return i

If the function has no return value, the return value type is Unit. The return value type of the function can be return, or the variable can be written directly on the last line, for example

2-6-2 anonymous function
      var y= (parameter list ) =>{
         Function body
      var y:(Int,Int)=>Int = (a:Int,b:Int)=>{

y: (int, int) = > int is called the signature of the function and can exist as a formal parameter of the function. Calling an anonymous function is similar to calling an ordinary function, which is y (formal parameter)

2-6-3 nested functions (functions defined in functions)
 def test1(a:String):Unit = {
        def test2():Unit = {
Application of partial function
        def fun07(date:Date,tp:String,msg:String): Unit ={
        var info = fun07(_:Date,"info","ok") //Fixed the last two parameters, the first parameter "" It's space occupying
        info(new Date())
2-6-5 variable parameter function
 def fun08(a: Int*): Unit = {
          for (elem <- a) {
                    //Function as parameter
          a.foreach(println) //Print each primitive of a, foreach receives a function with a formal parameter, and the return value is generic. println has only one formal parameter, and the return value is Unit
2-6-6 higher order function

Function as a parameter or return value

       //Function as a parameter, Y: (Int, Int) = > Int is the required function format. It receives two parameters and returns an Int data
        def compute(a:Int,b:Int,y:(Int,Int)=>Int):Int = {
        println(compute(3,4,_ % _))
        //Function as return value
        def factory(op:String):(Int,Int)=>Int ={
        var addFunc = factory("+")
        var mulFunc = factory("*")
2-6-7 Ke physics and chemistry

It is also called multi parameter list. It feels a little abstract. The form is as follows

      def func09(a:Int)(b:String): Unit = {

Doesn't it feel superfluous? Just define def func09(a:Int,b:String) directly. It's mainly used

  • 1) Used to receive variable parameter list when the type is inconsistent
def func09(a:Int*)(b:String*): Unit = {
        //      println(s"$a\t$b")
// Of course, it can be implemented with def func09(a:Any *), but the type of parameter passed in cannot be controlled at this time
  • 2) , * * implicit parameters: * * if you want to specify that some parameters in the parameter list are implicit, you should use the multi parameter list
2-7 set framework
  • 1) The collection framework using java is written in java, but it has been compiled into bytecode and runs on the JVM, so scala can use the java class library

  • < font size = 4.5 face = 'italics' 21). There are two types of set classes defined by scala, mutable and immutable. By default, the set class in the immutable package is used, as shown in the following example

        var arr01 = Array(1,2,3,4)
        println(arr01(0)) //Take the corresponding index with parentheses, [] is a generic parameter in scala
        var arr02 = Array[Int](1,2,3,4)
       //Linked list
        var list01 = List(1,2,3,4,5) //Immutable List
        //Variable List
        var list02 = new ListBuffer[Int]()
        list02.+=(32) //Add element, + + + +: see for the meaning of operators such as
         set //Variable and immutable
       var t2 = new Tuple2(11,"sssss") //2 means that there can be 2 elements, up to Tuple22
       println(t2._1) //Value
        val iterator = t2.productIterator //Get the iterator
       //Tuple2 describes key value pairs in scala
       import scala.collection.mutable.Map
       val map01:Map[String,Int] = Map(("a", 33), "b" -> 22, ("c", "44"))
       val keys:Iterable[String] = map01.keys
       val value = map01.get("a").getOrElse("aa")
      // get("a") returns the Option type. There are two values inside the Option, none and some. If there is a value, it returns some. If there is no value, it returns none, and then get the value through the next layer

       //Collection operation
       //map method, receive a function, one in and one out
        val list = List(1,2,3,4,5)
        val list02 = Int) => (x * x))
       //The reduce method receives one function, one more in and one out
        list.reduce((a:Int,b:Int)=>(a+b)) //The internal call is reduceLeft, which accumulates from left to right, and the initial value is 0
      //flatMap method, collection expansion
          val list03 = List("hello word","hello jeje")
         val strings = list03.flatMap((x: String) => {
           x.split(" ")
2-8 iterators

In order to avoid the memory overflow caused by directly loading a large amount of data into the memory at one time, the iterator mode is widely used in the field of data computing. You only need to save the pointer to the real data. Through the iterative data of the pointer, take an example to illustrate the use of the iterator to iterate the scala list

        val list03 = List("hello word","hello jeje","hehe hhhhhh")
        val iter:Iterator[String] = list03.iterator

        val strings = iter.flatMap((x: String) => {
          x.split(" ")
        }) //The return is also an iterator, and no actual calculation occurs
    //    strings.foreach(println) / / iterate over the elements, perform calculations, and finally call iter's next and hasnext methods. You can see the source code
        val tuples =, 1))
        //Strings is an iterator and is already calling strings After foreach, point to the end, and then iterate the map. The element cannot be output
2-9 advanced features
2-9-1 trait

Similar to the interface [it is indeed an interface after compilation], it is used for multi inheritance

         trait A {
           def say(): Unit={
         trait B {
           def sayB():Unit={
           def sayB2():Unit
         class Person(name:String) extends A with B {
           def hello(): Unit = {
             println(s"$name say hello")
           override def sayB2(): Unit ={
         object traitTest {

           def main(args: Array[String]): Unit = {
             val p = new Person("ssss")

2-9-2 case class

Sample classes are mainly used for pattern matching. Unlike ordinary classes, the comparison of sample classes is a comparison value rather than a reference, so the following a and a2 are equal

         //Similar to the factory, as long as the value of the construction instance is the same, the factory products are the same
         case class Dog(name:String,age:Int){

         object caseClassTest {
           def main(args: Array[String]): Unit = {
             val a = new Dog("hashiqi", 18)
             val a2 = new Dog("hashiqi", 18)
2-9-3 match pattern matching

It feels like an enhanced switch, which can match not only values, but also types

         val tup:(Double,Int,String,Char) = (1.0, 2, "aaa", 'a')
         val iter = tup.productIterator
         val res =>{
           x match{
             case 1.0 => println("1.0")  //Match value
     //        case 2 => println("2")
             case o:Int => println(s"$o is Int") //Match type, o is the incoming x
             case o:String => println(s"$o is String")
             case _ => println("default") //By default, that is, the default rule of switch
2-10 partial function

Process the data and return the corresponding value according to the corresponding rules

         //The first position is the incoming parameter, and the second parameter is the return value type
         def test:PartialFunction[Any,String]={
           case "hello" => "val is Hello"
           case x:Int => s"$x is Int"
           case _ => "none"
2-11 implicit conversion

The function of implicit conversion is to enhance the existing compiled classes, assuming that java LinkList is used now

             val list01 = new util.LinkedList[Int]()

It needs to be traversed, but the java LinkList does not have a foreach method. You can wrap it in the following methods

  • 1) . packaging method
   def foreach[T](linkedList: util.LinkedList[T],f:(T)=>Unit)={ //T is a generic parameter
                  val iter = linkedList.iterator()
  • 2) . encapsulation class
class ListEx[T](linkedList: util.LinkedList[T]){
            def foreach(f:(T)=>Unit)={
              val iter = linkedList.iterator()
              //    Class encapsulation
              val listex = new ListEx(list01)
  • 3) Class is also used to encapsulate it, and implicit transformation method is used to enhance the original set
//Implicit conversion method. The name doesn't matter. The type should be right
              implicit def tran[T](linkedList: util.LinkedList[T]):Unit={
                 new ListEx(linkedList)
  • 4) Use implicit conversion class
implicit class tran[T](linkedList: util.LinkedList[T]) {
                def foreach(f: (T) => Unit) = {
                  val iter = linkedList.iterator()
                  while (iter.hasNext) {

Using implicit conversion can enhance the class function without modifying the source code. In addition to implicit conversion functions and classes, there are implicit conversion parameters, as follows

implicit val aa:String = "aaa"
def aaaa(implicit aaa:String):Unit={ //The representative parameter can be passed or not. If it is not passed, it will find the filling with matching type from the implicit variable defined in the program. If multiple parameters are found, an error will be reported
//Call mode
                //If the function is changed to
                def aaaa(implicit aaa:String,bbb:Int)
                //Although there is an implicit variable of String type, when calling aaaa, you can't just wear bbb parameters, and you must pass them in or out at the same time. If you want to realize the function of only passing in bbb, you need to use Coriolis (multi parameter list) to define the function as
                def test01(bbb:Int)(implicit aaa:String):Unit ={
                  println(s"$aaa ----> $bbb")

Keywords: Scala Big Data Spark

Added by GateGuardian on Sun, 06 Feb 2022 08:36:12 +0200