[software engineering practice] Hive research - Blog13
2021SC@SDUSC
Research content introduction
I am responsible for converting the query block QB into a logical query plan (OP Tree)
The following code is from apaceh-hive-3.1 2-Src / QL / SRC / Java / org / Apache / Hadoop / hive / QL / plan, which is my analysis object code. In Blog9-12, we have completed the analysis of the following file Codes:
- BoundaryDef.java
- PTFExpressionDef.java
- OrderDef.java
- OrderExpressionDef.java
- PartitionDef.java
- WindowTableFunctionDef.java
- PartitionedTableFunctionDef.java
- PTFInputDef.java
- PTFQueryInputDef.java
- ShapeDetails.java
This week is the last week's Blog. Let's continue our research and analyze the remaining codes in the PTFE folder:
- WindowExpressionDef.java
- WindowFrameDef.java
- WindowFunctionDef.java
WindowExpressionDef.java file code parsing
We first attach the entire java file source code.
/* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.hadoop.hive.ql.plan.ptf; import org.apache.hadoop.hive.ql.plan.Explain; public abstract class WindowExpressionDef extends PTFExpressionDef { private String alias; @Explain(displayName = "alias") public String getAlias() { return alias; } public void setAlias(String alias) { this.alias = alias; } }
Start parsing.
Global variable resolution (1)
As before, let's start with the analysis of the imported package. Let's first look at the import packages:
import org.apache.hadoop.hive.ql.plan.Explain;
org.apache.hadoop.hive.ql.plan.Explain import package resolution
Let's go to the API interface on the apache official website to find a detailed description of this package:
This is more like a class than a standard package. It is very different from other import packages. It does not contain other classes. There are no methods, only some basic variables, and variables of other classes are introduced. Similarly, we wait until we need to use this class, and then return here for detailed analysis.
So far, we have resolved all the import packages.
Global variable resolution (2)
Let's take a look at the global variables:
private String alias;
This is a String variable, which is the most basic type of JAVA. We won't parse it here.
So far, we have resolved all global variables, and then we begin to analyze the code.
Explain settings
@Explain(displayName = "alias")
What kind of statement is this? Let's watch the keyword @ export first@ Explain is actually calling our import package org apache. hadoop. hive. ql.plan. Explain. In this package, there is a displayName variable of its string type. The previous displayName = "alias" statement sets the value of the variable displayName to "alias".
Method getAlias
@Explain(displayName = "alias") public String getAlias() { return alias; } }
The getter method of the alias parameter is used to get the alias parameter.
Method setAlias
public void setAlias(String alias) { this.alias = alias; }
The setter method of the alias parameter is used to set the alias parameter.
At this point, we're working on windowexpressiondef The code parsing of the java file is complete, and we will continue to parse the code.
WindowFrameDef.java file code parsing
We first attach the entire java file code.
/* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.hadoop.hive.ql.plan.ptf; import org.apache.hadoop.hive.common.classification.InterfaceAudience; import org.apache.hadoop.hive.common.classification.InterfaceStability; import org.apache.hadoop.hive.ql.metadata.HiveException; import org.apache.hadoop.hive.ql.parse.WindowingSpec.WindowType; @InterfaceAudience.Public @InterfaceStability.Stable public class WindowFrameDef { private WindowType windowType; private BoundaryDef start; private BoundaryDef end; private final int windowSize; private OrderDef orderDef; // Order expressions which will only get set and used for RANGE windowing type public WindowFrameDef(WindowType windowType, BoundaryDef start, BoundaryDef end) { this.windowType = windowType; this.start = start; this.end = end; // Calculate window size if (start.getDirection() == end.getDirection()) { windowSize = Math.abs(end.getAmt() - start.getAmt()) + 1; } else { windowSize = end.getAmt() + start.getAmt() + 1; } } public BoundaryDef getStart() { return start; } public BoundaryDef getEnd() { return end; } public WindowType getWindowType() { return windowType; } public void setOrderDef(OrderDef orderDef) { this.orderDef = orderDef; } public OrderDef getOrderDef() throws HiveException { if (this.windowType != WindowType.RANGE) { throw new HiveException("Order expressions should only be used for RANGE windowing type"); } return orderDef; } public boolean isStartUnbounded() { return start.isUnbounded(); } public boolean isEndUnbounded() { return end.isUnbounded(); } public int getWindowSize() { return windowSize; } @Override public String toString() { return windowType + " " + start + "~" + end; } }
Start parsing.
Global variable resolution (1)
First, we need to parse the imported package and the set global variables to facilitate our subsequent parsing. Let's first look at windowframedef Which packages are imported from the java file.
import org.apache.hadoop.hive.common.classification.InterfaceAudience; import org.apache.hadoop.hive.common.classification.InterfaceStability; import org.apache.hadoop.hive.ql.metadata.HiveException; import org.apache.hadoop.hive.ql.parse.WindowingSpec.WindowType;
Let's start parsing these packages now.
org. apache. hadoop. hive. common. classification. Interfaceunderstanding import package resolution
As before, let's go to the apache official website API to find a detailed introduction to this package:
You can see that this is a package of interface classes. The purpose of importing this package is to directly reference the class that implements its interface, so as to realize the effect that only one package can be referenced but multiple methods can be called. Let's take a look at the official description of this package: Annotation to information users of a package, class or method's intended audience We translate it into: a comment that informs the user of the target audience of a package, class, or method. Wait until the variables and methods in the implementation class need to be called, and then we go to the specific implementation class to find:
org.apache.hadoop.hive.common.classification.InterfaceStability import package resolution
As before, let's go to the apache official website API to find a detailed introduction to this package:
This package is of the same type as the previous package. It is an interface type package. When we call this package, we are actually calling its implemented interface. Let's take a look at its official description: Annotation to information users of how much to rely on a particular package, class or method not changing over time Annotations inform users how much they depend on specific packages, classes, or methods that do not change over time. When the method or variable in the implementation class is called, we can enter the specific class to view:
org.apache.hadoop.hive.ql.metadata.HiveException import package resolution
As before, let's go to the apache official website API to find a detailed introduction to this package:
Obviously, we can know from the name of this class that it is an exception handling class. Moreover, by checking the official description of this class, we can also know that it is an exception class: Generic exception class for Hive Hive's general exception class. This class is obviously imported into this file to handle exceptions and prevent the program from crashing. We wait until we need to use the exception capture inside, and then return here for detailed analysis.
org.apache.hadoop.hive.ql.parse.WindowingSpec.WindowType import package resolution
As before, let's go to the apache official website API to find a detailed introduction to this package:
We observe this class and find that it has an enumerated array containing two instantiated objects, RANGE and ROWS. We should note that they are likely to be set or referenced below. There are only two methods in this package, which are relatively simple. Wait until we reference the method, and then come back here for detailed analysis.
So far, we've done some research on windowframedef The import package parsing of java files has been completed. Now we continue to parse global variables.
Global variable resolution (2)
Let's take a look at what the two @ defined at the beginning mean:
@InterfaceAudience.Public @InterfaceStability.Stable
Obviously, this is calling the implementation classes of the two packages. Let's look at these two implementation classes.
Implementation class interfaceaudience Public
As before, let's go to the apache official website API to find a detailed introduction to this package:
Let's take a look at the official description: Intended for use by any project or application It is intended to be used by any project or application. However, there is no other information provided here, so we won't analyze it for the time being.
Implementation class interfacestability Stable
As before, let's go to the apache official website API to find a detailed introduction to this package:
Let's take a look at the official description: can evolve while retaining compatibility for minor release boundaries; can break compatibility only at major release. It can evolve while maintaining the boundary compatibility of small versions. Compatibility will only be broken at the time of the major version. However, there is no other information provided here, so we won't analyze it for the time being.
Global variable resolution (3)
Let's now look at the global variables defined:
private WindowType windowType; private BoundaryDef start; private BoundaryDef end; private final int windowSize; private OrderDef orderDef; // Order expressions which will only get set and used for RANGE windowing type
Windowtype is the windowtype of windowtype, which is our import package org apache. hadoop. hive. ql.parse. WindowingSpec. An instantiated object of windowtype.
BoundarDef type start and end, which are a class under the PTFE folder:
windowSize of final int type, which is a basic java type, is not parsed.
Orderdef of orderdef type, which is a class under the PTFE folder:
Class constructor method WindowFrameDef
public WindowFrameDef(WindowType windowType, BoundaryDef start, BoundaryDef end) { this.windowType = windowType; this.start = start; this.end = end; // Calculate window size if (start.getDirection() == end.getDirection()) { windowSize = Math.abs(end.getAmt() - start.getAmt()) + 1; } else { windowSize = end.getAmt() + start.getAmt() + 1; } }
The beginning of this method is a setter process, which we do not parse. Let's look at the following if statement. start. What is the getdirection () method? Let's go back to boundarydef View the source code in the java file:
public Direction getDirection() { return direction; }
What is this direction? This is a global variable. Let's view its definition statement:
Direction direction;
The directon type is an import package: org apache. hadoop. hive. ql.parse. WindowingSpec. Direction, if we need to use this import package, we will analyze it in detail
Back to the source code, let's look at the judgment condition of if: start Is getdirection () the same as end Getdirection() is equal. If yes, execute the statement windowsize = math abs(end.getAmt() - start. getAmt()) + 1;, What is the Math,abs() method? After consulting the data, this is a method to take the absolute value What method is getamt () in it? We go back to boundarydef View the source code in the java file:
public int getAmt() { return amt; }
amt is a global variable. Let's view its definition statement:
private int amt;
Then the statement windowsize = math abs(end.getAmt() - start. getAmt()) + 1; It means setting the value of wndownsize to the amt variable of end and start The difference of amt variable is taken as positive value plus 1
If the if judgment is false, set the value of windowSize to the sum of amt of end and amt of start plus 1
Method getStart
public BoundaryDef getStart() { return start; }
The getter method for the start parameter is used to get the start parameter.
Method getEnd
public BoundaryDef getEnd() { return end; }
The getter method of the end parameter is used to get the end parameter.
Method getWindowType inserts the code slice here
public WindowType getWindowType() { return windowType; }
The getter method of the parameter windowstype is used to get the windowstype parameter.
Method setOrderDef
public void setOrderDef(OrderDef orderDef) { this.orderDef = orderDef; }
The setter method for the parameter orderDef is used to set the orderDef parameter.
Method getOrderDef
public OrderDef getOrderDef() throws HiveException { if (this.windowType != WindowType.RANGE) { throw new HiveException("Order expressions should only be used for RANGE windowing type"); } return orderDef; }
This is an exception related method. First, use the if statement to judge whether the windowtype variable of the instantiated object is different from the enumeration variable RANGE in windowtype. If not, throw an exception, and print the Order expressions should only be used for RANGE windowing type statement. If equal, return orderDef directly This is a getter method for the parameter orderDef, which is used to get the orderDef parameter, but a layer of protection mechanism is added to prevent referencing the wrong orderDef parameter
Method isStartUnbounded
public boolean isStartUnbounded() { return start.isUnbounded(); }
Let's take a look at boundarydef Source code of isUnbounded method in Java file:
public boolean isUnbounded() { return this.getAmt() == BoundarySpec.UNBOUNDED_AMOUNT; }
This method determines whether the amt value of the instantiated BoundaryDef object is consistent with the BoundarySpec UNBOUNDED_AMOUNT is the global variable unbounded in BoundarySpec_ Equal amount, and then return the boolean value And BoundartSpec is BoundaryDef Import package org. In Java file apache. hadoop. hive. ql.parse. WindowingSpec. BoundarySpec, this package has a global variable UNBOUNDED_AMOUNT, whose type is int
Method isEndUnbounded
public boolean isEndUnbounded() { return end.isUnbounded(); }
This method is the same as the previous method, and we won't repeat it here
Method getWindowSize
public int getWindowSize() { return windowSize; }
The getter method of the parameter windowSiz is used to get the windowSiz parameter.
(override) method toString
@Override public String toString() { return windowType + " " + start + "~" + end; }
Here, the default method toString of each class is rewritten and changed to the format of windowstype + "" + start + "~" + end, that is, each value is returned in the form of string
So far, we've done some research on windowframedef Java file code parsing is complete. Let's continue to parse the code
WindowFunctionDef.java file code parsing
We first attach the entire java file code
/* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.hadoop.hive.ql.plan.ptf; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.hive.ql.plan.Explain; import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator; @Explain(displayName = "window function definition") public class WindowFunctionDef extends WindowExpressionDef { String name; boolean isStar; boolean isDistinct; List<PTFExpressionDef> args; WindowFrameDef windowFrame; GenericUDAFEvaluator wFnEval; boolean pivotResult; @Explain(displayName = "name") public String getName() { return name; } public void setName(String name) { this.name = name; } @Explain(displayName = "isStar", displayOnlyOnTrue = true) public boolean isStar() { return isStar; } public void setStar(boolean isStar) { this.isStar = isStar; } @Explain(displayName = "isDistinct", displayOnlyOnTrue = true) public boolean isDistinct() { return isDistinct; } public void setDistinct(boolean isDistinct) { this.isDistinct = isDistinct; } public List<PTFExpressionDef> getArgs() { return args; } public void setArgs(List<PTFExpressionDef> args) { this.args = args; } public void addArg(PTFExpressionDef arg) { args = args == null ? new ArrayList<PTFExpressionDef>() : args; args.add(arg); } @Explain(displayName = "arguments") public String getArgsExplain() { if (args == null) { return null; } StringBuilder builder = new StringBuilder(); for (PTFExpressionDef expression : args) { if (builder.length() > 0) { builder.append(", "); } builder.append(expression.getExprNode().getExprString()); } return builder.toString(); } public WindowFrameDef getWindowFrame() { return windowFrame; } public void setWindowFrame(WindowFrameDef windowFrame) { this.windowFrame = windowFrame; } @Explain(displayName = "window frame") public String getWindowFrameExplain() { return windowFrame == null ? null : windowFrame.toString(); } public GenericUDAFEvaluator getWFnEval() { return wFnEval; } public void setWFnEval(GenericUDAFEvaluator wFnEval) { this.wFnEval = wFnEval; } @Explain(displayName = "window function") public String getWFnEvalExplain() { return wFnEval == null ? null : wFnEval.getClass().getSimpleName(); } @Explain(displayName = "isPivotResult", displayOnlyOnTrue = true) public boolean isPivotResult() { return pivotResult; } public void setPivotResult(boolean pivotResult) { this.pivotResult = pivotResult; } }
Start parsing
Global variable resolution (1)
As before, let's start with the analysis of the imported package. Let's first look at the import packages:
import java.util.ArrayList; import java.util.List; import org.apache.hadoop.hive.ql.plan.Explain; import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
For the first two packages ArrayList and List as the most basic variables of java, we will not parse them here
org.apache.hadoop.hive.ql.plan.Explain import package resolution
We have parsed this package at the beginning, so we won't repeat it here
org. apache. hadoop. hive. ql.udf. generic. Generic udafevaluator import package resolution
As before, let's go to the apache official website API to find a detailed introduction to this package:
This is a class that aggregates many methods and variables. We don't analyze the methods one by one. When we need to use the methods, we will come back to do specific analysis
Global variable resolution (2)
Let's now look at the global variables defined:
String name; boolean isStar; boolean isDistinct; List<PTFExpressionDef> args; WindowFrameDef windowFrame; GenericUDAFEvaluator wFnEval; boolean pivotResult;
The name of String type, isStar, isDistinct and pivotResult of boolean type, and args of List type are the most basic variable types in java. We will not resolve them here
For windowFrame of WindowFrameDef type, this is a class under the PTFE folder:
In addition, wFnEval of GenericUDAFEvaluator type is the import package org apache. hadoop. hive. ql.udf. generic. The variable type introduced by GenericUDAFEvaluator
In addition, the element variable type contained in args of List type is ptf expressiondef class, which is also a class under the PTFE folder:
At this point, all global variables are resolved, and we begin to parse the code
Explain settings (1)
@Explain(displayName = "name")
What kind of statement is this? Let's watch the keyword @ export first@ Explain is actually calling our import package org apache. hadoop. hive. ql.plan. Explain. In this package, there is a displayName variable of its string type. The previous displayName = "name" statement sets the value of the variable displayName to "name".
getter and setter methods of parameter name
@Explain(displayName = "name") public String getName() { return name; }
public void setName(String name) { this.name = name; }
The getter and setter methods of the parameter name are used to get and set the name parameter.
Explain settings (2)
@Explain(displayName = "isStar", displayOnlyOnTrue = true)
This time, the Explain setting is to set the value of the variable displayName to "isStar", and set the global boolean type variable displayOnlyOnTrue in Explain to true
getter and setter methods of parameter isStar
@Explain(displayName = "isStar", displayOnlyOnTrue = true) public boolean isStar() { return isStar; }
public void setStar(boolean isStar) { this.isStar = isStar; }
The getter and setter methods of the parameter isStar are used to get and set the isStar parameter.
Explain settings (3)
@Explain(displayName = "isDistinct", displayOnlyOnTrue = true)
This time, the Explain setting is to set the value of the variable displayName to "isDistinct", and set the global boolean type variable displayOnlyOnTrue in Explain to true
getter and setter methods of parameter isDistinct
@Explain(displayName = "isDistinct", displayOnlyOnTrue = true) public boolean isDistinct() { return isDistinct; }
public void setDistinct(boolean isDistinct) { this.isDistinct = isDistinct; }
The getter and setter methods of the parameter isDistinct are used to get and set the isDistinct parameter.
getter and setter methods of parameter args
public List<PTFExpressionDef> getArgs() { return args; }
public void setArgs(List<PTFExpressionDef> args) { this.args = args; }
getter and setter methods for args parameter are used to get and set args parameter.
Method addArg
public void addArg(PTFExpressionDef arg) { args = args == null ? new ArrayList<PTFExpressionDef>() : args; args.add(arg); }
Let's look at the first sentence: args = args = = null? new ArrayList<PTFExpressionDef>() : args;, This sentence means that if args has not been initialized, initialize a new ArrayList type variable to args Then add the passed in parameters to the list
Explain settings (4)
@Explain(displayName = "arguments")
The Explain setting this time is to set the value of the variable displayName to "arguments"
Method getArgsExplain
@Explain(displayName = "arguments") public String getArgsExplain() { if (args == null) { return null; } StringBuilder builder = new StringBuilder(); for (PTFExpressionDef expression : args) { if (builder.length() > 0) { builder.append(", "); } builder.append(expression.getExprNode().getExprString()); } return builder.toString(); }
The first is the if judgment statement. If args is null, it will directly return null If it's not null, go ahead and build a StringBuilder. This type of variable can be understood as a string buffer. You can add strings to it, and then it will be spliced automatically at that time The for loop traverses all elements in args and assigns this element to the expression temporary variable Then, the if statement determines whether the builder has been used. If it has been used, a comma is automatically added to separate elements Then add expression getExprNode(). Return value of getexprstring() What function is this? In fact, we have written it before. Here we directly quote:
getter and setter methods of parameter windowFrame
public WindowFrameDef getWindowFrame() { return windowFrame; }
public void setWindowFrame(WindowFrameDef windowFrame) { this.windowFrame = windowFrame; }
The getter and setter methods of the parameter windowsframe are used to get and set the windowsframe parameters.
Explain settings (5)
@Explain(displayName = "window frame")
The Explain setting this time is to set the value of the variable displayName to "window frame"
Method getWindowFrameExplain
@Explain(displayName = "window frame") public String getWindowFrameExplain() { return windowFrame == null ? null : windowFrame.toString(); }
In the return statement, first judge whether the windowsframe is null. If yes, return null value. If not, return String type windowsframe
getter and setter methods of parameter wFnEval
public GenericUDAFEvaluator getWFnEval() { return wFnEval; }
public void setWFnEval(GenericUDAFEvaluator wFnEval) { this.wFnEval = wFnEval; }
The getter and setter methods of the parameter wFnEval are used to get and set the wFnEval parameter.
Explain settings (6)
@Explain(displayName = "window function")
The Explain setting this time is to set the value of the variable displayName to "window function"
Method getWFnEvalExplain
@Explain(displayName = "window function") public String getWFnEvalExplain() { return wFnEval == null ? null : wFnEval.getClass().getSimpleName(); }
In the return statement here, you also first judge whether the wFnEval variable is null. If yes, it returns null. If not, it returns wFnEval getClass(). The value of getSimpleName() Let's first look at the getClass () method, which returns a running Class object That is, the Class in which this thing runs, and the returned here is org apache. hadoop. hive. ql.udf. generic. GenericUDAFEvaluator then invokes the getSimpleName () method inside. This method is used to return the simple name of the basic Class given in the source code, that is, genericudafevaluator
Explain settings (7)
@Explain(displayName = "isPivotResult", displayOnlyOnTrue = true)
This time, the Explain setting is to set the value of the variable displayName to "isPivotResult", and set the global boolean type variable displayOnlyOnTrue in Explain to true
getter and setter methods of parameter pivotResult
@Explain(displayName = "isPivotResult", displayOnlyOnTrue = true) public boolean isPivotResult() { return pivotResult; }
public void setPivotResult(boolean pivotResult) { this.pivotResult = pivotResult; }
The getter and setter methods of the parameter pivotResult are used to get and set the pivotResult parameter.
So far, we have completed the windowfunctiondef Java file all the code analysis, also completed all the code analysis under the PTFE folder
Summary
This Blog is the last Blog. We have completed all the codes under two folders: mapper folder and PTFE folder In the process of parsing the code, we not only learned about Hive's underlying logic, but also learned a lot about Java. It was a very rewarding learning process