c++/java/go data structure and algorithm dynamic array

Original address

Array

Basic concepts

  • It is a data structure composed of a collection of elements of the same type, which allocates a continuous piece of memory for storage
  • The storage address corresponding to the element can be calculated by using the index of the element

Basic use of c + + array

#include <iostream>
#include <vector>
using namespace std;

int main()
{
    int a[10] = {0}; // Initialization is required, otherwise it will be an unknown value {1, 2, 0, 0, 4196048, 0, 4195696, 0, - 8688, 32767}

    a[0] = 1;
    a[1] = 2;

    a[10] = 10;      // c/c + + does not check the array out of bounds
    *(a + 10) = 11;

    // Create an array using new  
    int* pA = new int[10];
    pA[0] = 1;
    *(pA + 1) = 2;

    // After the memory is released, pA[0] is the initial value of 0  
    delete []pA;    // Don't leave it blank. The local variable is released after it is finished  

    return 0;
}

C attaches great importance to the efficiency of the runtime and does not perform the array cross-border check, while C + + inherits the efficiency requirements of C and does not perform the array cross-border check If the data is out of bounds, the compiler must add additional code to the generated object code to detect whether the subscript is out of bounds when the program runs, which will reduce the running speed of the program
First of all, we should be clear that the essence of accessing an array is to access a unit in a continuous memory. As long as the memory of this unit is available, the program will not crash. The reason for the unavailability of memory is usually the memory protection mechanism of the operating system, that is, the program will crash because it accesses the memory not allocated to it. (artificial restriction)

In addition, from the perspective of assembly implementation, although the declared array has capacity display, the assembly operation is still based on the address offset without additional inspection:

7	    int a[10] = {0}; 
   0x00000000004006f5 <+8>:	mov    QWORD PTR [rbp-0x30],0x0       //Array variable a is no different from ordinary variables, but an address on the stack  
   0x00000000004006fd <+16>:	mov    QWORD PTR [rbp-0x28],0x0
   0x0000000000400705 <+24>:	mov    QWORD PTR [rbp-0x20],0x0
   0x000000000040070d <+32>:	mov    QWORD PTR [rbp-0x18],0x0
   0x0000000000400715 <+40>:	mov    QWORD PTR [rbp-0x10],0x0

8	
9	    a[0] = 1;
   0x000000000040071d <+48>:	mov    DWORD PTR [rbp-0x30],0x1

10	    a[1] = 2;
   0x0000000000400724 <+55>:	mov    DWORD PTR [rbp-0x2c],0x2

11	
12	    a[10] = 10;      // c/c + + does not check the array out of bounds
   0x000000000040072b <+62>:	mov    DWORD PTR [rbp-0x8],0xa

java array basic use

    public static void main(String[] args) {
        // Static initialization, the compiler determines the length according to the number of elements
        int[] a = new int[] { 0, 1, 2, 3, 4 }; 
        System.out.println(Arrays.toString(a));

        // dynamic initialization 
        int[] b = new int[10];
        b[0] = 1;
        b[1] = 2;

        // b[10] = 10;    //  The ArrayIndexOutOfBoundsException runtime checks whether the array is out of bounds, and it cannot be detected at compile time  
        System.out.println(Arrays.toString(b));
    }

java array out of bounds will be checked at runtime, which can be checked through jdk1 8-byte code discovery details:

Classfile /root/work/vscode_prj/java/JavaDemo.class
  Last modified Feb 28, 2022; size 527 bytes
  MD5 checksum 39d8114753bae0b7d44aa6cb28d840c7
  Compiled from "JavaDemo.java"
public class JavaDemo
  minor version: 0
  major version: 52                       // edition
  flags: ACC_PUBLIC, ACC_SUPER            // public
Constant pool:
   #1 = Methodref          #6.#15         // java/lang/Object."<init>":()V
{
  public JavaDemo();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 4: 0                             // The position corresponding to line 4 of the code is: 0

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=4, locals=3, args_size=1
        ...
        34: bipush        10                 // int[] b = new int[10];  Push byte  
        36: newarray      int
        38: astore_2                         // Store reference into local variable
        39: aload_2                          // b[0] = 1; // Load reference from local variable 
        40: iconst_0                         // Push int constant , push 0  
        41: iconst_1                         // push 1 
        42: iastore                          // Store into int array (I = > int a = > array) will check the boundary and throw exceptions when storing  
        43: aload_2                          // b[1] = 2;     
        44: iconst_1
        45: iconst_2
        46: iastore
        47: return
      LineNumberTable:
        line 7: 0
        line 8: 24
        line 11: 34                           // int[] b = new int[10];
        line 12: 39                           // b[0] = 1;
        line 13: 43                           // b[1] = 2;
        line 17: 47
}
SourceFile: "JavaDemo.java"

From the current version jdk1 In the bytecode of 8, there is no check on the array boundary, but you can see the operation of arraylength and iaload to obtain the length of the array from the web article??
Different versions of jvm have different implementations. The operation used in the current version is iastore

If arrayref is null, iastore throws a NullPointerException.
Otherwise, if index is not within the bounds of the array referenced by arrayref, the iastore instruction throws an ArrayIndexOutOfBoundsException.

vscode uses the JVM Bytecode Viewer plug-in in Right click the class file and select View bytecode. idea click View > > show bytecode in the menu bar, and use the jclasslib plug-in to view the bytecode more clearly

go array basic usage

     1:	package main
     2:	
     3:	import "fmt"
     4:	
     5:	func main() {
     6:		var a [10]int // var a []int is an empty array
=>   7:		a[0] = 1
     8:		a[1] = 2
     9:		fmt.Println(a)
    10:	}

Check the assembly language of lines 7 and 8 through dlv:

	godemo.go:7	0x10cbaf3	48c744243001000000		mov qword ptr [rsp+0x30], 0x1
	godemo.go:8	0x10cbafc	48c744243802000000		mov qword ptr [rsp+0x38], 0x2

It can be seen from the assembly implementation that there is no boundary check at run time, and the array of go is checked at compile time, which can be viewed go source code

	// InvalidIndex occurs when an index argument is not of integer type,
	// negative, or out-of-bounds.
	//
	// Example:
	//  var s = [...]int{1,2,3}
	//  var x = s[5]
	//
	// Example:
	//  var s = []int{1,2,3}
	//  var _ = s[-1]
	//
	// Example:
	//  var s = []int{1,2,3}
	//  var i string
	//  var _ = s[i]
	InvalidIndex

Dynamic array

  • In actual programming, it often happens that the required memory space depends on the actual input data and cannot be determined in advance.
  • The length of static array is pre-defined. In the whole program, once the size is given, it cannot be changed. Dynamic arrays, on the other hand, can be resized as needed by the program.

C + + dynamic array

A Vector is a Sequence Container that encapsulates dynamic size arrays
Like any other type of container, it can hold various types of objects. It can be simply considered that a vector is a dynamic array that can store any type.

#include <iostream>  
#include <vector>
using namespace std;

int main()
{
    vector<int> vi = {1,2,3,4,5,6,7,8};
    cout << "size:" << vi.size() << ",cap:" << vi.capacity() << "" << endl;

    vi.push_back(9);
    cout << "size:" << vi.size() << ",cap:" << vi.capacity() << "" << endl;

    vector<int>::iterator it;
    for (it = vi.begin(); it != vi.end(); it++)
    {
        cout << *it << " ";
    }
    
    return 0;
}

Output log:

size:8,cap:8
size:9,cap:16   # Double capacity change
1 2 3 4 5 6 7 8 9 

From the perspective of stl implementation, the expansion is based on double expansion. The expansion strategy implemented by Microsoft is:_ Capacity = _Capacity + _Capacity / 2

In addition, it can be seen from the creation and assembly implementation of dynamic array that it is related to space allocator:

	    vector<int> vi = {1,2,3,4,5,6,7,8};
   0x0000000000400c0a <+13>:	lea    rax,[rbp-0x35]
   0x0000000000400c0e <+17>:	mov    rdi,rax
   0x0000000000400c11 <+20>:	call   0x400e70 <std::allocator<int>::allocator()>
   0x0000000000400c16 <+25>:	mov    r12d,0x401f40
   0x0000000000400c1c <+31>:	mov    r13d,0x8
   0x0000000000400c22 <+37>:	lea    rdi,[rbp-0x35]
   0x0000000000400c26 <+41>:	mov    rcx,r12
   0x0000000000400c29 <+44>:	mov    rbx,r13
   0x0000000000400c2c <+47>:	mov    rax,r12
   0x0000000000400c2f <+50>:	mov    rdx,r13
   0x0000000000400c32 <+53>:	mov    rsi,rcx
   0x0000000000400c35 <+56>:	lea    rax,[rbp-0x50]
   0x0000000000400c39 <+60>:	mov    rcx,rdi
   0x0000000000400c3c <+63>:	mov    rdi,rax
   0x0000000000400c3f <+66>:	call   0x400efe <std::vector<int, std::allocator<int> >::vector(std::initializer_list<int>, std::allocator<int> const&)>
   0x0000000000400c44 <+71>:	lea    rax,[rbp-0x35]
   0x0000000000400c48 <+75>:	mov    rdi,rax
   0x0000000000400c4b <+78>:	call   0x400e8a <std::allocator<int>::~allocator()>

Java dynamic array

The underlying data structure of the Java dynamic array ArrayList depends on the array:

    /**
     * The array buffer into which the elements of the ArrayList are stored.
     * The capacity of the ArrayList is the length of this array buffer. Any
     * empty ArrayList with elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA
     * will be expanded to DEFAULT_CAPACITY when the first element is added.
     */
    transient Object[] elementData; // non-private to simplify nested class access

The capacity expansion strategy is int newcapacity = oldcapacity + (oldcapacity > > 1), that is, newCap = oldCap + (oldCap / 2), which is the same as the C++ Vector capacity expansion strategy

    private void grow(int minCapacity) {
        // overflow-conscious code
        int oldCapacity = elementData.length;
        int newCapacity = oldCapacity + (oldCapacity >> 1);
        if (newCapacity - minCapacity < 0)
            newCapacity = minCapacity;
        if (newCapacity - MAX_ARRAY_SIZE > 0)
            newCapacity = hugeCapacity(minCapacity);
        // minCapacity is usually close to size, so this is a win:
        elementData = Arrays.copyOf(elementData, newCapacity);
    }

Example code:

import java.lang.reflect.Array;
import java.util.ArrayList;

// java demo 
public class JavaDemo {
    public static void main(String[] args) {
        ArrayList<Integer> al = new ArrayList<>();
        al.add(1);
        al.add(2);
        al.add(3);

        System.out.println(al.size());
    }
}

Go dynamic array

Slicing in Go language can also be regarded as a dynamic array. With the continuous change of capacity, there will be strategies to expand and shrink the capacity

From the growslice, we can see the expansion logic. If the capacity is less than 1024, it will be expanded to twice. If the capacity is greater than 1024, the capacity will be programmed with newcap += newcap / 4

func growslice(et *_type, old slice, cap int) slice {
  ...
  newcap := old.cap
	doublecap := newcap + newcap
	if cap > doublecap {
		newcap = cap
	} else {
		if old.cap < 1024 {
			newcap = doublecap
		} else {
			// Check 0 < newcap to detect overflow
			// and prevent an infinite loop.
			for 0 < newcap && newcap < cap {
				newcap += newcap / 4
			}
			// Set newcap to the requested cap when
			// the newcap calculation overflowed.
			if newcap <= 0 {
				newcap = cap
			}
		}
	}
  ...
}

Example code:

func main() {
	var s []int
	s = append(s, 1)
	s = append(s, 2)
	s = append(s, 3)
	s = append(s, 4)
	s = append(s, 5)
	s = append(s, 6)
	s = append(s, 7)
	s = append(s, 8)
	fmt.Println(len(s), cap(s))

	s = append(s, 9)
	fmt.Println(len(s), cap(s))
}

Printout:

8 8
9 16

Keywords: C++ Go data structure

Added by aleigh on Sat, 05 Mar 2022 16:02:10 +0200