Operation principle of LUA on C: basis

I sketch

I'm studying the design and implementation of lua recently. I'm glad to share my experience with you here. If there is any mistake, please don't hesitate to give me advice.
When I first learned lua, I often had these doubts:

  • Why can lua variables point to any type?
  • How does lua load and run dynamically? Is it compiled into machine code and executed by the operating system, or is it interpreted by a dependent c loop?

This method is mainly to answer the above questions. If you have any doubts about the specific details, you can read the materials (recommendation lua design and Implementation) or leave a message to me. I should know everything and say everything~

II Data reading and writing

Take number reading and writing as an example

	lua_State *L = luaL_newstate();
	luaL_openlibs(L);
	lua_pushnumber(L, 666);
	if (lua_isnumber(L, -1)) {
		cout << lua_tonumber(L, -1) << endl;
	}
	lua_close(L);

Type the code to get the following output

1.lua_pushNumber

LUA_API void lua_pushnumber (lua_State *L, lua_Number n) {
  lua_lock(L);
  setfltvalue(L->top, n);
  api_incr_top(L);
  lua_unlock(L);
}

#define setfltvalue(obj,x) \
  { TValue *io=(obj); val_(io).n=(x); settt_(io, LUA_TNUMFLT); }
  
#define val_(o)		((o)->value_)
#define settt_(o,t)	((o)->tt_=(t))

Get the top(TValue type) of L, top value_ . N = n (666 we typed), top.tt_ = lua_tnumflt (type enumeration).
What are the so-called L and TValue?

a. Stack

L refers to lua virtual state machine. Functionally, it can be understood as an operating system, which is responsible for maintaining lua execution state / context. L maintains a stack, L - > top, that is, L - > stack - > top (note that the meaning of this top is the next available space). It is worth mentioning that when indexing the stack, the index is positive from bottom to top, and negative numbers are vice versa.

b.TValue

typedef struct lua_TValue {
  TValuefields;
} TValue;

#define TValuefields	Value value_; int tt_

typedef union Value {
  GCObject *gc;    /* collectable objects */
  void *p;         /* light userdata */
  int b;           /* booleans */
  lua_CFunction f; /* light C functions */
  lua_Integer i;   /* integer numbers */
  lua_Number n;    /* float numbers */
} Value;

From the source code, TValue has two members: value_ And tt_. The nature of the union allows value_ It can be one of its package types and can be logically regarded as a pointer to data. And TT_ Is an enumeration used to mark the data type.

2.lua_tonumber

LUA_API lua_Number lua_tonumberx (lua_State *L, int idx, int *pisnum) {
  lua_Number n;
  const TValue *o = index2addr(L, idx);
  int isnum = tonumber(o, &n);
  if (!isnum)
    n = 0;  /* call to 'tonumber' may change 'n' even if it fails */
  if (pisnum) *pisnum = isnum;
  return n;
}

#define tonumber(o,n) \
	(ttisfloat(o) ? (*(n) = fltvalue(o), 1) : luaV_tonumber_(o,n))
#define fltvalue(o)	check_exp(ttisfloat(o), val_(o).n)

/**Inspection type**/
#define ttisfloat(o)		checktag((o), LUA_TNUMFLT)
#define checktag(o,t)		(rttype(o) == (t))
#define rttype(o)	((o)->tt_)

Find o according to the index, check the type of O, and return the value of the corresponding type if it is legal. Thanks to the value and type saved by TValue, all operations in this step are natural and natural.

III Compile and run

Compared with the simplicity of data reading and writing (not talking about the operation algorithms of table and string), the compilation and operation of lua is a little complicated. However, don't worry, we still only care about the above "how does Lua load and run dynamically?" This problem is discussed, and some details of compilation and operation are ignored for the time being. Let's look at the following example.
Create a new Lua script testlua02 lua:

local coder = "The widows are Coding"
print(coder .. " I don't work today ^0^!")

Type the code in the c++ main function

	lua_State *L = luaL_newstate();
	luaL_openlibs(L);
	int stat = luaL_loadfile(L, "luaTest02.lua") | lua_pcall(L, 0, 0, 0);
	if (stat)
	{
		cout << "error" << endl;
	}
	else
	{
		cout << "succ" << endl;
	}
	lua_close(L);

function

1. Loading and compiling

We use the above code lual_ Load file (L, "luaTest02.lua") for debugging and follow f_parser:

static void f_parser (lua_State *L, void *ud) {
  LClosure *cl;
  struct SParser *p = cast(struct SParser *, ud);
  int c = zgetc(p->z);  /* read first character */
  if (c == LUA_SIGNATURE[0]) {
    checkmode(L, p->mode, "binary");
    cl = luaU_undump(L, p->z, p->name);
  }
  else {
    checkmode(L, p->mode, "text");
    cl = luaY_parser(L, p->z, &p->buff, &p->dyd, p->name, c);
  }
  lua_assert(cl->nupvalues == cl->p->sizeupvalues);
  luaF_initupvals(L, cl);
}

In short, parser performs lexical analysis according to the input, encodes it through syntax analysis, generates closures, and then pushes it into the stack to wait for the call. Let's look at some data structures used.

a. LClosure

typedef struct LClosure {
  ClosureHeader;
  struct Proto *p;
  UpVal *upvals[1];  //Captured external local variables
} LClosure;

This is the Closure of lua. In addition, CClosure is the Closure of c. they are wrapped by the Closure consortium.

b.Proto

typedef struct Proto {
  CommonHeader;
  lu_byte numparams;  /* number of fixed parameters */
  lu_byte is_vararg;
  lu_byte maxstacksize;  /* number of registers needed by this function */
  int sizeupvalues;  /* size of 'upvalues' */
  int sizek;  /* size of 'k' */
  int sizecode;
  int sizelineinfo;
  int sizep;  /* size of 'p' */
  int sizelocvars;
  int linedefined;  /* debug information  */
  int lastlinedefined;  /* debug information  */
  TValue *k;  /* constants used by the function */
  Instruction *code;  //codes
  struct Proto **p;  /* functions defined inside the function */
  int *lineinfo;  /* map from opcodes to source lines (debug information) */
  LocVar *locvars;  /* information about local variables (debug information) */
  Upvaldesc *upvalues;  /* upvalue information */
  struct LClosure *cache;  /* last-created closure with this prototype */
  TString  *source;  /* used for debug information */
  GCObject *gclist;
} Proto;

Instruction *code; Note this variable, which is the pointer to the bytecode array generated after compilation.

c.FuncState

typedef struct FuncState {
  Proto *f;  /* current function header */
  struct FuncState *prev;  /* enclosing function */
  struct LexState *ls;  /* lexical state */
  struct BlockCnt *bl;  /* chain of current blocks */
  int pc;  /* next position to code (equivalent to 'ncode') */
  int lasttarget;   /* 'label' of last 'jump label' */
  int jpc;  /* list of pending jumps to 'pc' */
  int nk;  /* number of elements in 'k' */
  int np;  /* number of elements in 'p' */
  int firstlocal;  /* index of first local var (in Dyndata array) */
  short nlocvars;  /* number of elements in 'f->locvars' */
  lu_byte nactvar;  /* number of active local variables */
  lu_byte nups;  /* number of upvalues */
  lu_byte freereg;  /* first free register */
} FuncState;

Funcstates are nested with each other. The external FuncState saves some internal information, and the f member of the outermost FuncState saves all compiled bytecode and passes it to the closure LClosure.

d. Brief description of compilation process

Take loading lua scripts as an example.

  1. f_parser calls luaY_parser parses and initializes Upvalues (external local variables).
  2. luaY_parser uses LexState to wrap FuncState and calls luaX_next for further analysis, the results are saved in the code array of Proto structure, passed to LClosure and pushed into the stack.
  3. luaX_next loop analysis, call luak according to lexical and grammatical rules_ Code generates bytecode.

Analysis part code:

static void statement (LexState *ls) {
  int line = ls->linenumber;  /* may be needed for error messages */
  enterlevel(ls);
  switch (ls->t.token) {
    case ';': {  /* stat -> ';' (empty statement) */
      luaX_next(ls);  /* skip ';' */
      break;
    }
    case TK_IF: {  /* stat -> ifstat */
      ifstat(ls, line);
      break;
    }
    //.....................
 }
}

2. Operation

After compiling the code, you can parse and run the closure. Debug code Lua above_ Pcall (L, 0, 0, 0) code, followed by luaD_call:

void luaD_call (lua_State *L, StkId func, int nResults) {
  if (++L->nCcalls >= LUAI_MAXCCALLS)
    stackerror(L);
  if (!luaD_precall(L, func, nResults))  /* is a Lua function? */
    luaV_execute(L);  /* call it */
  L->nCcalls--;
}
}

Call luad first_ Precall for preparatory work, Lua_ State extension base_ci(CallInfo type) array creates a new element to save the state of the call stack including the instruction pointer of the virtual machine (lua_state - > savedpc), so as to recover the call stack after the call is completed, and point the instruction pointer to the instruction array of the closure (closure - > p - > codes).
Then call luaV_. Execute cycle takes out the instruction and runs it.
luaV_execute explains the code of the execution part:

void luaV_execute (lua_State *L) {
  CallInfo *ci = L->ci;
  LClosure *cl;
  TValue *k;
  StkId base;
  ci->callstatus |= CIST_FRESH;  /* fresh invocation of 'luaV_execute" */
 newframe:  /* reentry point when frame changes (call/return) */
  lua_assert(ci == L->ci);
  cl = clLvalue(ci->func);  /* local reference to function's closure */
  k = cl->p->k;  /* local reference to function's constant table */
  base = ci->u.l.base;  /* local copy of function's base */
  /* main loop of interpreter */
  for (;;) {
    Instruction i;
    StkId ra;
    vmfetch();
    vmdispatch (GET_OPCODE(i)) {
      vmcase(OP_MOVE) {
        setobjs2s(L, ra, RB(i));
        vmbreak;
      }
   	//............................
  }
}

a.CallInfo

When the function executes, lua_state understand the status information of the function through the CallInfo data structure, and use the CallInfo group base_ci grows up and down to maintain the call stack.

typedef struct CallInfo {
  StkId func;  /* function index in the stack */
  StkId	top;  /* top for this function */
  struct CallInfo *previous, *next;  /* dynamic call link */
  union {
    struct {  /* only for Lua functions */
      StkId base;  /* base for this function */
      const Instruction *savedpc;
    } l;
    struct {  /* only for C functions */
      lua_KFunction k;  /* continuation in case of yields */
      ptrdiff_t old_errfunc;
      lua_KContext ctx;  /* context info. in case of yields */
    } c;
  } u;
  ptrdiff_t extra;
  short nresults;  /* expected number of results from this function */
  unsigned short callstatus;
} CallInfo;

III epilogue

This is the mental method that I passed on to my Taoist friends. To understand the details, you need to read the source code and read the materials. Everything depends on the nature of your Taoist friends.
Maybe there is the principle of LUA on C: advanced

Keywords: C lua

Added by jmdavis on Sun, 26 Dec 2021 23:03:59 +0200