Back end development interview questions

Back end development interview knowledge point Outline:

Language class (C + +):

Keyword function explanation:

volatile action

Volatile The first characteristic of keywords: variability. The so-called variability, reflected at the assembly level, is two statements. The next statement will not directly use the corresponding one of the previous statement volatile The register contents of variables, but re read from memory.

Volatile The second characteristic of the keyword: "non optimization" characteristic. volatile Tell the compiler not to carry out all kinds of radical optimization on my variable, or even eliminate the variable directly, so as to ensure that the instructions written by the programmer in the code will be executed.
Volatile The third characteristic of keywords: "sequencing", which can ensure Volatile Because of the order between variables, the compiler will not optimize out of order.
C/C++ Volatile Variables, and non Volatile Operations between variables may be exchanged sequentially by the compiler. C/C++ Volatile Operations between variables are not exchanged by the compiler. Even if all variables are declared as volatile,Even if the out of order optimization of the compiler is eliminated, for the generated assembly code, CPU It is possible that the instructions may still be executed out of order, resulting in errors in the logic relied on by the program, volatile There's nothing I can do about it
 For this multi-threaded application, the real right way is to build a happens-before Semantics.

[C/C++ Volatile Keyword depth analysis](


Controls the storage and visibility of variables. 

(1)Modify local variables

In general, local variables are stored in the stack area, and the life cycle of local variables ends at the end of the execution of the statement block. But if you use static If modified, the variable is stored in the static data area, and its life cycle lasts until the end of the whole program execution. But it should be noted here that although it is used static After modifying the local variable, its life cycle and storage space have changed, but its scope has not changed. It is still a local variable, and its scope is limited to the statement block.

(2)Modify global variables

For a global variable, it can be accessed not only in this source file, but also in other source files of the same project(Just use extern Just make a statement). use static Modifying the global variable changes the scope of its scope from the original visible in the whole project to the visible in the source file.

(3)Modifier function

use static Modifying a function is similar to modifying a global variable, that is, it changes the scope of the function.

(4)C++Medium static

If in C++For a function in a class static If it is modified, it means that the function belongs to a class rather than any specific object of this class; If you modify a variable in a class static Modifier, indicating that the variable is owned by the class and all its objects. They all have only one copy in the storage space. It can be called through classes and objects.	

The meaning and implementation mechanism of const

 const Called a constant qualifier, it is used to qualify a specific variable to inform the compiler that the variable is immutable. Habitual use const,It can avoid possible changes to some variables that should not be modified in the function.
(1)const Modify basic data type

 1.const Modify general constants and arrays
 Basic data type, modifier const It can be used before or after the type specifier, and the result is the same. When using these constants, just don't change the value of these constants. 
 2.const Modifier pointer variable*And reference variables&  
If const Located in asterisk*Then, left const It is used to modify the variable pointed by the pointer, that is, the pointer points to a constant;

If const To the right of the asterisk, const Is to modify the pointer itself, that is, the pointer itself is a constant.

(2)const Apply to function,  

 1.As a parameter const Modifier 
 When calling a function, initialize with the corresponding variable const Constant, in the function body, according to const The modified part is often quantified,Protects the properties of the original object.
 [be careful]: parameter const Usually used when the parameter is a pointer or reference; 
 2.As the return value of the function const Modifier 
 After declaring the return value, const according to"Modification principle"Modify and play a corresponding protective role.

(3)const Usage in class

Cannot initialize in class declaration const Data member. Correct use const The implementation method is: const Data members can only be initialized in the initialization table of the class constructor
 Member functions in class: A fun4()const; The meaning is that you cannot modify any variables of your class.

(4)const Modify class objects and define constant objects 
Constant objects can only call constant functions, and other member functions cannot be called.


stay C In languages, modifiers extern Used before the declaration of a variable or function to describe "this variable"/The function is defined elsewhere and should be referenced here.

be careful extern The location of the declaration also has a relationship to its scope. If it is in main Function can only be declared in main Calls in functions can not be invoked in other functions. In fact, to call functions and variables in other files, you only need to use this file#include can be included. Why use extern? Because using extern will speed up the compilation process of the program, which can save time.

stay C++in extern There is another function for indicating C perhaps C++Function call specification. Like in C++Call in C Library function, you need to C++Used in program extern "C"Declare the function to reference. This is for the linker. Tell the linker to use it when linking C Function specification to link. The main reason is C++and C After the program is compiled, the naming rules in the object code are different, which is used to solve the problem of name matching.

Macro definition is different from expansion and inline functions,

Inline functions are functions where code is inserted into the caller's code. like #define macro and inline function improve execution efficiency by avoiding the cost of being called, especially it can be optimized by the compiler through calling ("procedural integration"). Macro definitions do not check function parameters and return values, but expand them. Relatively speaking, inline functions check parameter types, so they are safer. 	 Inline functions are similar to macros, but the difference is that macros are replaced by preprocessors, while inline functions are implemented through compiler control. Moreover, the inline function is a real function. Only when it needs to be used, the inline function expands like a macro, so the parameter stack of the function is cancelled and the calling overhead is reduced.

Macros are the input of the precompiler, and then the results of macro expansion will be sent to the compiler for syntax analysis. Macros and functions are at different levels and operate on different entities. Macros operate on token, Can proceed token Operations such as replacement and connection work before parsing. Function is a concept in language. It will create corresponding entities in the syntax tree. Inlining is only an attribute of function.
For the question: what is the use of functions? The answer is: first, functions cannot completely replace macros. Some macros can generate some variables in the current scope, but functions cannot. 2:  Inline function is just a kind of function. Inline is a hint to the compiler that it is best to expand the function at the called place to save the cost of a function call (stack pressing, jump and return)

Inline functions also have some limitations. That is, there should not be too much execution code in the function. If the function body of the inline function is too large, the general compiler will give up the inline method and call the function in an ordinary way. In this way, the execution efficiency of inline functions is the same as that of ordinary functions

Inline functions must be declared with the function body to be valid.

[Difference between macro definition and inline function](

Library function implementation:

malloc,strcpy,strcmp implementation, common library function implementation, which library functions belong to high-risk functions

STL principle and Implementation:

STL is implemented in various types of containers. STL has six components

STL provides six components that can be combined with each other:

1,Container( Containers): Various data structures, such as sequential containers vector,list,deque,Associative container set,map,multiset,multimap. Used to store data. From an implementation perspective, STL Container is a class template. 

2,Algorithm( algorithms): Various common algorithms, such as: sort,search,copy,erase. From an implementation perspective, STL Algorithm is a function template. Pay attention to one question: any one STL Algorithms need to obtain the interval marked by a pair of iterators to represent the operation range. The intervals marked by this pair of iterators are closed before open, for example[first, last)

3,Iterator( iterators): The glue between the container and the algorithm is the so-called "generic pointer". There are five types, as well as other derivative changes. From an implementation point of view, iterators are operator*,operator->,operator++,operator- - Overloaded by pointer related operations class template. All STL Containers have their own iterators. Only the container itself knows how to traverse its own elements. Native pointer(native pointer)It is also an iterator.

4,Imitative function( functors): The behavior is similar to the function, which can be used as a strategy of the algorithm( policy). From the perspective of implementation, imitation function is an overload operator()of class or class template. General function pointers can also be regarded as parody functions in a narrow sense.

5,Adapter( adapters): Something used to decorate containers, functors, iterators, and interfaces. For example: STL Provided queue and stack,Although it looks like a container, it can only be regarded as a container adapter, because their bottom is completely supported by deque,All operations are performed by the underlying deque Supply. change functors Interface, called function adapter;change container Interface, called container adapter;change iterator Interface, called iterator adapter. 

6,Configurator( allocators): Responsible for space configuration and management. From the perspective of implementation, the configurator is a device that realizes dynamic space configuration, space management and space release class template. 

The interaction between these six components: container((container) through allocator(Configurator) to obtain data storage space, algorithm(Algorithm) pass iterator(Iterator access container(Container) contents, functor(Imitation function) can help algorithm(Algorithm) complete different strategy changes, adapter(Adapter) can be modified or sleeved functor(Imitation function)

Sequential containers:
vector-Array, reallocate memory when there are not enough elements, and copy the elements of the original array to the newly allocated array.
list-Single linked list.
deque-Distribution central controller map(be not map container),map Records the addresses of a series of fixed length arrays.Remember this map Only the address of the array is saved,The real data is stored in the array.deque First from map Central position(Because of the two-way queue, elements can be inserted before and after)Find an array address and put data into the array. When the array is not enough, continue to map Find a free array to store data. When map When it is not enough, reallocate memory as a new one map,Put the original map Content in copy New map Yes. So use deque The complexity of is greater than vector,Try to use vector. 

stack-be based on deque. 
queue-be based on deque. 
heap-Complete binary tree, using the largest heap sort to array(vector)Store in the form of.
priority_queue-be based on heap. 
slist-Two way linked list.

Associative containers:
set,map,multiset,multimap-Based on red black tree(RB-tree),A binary search tree with additional equilibrium conditions.

hash table-Hash table. Save the data to be saved key After mapping, the function becomes an array(Generally vector)Index of data, for example key%Size of array=Index of array(General text can also be converted into numbers by algorithm),The data is then treated as an array element of this index. Some data key After algorithm conversion, it may be the index value of the same array(The collision problem can be solved by linear detection and secondary detection),STL It is solved by the open chain method. Each element of the array maintains one list,He put the data with the same index value into a list,So when list It is relatively short, and the algorithms of deletion, insertion and search are relatively fast.

hash_map,hash_set,hash_multiset,hash_multimap-be based on hashtable. 

[six STL components](
What is a "standard non STL container"?

What's the difference between list and vector?

vector It has a continuous memory space, so it supports random access. If efficient random access is required, regardless of the efficiency of insertion and deletion, it can be used vector. 
list It has a discontinuous memory space, so it does not support random access. If you need a large number of inserts and deletions and don't care about random access, you should use it list. 

Virtual function:

The function and implementation principle of virtual function. What is virtual function and what is its function?

C++Polymorphism can be divided into static polymorphism (compile time polymorphism) and dynamic polymorphism (runtime polymorphism). Static polymorphism is realized by overloading and template; Dynamic polymorphism is embodied by the protagonist virtual function in this paper.	
Implementation principle of virtual function:Including virtual function table, virtual function pointer, etc 

The function of virtual function is that when calling a virtual function, the executed code must be consistent with the dynamic type of the object calling the function. What the compiler needs to do is how to efficiently implement this feature. The implementation details of different compilers are also different. Most compilers pass vtbl(virtual table)and vptr(virtual table pointer)To achieve. When a class declares a virtual function or inherits a virtual function, the class will have its own vtbl. vtbl In fact, it is a function pointer array. Some compilers use linked lists, but the methods are the same. vtbl Each element in the array corresponds to a function pointer pointing to a virtual function of the class, and each object of the class will contain a vptr,vptr Point to this vtbl Your address.


Each class that declares a virtual function or inherits a virtual function will have its own vtbl
 At the same time, each object of this class will contain a vptr To point to the vtbl
 Virtual functions are placed in the order in which they are declared vtbl In the table, vtbl Each element in the array corresponds to a function pointer pointing to the virtual function of the class
 If the subclass overrides the virtual function of the parent class, it will be placed in the position of the original virtual function of the parent class in the virtual table
 In the case of multiple inheritance, each parent class has its own virtual table. The member function of the subclass is placed in the table of the first parent class

Derivative question: why is it slower to access virtual functions than ordinary functions in C + +?

The performance of single inheritance is similar, but it will be slow when multiple inheritance

Call performance

It can be seen from the calling process of the front virtual function. When calling a virtual function, the procedure is as follows (quoted from More Effective C++):

By object vptr Class found vtbl. This is a simple operation,Because the compiler knows where to find it in the object vptr(After all, they are placed by the compiler). Therefore, this price is only an offset adjustment(To get vptr)And indirect addressing of a pointer(To get vtbl). 
Find corresponding vtbl Pointer to the called function within. This is also very simple, Because the compiler creates vtbl A unique index is assigned within. The price of this step is only vtbl An offset within the array.
Call the function pointed to by the pointer found in step 2.
In the case of single inheritance, the cost of calling virtual functions is basically the same as that of non virtual functions. On most computers, it executes few more instructions, so many people generalize that the performance of virtual functions is not very scientific. In the case of multiple inheritance, multiple parent classes will be generated according to multiple parent classes vptr,Looking for in objects vptr The offset calculation will become more complex, but these are not the performance bottlenecks of virtual functions. The cost of virtual function running is that virtual function cannot be an inline function. This is also very easy to understand, because inline function refers to the instruction that replaces the function call with the called function body itself during compilation, but the "virtual" of virtual function refers to "which function to call can not be known until runtime." Only when the function is inlined can we know which function is inlined when it is running. Of course, if a virtual function is called directly through an object, it can be inlined, but most virtual functions are called through the pointer or reference of the object, and this call cannot be inlined. Because this call is a standard call, virtual functions cannot actually be inlined.

Occupied space

In the principle of virtual function implementation above, it can be seen that in order to realize the runtime polymorphism mechanism, the compiler will automatically establish a virtual function table for each class containing virtual functions or inheriting virtual functions. Therefore, one cost of virtual functions is to increase the volume of classes. In classes with few virtual function interfaces, this cost is not obvious. Virtual function table vtbl The volume of is equivalent to the volume of several function pointers, if you have a large number of classes or a large number of virtual functions in each class,You'll find vtbl It takes up a lot of address space. But this is not the main cost. The main cost occurs in the process of class inheritance. In the above analysis, it can be seen that when the subclass inherits the virtual function of the parent class, the subclass will have its own vtbl,If the subclass only covers one or two virtual function interfaces of the parent class, the subclass vtbl The rest of the content will be repeated with the parent class. This will cause a lot of address space waste if there are a large number of subclass inheritance and the virtual function interface rewriting the parent class only accounts for a small part of the total. In some GUI It is common for a large number of subclasses in the library to inherit from the same parent class and cover only one or two virtual functions, which leads to UI The memory occupied by the library is significantly larger. Due to virtual function pointer vptr The existence of virtual functions will also increase the volume of each object of this class. In the case of single inheritance or no inheritance, each object of the class will have one more vptr The volume of the pointer, that is, 4 bytes; In the case of multiple inheritance, each object of the class will be multiple N One( N=Number of parent classes containing virtual functions) vptr The volume of, which is 4 N Bytes. When the object volume of a class is large, the cost is not obvious, but when the object of a class is light, for example, the member variable has only 4 bytes, then add 4 (or 4) N)Bytes vptr,The volume of the object is equivalent to turning 1 (or N)Times, the price is very high.

Analysis of C + + virtual function

Pure virtual function, why do you need pure virtual function?

Pure virtual function is a virtual function declared in the base class. It is not defined in the base class, but any derived class is required to define its own implementation method. The method of realizing pure virtual function in base class is to add after function prototype“=0"

virtual void funtion1()=0

1,In order to facilitate the use of polymorphism, we often need to define virtual functions in the base class.
2,In many cases, it is unreasonable for the base class itself to generate objects. For example, as a base class, animals can derive subclasses such as tigers and peacocks, but the objects generated by animals themselves are obviously unreasonable.

In order to solve the above problems, the concept of pure virtual function is introduced, and the function is defined as pure virtual function (method: virtual ReturnType Function()= 0;),The compiler requires that it must be overridden in derived classes to achieve polymorphism. At the same time, a class containing pure virtual functions is called an abstract class, which cannot generate objects. In this way, the above two problems are well solved. The class that declares a pure virtual function is an abstract class. Therefore, a user cannot create an instance of a class, only an instance of its derived class.

The purpose of defining a pure virtual function is to make the derived class only the interface of the inherited function.
The meaning of pure virtual function allows all class objects (mainly derived class objects) to perform the actions of pure virtual function, but class cannot provide a reasonable default implementation for pure virtual function. So the declaration of class pure virtual function is to tell the designer of subclass, "you must provide an implementation of pure virtual function, but I don't know how you will implement it".

[Difference between virtual function and pure virtual function](

Why virtual destructors are needed and when not? Why should the destructor of the parent class be defined as a virtual function

Generally, the destructor of a class releases memory resources. If the destructor is not called, it will cause memory leakage. This is done so that when an object of a derived class is deleted with a pointer to a base class, the destructor of the derived class will be called.
Of course, it is not necessary to write the destructors of all classes as virtual functions. Because when there are virtual functions in a class, the compiler will add a virtual function table to the class to store virtual function pointers, which will increase the storage space of the class. Therefore, destructors are written as virtual functions only when a class is used as a base class.

Can inline functions, constructors and static member functions be virtual functions?

inline, static, constructor None of the three functions can have virtual keyword.
inline It is a compile time expansion and must have entities;
static belong to class Their own, there must also be entities;
virtual Function based vtable(Memory space), constructor Function if yes virtual When calling, you also need to vtable Looking, but constructor yes virtual You can't find it because constructor They don't exist and can't be created class Instance of, no instance, class Members of (except public static/protected static for friend class/functions,The rest, whether or not virtual)Can't be accessed.

Virtual functions cannot actually be inlined:The cost of virtual function running is that virtual function cannot be an inline function. This is also very easy to understand, because inline function refers to the instruction that replaces the function call with the called function body itself during compilation, but the "virtual" of virtual function refers to "which function to call can not be known until runtime." Only when the function is inlined can we know which function is inlined when it is running. Of course, if a virtual function is called directly through an object, it can be inlined, but most virtual functions are called through the pointer or reference of the object, and this call cannot be inlined. Because this call is a standard call, virtual functions cannot actually be inlined.

Constructors cannot be virtual functions. Moreover, the virtual function is called in the constructor, and the corresponding function of the parent class is actually executed because it has not been constructed yet., Polymorphism is by disable of 

Static objects belong to the whole class. For an object, the pointer storage of its function is different from that of general member functions. It cannot become the pointer of an object's virtual function to realize the resulting dynamic mechanism.			

Can virtual functions be called in constructors?

Finally, some common questions about virtual functions are summarized:

1) Virtual functions are dynamically bound, that is, using the pointer and reference of virtual functions can correctly find the corresponding function of the actual class, rather than executing the function that defines the class. This is the basic function of virtual function, so it is no longer explained. 

2) Constructors cannot be virtual functions. Moreover, the virtual function is called in the constructor, and the corresponding function of the parent class is actually executed because it has not been constructed yet., Polymorphism is by disable of 

3) In this kind of structure, it is often a complex function, and it must be a virtual function.
4) Defining a function as a pure virtual function actually defines this class as an abstract class and cannot instantiate an object. 

5) Pure virtual functions usually do not have a defined body, but they can also have it.

6)  The destructor can be pure virtual, but the pure virtual destructor must have a definition body, because the call of the destructor is implicit in the subclass. 

7) An impure virtual function must have a definition body, otherwise it is an error. 

8) Of derived classes override The virtual function definition must be exactly the same as the parent class. With one exception, if the return value in the parent class is a pointer or reference, the child class override Can return the derivation of this pointer (or reference). For example, in the above example, in Base Defined in virtual Base* clone(); stay Derived Can be defined as virtual Derived* clone(). As you can see, this relaxation is very important for Clone Patterns are very useful.

Virtual destructor (√), pure virtual destructor (√), virtual constructor (X)

Why virtual inheritance? Analysis of virtual inheritance implementation principle,

Virtual inheritance is a unique concept in multiple inheritance. Virtual base classes appear to solve multiple inheritance.
as:class D Inherit from class B1,B2,And class B1,B2 All inherit from class A,So in class D Class appears twice in A Variables and functions in. To save memory space, you can B1,B2 yes A Inheritance is defined as virtual inheritance, and A It becomes a virtual base class,Virtual inheritance is rarely used in general applications, so it is often ignored, mainly because in C++In, multiple inheritance is not recommended and is not commonly used. Once you leave multiple inheritance, virtual inheritance will completely lose the necessity of existence, because it will only reduce efficiency and occupy more space.

The characteristic of virtual inheritance is that in any derived class virtual The base class is always represented by the same (shared) object,

C + + virtual inheritance

Design mode:

C + + singleton mode:

Static is not a singleton (Singleton) pattern:
first, The initialization order of static member variables does not depend on the constructor, It depends on your mood, The initialization order cannot be guaranteed (Extreme situation: have a b Two member objects, b Need to a Passed as initialization parameter, Your class must have a constructor, And ensure the initialization sequence).

second, The most serious problem, Lost the important characteristics of facing objects -- "polymorphic", Static member method cannot be virtual of. Log Subclasses of class cannot enjoy "polymorphic" Convenience brought by.
class Log {
  static void Write(char const *logline);
  static bool SaveTo(char const *filename);
  static std::list<std::string> m_data;

In log.cpp we need to add

std::list<std::string> Log::m_data;

Hungry man model:
Hungry Han mode means that a single instance is initialized immediately when the program is running:

class Log {
  static Log* Instance() {
    return &m_pInstance;

  virtual void Write(char const *logline);
  virtual bool SaveTo(char const *filename);

  Log();              // ctor is hidden
  Log(Log const&);    // copy ctor is hidden

  static Log m_pInstance;
  static std::list<std::string> m_data;

// in log.cpp we have to add
Log Log::m_pInstance;
The problem with this model is also obvious, Class is now polymorphic, But the initialization order of static member variables is still not guaranteed.

Lazy mode (stack-Rough version)
A singleton instance is initialized only when it is first used:

class Log {

  static Log* Instance() {
    if (!m_pInstance)
      m_pInstance = new Log;
    return m_pInstance;

  virtual void Write(char const *logline);
  virtual bool SaveTo(char const *filename);

  Log();        // ctor is hidden
  Log(Log const&);    // copy ctor is hidden

  static Log* m_pInstance;
  static std::list<std::string> m_data;

// in log.cpp we have to add
Log* Log::m_pInstance = NULL;
Instance() Only when called for the first time m_pInstance Allocate memory and initialize. Um, It seems that all the problems have been solved, The initialization sequence is guaranteed, Polymorphism is no problem.
When the program exits, Destructor not executed. This can lead to resource leakage on some systems with unreliable design, Such as file handle, socket connect, Memory, etc
 For this problem, The solution to soil comparison is, For each Singleton Class to add a destructor() method:

Lazy mode (Local static variable-Best edition)

It is also called Meyers Singleton [Meyers]:

class Log {
  static Log& Instance() {
    static Log theLog;
    return theLog;

  virtual void Write(char const *logline);
  virtual bool SaveTo(char const *filename);

  Log();          // ctor is hidden
  Log(Log const&);      // copy ctor is hidden
  Log& operator=(Log const&);  // assign op is hidden

  static std::list<std::string> m_data;

stay Instance() The benefits of defining local static variables within functions are, theLog `` The constructor of will only be called the first time ``Instance() Is initialized when, Reached and "Stack version" Same dynamic initialization effect, Guaranteed member variables and Singleton Initialization order of itself.

It also has a potential safety measure, Instance() Returns a reference to a local static variable, If a pointer is returned, Instance() The caller of is likely to mistakenly assume that he wants to check the validity of the pointer, And responsible for destruction. Constructors and copy constructors have also been privatized, Users of such classes cannot instantiate themselves.

in addition, Multiple different Singleton The deconstruction order of the instance is opposite to the construction order.

[C++ Singleton (Single case) Mode optimal implementation](

Design a class that cannot be inherited with C + +.

The constructor or destructor is a private function, so this class cannot be inherited,

How to define a class that can only define objects on the heap? On the stack

Classes that can only be instantiated on heap memory: define destructors as private,Destructors cannot be called automatically on the stack, but only manually. You can also define a constructor as private,But in this way, you need to manually write a function to realize the construction of the object.

Classes that can only be instantiated on stack memory: function operator new and operator delete Defined as private,Use this way new Operator cannot be called when creating an object operator new,delete Destroy objects cannot be called operator delete. 

[Design a class that can only be instantiated on the heap or stack](

Meet the above three conditions

[C++Singleton mode in](

Order of construction and Deconstruction of multiple classes

First call the constructor of the base class, and then call the constructor of the derived class

First constructed and then destructed, and then constructed and then destructed

Memory allocation:

There are three ways to allocate memory:

(1)Allocate from a static storage area. Memory has been allocated when the program is compiled, and it exists throughout the running period of the program. For example, global variables, static Variable.

(2)Create on the stack. When the function is executed, the storage units of local variables in the function can be created on the stack, and these storage units are automatically released at the end of function execution. The stack memory allocation operation is built into the instruction set of the processor, which is very efficient, but the allocated memory capacity is limited.

(3)Allocation from the heap, also known as dynamic memory allocation. The program is used when running malloc or new The programmer is responsible for when to use any amount of memory free or delete Free memory. The lifetime of dynamic memory is determined by us. It is very flexible to use, but it also has the most problems.

c + + runtime memory allocation of various types (heap, stack, static area, data segment, BSS, ELF), BSS segment,
sizeof the size of a class (byte alignment principle)

C + + four types of cast,

Int char float, long long type length


To prevent the pointer from being used out of bounds,

The pointer must point to a valid memory address, 
1 Prevent array out of bounds 
2 Prevent copying too much content into one piece of memory 
3 Prevent the use of null pointers 
4 Prevent change const Modified pointer 
5 Prevent content pointing to static storage from changing 
6 Prevent releasing one pointer twice 
7 Prevent the use of wild pointers. 

What is pointer degradation and prevention

If you use an array as a function argument 
such as 
void fun(char a[100]) 

The movement of the pointer,

Pointer P ++The number of bytes moved is equal to the size of the variable type pointed to by the pointer. 

Const,volatile modifies the meaning of the pointer,

Pointers on the heap and stack,

Where is the memory pointed to by the pointer allocated,On the heap is called a pointer on the heap,On the stack is the pointer on the stack. 
Pointer on heap,Can be saved in the global data structure,Used by different functions to access the same block of memory. 
Pointer on stack,After the function exits,The memory is inaccessible. 	

Pointer release and memory leakage causes,

Pointer as the parameter of the function, function pointer,

The difference between pointer and reference and address, array name,

Difference between pointer and address? 
1 Pointer means that a pointer variable already exists,His value is an address,The pointer variable itself is also stored in an address with a length of four bytes,The concept of address itself does not mean that there are any variables. 
2 The value of the pointer,If there are no restrictions,It is usually changeable,You can also point to another address. 
   An address represents a location point in memory space,It's used to give pointers,The address itself has no concept of size,The pointer points to the size of the variable,It depends on the type of variable stored after the address. 
Relationship between pointer and array name? 
Its value is an address,But the former can be moved,The latter is immutable. 
The difference between pointer and reference (usually asked)

Similarities: 1. Are the concept of address;
The pointer points to a piece of memory, and its content is the address of the memory; A reference is an alias for a block of memory.

Difference: 1. A pointer is an entity, while a reference is only an alias;

2. There is no need to dereference when using references(*),Pointer needs to be dereferenced;

3. A reference can only be initialized once during definition, and is immutable thereafter; Pointer variable;

4. Reference no const,Pointer yes const;

5. The reference cannot be null, and the pointer can be null;

6. "sizeof "Reference" gets the variable pointed to(object)Size, and“ sizeof "Pointer" gets the pointer itself(The address of the variable or object pointed to)The size of the;

7. Self increment of pointers and references(++)The meaning of operation is different;

8.From the perspective of memory allocation: the program allocates the memory area for pointer variables, while the reference does not need to allocate the memory area.

What is the difference between iterators and ordinary pointers

The principle of smart pointer,

Smart pointer: actually refers to a class object whose behavior is similar to that of a pointer. Its general implementation method is to use the method of reference counting.
1.The smart pointer associates a counter with the object pointed to by the class, and the reference count tracks how many class objects share the same pointer.
2.Each time a new object of the class is created, the pointer is initialized and the reference count is set to 1;
3.When an object is created as a copy of another object, the copy constructor copies the pointer and increases the corresponding reference count;
4.When assigning an object, the assignment operator reduces the reference count of the object referred to by the left operand (if the reference count is reduced to 0, the object will be deleted) and increases the reference count of the object referred to by the right operand; This is because the pointer on the left points to the object pointed to by the pointer on the right, so the reference count of the object pointed to by the right pointer+1;
5.When the destructor is called, the constructor decreases the reference count (if the reference count decreases to 0, the underlying object is deleted).
6.There are two classic strategies for implementing smart pointers: one is to introduce auxiliary classes, and the other is to use handle classes. Here we mainly talk about the methods of introducing auxiliary classes

Others: the difference between override and overload,

1,The method name, parameter and return value are the same.
2,Subclass methods cannot reduce the access rights of parent methods.
3,A subclass method cannot throw more exceptions than a parent method(But subclass methods can not throw exceptions). 
4,Exists between parent and child classes.
5,Method is defined as final Cannot be overridden.
overload(Heavy load)
1,At least one parameter type, number and order is different.  
2,Cannot overload method names with only different return values.
3,It exists in parent and child classes and classes.

Overload It means overloading, Override Overwrite means overwrite.

heavy load Overload It means that there can be multiple methods with the same name in the same class, but the parameter lists of these methods are different (that is, the number or type of parameters are different).

rewrite Override It means that the name and parameters of the method in the subclass can be exactly the same as that of a method in the parent class. When calling this method through the instance object created by the subclass, the defined method in the subclass will be called, which is equivalent to overwriting the exactly same method defined in the parent class. This is also a manifestation of polymorphism in object-oriented programming.

When the subclass overrides the method of the parent class, it can only throw fewer exceptions than the parent class, or it can throw the child exceptions of the exceptions thrown by the parent class, because the subclass can solve some problems of the parent class and cannot have more problems than the parent class. The access rights of subclass methods can only be greater than those of the parent class, not less. If the method of the parent class is private Type, then the subclass does not have the restriction of coverage, which is equivalent to adding a new method to the subclass.

Write the construction, destruct and copy function of string class

String The prototype of the class is as follows
class String
         String(const char *str=NULL); //Constructor
         String(const String &other); //copy constructor 
         ~String(void); //Destructor
         String& operator=(const String &other); //Equal sign operator overload

         char *m_data; //Pointer

   delete [] m_data; //Destructor, free address space
String::String(const char *str)
   if (str==NULL)//When the initialization string does not exist, it is m_data requests a space to store '\ 0';
       m_data=new char[1];
   else//When the initialization string exists, it is m_data requests a space of the same size to store the string;
       int length=strlen(str);
       m_data=new char[length+1];

String::String(const String &other)//Copy constructor, similar to constructor.
   int length=strlen(other.m_data);
   m_data=new [length+1];
String& String::operator =(const String &other) 
   if (this==&other)//When the address is the same, return directly;
       return *this; 

   delete [] m_data;//When the addresses are different, delete the space originally applied for and restart the construction;
   int length=sizeof(other.m_data);
   m_data=new [length+1];
   return *this; 

String::ShowString()//Due to m_data is a private member, and objects can only be accessed through public member functions;

String AD;
char * p="ABCDE";
String B(p);


Four elements of 1 pointer

1 Pointer variable,Represents a memory address,Usually logical address,There is also a mapping relationship with the actual physical address. 
2 Length of pointer variable,stay WIN32 Four bytes below, 
3 Variable pointed to by pointer 
   Variables stored in the memory address space,The specific content may be various types of variables. 
4 The length of the variable pointed to by the pointer,The size of the memory space starting with this memory address space. 

Data structure algorithm:

Linked list, tree and hash table can effectively avoid the collision of hash result values

Performance comparison of sorting algorithms

Operating system:

linux memory management mechanism, memory addressing mode, what is virtual memory, memory paging algorithm, task scheduling algorithm

	Linux The implementation of virtual memory needs the support of six mechanisms: address mapping mechanism, memory allocation and recovery mechanism, cache and refresh mechanism, request page mechanism, exchange mechanism and memory sharing mechanism

The memory management program maps the logical address of the user program to the physical address through the mapping mechanism. When the user program is running, if it is found that the virtual address to be used in the program does not have corresponding physical memory, a request page request is issued. If there is free memory available for allocation, it is requested to allocate memory(Therefore, the allocation and recycling of memory are used),And record the physical pages in use in the cache(Cache mechanism is used). If there is not enough memory to allocate; Free up some memory. In addition, in the address mapping TLB(Translation backup memory)To find the physical page; The exchange cache should also be used in the exchange mechanism, and the physical page content should be exchanged into the exchange file. The page table should also be modified to map the file address.
Process and thread, inter process and thread communication mode, and the use and implementation principle of shared memory

Deadlock necessary conditions and avoidance algorithm

1,Resources cannot be shared and can only be used by one process.
2,Request and hold( Hold andwait): The process that has obtained resources can apply for new resources again.
3,Inalienable( Nopre-emption): Allocated resources cannot be forcibly deprived from the corresponding process.
4,Circular waiting: several processes in the system form a loop, in which each process is waiting for the resources occupied by adjacent processes

Strategies for handling deadlocks: 1.Ignore the problem. For example, ostrich algorithm, which can be applied in the case of few deadlocks. Why is it called ostrich algorithm? Because it is said that ostriches bury their heads under the ground when they see danger. Maybe ostriches don't think they can see danger, so they are not dangerous. It's a bit like stealing a bell. two.Deadlock detection and recovery. three.Carefully allocate resources dynamically to avoid deadlock. four.Prevent deadlock by breaking one of the four necessary conditions.)

The difference between dynamic link and static link

Dynamic link is to establish only one reference interface, and the real code and data are stored in another executable module and loaded at run time; The static link is to copy all the code and data into this module, and the library is no longer needed at runtime.

c program identification system is 16 bit or 32-bit, large end or small end byte order


Method 1: int k=~0;

if((unsigned int)k >63356) cout<<"at least 32bits"<<endl;
else cout<<"16 bits"<<endl;

Method 2://32 is the system

int i=65536;
int j=65535;

Big or small

1) Little-Endian That is, the low-order bytes are arranged at the low address end of the memory, and the high-order bytes are arranged at the high address end of the memory.
2) Big-Endian That is, the high-order bytes are arranged at the low address end of the memory, and the low-order bytes are arranged at the high address end of the memory.
For example, the number 0 x12 34 56 78 The representation in memory is:

1)Big end mode:
Low address -----------------> High address
0x12  |  0x34  |  0x56  |  0x78
2)Small end mode:
Low address ------------------> High address
0x78  |  0x56  |  0x34  |  0x12

32bit Wide number 0 x12345678 stay Little-endian Mode and Big-endian Mode) CPU Storage mode in memory (assuming from address 0) x4000 Start storage) is:
Memory address	Store content in small end mode	Content stored in big end mode
0x4000	0x78	0x12
0x4001	0x56	0x34
0x4002	0x34	0x56
0x4003	0x12	0x78

4)The big end and the small end have no advantages and disadvantages. Their respective advantages are the disadvantages of the other side:
Small end mode: it is not necessary to adjust the byte content for forced conversion data. The storage methods of 1, 2 and 4 bytes are the same.
Big end mode: the determination of symbol bit is fixed as the first byte, which is easy to determine the positive and negative.

BOOL IsBigEndian()  
    int a = 0x1234;  
    char b =  *(char *)&a;  //By converting the int mandatory type to char single byte, the starting storage location is determined. That is, take the low address part where b is equal to a  
    if( b == 0x12)  
        return TRUE;  
    return FALSE;  

Consortium union The storage order is that all members are stored from the low address, which can be easily obtained by using this feature CPU Use for memory Little-endian still Big-endian Mode read / write:

BOOL IsBigEndian()  
    union NUM  
        int a;  
        char b;  
    num.a = 0x1234;  
    if( num.b == 0x12 )  
        return TRUE;  
    return FALSE;  

Generally, the operating system is a small end, while the communication protocol is a large end.
common CPU Byte order of
Big Endian : PowerPC,IBM,Sun
Little Endian : x86,DEC
ARM It can work in both big end mode and small end mode.

Common signals, how the system notifies a signal to the process

Signal mechanism is a method of transmitting messages between processes. Signals are called soft interrupt signals, which are also called soft interrupt.
Processes can call each other through the system kill Send soft interrupt signal.
SIGHUP 1 A The terminal hangs or the control process terminates 
SIGINT 2 A Keyboard interrupt (if any) break Key pressed) 
SIGQUIT 3 C Keyboard key pressed to exit 
SIGILL 4 C Illegal instruction 
SIGABRT 6 C from abort(3)Exit instruction issued 
SIGFPE 8 C Floating point exception 
SIGKILL 9 AEF Kill signal 
SIGSEGV 11 C Invalid memory reference 
SIGPIPE 13 A Pipeline rupture: Write a pipe without a read port 

The signaling mechanism is asynchronous; When a process receives a signal, it will process the signal immediately without waiting for the current function or even the current line of code to finish running. There are dozens of signals, each representing different meanings. Signals are distinguished by their values, but the name of a signal is usually used in a program to represent a signal. stay Linux In the system, these signals and constants named after them are defined in/usr/include/bits/signum.h File. (usually, the program does not need to include this header file directly, but should include<signal.h>. )

There are two sources of signal events: hardware source(For example, we press the keyboard or other hardware failure);Software source, the most commonly used system function for sending signals is kill, raise, alarm and setitimer as well as sigqueue Functions, software sources also include some illegal operations and other operations.

The main functions of sending signals are: kill(),raise(), sigqueue(),alarm(),setitimer()as well as abort(). 

The process can respond to a signal in three ways: (1) ignore the signal, that is, do not process the signal. Among them, there are two signals that cannot be ignored: SIGKILL and SIGSTOP;(2)Capture the signal. Define the signal processing function, and execute the corresponding processing function when the signal occurs; (3) Perform the default operation,

Various synchronization mechanisms of linux system, various asynchronous mechanisms of linux system

How to implement daemon

The most important feature of daemon is running in the background.		

1. Run in the background.

To avoid hanging, the control terminal will Daemon Put it into the background for execution. Method is called in the process. fork Terminate the parent process Daemon Execute in the background in a child process.

exit(0); //It is the parent process. End the parent process and the child process continues
2. Leave the control terminal and log in to the session and process group

It is necessary to introduce it first Linux The relationship between the process and the control terminal, the login session and the process group: the process belongs to a process group, and the process group number( GID)Is the process number of the process leader( PID). A login session can contain multiple process groups. These process groups share a control terminal. This control terminal is usually the login terminal of the creation process. The control terminal, login session and process group are usually inherited from the parent process. Our goal is to get rid of them and make them unaffected by them. Method is called based on point 1 setsid()Make the process session leader:


Description: when the process is the session leader setsid()Call failed. But the first point is to ensure that the process is not the session leader. setsid()After the call is successful, the process becomes the new session leader and the new process leader, and is separated from the original login session and process group. Because the session process is exclusive to the control terminal, the process is separated from the control terminal at the same time.

3. Prohibit the process from reopening the control terminal

Now, the process has become a session leader without terminals. But it can reapply to open a control terminal. The process can be prevented from reopening the control terminal by making the process no longer become the session leader:

if(pid=fork()) exit(0); //End the first subprocess and the second subprocess continues (the second subprocess is no longer the session leader)

4. Close open file descriptor

The process inherits the open file descriptor from the parent process that created it. If it is not shut down, it will waste system resources, cause the file system where the process is located to be unable to dismount and cause unexpected errors. Close them as follows:

for(i=0;i Close open file descriptor close(i);>

5. Change current working directory

When a process is active, the file system where its working directory is located cannot be dismounted. Generally, you need to change the working directory to the root directory. For the process that needs to dump the core, the process that writes the run log changes the working directory to a specific directory, such as /tmpchdir("/")

6. Reset file creation mask

The process inherits the file creation mask from the parent process that created it. It may modify the access bits of files created by the daemon. To prevent this, clear the file creation mask: umask(0);

7. handle SIGCHLD signal

handle SIGCHLD Signals are not necessary. However, for some processes, especially the server process, it often generates sub processes to process requests when requests arrive. If the parent process does not wait for the child process to end, the child process will become a zombie process( zombie)Thus occupying system resources. If the parent process waits for the child process to finish, it will increase the burden of the parent process and affect the concurrency performance of the server process. stay Linux You can simply SIGCHLD The operation of the signal is set to SIG_IGN. 


In this way, the kernel will not generate zombie processes at the end of child processes. This is similar to BSD4 Different, BSD4 You must explicitly wait for the child process to end before releasing the zombie process.

The difference between standard library functions and system calls,

1,system call

Functions provided by the system call, such as open, close, read, write, ioctl Etc., including header file unistd.h. with write For example: its function prototype is size_t write(int fd, const void *buf, size_t nbytes),Its operation object is file descriptor or file handle fd(file descriptor),To write a file, you must first use writable permissions open The system calls to open a file and get the name of the opened file fd,for example fd=open(/"/dev/video/", O_RDWR). fd Is an integer value obtained when a new file is opened fd Is the current maximum fd Add 1. Linux The system assigns 3 file descriptor values by default: 0-standard input,1-standard output,2-standard error. 

System calls are usually used for underlying file access( low-level file access),For example, direct access to device files in the driver.

System calls are operating system related, so there is generally no portability across operating systems.

System calls occur in kernel space. Therefore, if system calls are used for file operations in general applications in user space, there will be an overhead of switching from user space to kernel space. In fact, even if the library function is used to operate the file in the user space, because the file always exists on the storage medium, whether it is a read-write operation or an operation on the hardware (memory), it will inevitably cause the system call. In other words, the operation of library functions on files is actually realized through system calls. for example C Library function fwrite()Is through write()System call is used to implement.

In this case, using library functions also has the overhead of system calls. Why not use system calls directly? This is because reading and writing files is usually a large amount of data (this large amount is relative to the data operation unit implemented by the underlying driven system call). At this time, the use of library functions can greatly reduce the number of system calls. This result is due to buffer technology. In both user space and kernel space, buffers are used for file operations, such as fwrite When writing a file, the content is written to the user space buffer first. When the user space buffer is full or the write operation is over, the content of the user buffer is written to the kernel buffer. Similarly, when the kernel buffer is full or the write operation is over, the content of the kernel buffer is written to the hardware media corresponding to the file.
2,Library function call

standard C File operation functions provided by library functions, such as fopen, fread, fwrite, fclose,fflush, fseek Etc., including header file stdio.h. with fwrite As an example, its function prototype is size_t fwrite(const void *buffer,size_t size, size_t item_num, FILE *pf),Its operation object is file pointer FILE *pf,To write a file, you must first use writable permissions fopen Function to open a file and get the name of the opened file FILE Structure pointer pf,for example pf=fopen(/"~/proj/filename/",/"w/"). In fact, since the operation of the library function on the file is finally realized through system call, every time you open a file FILE Structure pointers have a kernel space file descriptor fd Corresponding to it. There are also corresponding predefined FILE Pointer: stdin-standard input,stdout-standard output,stderr-standard error. 

Library function calls are often used to access general files in applications.

Library function calls are system independent, so they are portable.

Because library function calls are based on C Library, so it is impossible to use it for the operation of devices in the driver of kernel space

fd and PCB,

How much heap memory can a 32-bit system process have at most,

Five I/O modes,

Five kinds I/O pattern:
[1]       block I/O           (Linux Lower I/O The default operation is blocking I/O,Namely open and socket Created I/O It's all blocked I/O)
[2]       Non blocking I/O        (Can pass fcntl perhaps open When using O_NONBLOCK Parameter, will fd Set to non blocking I/O)
[3]       I/O Multiplexing     (I/O Multiplexing usually requires non blocking I/O Use together)
[4]       Signal drive I/O    (SIGIO)
[5]        asynchronous I/O

Apache model (PPC), TPC (ThreadPer Connection) model, select model, poll model and epoll model

Generally speaking, there are two steps for program input:
1.Wait for data to be read
2.Copy the data from the system kernel to the data area of the program.

about sock Programmatically speaking:

         First step:   Generally speaking, it is waiting for data to be uploaded from the network to the local. When the packet arrives, the data will be copied from the network layer to the kernel cache;

         Step two:   Is to copy data from the kernel to the data area of the program.


block I/O pattern                           //When the process is in blocking mode, give up the CPU and enter the sleep state
        block I/O Patterns are the most commonly used I/O pattern. yes Linux Default in the system IO pattern.

       Most programs use blocking mode I/O . 

       The mode in which a socket is established is blocking I/O pattern. (because Linux System default IO Mode is blocking mode)

For a UDP For sockets, the data ready flag is relatively simple:
(1)A whole datagram has been received
(2)Not received.
and TCP This concept is more complex, and some other variables need to be added.

       A process call recvfrom  ,Then, the system call does not return that it knows that there are datagrams arriving at the local system, and then the system copies the data to the cache of the process. (if the system call receives an interrupt signal, its call will be interrupted)

   We call this process calling recvfrom All the way from recvfrom The return period is blocked. When recvfrom When it returns normally, our process continues its operation

Non blocking mode I/O                          //Non blocking mode is not widely used because it wastes a lot of CPU resources.
       When we set a socket to non blocking mode, we are equivalent to telling the system kernel: "when I request I/O The operation cannot be completed immediately. Don't do this when you want my process to sleep and wait. Please return an error to me immediately. "
      We began to be right recvfrom Because the system has not received the network data, the kernel immediately returns one EWOULDBLOCK Error.

      The fourth time we call recvfrom Function, a datagram has arrived, the kernel copies it into the buffer of our application, and then recvfrom After normal return, we can process the received data.
      When an application uses non blocking sockets, it needs to use a loop to test whether a file descriptor has data readable(call polling(polling )). Applications are constantly polling Kernel to check whether I/O The operation is ready. This will be a great waste CPU Operation of the resource. This model is not very common in use.


 for example:

         For pipeline operation, it is best to use non blocking mode!


I/O Multiplexing                            //For batch IP operations, the use of I/O multiplexing is very good.

       in use I/O When multiplexing, we call select()Function sum poll()Function or epoll function(2.6 Kernel start support),Block when calling them, not when we call them recvfrom(or recv)Blocking when.
       When we call select When the function is blocked, select The function waits for the datagram socket to enter the read ready state. When select When the function returns, that is, when the socket can read data. Then we can call recvfrom Function to copy data into our program buffer.
        For single I/O Operation, compared with blocking mode, select()and poll()or epoll There is nothing advanced.

       Moreover, in blocking mode, only one function needs to be called:

                            Read or send functions.

                  After using multiplexing technology, we need to call two functions:

                             Call first select()Function or poll()Function before real reading and writing.

       The advanced aspect of multiplexing is::

             It can wait for multiple file descriptors at the same time, and any one of these file descriptors (socket descriptors) enters the read ready state, select()Function can return.


IO Multiplexing technology is generally used in the following cases:
1,When a client needs to process the input and output operations of multiple file descriptors at the same time (generally speaking, standard input and output and network socket)),I/O Multiplexing technology will have the opportunity to be used.
2,When the program needs to operate multiple sockets at the same time.
3,If one TCP The server program handles both sockets that are listening for network connections and sockets that are already connected.
4,If a server program is used at the same time TCP and UDP agreement.
5,If a server uses multiple services at the same time, and each service may use different protocols (e.g inetd That's it.

asynchronous IO Mode has::

      1,Signal drive I/O pattern

       2,asynchronous I/O pattern

Signal drive I/O pattern                                                  //I haven't used it myself.

       We can use signals to let the kernel use them when the file descriptor is ready SIGIO Signal to inform us. We call this mode signal driven I/O pattern.

To use signal drivers on a socket I/O Operation, the following three steps are necessary.
(1)One and SIGIO The signal processing function must be set.
(2)The owner of the socket must be set. Generally speaking, it is used fcntl Functional F_SETOWN Parameter to
 Set the owner.
(3)Sockets must be allowed to use asynchrony I/O. Usually by calling fcntl Functional F_SETFL Orders, O_ASYNC For parameters.

       Although the socket is set to be asynchronous I/O It's very simple, but the difficult part to use is how to generate the interrupt in the program SIGIO What state is the program in when the signal is sent to the socket owner.

1.UDP Socket SIGIO signal                   (Relatively simple)
stay UDP Use asynchrony on the protocol I/O It's simple.This signal will be generated at this time:

1,The socket received a packet of datagram.
2,An asynchronous error occurred on the socket.
        When we are using UDP Socket asynchrony I/O When we use recvfrom()Function to read datagram data or asynchronously I/O Error message.
2.TCP Socket SIGIO signal                  (Not used)
          Unfortunately, asynchronous I/O Almost right TCP Sockets have little effect. Because for a TCP For sockets, SIGIO The probability of the signal is too high, so SIGIO Signals don't tell us what happened.

stay TCP Connecting, SIGIO The signal will be generated at this time:
l  A new connection is successfully established on a socket listening for a port.
l  A disconnection request was successfully initialized.
l  A disconnection request ended successfully.
l  A channel of the socket (send channel or receive channel) is closed.
l  Socket received new data.
l  The socket sends data.

l  An asynchronous error occurred I/O Error.

A pair of signal driven I/O The more practical aspect is NTP(Network time protocol Network TimeProtocol)Server, which uses UDP. The main loop of the server is used to receive datagram packets sent from the client, and then send requests. For this server, it is very important to record the specific time when each packet is received.

Because that will be the value returned to the client, the client needs to use this data to calculate the time spent on the datagram back and forth on the network. Figure 6-8 Shows how to build such a UDP The server.



asynchronous I/O pattern             //For example, write operations only need to be written, not necessarily to the disk (this is the advantage of asynchronous I/O). The benefits of asynchronous IO are high efficiency.
      When we run asynchronously I/O In mode, if we want to I/O Operation, just tell the kernel what we want to do I/O Operation, and then the kernel will return immediately. concrete I/O The copy of and data is completed by the kernel, and our program can continue to execute downward. When the kernel completes all I/O After the operation and data copy, the kernel will notify our program.
asynchronous I/O And signal drive I/O The differences are:
        1,Signal drive I/O In mode, the kernel notifies our application when the operation can be operated SIGIO News.

        2,asynchronous I/O In mode, the kernel will not notify our application until all operations have been completed by the kernel.


. Epoll Who is sacred?

Epoll But at present Linux A popular candidate for developing large-scale concurrent network programs, Epoll stay Linux2.6 Formally introduced into the kernel, and select Similar, in fact I/O It's just multiplexing technology. There's no mystery.

Actually Linux There has always been no lack of methods to design concurrent network programs, such as typical Apache Model( Process Per Connection ,abbreviation PPC ), TPC ( ThreadPer Connection )Model, and select Model and poll Model, then why introduce it again Epoll What about this thing? There's still something to say

2. Disadvantages of common models

If we don't show the shortcomings of other models, how can we compare them Epoll The advantages of.

2.1 PPC/TPC Model

The idea of these two models is similar, that is, let each incoming connection do things by itself, and don't bother me again. just PPC Is to open a process for it, and TPC Opened a thread. But don't bother me. It comes at a price. It takes time and space. After more connections, so many processes / Thread switching, this overhead comes up; Therefore, the maximum number of connections that can be accepted by this kind of model will not be high, generally about hundreds.

2.2 select Model

1. The maximum number of concurrent operations is limited because a process is open FD (File descriptor) is limited, from FD_SETSIZE The default value is 1024/2048 ,therefore Select The maximum concurrency of the model is limited accordingly. Change this yourself FD_SETSIZE ?Although the idea is good, let's take a look at the following

2. Efficiency issues, select Each call will linearly scan all the FD Set, so the efficiency will decrease linearly FD_SETSIZE The consequence of the reform is that everyone takes their time. What? Time out??!!

3. kernel / User space memory copy problem, how to make the kernel FD Message notification to user space? On this issue select Memory copy method is adopted.

2.3 poll Model

Basically, efficiency and select Is the same, select It hasn't got rid of shortcomings 2 and 3.

3. Epoll Promotion of

Criticize the other models one by one and have a look again Epoll The improvement of it, in fact select The opposite is Epoll The advantages of.

3.1. Epoll There is no limit on the maximum number of concurrent connections. The upper limit is the maximum number of files that can be opened, which is generally much greater than 2048, Generally speaking, this number is closely related to the system memory, and the specific number can be cat /proc/sys/fs/file-max Look.

3.2. Efficiency improvement, Epoll The biggest advantage is that it only cares about your "active" connections and has nothing to do with the total number of connections. Therefore, in the actual network environment, Epoll Will be much more efficient than select and poll . 

3.3. Memory copy, Epoll "Shared memory" is used in this regard, and this memory copy is omitted.

4. Epoll Why efficient

Epoll The efficiency of is inseparable from the design of its data structure, which will be mentioned below.

First recall select Model, when there is I/O When the event comes, select Notify the application that there are events to be handled quickly, and the application must poll all FD Set, test each FD Whether there is an event and deal with it; The code looks like this:

int res = select(maxfd+1, &readfds,NULL, NULL, 120);

if (res > 0)


    for (int i = 0; i <MAX_CONNECTION; i++)


       if (FD_ISSET(allConnection[i], &readfds))






// if(res == 0) handle timeout, res < 0handle error

Epoll Not only will it tell the application I/0 When the event arrives, it will also tell the application relevant information, which is filled by the application. Therefore, according to this information, the application can directly locate the event without traversing the whole process FD Gather.

int res = epoll_wait(epfd, events, 20,120);

for (int i = 0; i < res;i++)




5. Epoll Key data structure

Mentioned earlier Epoll Fast speed is closely related to its data structure, and its key data structure is:

struct epoll_event {

    __uint32_tevents;      // Epoll events

    epoll_data_tdata;      // User data variable


typedef union epoll_data {

    void *ptr;

    int fd;

    __uint32_t u32;

    __uint64_t u64;

} epoll_data_t;

so epoll_data It's a union structural morphology , With it, applications can store many types of information :fd ,Pointer, etc. With it, the application can directly locate the target.

socket server implementation, the difference between select and epoll (must ask)

select The essence of is to use 32 bits of 32 integers, that is, 32*32= 1024 To identify, fd The value is 1-1024. When fd When the value of exceeds the 1024 limit, it must be modified FD_SETSIZE The size of the. This time you can identify 32*max Value range fd. 

For single process multithreading, each thread handles multiple threads fd The situation, select Is not suitable.

1.All threads are from 1-32*max Scan, and each thread processes a segment fd Value, it's a bit wasteful

2.1024 The upper limit problem, a process that handles multiple users, fd The value is much greater than 1024

So it should be adopted at this time poll,

poll What is passed is the array header pointer and the length of the array. As long as the length of the array is not very long, the performance is still very good, because poll Apply in the kernel 4 at a time K(The size of a page fd),Try to limit it to 4 K within

epoll still poll An optimization that does not need to be optimized for all after return fd Traversal, maintained in the kernel fd List of. select and poll Is to maintain the kernel list in user mode, and then pass it to the kernel. But only in 2.6 Is supported by the kernel.

epoll More suitable for dealing with a large number of fd ,And active fd Not many cases, after all fd It is also a serial operation


epoll What are the trigger modes and what are the differences? (the difference between horizontal trigger and edge trigger must be explained in great detail, and what more confirmation should be made for edge trigger in programming)

epoll It can support both horizontal trigger and edge trigger( Edge Triggered,It only tells the process which file descriptors have just become ready. It only says it once. If we don't take action, it won't tell again. This method is called edge trigger). Theoretically, the performance of edge trigger is higher, but the code implementation is quite complex.

epoll Similarly, only those ready file descriptors are notified, and when we call epoll_wait()When you get the ready file descriptor, what is returned is not the actual descriptor, but a value representing the number of ready descriptors. You only need to epoll You can get the corresponding number of file descriptors from the specified array in turn. Memory mapping is also used here( mmap)This eliminates the overhead of copying these file descriptors.

Another essential improvement is epoll Event based ready notification is adopted. stay select/poll In, the kernel scans all monitored file descriptors only after the process calls a certain method epoll Pass in advance epoll_ctl()To register a file descriptor. Once a file descriptor is ready, the kernel will adopt a similar method callback When the file descriptor is invoked, the callback mechanism is called quickly epoll_wait()You'll be notified when.	

Shock group phenomenon,

 Take a very simple example. When you throw a piece of food among a group of pigeons, although only one pigeon finally grabs the food, all the pigeons will be startled to compete. The pigeons who don't grab the food have to go back to sleep and wait for the next piece of food. In this way, every piece of food thrown will startle all pigeons, that is, startling the crowd. For the operating system, multiple processes/When a thread is waiting for the same resource, it will have a similar effect. The result is that every time a resource is available, all processes/Threads compete for resources, resulting in the following consequences:
1)System to user process/Threads frequently do invalid scheduling and context switching, which can greatly reduce the system.
2)In order to ensure that only one thread gets resources, users must lock the resource operation, which further increases the system overhead.

What is a surprise group
    The most common example is socket Descriptor accept Operation when multiple user processes/When a thread listens on the same port, it is actually only possible accept Once, it will surprise the crowd. Of course, as mentioned earlier, this problem is an old problem, which has been solved by the new operating system kernel.
    linux Kernel method for solving group startling problem

    For some known problems, kernel developers have added a "mutually exclusive wait" option. The behavior of a mutually exclusive wait is basically similar to that of sleep. The main differences are:
    1)When a waiting queue entry has WQ_FLAG_EXCLUSEVE Flag setting, It is added to the end of the waiting queue. Entry without this flag, contrary, Add to start.
    2)When wake_up When called on a waiting queue, It's waking up the first one with WQ_FLAG_EXCLUSIVE The process of the flag stopped after.
    In other words, for the behavior of mutually exclusive waiting, for example, for a listen Later socket Descriptor, multithread blocking accept The system kernel will only wake up the first of all queues waiting for this time, and others in the queue will continue to wait for the next event, so as to avoid multiple threads listening to the same event at the same time socket Surprise group problem in descriptor.

What's the difference between a block device and a character device,

(1) Character device: it provides continuous data stream, which can be read by applications in sequence. It usually does not support random access. Instead, such devices support byte by byte/Characters to read and write data. For example, a modem is a typical character device.
(2) Block device: the application program can randomly access the device data, and the program can determine the location of reading the data by itself. A hard disk is a typical block device, and applications can address anywhere on the disk and read data from it. In addition, data can only be read and written in blocks(Usually 512 B)Multiple of. Unlike character devices, block devices do not support character based addressing.
There is no strict distinction between the two devices, mainly the access interface provided by character device and block device driver( file I/O API)It's different. This paper mainly compares the two devices on data interface, access interface and device registration method.

The difference between user mode and kernel mode

	Although there are many differences between programs working in user mode and kernel mode, the most important difference lies in the difference of privilege level, that is, the difference of power. Programs running in user mode cannot directly access the kernel data structures and programs of the operating system,
	When we execute a program in the system, most of the time it runs in the user state. When it needs the help of the operating system to complete some work that it does not have the power and ability to complete, it will switch to the kernel state,

linux file system: inode, what does inode store, directory name and file name

inode Contains the meta information of the file, specifically the following contents:
  * Bytes of file
  * File owner's User ID
  * Document Group ID
  * Read, write and execute permissions of files
  * There are three timestamps of the file: ctime finger inode The time of the last change, mtime Refers to the time when the content of the document was last changed, atime Refers to the time when the file was last opened.
  * The number of links, that is, how many file names point to this inode
  * file data  block Location of
 inode It will also consume hard disk space, so when the hard disk is formatted, the operating system automatically divides the hard disk into two areas. One is the data area, which stores file data; The other is inode District( inode table),deposit inode Information contained.
each inode The size of the node is generally 128 bytes or 256 bytes. inode The total number of nodes is given at the time of formatting, generally every 1 KB Or every 2 KB Just set one inode. Suppose in a block 1 GB On your hard drive, each inode The size of the node is 128 bytes per 1 KB Just set one inode,that inode table The size will reach 128 MB,12% of the whole hard disk.8%. 

each inode There is a number for the operating system inode Number to identify different documents.
It's worth repeating here, Unix/Linux The file name is not used inside the system, but inode Number to identify the file. For the system, the file name is just inode An easily identifiable nickname or nickname.
On the surface, the user opens the file through the file name. In fact, the internal process of the system is divided into three steps: first, the system finds the corresponding file name inode Number; Secondly, through inode Number, get inode Information; Finally, according to inode Information, find the location of the file data block,Read data.

In general, the file name and inode The number is"One to one correspondence"Relationship, each inode The number corresponds to a file name. But, Unix/Linux The system allows multiple file names to point to the same file inode Number.
This means that the same content can be accessed with different file names; Modifying the file content will affect all file names; However, deleting one file name does not affect the access of another file name. This situation is called"Hard link"(hard link). 
ln Command to create a hard link: ln Source file destination file

file A And documents B of inode The number is different, but the file A The content of is a file B The path of the. read file A When, the system will automatically direct visitors to files B. Therefore, no matter which file is opened, the final read is the file B. At this time, the file A It's called a file B of"Soft link"(soft link)perhaps"Symbolic link( symbolic link). 
This means that the file A File dependent B And exists if the file is deleted B,Open file A An error will be reported:"No such file or directory". This is the biggest difference between soft link and hard link: file A Point to file B File name instead of file B of inode Number, file B of inode"Number of links"It will not change.

ln -s Command to create soft links.: ln -s Source file or directory target file or directory

/Where does proc exist (in memory)

/proc File system is a virtual file system, through which a new method can be used in Linux® Communication between kernel space and user space. stay /proc In the file system, we can read and write virtual files as a means of communicating with entities in the kernel, but different from ordinary files, the contents of these virtual files are created dynamically


Differences between TCP and UDP

key:TCP It is a connection oriented, reliable and byte stream service

1.Link oriented: TCP Link oriented, connection oriented means two uses TCP Applications (usually a client and a server) must establish a handshake three times before exchanging data with each other TCP connect. In a TCP In, only two parties communicate with each other, and multicast and broadcasting cannot be used TCP. UDP It is an unreliable transmission. There is no need to establish a link before transmission. Multicast and broadcasting can be used to realize one to many communication.

2.Reliability: TCP Provide end-to-end flow control, confirm the received data, use timeout retransmission, reorder the out of order data and other mechanisms to ensure the reliability of data communication. and UDP Is an unreliable service. The receiver may not receive the sender's datagram.

3.TCP Is a streaming protocol, UDP Is a datagram mode protocol. Each output operation of the process produces exactly one UDP Datagram and assemble it into a copy to be sent IP Datagram. TCP The whole data generated by the application is different from the single data actually sent IP There may be no connection between datagrams. TCP There will be sticky bags and half bags.

4.Efficiency: speed, general TCP The speed is slow. In the process of transmission, the data needs to be confirmed, retransmitted after timeout, and the data needs to be sorted. UDP Without these mechanisms, it is fast. Data scale, TCP The first 20 bytes at least, UDP The first 8 bytes are relatively efficient. Assembly efficiency: TCP The first 20 bytes at least, UDP First 8 bytes, on system assembly TCP Relatively slow.

5.Purpose: for TCP Reliability, http,ftp use. And because UDP Fast speed, multi-purpose video and online games UDP,Ensure real-time performance

Understanding of the third point. TCP It is possible to send 100 "packets" and receive 50 "packets". Instead of losing the "packets", each time you accept more "packets" than you send, in fact TCP There is no concept of package. For example, if you send 10 bytes at a time, you may read 20 bytes at a time. TCP It is a streaming protocol. In the received cache, the packets are automatically spliced in order according to the order of the sent packets, because the data basically comes from the same host and is sent in order, TCP What is stored in the cache is continuous data. It feels like one more step than UDP. and UDP Because two different hosts may send data to the same host (one port may receive data from multiple applications), or according to TCP Merging data in that way will inevitably lead to data errors. I think the key reason is, TCP Is connection oriented, and UDP There is no connection, which leads to, TCP The received data is sent by a host and is in order, and UDP It may be unordered or wrong from multiple hosts.

TCP and UDP header byte definitions,

TCP and UDP three handshakes and four waves status and message type,

time_wait,close_ Cause of wait status, keepalive,

TIME_WAIT: It means that you have received the other party's reply FIN Message and sent out ACK Message. TIME_WAIT In state TCP The connection will wait 2*MSL(Max Segment Lifetime,Maximum segmented survival refers to a TCP Message in Internet The longest survival time on the. Each specific TCP A certain protocol must be selected for protocol implementation MSL Value, RFC 1122 The recommended time is 2 minutes, but BSD The traditional implementation takes 30 seconds, Linux sure cat /proc/sys/net/ipv4/tcp_fin_timeout See this value of this machine), and then you can go back to CLOSED Available status. If FIN_WAIT_1 In this state, you have received the other party's belt at the same time FIN Signs and ACK When a message with a flag is sent, it can be directly entered into TIME_WAIT State without going through FIN_WAIT_2 Status.

If used nginx Agent, then the system TIME_WAIT The number of will become more because nginx The reason why the agent uses short links to interact with the back-end makes nginx And backend ESTABLISHED Become few and TIME_WAIT quite a lot. This not only happens during installation nginx On the proxy server, but also on the back end app There are a large number of on the server TIME_WAIT. consult TIME_WAIT According to the data, it is not a big problem to find a lot of this state, but it may occupy too many ports in the system, resulting in the failure of subsequent requests to obtain ports.

although TIME_WAIT It will cause some problems, but it is also improper to shoot it completely, although it seems that there is nothing wrong with doing so. See this document for details:

So it seems that the best way is to let everyone TIME_WAIT Expire early.

stay linux You can configure it as follows:

#Let TIME_WAIT states can be reused, even if TIME_WAIT fills all ports and will not cause obstacles by rejecting new requests
echo "1" > /proc/sys/net/ipv4/tcp_tw_reuse
#Let TIME_WAIT to recycle as soon as possible. I don't know how long it is. The observation is about one second
echo "1" > /proc/sys/net/ipv4/tcp_tw_recycle

Many documents suggest that both parameters should be configured, but I find that only modification is needed tcp_tw_recycle You can solve the problem, TIME_WAIT reusing TCP The agreement itself is not recommended to be opened.

Failure to reuse ports may cause some services of the system to fail to start. For example, if you want to restart a system monitoring software, it uses 40000 ports, and this port is just used in the process of software restart, it may fail to restart. linux This problem is considered by default. There is such a setting:

#View the limit value of local available ports of the system
cat /proc/sys/net/ipv4/ip_local_port_range

Using this command will return two numbers. The default is 32768 61000, indicating that this machine can connect 61000 locally-32768=28232 Connections. Note that local connections are external connections. Not all connections of this machine will not affect the number of external connections of port 80 of this machine. But this number will affect the proxy server( nginx)yes app The maximum number of connections to the server because nginx yes app Asynchronous transmission is used, so the connection speed of this link is very fast, so there are few stacked connections. If nginx yes app There is a problem with the bandwidth between servers or app If there is a problem with the server, the connections may pile up. At this time, you can set nginx Agent timeout to release the connection as soon as possible. Generally speaking, 28232 connections are rarely used.

Because some software uses 40000 port to listen, if there are often errors, you can set it ip_local_port_range Minimum value of:

echo "40001 61000" > /proc/sys/net/ipv4/ip_local_port_range

However, this obviously reduces the number of available ports of the system. At this time, you can ip_local_port_range The maximum value of is increased, but it is a good habit to use ports no more than 32768 to listen for services, and there is no need to modify it ip_local_port_range The value is 1024 65535, which is of little significance.

Because of the use of nginx Agent, in windows It will also cause a lot of damage TIME_WAIT,of course windows You can also adjust:

In the registry( regedit)of HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters Add a previous DWORD Value of type TcpTimedWaitDelay,The value is the number of seconds.

windows The default is reuse TIME_WAIT,I don't know how to change it to non reuse, and I haven't found the value of the local port, but it doesn't matter much. They can operate according to the system default.
 according to TCP Agreement, the party initiating the closure will enter TIME_WAIT Status for 2*MSL(Max Segment Lifetime),The default is 240 seconds. In this case post It briefly introduces why this state is needed.
It is worth mentioning that based on TCP of HTTP Agreement, close TCP Connected is Server End, so, Server The end will enter TIME_WAIT Status, it is conceivable that for those with a large number of visits Web Server,There will be a lot of TIME_WAIT Status, if server If you receive 1000 requests a second, you will have a backlog of 240*1000=240,000 individual TIME_WAIT Record and maintain these states to Server Bring a burden. Of course, modern operating systems use fast search algorithms to manage these TIME_WAIT,So for the new TCP Connection request, judge whether hit One in TIME_WAIT It won't take much time, but it's always bad to have so many states to maintain.
HTTP Agreement 1.1 Edition regulation default Behavior is Keep-Alive,That is, it will be reused TCP Connect and transmit multiple request/response,One of the main reasons is the discovery of this problem. There is another way to slow down TIME_WAIT Pressure is to put 2 of the system*MSL The time is reduced, because the time of 240 seconds is a little too long, for Windows,Modify the registry, in HKEY_LOCAL_MACHINE\ SYSTEM\CurrentControlSet\Services\ Tcpip\Parameters Add a previous DWORD Value of type TcpTimedWaitDelay,It is generally believed that it should not be less than 60, otherwise there may be trouble.
For large services, one server I'm not sure. I need one LB(Load Balancer)Allocate traffic to several back-end servers, if this LB So NAT If it works in the same way, it may cause problems. If all from LB To back end Server of IP Wrapped source address It's all the same(LB Internal address), then LB To back end Server of TCP Connections are limited because of frequent TCP When the connection is established and closed server Leave on TIME_WAIT States, and these states correspond to remote address All LB of LB of source port More than 60000 died(2^16=65536,1~1023 It is reserved port, and there are some other ports that will not be used by default) LB Once the port on Server of TIME_WAIT Blacklist, there are 240 seconds can no longer be used to establish and Server Connection, so LB and Server Up to 300 connections can be supported. without LB,There won't be this problem because of this server See remote address yes internet On the vast collection, for each address,60000 Multiple port That's enough.
At first I thought it was useful LB Will be largely limited TCP The number of connections, but experiments show that this is not the case, LB One in the back Windows Server 2003 The number of requests processed per second still reaches 600, isn't it TIME_WAIT Status doesn't work? use Net Monitor and netstat After observation, Server and LB of XXXX Connection entry between ports TIME_WAIT After the state, another one LB of XXXX Port SYN Bag, Server It was received and processed as usual, but as imagined drop It fell. Turn over the book and find the dusty college bought from the pile of books< UNIX Network Programming, Volume 1, Second Edition: Networking APIs: Sockets and XTI>,In the middle, for BSD-derived Implementation, as long as SYN of sequence number Greater than the maximum at the last shutdown sequence number Bigger, then TIME_WAIT Accept this as you are SYN,Is it difficult Windows Also count BSD-derived?With this clue and keyword(BSD),Find this post,stay NT4.0 When I was young, I was still with you BSD-derived It's different, but Windows Server 2003 Already NT5.2 Yes, maybe a little different.
Do an experiment with Socket API Make one up Client End, every time Bind To a local port, such as 2345, repeat the establishment TCP Connect to a Server send out Keep-Alive=false of HTTP Request, Windows The realization of sequence number So growing, though Server about Client 2345 port connection remains TIME_WAIT Status, but can always accept new requests and will not refuse. So if SYN of Sequence Number What happens when you get smaller? Same use Socket API,But this time Raw IP,Send a small message sequence number of SYN Wrap it up, Net Monitor See inside, this SYN cover Server After receiving it, it was like a sea of cattle. There was no response at all. It was rejected drop It fell.
According to the book, BSD-derived and Windows Server 2003 There are security risks, but at least it won't happen TIME_WAIT prevent TCP Of course, the client should cooperate to ensure different requests TCP Connected sequence number Rise, not fall.
Socket Medium TIME_WAIT state
 In high concurrency short connection server End, when server Finished processing client Immediately after your request closesocket The dialog box appears time_wait Status then if client Another 2000 connections will be concurrent, and some connections will not be connected at this time,use linger Forced shutdown can solve this problem, but linger Will cause data loss, linger A value of 0 is forced off,No matter how much concurrency, it can be connected normally,If it is not 0, some connections will not be connected!(Callable setsockopt Sets the of the socket linger Delay flag, and set the delay time to 0.)
TCP/IP of RFC file. TIME_WAIT yes TCP The state that must occur when the connection is disconnected.
This is unavoidable TCP Part of the protocol implementation.
stay WINDOWS Next, you can modify the registry to make this time shorter

time_wait The time is 2 msl,The default is 4 min.
You can change this variable:
Shorten it to 30 s
TCP Ensure that all data can be delivered under all possible circumstances. When you close a socket Actively close one end of the socket Will enter TIME_WAIT Status, while the passive closing party will transfer to CLOSED State, which does ensure that all data is transmitted. Be a socket When it is closed, it is completed through the four handshake processes of sending messages to each other at both ends. When one end calls close()When, it means that there is no data to be sent at the local end. It seems that after shaking hands, socket Should be off CLOSED Status. But there are two problems. First, we don't have any mechanism to guarantee the last one ACK It can transmit normally. Second, there may still be residual packets on the network(wandering duplicates),We must also be able to handle it normally.
Through the correct state machine, we know that the shutdown process of both parties is as follows

 Suppose the last one ACK If it is lost, the server will resend the last one it sent FIN,Therefore, the client must maintain a status information in order to be able to resend ACK;If this state is not maintained, the client receives FIN Will respond to a RST,Received by server RST You'll think it's a mistake. If TCP If the protocol can normally complete the necessary operations and terminate the data stream transmission of both parties, the four sections of the four handshakes must be transmitted completely and correctly without any loss. That's why socket After closing, it is still in TIME_WAIT Status, because he has to wait for retransmission ACK. 
If both sides of the currently connected communication have called close(),It is assumed that both parties arrive CLOSED Status, not TIME_WAIT Status, the following situations will occur. Now a new connection has been established and used IP The address and port are exactly the same as the previous one, and the connection established later is also called an avatar of the original connection. It is also assumed that there are datagrams remaining in the network in the original connection, so the datagrams received by the new connection may be the datagrams of the previous connection. To prevent this, TCP Not allowed from in TIME_WAIT Stateful socket Establish a connection. be in TIME_WAIT Stateful socket Waiting twice as long MSL After time (the reason is twice) MSL,Because MSL It refers to the time from the one-way sending of a datagram in the network to the identification of loss. A datagram may become a residual datagram in the sending diagram or its response process. It takes twice the time to confirm the discarding of a datagram and its response MSL),Will be transformed into CLOSED Status. This means that a successful connection will inevitably lead to the loss of the remaining datagrams in the previous network.
because TIME_WAIT We can set the related problems caused by status SO_LINGER Signs to avoid socket get into TIME_WAIT Status, which can be sent by RST Instead of normal TCP The termination of four handshakes. But it's not a good idea, TIME_WAIT It is often advantageous for us.
Client and server establishment TCP/IP Close after connection SOCKET The port to which the server is connected
 Status is TIME_WAIT
 Are all active shutdown socket Will enter TIME_WAIT What about the status?
Is there anything that makes the active shutdown socket Direct entry CLOSED What about the status?
The party actively shutting down is sending the last one ack after
 Will enter TIME_WAIT Status stay 2 MSL(max segment lifetime)time
 This is TCP/IP It is essential, that is, it can't be solved.

that is TCP/IP That's how the designer designed it
 There are two main reasons
1. Prevent the packet in the last connection from reappearing after getting lost and affecting the new connection
   (After 2 MSL,(all duplicate packets in the last connection will disappear)
2. Reliable shutdown TCP connect
   The last one sent by the active shutdown party ack(fin) ,It may be lost, and the passive party will send it again
   fin, If the active party is in CLOSED Status, it will respond rst instead of ack. therefore
   The active party should be in TIME_WAIT Status, not CLOSED . 

TIME_WAIT It won't take up a lot of resources unless attacked.

Also, if one side send or recv If it times out, it will enter directly CLOSED state
socket-faq This paragraph in is also very good. The excerpt is as follows:
2.7. Please explain the TIME_WAIT state.

What is sliding window, timeout retransmission,

List the tcp options you know,

connect will block detection and prevention. When is the socket readable?

connect will be blocked. How to solve it? (required test and question)

The most common and effective method is to add a timer; Non blocking mode can also be used.

Set non blocking, use after returning select Detection status)

If select returns readable, the result is read-only to 0 bytes. What happens?

A socket set is not ready and may select Memory usage FD_CLR Clear this bit to 0;

When is socket readable?

Before each read operation returns, check whether there is any remaining data that has not been read. If so, keep the data valid flag. Otherwise, there will be obvious inconsistency, that is, the data is in the read buffer but there is no read valid flag.

What is keepalive? How to use?

set up Keepalive Parameter to detect the disconnected customer connection
 stay TCP One of them Keep-alive The mechanism can detect dead connections. The principle is very simple, TCP It will send data to the other party after being idle for a certain time:

1.If the host is reachable, the other party will respond ACK Response is considered to be alive.

2.If it is reachable, but the application exits, the other party will send a message RST Reply, send TCP Undo the connection.

3.If it can be reached, but the application crashes, the other party will send a message FIN News.

4.If the other host does not respond ack, rst,Continue sending until it times out, and then withdraw the connection. This time is the default

Two hours.

Benefits of using connect in UDP:

1:Will improve efficiency.As described earlier.2:High concurrency services will increase system stability.reason:hypothesis client A Through non connect of UDP And serverB,C signal communication.B,C Provide the same service.For load balancing,We let A And B,C Alternate communication.A And B signal communication IPa:PORTa<----> IPb:PORTbA And C signal communication IPa:PORTa'<---->IPc:PORTc 
hypothesis PORTa And PORTa'Same(This happens in the case of large concurrency),Then it's possible A wait for B Message,But I got it C Message.Cause error in receiving message.The solution is to adopt connect of UDP communication mode .stay A Create two udp,Then separate connect reach B,C.

Long and short connections,

DNS and HTTP protocol, HTTP request mode,


Consistent hash load balancing,

Describe what happens after typing in a web address in the browser and pressing enter,

PING command

ping The principle of command is like this:Every machine on the network has a unique identity IP Address, we give the target IP If the address sends a packet, the other party will return a packet of the same size. According to the returned packet, we can determine the existence of the target host and preliminarily judge the operating system of the target host.


Talk about your understanding of index in database and the difference between index and primary key

  • A clustered index can only have one table, while a non clustered index can have multiple tables.

  • Clustered index storage records are physically continuous, while non clustered indexes are logically continuous, and physical storage is not continuous.

      Clustered index: the logical order of key values in the index determines the physical order of corresponding rows in the table.
      A clustered index determines the physical order of data in a table. A clustered index is similar to a phone book, which arranges data by last name. Since a clustered index specifies the physical storage order of data in a table, a table can only contain one clustered index. However, the index can contain multiple columns (combined index), just as a phone book is organized by last name and first name.
           Precautions for using clustered index
       The fewer columns you use when defining a clustered index key, the better.
       • A column that contains a large number of non repeating values.
      .• A query that returns a range value using the following operators: BETWEEN,>,>=,< and <=. 
       •  Consecutive columns are accessed.
       •  Returns a query for a large result set.
       • Often used to join or GROUP BY The column accessed by the query of clause; In general, these are foreign key columns. yes ORDER BY or GROUP BY The column specified in clause can be indexed SQL Server You do not have to sort the data because these rows are already sorted. This can improve query performance.
      •  OLTP Type of applications that require very fast single line lookup (usually through the primary key). A clustered index should be created on the primary key.
      Clustered indexes are not available for:
       • Columns that change frequently. This will cause the entire row to move (because SQL Server Data values in rows must be retained in physical order). This should be paid special attention, because data is volatile in large data transaction processing systems.
       • Wide key. Key values from clustered indexes are used as lookup keys by all nonclustered indexes and are therefore stored in the leaf entry of each nonclustered index.
       Nonclustered index: the data is stored in one place, the index is stored in another place, and the index has a pointer to the storage location of the data.
        Entries in a nonclustered index are stored in the order of the index key values, while information in a table is stored in another order (which can be specified by a clustered index). For nonclustered indexes, you can create a nonclustered index for each column that is commonly used to find data in a table nonclustered index. Some books contain multiple indexes. For example, a book on gardening may include an index of popular plant names and an index of botanical names, because these are the two most common ways for readers to find information.
        A popular example to illustrate the difference between the two
        In fact, the text of our Chinese Dictionary itself is a clustered index. For example, if we want to check the word "an", we will naturally open the first few pages of the dictionary, because the Pinyin of "an" is“ an",The dictionary of Chinese characters sorted by pinyin is based on English letters“ a"Start with“ z"At the end, the word "an" is naturally arranged at the front of the dictionary. If you've turned all the pages“ a"If you still can't find the word at the beginning, it means that you don't have the word in your dictionary; Similarly, if you look up the word "Zhang", you will also turn your dictionary to the last part, because the Pinyin of "Zhang" is“ zhang". In other words, the body of the dictionary itself is a directory. You don't need to look up other directories to find what you need. We call this text content itself a directory arranged according to certain rules as "clustered index".
          If you know a word, you can quickly find it from the automatic. But you may also encounter a word you don't know and don't know its pronunciation. At this time, you can't find the word you want according to the method just now. Instead, you need to find the word you want according to the "radical", and then turn to a page directly according to the page number after the word to find the word you want. However, the sorting of the words you find by combining the "radical directory" and the "word Checklist" is not the real sorting method of the text. For example, when you check the word "Zhang", we can see that the page number of "Zhang" in the word checklist after checking the radical is 672, the word "Chi" above "Zhang" in the word checklist is 63, the word "crossbow" below "Zhang" and the page is 390. Obviously, these words are not really located above and below the word "Zhang". The continuous words "Chi, Zhang and crossbow" you see now are actually their sorting in the non clustered index, which is the mapping of the words in the body of the dictionary in the non clustered index. We can find the words you need in this way, but it requires two processes: first find the results in the directory, and then turn to the page number you need. We call the sorting method that the directory is purely a directory and the body is purely a body as "non clustered index".
      First: the constraint of clustered index is uniqueness. Does it require that fields are also unique?
      Analysis: if you think it's your friend, it may be affected by the default settings of the system. Generally, we specify the primary key of a table. If there is no clustered index before this table, and there is no mandatory use of non clustered index when creating the primary key,SQL A clustered index will be created on this field by default, and the primary key is unique, so it is natural to think that the field creating the clustered index also needs to be unique.
      Conclusion: the clustered index can be created on any field you want to create. In theory, the actual situation cannot be specified arbitrarily, otherwise it will be a nightmare in performance.
      Second: why can a clustered index be created on any column? If this table has no primary key constraint, there may be duplicate row data?
      At first glance, this is really contrary to the constraints of clustered index, but in reality, clustered index can be created.
      The reason is: if not used UNIQUE Property to create a clustered index, and the database engine will automatically add a four byte to the table uniqueifier Columns. If necessary, the database engine will automatically add one to the row uniqueifier Value to make each key unique. This column and column value are for internal use and cannot be viewed or accessed by users.
      Third: should clustered indexes have better performance than non clustered indexes?
      If you want to check the credits at 60-90 Between the credits and names of students, is it optimal to create a clustered index on the credits?
      Answer: No. Since only two columns are output, we can create a joint non clustered index on credits and student names. At this time, the index forms an overlay index, that is, the content stored in the index is the final output data. This index has better query performance than taking credits as the clustered index.
      Fourth: how to describe the relationship between clustered index and nonclustered index in the database?
      The index is described in the form of binary tree. We can distinguish the difference between clustered index and non clustered index in this way: the leaf node of clustered index is the final data node, while the leaf node of non clustered index is still the index node, but it has a pointer to the final data.
      Fifth: why is it slower to create a clustered index table on the primary key than to create a non clustered index table on the primary key?
      With the understanding of the fourth point above, we can be sure to analyze this problem. When inserting data rows into a table with a primary key, because there are constraints on the uniqueness of the primary key, we need to ensure that the inserted data is not repeated. Let's compare the search of clustered indexes and nonclustered indexes:Because the index leaf node of the clustered index is a data page, if you want to check the uniqueness of the primary key, you need to traverse all data nodes, but unlike the non clustered index, because the non clustered index already contains the primary key value, you only need to traverse all index pages to find the uniqueness of the primary key, which is much less than traversing all data rows IO Consumption. This is the real reason why creating a nonclustered index on the primary key is faster than creating a clustered index on the primary key when inserting data.

What kind of data structure is used in ordinary relational databases,

[MySQL Data structure and algorithm principle behind index](

The advantages and disadvantages of indexing,

Advantages of indexing
1.Greatly accelerate the speed of data retrieval;
2.Create a unique index to ensure the uniqueness of each row of data in the database table;
3.Connection between accelerometer and;
4.When using grouping and sorting clauses for data retrieval, the time of grouping and sorting in queries can be significantly reduced.
Disadvantages of indexing
1.Indexes need to occupy physical space.
2.When the data in the table is added, deleted and modified, the index should also be maintained dynamically, which reduces the speed of data maintenance.

unique index
 A unique index is an index in which any two rows are not allowed to have the same index value.
When there are duplicate key values in existing data, most databases do not allow the newly created unique index to be saved with the table. The database may also prevent the addition of new data that will create duplicate key values in the table. For example, if employee Last name of the employee in the table (lname) If a unique index is created on, no two employees can have the same last name.
primary key 
Database tables often have a column or a combination of columns whose values uniquely identify each row in the table. This column is called the primary key of the table.
Defining a primary key for a table in a database diagram automatically creates a primary key index, which is a specific type of unique index. The index requires each value in the primary key to be unique. It also allows fast access to data when using primary key indexes in queries.
Clustered index
 In a clustered index, the physical order of the rows in the table is the same as the logical (index) order of the key values. A table can contain only one clustered index.
If an index is not a clustered index, the physical order of the rows in the table does not match the logical order of the key values. Clustered indexes generally provide faster data access than non clustered indexes. 

The characteristics of relational database and non relational database,

In short, relational model refers to two-dimensional table model, and a relational database is a data organization composed of two-dimensional tables and their relationships.

Non relational database puts forward another idea. For example, it is stored in key value pairs, and the structure is not fixed. Each tuple can have different fields. Each tuple can add some key value pairs as needed, so it will not be limited to a fixed structure, and the overhead of time and space can be reduced. In this way, users can add the fields they need according to their needs. In this way, in order to obtain different information of users, there is no need to perform association query on multiple tables like in relational database. Only according to id Take out the corresponding value You can complete the query. However, non relational databases cannot provide information like SQL Provided where This kind of query for field attribute value. And it is difficult to reflect the integrity of the design. It is only suitable for storing some relatively simple data. For data requiring more complex queries, SQL The database is obviously more appropriate.

The biggest feature of relational database is the consistency of transactions: the traditional read and write operations of relational database are transactional and have ACID This feature makes relational database can be used in almost all systems that require consistency, such as typical banking system.

However, in web applications, especially SNS In applications, consistency is not so important, users A What you see and users B See the same user C Inconsistent content updates can be tolerated, or it can be tolerated that two people see the data update time difference of the same friend by a few seconds. Therefore, the biggest feature of relational database is useless here, at least not so important.

On the contrary, the huge cost of relational database in maintaining consistency is its poor reading and writing performance, such as microblog facebook This kind SNS The application of relational database has high requirements for concurrent reading and writing ability, and relational database can't cope with it(In terms of reading, traditionally, in order to overcome the defects of relational database and improve performance, one level is added memcache To static web pages, and in SNS Change is too fast, memchache There's nothing I can do),Therefore, a new data structure storage must be used to replace the relational database.

Another characteristic of relational database is that it has a fixed table structure, so its scalability is very poor SNS In, the upgrading of the system and the increase of functions often mean great changes in the data structure, which is also difficult for the relational database to cope with and requires new structured data storage.

Therefore, non relational database came into being. Because it is impossible to meet all the new needs with one kind of data structured storage, non relational database is not a database strictly, but a collection of data structured storage methods.

It must be emphasized that the persistent storage of data, especially the persistent storage of massive data, still needs a veteran of relational database.

Non relational database classification:
It is mainly divided into the following categories:
For high performance concurrent read and write key-value Database:

key-value The main feature of database is that even if it has extremely high concurrent read-write performance, Redis,Tokyo Cabinet,Flare Is the representative of this kind

Document oriented database for massive data access:

The characteristic of this kind of database is that it can quickly query data in a large amount of data. The typical representative is MongoDB as well as CouchDB

Scalability oriented distributed database:

The problem that this kind of database wants to solve is that the traditional database has the defect of scalability. This kind of database can adapt to the increase of data volume and the change of data structure

Relational database and non relational database

The difference between optimistic lock and pessimistic lock,

Pessimistic lock: it is assumed that concurrency conflicts will occur, shielding all operations that may violate data integrity.[1]      Pessimistic lock assumes that the probability of other users attempting to access or change the object you are accessing and changing is very high. Therefore, in the environment of pessimistic lock, lock the object before you start changing the object, and release the lock until you submit your changes. The drawback of pessimism is that whether it is a page lock or a row lock, the locking time may be very long, which may restrict the access of other users for a long time, that is, the concurrency of pessimism lock is not good.

Optimistic lock: assuming that there is no concurrency conflict, only check whether the data integrity is violated when submitting the operation.[1] Optimistic locking cannot solve the problem of dirty reading.    Optimistic lock believes that the probability of other users trying to change the object you are changing is very small. Therefore, optimistic lock does not lock the object until you are ready to submit the changes. It does not lock when you read and change the object. It can be seen that the locking time of optimistic lock is shorter than that of pessimistic lock. Optimistic lock can obtain better concurrent access performance with larger lock granularity. However, if the second user reads the object just before the first user submits the change, the database will find that the object has changed when he completes his change submission. In this way, the second user has to re read the object and make changes. This indicates that in an optimistic lock environment, the number of concurrent users reading objects will increase.

From the perspective of database manufacturers, it is better to use optimistic page locks, especially in batch operations that affect many rows, so as to reduce the demand for resources and improve the performance of the database. Then consider clustered indexes. Records in the database are stored in the physical order of the clustered index. If a page lock is used, when two users access and change two adjacent rows on the same data page at the same time, one user must wait for the other user to release the lock, which will significantly reduce the performance of the system. interbase Like most relational databases, optimistic locks are used, and read locks are shared and write locks are exclusive. A read lock can be placed on a read lock, but a write lock cannot be placed; You can't put any more locks on the write lock. Lock is an effective means to solve multi-user concurrent access.  

[The difference between optimistic lock and pessimistic lock](

Database paradigm

1NF The definition of is: comply with 1 NF Each attribute in the relationship of cannot be subdivided,1NF It is the most basic requirement of all relational databases,
2NF In 1 NF The partial functional dependence of non primary attributes on codes is eliminated.
3NF In 2 NF It eliminates the dependence of non primary attributes on the transfer function of the code. In other words, if there is a transfer function dependency of non primary attributes on the code, it does not comply with 3 NF Requirements.
BCNF Paradigm in 3 NF The dependence of the main attribute on the code and the transfer function is eliminated.

Explain the first, second and third paradigm of relational database?

Role of database log type

MySQL Log file classification
1.Error log(Error Log)
2.Binary log(Binary Log & Binary Log Index)
3.General query log(query log)
4.Slow query log(slow query log)
5.Innodb Online redo journal(innodb redo log)
6.Update log(update log)

The difference between innodb and myisam

innodb,Clustered index, supporting foreign keys and transactions(commit),RollBACK (rollback)And crash repair capability(crash recovery capabilities)Transaction security for(transaction-safe (ACID compliant)),It does not support full-text indexing and counting. It will be traversed during statistics,
myisam,Non clustered index, does not support foreign key transactions, supports full-text index, supports counting, and has good query effect

Where are you most familiar with innodb

What are the types of innodb indexes

4 Species, hash,b-tree,spatial,full-text
1. B-Tree Indexes

The most common index type, based on B-Tree Data structure. B-Tree The basic idea is that all values (indexed columns) are ordered, and the distance from each leaf node to the following node is equal. therefore B-Tree It is suitable for finding data in a certain range, and can directly support data sorting( ORDER BY). However, when indexing multiple columns, the order of columns is particularly important and needs special attention. InnoDB and MyISAM All support B-Tree Indexes. InnoDB It's a variant B+Tree,and MyISAM In order to save space, the index is compressed at the expense of performance.

2. Hash Indexes

be based on hash Watch. Therefore, this index only supports precise search, range search and sorting. This means range lookup or ORDER BY Rely on server Additional work on the floor. Currently only Memory The engine supports explicit hash Index (but its hash yes nonunique , too many conflicts will also affect the search performance). Memory The default index type of the engine is Hash Index, although it also supports B-Tree Indexes.
Hash Index can only meet"=","IN"and"<=>"Query, cannot use range query.


CREATE TABLE testhash (
    fname VARCHAR(50) NOT NULL,
    lname VARCHAR(50) NOT NULL,
    KEY USING HASH(fname)
3. Spatial (R-Tree)(Spatial index

only MyISAM The engine supports, and the support is not good. Can be ignored.
4. Full-text Indexes

It is mainly used to find the keywords in the text, rather than directly compare with the values in the index. Full-text Index is very different from other indexes. It is more like a search engine than a simple index WHERE The parameters of the statement match. You can separate a column full-text Index and B-Tree Index, the two do not conflict with each other. Full-text Index fit MATCH AGAINST Operation use, not general WHERE Statement plus LIKE.

Difference between B TREE and B+TREE

Does innodb have a full-text index

No, myisam have

union and join

JOIN Used in accordance with ON There are four types of conditional connection between two tables:

INNER JOIN: Internally join the records in two tables. Only when at least one row belonging to both tables meets the join conditions, the inline will return rows. I understand that as long as the records do not meet ON Condition, it will not be displayed in the result set.

LEFT JOIN / LEFT OUTER JOIN: Externally joins the records in the two tables and contains all the records in the left table. If a record of the left table has no matching record in the right table, all the selection list columns of the right table in the associated result set are null. It is understood that even if it does not comply with ON Conditions, all records in the left table are also displayed, and the field in the right table of such records in the result set is null.

RIGHT JOIN / RIGHT OUTER JOIN: Externally join the records in the two tables and contain all the records in the right table. To put it simply, and LEFT JOIN conversely.

FULL JOIN / FULL OUTER JOIN:  The full outer join returns all rows in the left and right tables. namely LEFT JOIN and RIGHT JOIN And merge, the data of the left and right tables are all displayed.

JOIN Basic syntax:

Select table1.* FROM table1 JOIN table2 ON

UNION operator

Combines the result sets of two or more queries into a single result set that contains all rows of all queries in the federated query. UNION Result set column names and UNION First in operator Select The result set of the statement has the same column name. the other one Select The result set column name of the statement will be ignored.

Two different uses are UNION and UNION ALL,The difference is UNION Removes duplicate rows from the result set. If used UNION ALL All rows will be included and duplicate rows will not be deleted.

Similarities: in some specific cases, you can use join realization union all This situation is conditional. Select when this situation occurs union all still group by It depends on the situation or the consumption of both.

Massive data processing:


Map reduce principle,

BloomFilter principle

It is actually a long binary vector and a series of random mapping functions( Hash Function). Bloom filters can be used to retrieve whether an element is in a collection. Its advantage is that the space efficiency and query time are far more than the general algorithm, but its disadvantage is that it has a certain false recognition rate and deletion difficulty. Bloom Filter It is widely used in various occasions requiring query, such as Orocle Database, Google of BitTable This technology is also used.

Bloom Filter characteristic:
There is no missing report( False Negative),That is, an element can be reported in a set.
There may be false positives( False Positive),That is, if an element is not in a set, it may also be exploded.
The cost of determining whether an element is in a set is independent of the total number of elements.

Trie tree principle
Word search tree

B + tree principle,

LSM tree principle,

Big data processing


Compiled by GCC, GDB, Perf, Valgrind and makefile
Other tools: netstat,ps,top,df,fdisk,lsof, ifconfig,uname,kill, tcpdump, ipcs, grep

Others: security, encryption (DES, SHA)

Attachment: Software developer interview questions

Related books

Language class:
C:C Programming language( K&R)->C And pointer->C Expert programming->C Traps and defects->495 you have to know C Language problems
C++: C++ primer -> effective C++->Deep exploration C++object model  ->stl Source code analysis->C++Know and know
java: java Programming thought->java Concurrent programming->Deep understanding Java Virtual machine: JVM Advanced features and best practices
Algorithm and data structure:
Introduction to Algorithms->Data structure and algorithm analysis(Weiss)->The beauty of programming->Sword finger offer
Operating system:
Deep understanding of computer operating system->Compilation principle (Dragon Book)
Brother bird's linux Private dishes->linux Kernel design and Implementation->Deep understanding linux kernel
linux shell Script Introduction (short and concise)

Network programming: 
TCP/IP Agreement details v1->
unix Advanced environment programming->
unix Network programming (Volume 1)&Volume 2)->
unix Programming art(Advanced )

Technical architecture of large websites: core principles and case analysis,
Deep understanding nginx: Module development and architecture analysis,
Large scale distributed storage system : Principle analysis and Architecture Practice

Programmer self-cultivation,
The art of writing readable code,
headfirst Design mode

Related network resources

Matrix67 Blog Daniel: 
July of CSDN Blog: 
He Haitao blog: 
Classic of written interview: Cracking the coding interview--Questions and answers:
Here are a collection of written test questions:
 Programmer programming art: interview and algorithm experience

Added by senthilnayagam on Fri, 04 Mar 2022 15:59:28 +0200