Brief concepts of XHProf
Native method of re-encapsulating zend
If the CPU is to be detected, there will be a 5 ms delay because the CPU frequency needs to be calculated.
Internal use of linked lists
Source address: / root/Downloads/xhprof/extension/xhprof.c
Two of the most important structures
/* Xhprof's global state. * * This structure is instantiated once. Initialize defaults for attributes in * This structure is initialized only once * hp_init_profiler_state() Cleanup/free attributes in * hp_clean_profiler_state() */ typedef struct hp_global_t { /* ---------- Global attributes: ----------- */ /* Indicates if xhprof is currently enabled Is it currently available */ int enabled; /* Indicates if xhprof was ever enabled during this request Has xhprof been used during this request */ int ever_enabled; /* Holds all the xhprof statistics */ zval *stats_count; /* Indicates the current xhprof mode or level Current operating mode and level*/ int profiler_level; /* Top of the profile stack The first on the stack*/ hp_entry_t *entries; /* freelist of hp_entry_t chunks for reuse... */ hp_entry_t *entry_free_list; /* Callbacks for various xhprof modes Does it represent callbacks of different modes?*/ hp_mode_cb mode_cb; /* ---------- Mode specific attributes: ----------- */ /* Global to track the time of the last sample in time and ticks */ struct timeval last_sample_time; uint64 last_sample_tsc; /* XHPROF_SAMPLING_INTERVAL in ticks */ uint64 sampling_interval_tsc; /* This array is used to store cpu frequencies for all available logical * cpus. For now, we assume the cpu frequencies will not change for power * saving or other reasons. If we need to worry about that in the future, we * can use a periodical timer to re-calculate this arrary every once in a * while (for example, every 1 or 5 seconds). How often does the processor execute?*/ double *cpu_frequencies; /* The number of logical CPUs this machine has. Number of logical CPUs*/ uint32 cpu_num; /* The saved cpu affinity. */ cpu_set_t prev_mask; /* The cpu id current process is bound to. (default 0) id of the processor in which the current process is located*/ uint32 cur_cpu_id; /* XHProf flags */ uint32 xhprof_flags; /* counter table indexed by hash value of function names. Table of the number of calls to a method*/ uint8 func_hash_counters[256]; /* Table of ignored function names and their filter Tables that ignore statistical methods*/ char **ignored_function_names; uint8 ignored_function_filter[XHPROF_IGNORED_FUNCTION_FILTER_SIZE]; } hp_global_t;
typedef struct hp_entry_t { char *name_hprof; /* function name Method name*/ int rlvl_hprof; /* recursion level for function Recursive Hierarchy of Methods*/ uint64 tsc_start; /* start value for TSC counter Initial clock cycle*/ long int mu_start_hprof; /* memory usage Memory usage*/ long int pmu_start_hprof; /* peak memory usage Peak memory usage*/ struct rusage ru_start_hprof; /* user/sys time start */ struct hp_entry_t *prev_hprof; /* ptr to prev entry being profiled Point to the last analyzed pointer*/ uint8 hash_code; /* hash_code for the function name hash corresponding to each method name*/ } hp_entry_t;
Use of XHProf in php
Let's first look at the use of XHProf
<?php // Open xhprof monitoring //xhprof_enable(XHPROF_FLAGS_CPU | XHPROF_FLAGS_MEMORY); xhprof_enable(); // Execution function, where zend_execute is called test(); // Processing strings into command parsing calls zend_compile_string eval('test();'); $data = xhprof_disable(); include './xhprof_lib/utils/xhprof_lib.php'; include './xhprof_lib/utils/xhprof_runs.php'; $xhprof_runs = new XHProfRuns_Default(); $run_id = $xhprof_runs->save_run($data,'test'); // I print out the visual link address directly here for debugging convenience. echo "<a href='http://192.168.186.136/xhprof_html/index.php?run=".$run_id."&source=test'>test</a>"; function test() { $a = range(0,10000); foreach($a as $item) { // pass } }
The results are as follows: (you can skip the results directly, see below, but remember that there are two values: ct and wt)
array(7) { ["test==>range"]=> array(2) { ["ct"]=> int(2) ["wt"]=> int(4463) } ["main()==>test"]=> array(2) { ["ct"]=> int(1) ["wt"]=> int(3069) } ["main()==>eval::/var/www/html/index2.php(9) : eval()'d code"]=> array(2) { ["ct"]=> int(1) ["wt"]=> int(16) } ["eval==>test"]=> array(2) { ["ct"]=> int(1) ["wt"]=> int(2614) } ["main()==>eval"]=> array(2) { ["ct"]=> int(1) ["wt"]=> int(2617) } ["main()==>xhprof_disable"]=> array(2) { ["ct"]=> int(1) ["wt"]=> int(0) } ["main()"]=> array(2) { ["ct"]=> int(1) ["wt"]=> int(5716) } }
XHProf source code
xhprof_enable()
First, let's look at xhprof_enable(), which defines three parameters to be accepted and passes them to two methods for use, the most important of which is hp_begin().
/** * Start XHProf profiling in hierarchical mode. * * @param long $flags flags for hierarchical mode * @return void * @author kannan */ PHP_FUNCTION(xhprof_enable) { long xhprof_flags = 0; /* XHProf flags */ zval *optional_array = NULL; /* optional array arg: for future use */ /* Get parameters and allow passing an optional parameter of l and z to represent xhprof_flags and optional_array, respectively For TSRMLS_CC, see http://www.laruence.com/2008/08/03/201.html. In addition, whether the processing of the return value failure of zend_parse_parameters is successful or not */ if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "|lz", &xhprof_flags, &optional_array) == FAILURE) { return; } /* Obtaining methods that need to be ignored from parameters Refer to the manual parameter description http://php.net/manual/zh/function.xhprof-enable.php */ hp_get_ignored_functions_from_arg(optional_array); hp_begin(XHPROF_MODE_HIERARCHICAL, xhprof_flags TSRMLS_CC); }
hp_begin()
This method seems very long, but the logic in the world is very simple, mainly with some initialization.
The next four replace ments are used to encapsulate zend's method.
The following four replacement encapsulations are very important, and the specific method functions are described in the code comments below.
zend_compile_file
=>hp_compile_file
zend_compile_string
=>hp_compile_string
zend_execute
=>hp_execute
zend_execute_internal
=>hp_execute_internal
/** * This function gets called once when xhprof gets enabled. * This method is called once when enable * It replaces all the functions like zend_execute, zend_execute_internal, * etc that needs to be instrumented with their corresponding proxies. * Some of the proxy methods he used to replace zend meant that xhprof hijacked the native method * hp_begin(XHPROF_MODE_HIERARCHICAL, xhprof_flags TSRMLS_CC); * * level Grade * xhprof_flags Operation mode */ static void hp_begin(long level, long xhprof_flags TSRMLS_DC) { /* If xhprof is not turned on, that is, the enable method is not called, then the logic to buy here is judged by hp_globals. */ if (!hp_globals.enabled) { int hp_profile_flag = 1; hp_globals.enabled = 1; /* Here, the enbale state is modified to ensure that enable s are triggered only by the first call during the entire request process. */ hp_globals.xhprof_flags = (uint32)xhprof_flags; /* Format into 32-bit unsigned integers */ /* Here are four replace ments to encapsulate zend 1. zend_compile_file => hp_compile_file zend_compile_file It is responsible for compiling the script file to be executed into op codes composed of ZE basic instruction sequence, and then handing op codes to zend_execute for execution, so as to get the result of our script. http://www.laruence.com/2008/08/14/250.html 2. zend_compile_string => hp_compile_string This is the process of compiling php code into opcode http://www.phpchina.com/portal.php?mod=view&aid=40347 3. zend_execute => hp_execute zend_compile_file() zend_compile_file() is the wrapper for the lexer, parser, and code generator. It compiles a file and returns a zend_op_array. zend_execute() After a file is compiled, its zend_op_array is executed by zend_execute(). http://php.find-info.ru/php/016/ch23lev1sec2.html 4. zend_execute_internal => hp_execute_internal There is also a companion zend_execute_internal() function, which executes internal functions. */ /* Replace zend_compile with our proxy It is backed up first by adding underscore, and then replaced by hp_compile_file.*/ _zend_compile_file = zend_compile_file; zend_compile_file = hp_compile_file; /* Replace zend_compile_string with our proxy */ _zend_compile_string = zend_compile_string; zend_compile_string = hp_compile_string; /* Replace zend_execute with our proxy */ #if PHP_VERSION_ID < 50500 _zend_execute = zend_execute; zend_execute = hp_execute; #else _zend_execute_ex = zend_execute_ex; zend_execute_ex = hp_execute_ex; #endif /* Replace zend_execute_internal with our proxy */ _zend_execute_internal = zend_execute_internal; /* XHPROF_FLAGS_NO_BUILTINGS It's used for identification and does not require statistical built-in function performance Bit Operations - Determine whether user-delivered flags contain NO_BUILTINGS In addition, there are three flags 1. HPROF_FLAGS_NO_BUILTINS (integer) Make it possible to skip all built-in (internal) functions. 2. XHPROF_FLAGS_CPU (integer) Make the output performance data add CPU data. 3. XHPROF_FLAGS_MEMORY (integer) Make the output performance data add memory data. */ if (!(hp_globals.xhprof_flags & XHPROF_FLAGS_NO_BUILTINS)) { /* if NO_BUILTINS is not set (i.e. user wants to profile builtins), * then we intercept internal (builtin) function calls. * If it is not set up, it represents the user who wants to analyze the performance of built-in functions, and we intercept built-in method requests. */ zend_execute_internal = hp_execute_internal; } /* Initialize with the dummy mode first Having these dummy callbacks saves * us from checking if any of the callbacks are NULL everywhere. * First, initialize these methods to avoid callback when the method is NULL*/ hp_globals.mode_cb.init_cb = hp_mode_dummy_init_cb; hp_globals.mode_cb.exit_cb = hp_mode_dummy_exit_cb; hp_globals.mode_cb.begin_fn_cb = hp_mode_dummy_beginfn_cb; hp_globals.mode_cb.end_fn_cb = hp_mode_dummy_endfn_cb; /* Register the appropriate callback functions Override just a subset of * all the callbacks is OK. According to different processing modes, simple or detailed*/ switch(level) { /* Usually this mode is used, so let's focus on this mode. */ case XHPROF_MODE_HIERARCHICAL: hp_globals.mode_cb.begin_fn_cb = hp_mode_hier_beginfn_cb; hp_globals.mode_cb.end_fn_cb = hp_mode_hier_endfn_cb; break; case XHPROF_MO Start executing the analysis, see line:1933D: hp_globals.mode_cb.init_cb = hp_mode_sampled_init_cb; hp_globals.mode_cb.begin_fn_cb = hp_mode_sampled_beginfn_cb; hp_globals.mode_cb.end_fn_cb = hp_mode_sampled_endfn_cb; break; } /* one time initializations Initialize the analyzer, internally fix the cpu frequency, initcb, negligible methods*/ hp_init_profiler_state(level TSRMLS_CC); /* start profiling from fictitious main() */ BEGIN_PROFILING(&hp_globals.entries, ROOT_SYMBOL, hp_profile_flag); } }
hp_init_profiler_state()
/** * Initialize profiler state * Initialization of analyzer status * * The level passed in at the beginning here is XHPROF_MODE_HIERARCHICAL * * @author kannan, veeve */ void hp_init_profiler_state(int level TSRMLS_DC) { /* Setup globals */ if (!hp_globals.ever_enabled) { /* If xhprof hasn't been turned on before, initialize this value to 1, and now it's turned on. */ hp_globals.ever_enabled = 1; /* The first setup of the stack is empty */ hp_globals.entries = NULL; } /* Level of analyzer */ hp_globals.profiler_level = (int) level; /* Init stats_count Initialization statistics */ if (hp_globals.stats_count) { /* Release this memory */ zval_dtor(hp_globals.stats_count); /* Notify the garbage collection mechanism to reclaim this memory */ FREE_ZVAL(hp_globals.stats_count); } /* Create a zval variable and initialize it as an array reference http://www.cunmou.com/phpbook/8.3.md */ MAKE_STD_ZVAL(hp_globals.stats_count); array_init(hp_globals.stats_count); /* NOTE(cjiang): some fields such as cpu_frequencies take relatively longer * to initialize, (5 milisecond per logical cpu right now), therefore we * calculate them lazily. Some fields take a long time to initialize, so we have to be lazy to calculate, that is, to calculate later.*/ if (hp_globals.cpu_frequencies == NULL) { get_all_cpu_frequencies(); restore_cpu_affinity(&hp_globals.prev_mask); } /* bind to a random cpu so that we can use rdtsc instruction. Here's a random binding of a cpu*/ bind_to_cpu((int) (rand() % hp_globals.cpu_num)); /* Call current mode's init cb Depending on the pattern, call the initial method, see line:1933*/ hp_globals.mode_cb.init_cb(TSRMLS_C); /* Set up filter of functions which may be ignored during profiling Setting the method to be filtered*/ hp_ignored_functions_filter_init(); }
get_cpu_frequency()
In the above method, a get_all_cpu_frequencies() is called, and it is interesting to call a get_cpu_frequencies () internally, because this method will cause a 5 ms delay if CPU detection is turned on.
/** * This is a microbenchmark to get cpu frequency the process is running on. The * returned value is used to convert TSC counter values to microseconds. * * @return double. * @author cjiang */ static double get_cpu_frequency() { struct timeval start; struct timeval end; /* gettimeofday Get the current time and put it in start */ if (gettimeofday(&start, 0)) { perror("gettimeofday"); return 0.0; } uint64 tsc_start = cycle_timer(); /* Sleep for 5 miliseconds. Comparaing with gettimeofday's few microseconds * execution time, this should be enough. * This is to get the execution frequency of the CPU, using the execution times of the CPU in 5000 microseconds to get the execution frequency of the CPU per second. * TSC Clock cycles recorded since the CPU was started * */ usleep(5000); if (gettimeofday(&end, 0)) { perror("gettimeofday"); return 0.0; } uint64 tsc_end = cycle_timer(); /* The number of clock cycles divided by the number of microsecond time intervals to get the cpu frequency */ return (tsc_end - tsc_start) * 1.0 / (get_us_interval(&start, &end)); }
BEGIN_PROFILING is important!
This is the logic of analysis. His point is to generate a single list.
/* * Start profiling - called just before calling the actual function * Start the analysis and call only before formal method calls * NOTE: PLEASE MAKE SURE TSRMLS_CC IS AVAILABLE IN THE CONTEXT * OF THE FUNCTION WHERE THIS MACRO IS CALLED. * TSRMLS_CC CAN BE MADE AVAILABLE VIA TSRMLS_DC IN THE * CALLING FUNCTION OR BY CALLING TSRMLS_FETCH() * TSRMLS_FETCH() IS RELATIVELY EXPENSIVE. * entries The address passed in here is a pointer to hp_entry_t * This place actually generates a single linked list, all associated with prev_hprof * * Here do while is used to encapsulate macros. Ask Yanchen specifically * */ #define BEGIN_PROFILING(entries, symbol, profile_curr) \ do { \ /* Use a hash code to filter most of the string comparisons. */ \ uint8 hash_code = hp_inline_hash(symbol); \ /* Is this a method that needs to be neglected, if not neglected, then analysis? */ \ profile_curr = !hp_ignore_entry(hash_code, symbol); \ if (profile_curr) { \ /* Returns a pointer (address), opens up a memory space for cur_entry, including hash_code, method name, stack pointer */ \ hp_entry_t *cur_entry = hp_fast_alloc_hprof_entry(); \ (cur_entry)->hash_code = hash_code; \ (cur_entry)->name_hprof = symbol; \ /* Here * entries point to the first address of the pointer hp_global_t.entires stack */ \ (cur_entry)->prev_hprof = (*(entries)); \ /* Call the universal callback*/ \ hp_mode_common_beginfn((entries), (cur_entry) TSRMLS_CC); \ /* Call the mode's beginfn callback This method only sets tsc_Start except cpu and mem */ \ hp_globals.mode_cb.begin_fn_cb((entries), (cur_entry) TSRMLS_CC); \ /* Update entries linked list */ \ (*(entries)) = (cur_entry); \ } \ } while (0)
We can see that the above list calls the hp_globals.mode_cb.begin_fn_cb method during the generation process. We don't consider CPU and memory here, so we find that we set a tsc start clock cycle every other current.
/** * XHPROF_MODE_HIERARCHICAL's begin function callback * * @author kannan */ void hp_mode_hier_beginfn_cb(hp_entry_t **entries, hp_entry_t *current TSRMLS_DC) { /* Get start tsc counter */ current->tsc_start = cycle_timer(); /* Get CPU usage If you want to calculate the cpu*/ if (hp_globals.xhprof_flags & XHPROF_FLAGS_CPU) { getrusage(RUSAGE_SELF, &(current->ru_start_hprof)); } /* Get memory usage If memory is to be computed*/ if (hp_globals.xhprof_flags & XHPROF_FLAGS_MEMORY) { current->mu_start_hprof = zend_memory_usage(0 TSRMLS_CC); current->pmu_start_hprof = zend_memory_peak_usage(0 TSRMLS_CC); } }
hp_execute code execution part
Every time there is code execution, it will go to this place. This code is mainly before and after executing zend_execute. Powder calls BEGIN_PROFILING and END_PROFILING respectively.
#if PHP_VERSION_ID < 50500 ZEND_DLEXPORT void hp_execute (zend_op_array *ops TSRMLS_DC) { #else ZEND_DLEXPORT void hp_execute_ex (zend_execute_data *execute_data TSRMLS_DC) { zend_op_array *ops = execute_data->op_array; #endif char *func = NULL; int hp_profile_flag = 1; func = hp_get_function_name(ops TSRMLS_CC); if (!func) { #if PHP_VERSION_ID < 50500 _zend_execute(ops TSRMLS_CC); #else _zend_execute_ex(execute_data TSRMLS_CC); #endif return; } BEGIN_PROFILING(&hp_globals.entries, func, hp_profile_flag); #if PHP_VERSION_ID < 50500 _zend_execute(ops TSRMLS_CC); #else _zend_execute_ex(execute_data TSRMLS_CC); #endif if (hp_globals.entries) { END_PROFILING(&hp_globals.entries, hp_profile_flag); } efree(func); }
END_PROFILING
hp_globals.mode_cb.end_fn_cb((entries) TSRMLS_CC);
This code eventually points to hp_mode_hier_endfn_cb, which mainly consists of a'=>'data format and calculates the number of calls to each method.
void hp_mode_hier_endfn_cb(hp_entry_t **entries TSRMLS_DC) { /* The last call to the entire stack */ hp_entry_t *top = (*entries); zval *counts; struct rusage ru_end; char symbol[SCRATCH_BUF_LEN]; long int mu_end; long int pmu_end; /* Get the stat array */ hp_get_function_stack(top, 2, symbol, sizeof(symbol)); if (!(counts = hp_mode_shared_endfn_cb(top, symbol TSRMLS_CC))) { return; } if (hp_globals.xhprof_flags & XHPROF_FLAGS_CPU) { /* Get CPU usage */ getrusage(RUSAGE_SELF, &ru_end); /* Bump CPU stats in the counts hashtable */ hp_inc_count(counts, "cpu", (get_us_interval(&(top->ru_start_hprof.ru_utime), &(ru_end.ru_utime)) + get_us_interval(&(top->ru_start_hprof.ru_stime), &(ru_end.ru_stime))) TSRMLS_CC); } if (hp_globals.xhprof_flags & XHPROF_FLAGS_MEMORY) { /* Get Memory usage */ mu_end = zend_memory_usage(0 TSRMLS_CC); pmu_end = zend_memory_peak_usage(0 TSRMLS_CC); /* Bump Memory stats in the counts hashtable */ hp_inc_count(counts, "mu", mu_end - top->mu_start_hprof TSRMLS_CC); hp_inc_count(counts, "pmu", pmu_end - top->pmu_start_hprof TSRMLS_CC); } }
hp_mode_shared_endfn_cb
This method counts the number of calls and the elapsed time, and in fact all the data is eventually stored in the linked list constructed by hp_entry_t.
zval * hp_mode_shared_endfn_cb(hp_entry_t *top, char *symbol TSRMLS_DC) { zval *counts; uint64 tsc_end; /* Get end tsc counter */ tsc_end = cycle_timer(); /* Get the stat array */ if (!(counts = hp_hash_lookup(symbol TSRMLS_CC))) { return (zval *) 0; } /* Bump stats in the counts hashtable */ hp_inc_count(counts, "ct", 1 TSRMLS_CC); hp_inc_count(counts, "wt", get_us_from_tsc(tsc_end - top->tsc_start, hp_globals.cpu_frequencies[hp_globals.cur_cpu_id]) TSRMLS_CC); return counts; }