Easy to make mistakes | ten most common mistakes for PHP developers

PHP language makes WEB programming simple, which is why it can become popular. But also because of its simplicity, PHP has slowly developed into a relatively complex language. There are endless frameworks. Various language features and version differences often make us head big and have to waste a lot of time debugging. This article lists the ten most error prone places that deserve our attention. Error prone #1: leave array reference after foreach loop How does foreach traversal work in PHP? If you want to manipulate each element of the array when you want to traverse the array, it is convenient to use references in the foreach loop, for example

$arr = array(1, 2, 3, 4);  
foreach ($arr as &$value) {  
        $value = $value * 2;  
}  
// $arr is now array(2, 4, 6, 8) 

The problem is that if you don't pay attention, it will lead to some unexpected negative effects. In the above example, after the code is executed, value remains in the scope and retains the reference to the last element of the array. Subsequent operations related to value inadvertently modify the value of the last element in the array.

Remember that foreach does not produce a block level scope. Therefore, in the above example, value is a global reference variable. In the foreach traversal, each iteration will form a reference to the next element of arr. After the traversal, value will reference the last element of arr and remain in the scope. This behavior will lead to some difficult and confusing bug s. The following is an example

$array = [1, 2, 3];  
echo implode(',', $array), "\n";  

foreach ($array as &$value) {}    // Traversal by reference  
echo implode(',', $array), "\n";  

foreach ($array as $value) {}     // Traversal by assignment  
echo implode(',', $array), "\n";  

The above code will be output

1,2,3  

1,2,3  

1,2,2   

You're right. The last value in the last line is 2, not 3. Why? After the first foreach traversal, the array does not change, but as explained above, value leaves a dangerous reference to the last element of the array (because foreach obtains value by reference), which leads to this "strange thing" when running to the second foreach. When value is obtained by assignment, foreach copies the elements of each array to

  • Step 1: copy array[0] (that is, 1) to value (value is actually the reference of the last element of array, that is, array[2]), so array[2] is now equal to 1. therefore
  • Step 2: copy array[1] (that is, 2) to value (the reference of array[2]), so array[2] is now equal to 2. therefore
  • Step 3: copy array[2] (now equal to 2) to value (reference of array[2]), so array[2] is now equal to 2. therefore

In order to avoid this trouble by using references in foreach conveniently, please unset() this variable that retains references after the execution of foreach. for example

$arr = array(1, 2, 3, 4);  
foreach ($arr as &$value) {  
    $value = $value * 2;  
}  
unset($value);   // $value no longer references $arr[3]  

Common error #2: misinterpret the behavior of isset() Although the name is isset, isset() returns false not only when the variable does not exist, but also when the variable value is null.

This behavior is more difficult than the initial problem, and it is also a common source of error. Look at the following code:

$data = fetchRecordFromStorage($storage, $identifier);  
if (!isset($data['keyShouldBeSet']) {  
    // do something here if 'keyShouldBeSet' is not set  
}  

The developer must want to confirm whether the keyshouldbaseset exists in the data. However, as mentioned above, if data ['keyshouldbaseset '] exists and the value is null, isset(

if ($_POST['active']) {  
    $postData = extractSomething($_POST);  
}  

if (!isset($postData)) {  
    echo 'post not active';  
}  

In the above code, it is generally believed that if _POST['active '] returns true, postData must exist, so isset(postData) will also return true. On the contrary, the only possibility that isset(postData) returns false is that _POST['active'] also returns false. However, this is not the case! As I said, if postData exists and is set to null, isset(postData) False will also be returned. That is, isset(postData) may return false even if _POST['active '] returns true. Again, the above logic is not rigorous. By the way, if the intention of the above code is to reconfirm whether _POST['active'] returns true, it depends on isset() It's a bad decision for either scenario. It's better to check _POST['active '] again, that is:

if ($_POST['active']) {  
    $postData = extractSomething($_POST);  
}  

if ($_POST['active']) {  
    echo 'post not active';  
}  

In this case, although it is important to check whether a variable really exists (that is, to distinguish whether a variable is not set or set to null), using the array_key_exists() function is a more robust solution. For example, we can rewrite the first example above as follows:

$data = fetchRecordFromStorage($storage, $identifier);  
if (! array_key_exists('keyShouldBeSet', $data)) {  
    // do this if 'keyShouldBeSet' isn't set  
}  

In addition, by combining array_key_exists() and get_defined_vars(), we can more reliably judge whether a variable exists in the current scope:

if (array_key_exists('varShouldBeSet', get_defined_vars())) {  
    // variable $varShouldBeSet exists in current scope  
}  

Common error #3: confusion about return by reference and return by value Consider the following code snippet:

class Config  
{  
    private $values = [];  
    public function getValues() {  
        return $this->values;  
    }  
}  
$config = new Config();  
$config->getValues()['test'] = 'test';  
echo $config->getValues()['test'];  

If you run the above code, you will get the following output:

PHP Notice:  Undefined index: test in /path/to/my/script.php on line 21    

What's wrong? The problem with the above code is that it doesn't make clear the difference between returning an array by reference and returning an array by value. Unless you explicitly tell PHP to return an array by reference (for ex amp le, use &), PHP will "pass value" by default Returns this array. This means that a copy of this array will be returned, so the called function and the array accessed by the caller are not the same array instance. So the above call to getValues() will return a copy of the $values array instead of a reference to it. With this in mind, let's review the two key lines in the above example:

// getValues() returns a copy of the $values array  
// So the 'test' element is added to the copy, not the $values array itself.  
$config->getValues()['test'] = 'test';  
// getValues() returns another copy of the $values array  
// And this copy does not contain a 'test' element (which is why we get the "undefined index" message).  
echo $config->getValues()['test']; 

One possible modification method is to store the copy of the $values array returned through getValues() for the first time, and then carry out subsequent operations on that copy; for example:

$vals = $config->getValues();  
$vals['test'] = 'test';  
echo $vals['test'];   

This code will work normally (for example, it will output test without generating any "undefined index" message), but this method may not meet your needs. In particular, the above code will not modify the original values array. If you want to modify the original array (for example, add a test element), you need to modify getValues() Function to return a reference to the values array itself. This function will return a reference by adding a & before the function name; for example:

class Config  
{  
   private $values = [];  
    // Returns a reference to the $values array  
    public function &getValues() {  
        return $this->values;  
    }  
}  
$config = new Config();  
$config->getValues()['test'] = 'test';  
echo $config->getValues()['test'];  

This will output the expected test. But now to make things more confusing, consider the following code snippet:

class Config  
{  
    private $values;  
    // Use array objects instead of arrays  
    public function __construct() {  
        $this->values = new ArrayObject();  
    }  

    public function getValues() {  
        return $this->values;  
    }  
}  

$config = new Config();  
$config->getValues()['test'] = 'test';  
echo $config->getValues()['test'];  

If you think this code will cause the same "undefined index" error as the previous array example, you are wrong. In fact, this code will run normally. The reason is that, unlike arrays, PHP always passes objects by reference. (ArrayObject is an SPL object, which completely imitates the usage of array, but works as an object.) As the above example shows, you should handle it by reference or copy, which is usually not obvious. Therefore, it is necessary to understand these default behaviors (for example, variables and arrays are passed by value; objects are passed by reference) and carefully check the API documentation of the function you are going to call to see whether it returns a value, a copy of the array, a reference to the array or a reference to the object. Nevertheless, we should realize that we should try to avoid returning an array or ArrayObject, because this will enable the caller to modify the private data of the instance object. This destroys the encapsulation of objects. Therefore, the best way is to use traditional "getters" and "setters", for example:

class Config  
{  
    private $values = [];  
    public function setValue($key, $value) {  
        $this->values[$key] = $value;  
    }  

    public function getValue($key) {  
        return $this->values[$key];  
    }  
}  

$config = new Config();  
$config->setValue('testKey', 'testValue');  
echo $config->getValue('testKey');    // Output "testValue"   

This method allows the caller to set or get any value in the array without public access to the private $values array itself. Common error #4: execute query in loop If it's like this, it's not hard to see that your PHP doesn't work properly.

$models = [];  
foreach ($inputValues as $inputValue) {  
    $models[] = $valueRepository->findByValue($inputValue);  
}    

There may be no real error here, but if you follow the logic of the code, you may find that this seemingly harmless call $valuerepository - > findbyvalue() finally executes such a query, for example:

$result = $connection->query("SELECT `x`,`y` FROM `values` WHERE `value`=" . $inputValue);  

As a result, each cycle will generate a query to the database. Therefore, if you provide an array of 1000 values for this loop, it will generate 1000 separate requests for resources! If such a script is called in multiple threads, it will have the potential risk of system crash. Therefore, it is very important that when your code wants to query, you should collect the required values as much as possible, and then get all the results in a query. One area where we often see inefficient queries (for example, in a loop) is to use the values in an array (for example, many IDS) to make requests to the table. To retrieve all the data of each ID, the code will iterate the array and make an SQL query request for each ID. it often looks like this:

$data = [];  
foreach ($ids as $id) {  
    $result = $connection->query("SELECT `x`, `y` FROM `values` WHERE `id` = " . $id);  
    $data[] = $result->fetch_row();  
}  

However, the same work can be done more efficiently with only one SQL query statement, such as the following:

$data = [];  
if (count($ids)) {  
    $result = $connection->query("SELECT `x`, `y` FROM `values` WHERE `id` IN (" . implode(',', $ids));  
    while ($row = $result->fetch_row()) {  
        $data[] = $row;  
    }  
}  

So be sure to recognize this query when your code makes a query request directly or indirectly. Try to get the desired results through one query. However, we should still be careful, otherwise there may be another easy mistake we want to talk about below Frequently asked questions #5: memory usage spoofing and inefficiency Fetching multiple records at a time is certainly more efficient than fetching one record at a time, but when we use the mysql extension of PHP, it may also become a condition that leads to "out of memory" in libmysqlclient. Let's demonstrate it in a test box. The environment of the test box is: limited memory (512MB RAM), MySQL, and PHP cli. We will guide a data table as follows:

// Connect to mysql  
$connection = new mysqli('localhost', 'username', 'password', 'database');  

// Create 400 fields  
$query = 'CREATE TABLE `test`(`id` INT NOT NULL PRIMARY KEY AUTO_INCREMENT';  
for ($col = 0; $col < 400; $col++) {  
    $query .= ", `col$col` CHAR(10) NOT NULL";  
}  
$query .= ');';  
$connection->query($query);  

// Write 2 million rows of data  
for ($row = 0; $row < 2000000; $row++) {  
    $query = "INSERT INTO `test` VALUES ($row";  
    for ($col = 0; $col < 400; $col++) {  
        $query .= ', ' . mt_rand(1000000000, 9999999999);  
    }  
    $query .= ')';  
    $connection->query($query);  
}  

OK, now let's take a look at the memory usage:

// Connect to mysql  
$connection = new mysqli('localhost', 'username', 'password', 'database');  
echo "Before: " . memory_get_peak_usage() . "\n";  

$res = $connection->query('SELECT `x`,`y` FROM `test` LIMIT 1');  
echo "Limit 1: " . memory_get_peak_usage() . "\n";  

$res = $connection->query('SELECT `x`,`y` FROM `test` LIMIT 10000');  
echo "Limit 10000: " . memory_get_peak_usage() . "\n";

The output is:

Before: 224704  
Limit 1: 224704  
Limit 10000: 224704  

Cool. It seems that in terms of memory usage, the memory of this query is safely managed internally. To make this clearer, we doubled the limit to 100000. Well, if we do, we will get the following results:

PHP Warning:  mysqli::query(): (HY000/2013): Lost connection to MySQL server during query in /root/test.php on line 11    

What the hell happened? This involves the working mode of mysql module of PHP. In fact, it is just an agent of libmysqlclient, which is responsible for doing dirty and hard work. Every time it finds out part of the data, it immediately puts the data into memory. Since this memory has not been managed by PHP, when we increase the limited number in the query, memory_get_peak_usage() does not show any increased resource usage. We were deceived by the complacent idea of "memory management is OK", which led to the problem in the above demonstration. To be honest, our memory management is indeed flawed, and we will encounter the problems shown above. If you use the mysqlnd module, you can at least avoid the above Deception (although it will not improve your memory utilization by itself). Mysqlnd is compiled as a native PHP extension and does use PHP's memory manager. Therefore, if we use mysqlnd instead of mysql, we will get more real information about memory utilization:

Before: 232048  
Limit 1: 324952  
Limit 10000: 32572912      

By the way, it's worse than just now. According to the PHP documentation, mysql uses twice the memory of mysql nd to store data, so the original script using mysql actually uses more memory than shown here (about twice).

To avoid this problem, consider limiting the number of queries you can query, using a smaller number to cycle through, like this:

$totalNumberToFetch = 10000;  
$portionSize = 100;  
for ($i = 0; $i <= ceil($totalNumberToFetch / $portionSize); $i++) {  
    $limitFrom = $portionSize * $i;  
    $res = $connection->query("SELECT `x`,`y` FROM `test` LIMIT $limitFrom, $portionSize");  
}  

When we consider this common error #4 together with the common error above, we will realize that our code ideal needs to achieve a balance between the two. Whether to make the query granular and repetitive, or to make a single query huge. Life is the same, balance is indispensable; Either extreme is bad, which may cause PHP to not work properly. Common error #6: ignoring Unicode/UTF-8 issues In a sense, this is actually a problem of PHP itself, not a problem you encounter when debugging PHP, but it has never been properly solved. The core of PHP 6 is to support Unicode. But it was put on hold with the suspension of PHP 6 in 2010. This does not mean that developers can avoid handling UTF-8 correctly and avoiding the assumption that all strings must be "old ASCII". Code that does not properly handle non ASCII strings becomes notorious for introducing coarse Heisenberg bug s. When a person whose name contains "Schr ö dinger" registers with your system, even a simple strlen($_POST['name ']) call will have problems. Here are some lists to avoid this problem:

  • If you don't know UTF-8, you should at least know the basics. Here's a good introduction.
  • Ensure mb is used_* Function instead of the old string processing function (you need to ensure that your PHP build version has the "multibyte" extension enabled).
  • Make sure your database and tables are Unicode encoded (many MySQL builds still use latin1 by default).
  • Remember json_encode() will convert non ASCII identifiers (for example, "Schr ö dinger" will be converted to "Schru00f6dinger"), but serialize() will not convert.
  • Make sure that the PHP file is also UTF-8 encoded to avoid conflicts when connecting hard coded strings or configuring string constants.

Common error #7: think_ POST always contains your POST data, regardless of its name_ The POST array does not always contain your POST data, it may also be empty. To understand this, let's take a look at the following example. Suppose we use jQuery.ajax() to simulate a service request, as follows:

// js  
$.ajax({  
    url: 'http://my.site/some/path',  
    method: 'post',  
    data: JSON.stringify({a: 'a', b: 'b'}),  
    contentType: 'application/json'  
});   

(by the way, note the contentType: 'application/json'. We use JSON to send data, which is very popular in the interface. This is the default data sending type in AngularJS http service.) on the server side of our example, let's simply print it_ POST array:

// php  
var_dump($_POST);  

Strangely, the results are as follows:

array(0) { }  

Why? What happened to our JSON string {a: 'a', b: 'b'}? The reason is that when the content type is application/x-www-form-urlencoded or multipart / form data, PHP will only automatically parse the valid content of a POST. There are historical reasons for this - these two content types are in PHP_ Two important types that have been used before POST implementation. So no matter what other content types are used (even those popular now, such as application/json), PHP will not automatically load valid content into POST. Since_ POST is a super global variable. If we rewrite it once (as early as possible in our script), the modified value (including the valid content of POST) will be referenced in our code. This is important because_ POST has been widely used by PHP framework and almost all custom scripts to obtain and pass request data. Therefore, for example, when processing a POST valid content with content type of application/json, we need to manually parse the requested content (decode the JSON data) and overwrite it_ POST variables, as follows:

// php  
$_POST = json_decode(file_get_contents('php://input'), true);  

Then when we print$_ When the POST array is, we can see that it correctly contains the valid contents of POST; As follows:

array(2) { ["a"]=> string(1) "a" ["b"]=> string(1) "b" }    

Common error #8: think PHP supports single character data types Read the following code and think about what will be output:

for ($c = 'a'; $c <= 'z'; $c++) {  
    echo $c . "\n";  
}    

If your answer is a to z, you may be surprised that this is a wrong answer. Yes, it does output a to z, but it will continue to output aa to yz. Let's see why. There is no char data type in PHP; Only string type can be used. Remember, adding z of string type in PHP will get aa:

php> $c = 'z'; echo ++$c . "\n";  
aa    

What is less confusing is that the dictionary order of aa is less than z:

php> var_export((boolean)('aa' < 'z')) . "\n";  

true   

This is why the simple code above will output a to z, and then continue to output aa to yz. It stops at za, which is the first one bigger than z:

php> var_export((boolean)('za' < 'z')) . "\n";  

false

In fact, in PHP, there is a suitable way to output the values of a to z in the loop:

for ($i = ord('a'); $i <= ord('z'); $i++) {  
    echo chr($i) . "\n";  
}  

Or this:

$letters = range('a', 'z');  
for ($i = 0; $i < count($letters); $i++) {  
    echo $letters[$i] . "\n";  
}  

Common error #9: ignoring code specification Although ignoring code standards does not directly lead to the need to debug PHP code, this is probably the most important thing to talk about. Ignoring code specifications in a project can lead to a lot of problems. The most optimistic prediction is that the code is inconsistent (before that, every developer was "doing his own thing"). But the worst result is that PHP code cannot run or it is difficult (sometimes impossible) to pass smoothly, which is also difficult for debugging code, improving performance and maintaining projects. And that means reducing your team's productivity and adding a lot of extra (or at least unnecessary) energy consumption. Fortunately, for PHP developers, there is a PHP coding standard recommendation (PSR), which consists of the following five standards: PSR-0: auto load standard PSR-1: basic coding standard PSR-2: coding style guidance PSR-3: log interface PSR-4: auto load enhanced PSR was originally created by the largest organization platform maintainer in the market. Zend, Drupal, Symfony, Joomla and others have contributed to these standards and have always adhered to them. Even PEAR, which tried to become a standard many years ago, is now added to PSR. In a sense, what your code standard is is almost unimportant, as long as you follow a standard and stick to it, but generally speaking, following PSR is a good idea, unless there are other irresistible reasons on your project. More and more teams and projects are complying with PSR. On this point, most PHP developers have reached a consensus. Therefore, using PSR code standard is conducive to making new developers feel more familiar and comfortable with your code standard. Common error #10: abuse empty() Some PHP developers like to use empty() for Boolean checking on almost everything. However, in some cases, this can lead to confusion. First, let's go back to arrays and ArrayObject instances (similar to arrays). Considering their similarity, it is easy to assume that their behavior is the same. However, it turns out to be a dangerous assumption. For example, in PHP 5.0:

// PHP 5.0 or later:  
$array = [];  
var_dump(emptyempty($array));        // Output bool(true)  
$array = new ArrayObject();  
var_dump(emptyempty($array));        // Output bool(false)  
// Why don't these two methods produce the same output? 

Worse, the results before PHP 5.0 may be different:

// Before PHP 5.0:  
$array = [];  
var_dump(emptyempty($array));        // Output bool(false)  
$array = new ArrayObject();  
var_dump(emptyempty($array));        // Output bool(false)  

The misfortune of this method is very common. For example, in Zend's Framework 2, Zend\Db\TableGateway's TableGateway::select() results the way to return data when calling current(), as the document shows. Developers can easily become victims of such data errors. To avoid these problems, a better way is to use count() to check the empty array structure:

// Note that this will work in all versions of PHP (before and after 5.0):  
$array = [];  
var_dump(count($array));        // Output int(0)  
$array = new ArrayObject();  
var_dump(count($array));        // Output int(0)    

Incidentally, since PHP converts 0 to false, count() can be used inside the if() condition to check for empty arrays. It is also worth noting that in PHP, count() is a constant complexity (O(1) operation) in the array, which more clearly shows that it is the right choice. Another dangerous example of using empty() is when it is associated with magic methods_ Use with get(). Let's define two classes and make them have a test attribute. First, we define a Regular class that contains the test public attribute.

class Regular  
{  
    public $test = 'value';  
}     

Then we define the Magic class, where we use the Magic method__ get() to access its test attribute:

class Magic  
{  
    private $values = ['test' => 'value'];  
    public function __get($key)  
    {  
        if (isset($this->values[$key])) {  
            return $this->values[$key];  
        }  
    }  
}  

Well, now let's try to access the test attribute in each class to see what happens:

$regular = new Regular();  
var_dump($regular->test);    // Output string(4) "value"  
$magic = new Magic();  
var_dump($magic->test);      // Output string(4) "value"

So far so good. But now when we call empty() on each of them, let's see what happens:

var_dump(emptyempty($regular->test));    // Output bool(false)  
var_dump(emptyempty($magic->test));      // Output bool(true)  

Cough. Therefore, if we rely on empty(), we may mistakenly think that the attribute test of $magic is empty, but in fact it is set to 'value'. Unfortunately, if the class uses magic methods__ get() to get the property value, there is no foolproof way to check whether the property value is empty. Outside the scope of the class, you can only check whether a null value will be returned. This does not mean that the corresponding key is not set, because it may actually be set to null. On the contrary, if we try to reference an attribute that does not exist in the Regular class instance, we will get a notification similar to the following:

Notice: Undefined property: Regular::$nonExistantTest in /path/to/test.php on line 10  
Call Stack:  0.0012     234704   1. {main}() /path/to/test.php:0   

So the main point here is that the empty() method should be used with caution because it can lead to confusing -- and potentially misleading -- results if not careful. summary PHP's ease of use makes developers fall into a false sense of comfort. Some nuances and characteristics of the language itself may take you a lot of time to debug. These may cause PHP programs to not work properly and cause the problems described here. PHP has changed significantly in its 20-year history. It's worth taking the time to familiarize yourself with the subtleties of the language itself, because it helps to ensure that the software you write is more scalable, robust and maintainable.

Added by gmann001 on Mon, 01 Nov 2021 09:47:37 +0200