PHP7.4 a new way to expand FFI

With PHP 7.4, there is an extension that I think is very useful: PHP FFI (Foreign Function interface), which refers to a description in PHP FFI RFC

 

For PHP, FFI opens a way to write PHP extensions and bindings to C libraries in pure PHP.

Yes, FFI provides high-level languages to call each other directly. For PHP, FFI allows us to call various libraries written in C language conveniently.

In fact, a large number of existing PHP extensions are the packaging of some existing C libraries, some commonly used mysqli, curl, gettext, etc. PECL also has a large number of similar extensions.

In the traditional way, when we need to use some existing C language library capabilities, we need to use C language to write wrappers and package them into extensions. In this process, we need to learn how to write PHP extensions. Of course, there are some convenient ways, some Zephir. But there are still some learning costs, and with FFI, we can directly call the functions in the library written in C language in the PHP script.

In the decades of history of C language, the accumulated excellent library, FFI directly allows us to enjoy this huge resource conveniently.

To get back to the point, today I'll use an example to show how we can use PHP to call libcurl to grab the content of a web page. Why use libcurl? Isn't PHP already extended with curl? Well, first of all, I'm familiar with libcurl's api. Second, it's just because I have it that I can compare it. Isn't the direct usability of traditional extension AS and FFI?

First of all, let's take the current article you read as an example. Now I need to write a piece of code to grab its content. If we use the curl extension of traditional PHP, we will probably write as follows:

<?php

  

$url = "https://www.laruence.com/2020/03/11/5475.html";

$ch = curl_init();

  

curl_setopt($ch, CURLOPT_URL, $url);

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);

  

curl_exec($ch);

  

curl_close($ch);

(because my website is https, there will be another operation to set SSL ﹐ verify peer). What about using FFI?

First of all, you need to enable ext / ffi of PHP 7.4. Note that PHP-FFI requires libffi-3 or above.

Then, we need to tell PHP FFI what the function prototype we want to call is. We can use FFI:: cdef for this. Its prototype is:

FFI::cdef([string $cdef = "" [, string $lib = null]]): FFI 

In the string $cdef, we can write the C language function declaration, FFI will parse it, understand the signature of the function we want to call in the string $lib library. In this example, we use 31 libcurl functions, their declarations can be found in the libcurl document, some about curl_easy_init.

For this example, we write a curl.php, which contains all the things to declare. The code is as follows:

$libcurl = FFI::cdef(<<<CTYPE
void *curl_easy_init();
int curl_easy_setopt(void *curl, int option, ...);
int curl_easy_perform(void *curl);
void curl_easy_cleanup(void *handle);
CTYPE
 , "libcurl.so"
 );

Here is a place where the return value written in the document is CURL *, but in fact, because our example will not dereference it, just pass it, then avoid the trouble and replace it with void *.

However, there is also a trouble that PHP is predefined:

<?php
const CURLOPT_URL = 10002;
const CURLOPT_SSL_VERIFYPEER = 64;
  
$libcurl = FFI::cdef(<<<CTYPE
void *curl_easy_init();
int curl_easy_setopt(void *curl, int option, ...);
int curl_easy_perform(void *curl);
void curl_easy_cleanup(void *handle);
CTYPE
 , "libcurl.so"
 );

OK, the definition part is finished. Now we finish the actual logic part. The whole code will be:

<?php
require "curl.php";
  
$url = "https://www.laruence.com/2020/03/11/5475.html";
  
$ch = $libcurl->curl_easy_init();
$libcurl->curl_easy_setopt($ch, CURLOPT_URL, $url);
$libcurl->curl_easy_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
  
$libcurl->curl_easy_perform($ch);
  
$libcurl->curl_easy_cleanup($ch);

How about using curl extension in proportion? Is it the same concise way?

Next, we'll make it a little bit more complicated, until, if we don't want the result to be output directly, but to return it as a string, for the curl extension of PHP, we only need to call curl  setup to set curlopt  returntransfer to 1, but in libcurl, we don't have the ability to return the string directly, or we provide an alternative function of WRITEFUNCTION When data is returned, libcurl will call this function. In fact, PHP curl extension does the same.

At present, we can't directly pass a PHP function as an additional function to libcurl through FFI, so we have two ways to do it:

1. With WRITEDATA, the default libcurl will call fwrite as a variable function, and we can give libcurl an fd through WRITEDATA, so that it will not write stdout, but write to this fd

2. We write a C to simple function by ourselves, which comes in through the date of FFI and passes it to libcurl.

First, we need to use fopen. This time, we define a C header file to declare the prototype (file.h):

void *fopen(char *filename, char *mode);
void fclose(void * fp);

Like file.h, we put all libcurl function statements in curl.h

#define FFI_LIB "libcurl.so"
  
void *curl_easy_init();
int curl_easy_setopt(void *curl, int option, ...);
int curl_easy_perform(void *curl);
void curl_easy_cleanup(CURL *handle);

Then we can use FFI:: load to load the. h file:

static function load(string $filename): FFI;

But how to tell FFI to load the corresponding library? As shown above, we define a macro of ffi﹣lib to tell FFI that these functions come from libcurl.so. When we use FFI:: load to load the h file, PHP FFI will automatically load libcurl.so

Then why does fopen not need to specify a loading library? That's because FFI will also look up symbols in the variable symbol table. Fopen is a standard library function, which has existed for a long time.

OK, now the whole code will be:

<?php
const CURLOPT_URL = 10002;
const CURLOPT_SSL_VERIFYPEER = 64;
const CURLOPT_WRITEDATA = 10001;
  
$libc = FFI::load("file.h");
$libcurl = FFI::load("curl.h");
  
$url = "https://www.laruence.com/2020/03/11/5475.html";
$tmpfile = "/tmp/tmpfile.out";
  
$ch = $libcurl->curl_easy_init();
$fp = $libc->fopen($tmpfile, "a");
  
$libcurl->curl_easy_setopt($ch, CURLOPT_URL, $url);
$libcurl->curl_easy_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$libcurl->curl_easy_setopt($ch, CURLOPT_WRITEDATA, $fp);
$libcurl->curl_easy_perform($ch);
  
$libcurl->curl_easy_cleanup($ch);
  
$libc->fclose($fp);
  
$ret = file_get_contents($tmpfile);
@unlink($tmpfile);

But this way is to use a temporary transfer file, which is not elegant enough. Now we use the second way. To use the second way, we need to write an alternative function in C and pass it to libcurl:

#include <stdlib.h>
#include <string.h>
#include "write.h"
  
size_t own_writefunc(void *ptr, size_t size, size_t nmember, void *data) {
 own_write_data *d = (own_write_data*)data;
 size_t total = size * nmember;
  
 if (d->buf == NULL) {
 d->buf = malloc(total);
 if (d->buf == NULL) {
 return 0;
 }
 d->size = total;
 memcpy(d->buf, ptr, total);
 } else {
 d->buf = realloc(d->buf, d->size + total);
 if (d->buf == NULL) {
 return 0;
 }
 memcpy(d->buf + d->size, ptr, total);
 d->size += total;
 }
  
 return total;
}
  
void * init() {
 return &own_writefunc;
}

Note the initial function here, because in PHP FFI, we can't get a function pointer directly in the current version (2020-03-11), so we define this function to return the address of own_writefunc.

Finally, we define the header file write.h used above:

#define FFI_LIB "write.so"
  
typedef struct _writedata {
 void *buf;
 size_t size;
} own_write_data;
  
void *init();

Note that we have also defined FFI ﹐ Lib in the header file, so that this header file can be used by write.c and our PHP FFI at the same time.

Then we compile the write function as a dynamic library:

gcc -O2 -fPIC -shared  -g  write.c -o write.so

Now, the whole code will be:

<?php
const CURLOPT_URL = 10002;
const CURLOPT_SSL_VERIFYPEER = 64;
const CURLOPT_WRITEDATA = 10001;
const CURLOPT_WRITEFUNCTION = 20011;
  
$libcurl = FFI::load("curl.h");
$write = FFI::load("write.h");
  
$url = "https://www.laruence.com/2020/03/11/5475.html";
  
$data = $write->new("own_write_data");
  
$ch = $libcurl->curl_easy_init();
  
$libcurl->curl_easy_setopt($ch, CURLOPT_URL, $url);
$libcurl->curl_easy_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$libcurl->curl_easy_setopt($ch, CURLOPT_WRITEDATA, FFI::addr($data));
$libcurl->curl_easy_setopt($ch, CURLOPT_WRITEFUNCTION, $write->init());
$libcurl->curl_easy_perform($ch);
  
$libcurl->curl_easy_cleanup($ch);
  
ret = FFI::string($data->buf, $data->size);

Here, we use FFI:: new ($write - > New) to allocate the memory of a structure, write data:

function FFI::new(mixed $type [, bool $own = true [, bool $persistent = false]]): FFI\CData 

$own indicates whether this memory management adopts PHP memory management. Sometimes, the memory we applied for will go through PHP life cycle management and do not need to be released actively, but sometimes you may want to manage it yourself. Then you can set $own to flash, and you need to call FFI:: free to release it actively when appropriate.

Then we pass $data as WRITEDATA to libcurl, where we use FFI:: addr to get the actual memory address of $data:

static function addr(FFI\CData $cdata): FFI\CData;

Then we pass the own write func as the write function to libcurl, so that when it returns, libcurl will call our own own write func to handle the return, and at the same time, we will pass the write data as a custom parameter to our alternative function.

Finally, we use FFI:: string to convert a piece of memory into PHP string:

static function FFI::string(FFI\CData $src [, int $size]): string 

All right, let's run?

However, if so is loaded directly in PHP for every request, it will be a big performance problem, so we can also use the preload mode. In this mode, we use opcache.preload to load when PHP starts:

ffi.enable=1
opcache.preload=ffi_preload.inc
ffi_preload.inc: 

<?php
FFI::load("curl.h");
FFI::load("write.h");

But what about the FFI we reference to load? Therefore, we need to modify these two. h header files and add FFI_SCOPE, such as curl.h:

#define FFI_LIB "libcurl.so"
#define FFI_SCOPE "libcurl"
  
void *curl_easy_init();
int curl_easy_setopt(void *curl, int option, ...);
int curl_easy_perform(void *curl);
void curl_easy_cleanup(void *handle);

Correspondingly, we added the FFI scope to write.h as "write", and now our script should look like this:

<?php
const CURLOPT_URL = 10002;
const CURLOPT_SSL_VERIFYPEER = 64;
const CURLOPT_WRITEDATA = 10001;
const CURLOPT_WRITEFUNCTION = 20011;
  
$libcurl = FFI::scope("libcurl");
$write = FFI::scope("write");
  
$url = "https://www.laruence.com/2020/03/11/5475.html";
  
$data = $write->new("own_write_data");
  
$ch = $libcurl->curl_easy_init();
  
$libcurl->curl_easy_setopt($ch, CURLOPT_URL, $url);
$libcurl->curl_easy_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$libcurl->curl_easy_setopt($ch, CURLOPT_WRITEDATA, FFI::addr($data));
$libcurl->curl_easy_setopt($ch, CURLOPT_WRITEFUNCTION, $write->init());
$libcurl->curl_easy_perform($ch);
  
$libcurl->curl_easy_cleanup($ch);
  
ret = FFI::string($data->buf, $data->size);

That is, instead of FFI:: load, we now use FFI:: scope to reference the corresponding function.

static function scope(string $name): FFI;

Then there is another problem. Although FFI has given us a large scale, it is still very risky to call C library functions directly. We should only allow users to call the functions we have confirmed, so ffi.enable = preload should be on the stage. When we set ffi.enable= For preload, only functions in opcache.preload script can call FFI, while functions written by users can't be called directly.

Let's change FFI preload.inc to FFI safe preload.inc

<?php
class CURLOPT {
 const URL = 10002;
 const SSL_VERIFYHOST = 81;
 const SSL_VERIFYPEER = 64;
 const WRITEDATA = 10001;
 const WRITEFUNCTION = 20011;
}
  
FFI::load("curl.h");
FFI::load("write.h");
  
function get_libcurl() : FFI {
 return FFI::scope("libcurl");
}
  
function get_write_data($write) : FFI\CData {
 return $write->new("own_write_data");
}
  
function get_write() : FFI {
 return FFI::scope("write");
}
  
function get_data_addr($data) : FFI\CData {
 return FFI::addr($data);
}
  
function paser_libcurl_ret($data) :string{
 return FFI::string($data->buf, $data->size);
}

In other words, we define all the functions that will call the FFI API in the preload script, and then our example will become (ffi_safety. PHP):

<?php
$libcurl = get_libcurl();
$write =  get_write();
$data = get_write_data($write);
  
$url = "https://www.laruence.com/2020/03/11/5475.html";
  
  
$ch = $libcurl->curl_easy_init();
  
$libcurl->curl_easy_setopt($ch, CURLOPT::URL, $url);
$libcurl->curl_easy_setopt($ch, CURLOPT::SSL_VERIFYPEER, 0);
$libcurl->curl_easy_setopt($ch, CURLOPT::WRITEDATA, get_data_addr($data));
$libcurl->curl_easy_setopt($ch, CURLOPT::WRITEFUNCTION, $write->init());
$libcurl->curl_easy_perform($ch);
  
$libcurl->curl_easy_cleanup($ch);
  
$ret = paser_libcurl_ret($data); 

In this way, through ffi.enable = preload, we can limit that all FFI API s can only be called by our controllable preload script, and users cannot directly call it. So we can do a good job of security assurance in these functions, so as to ensure a certain degree of security.

Well, after this example, you should have a deeper understanding of FFI. For a detailed description of PHP API, please refer to: PHP-FFI Manual , if you are interested, go find a C library, try it?

For the example of this article, you can download it on my github: FFI example

In the end, let's say one more thing. The example is just to demonstrate the function, so a lot of wrong branch judgment capture is omitted. When you write it, you still need to add it. After all, if you use FFI, you will have 1000 ways to make PHP segfault crash, so be careful

Keywords: PHP curl C SSL

Added by pgudge on Tue, 28 Apr 2020 15:57:51 +0300