Introduction to Linux Capabilities -- basic practical part

The series is divided into three articles:

Last article This paper introduces the birth background and basic principle of Linux capabilities. This paper will show how to view and set file capabilities through specific examples.

Linux system mainly provides two tools to manage capabilities: libcap and libcap ng. Libcap provides getcap and setcap commands to view and set the capabilities of the file respectively, and also provides capsh to view the capabilities of the current shell process. Libcap ng is easier to use. Use the same command filecap to view and set capabilities.

1. libcap

The installation is simple. Take CentOS as an example. You can install it through the following commands:

$ yum install -y libcap

If you want to view the capabilities of the current shell process, you can use the capsh command. The following is the output of capsh executed by root user in CentOS system:

$ capsh --print

Current: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read+ep
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)


  • Current: indicates the Effective capabilities and Permitted capabilities of the current shell process. It can contain multiple groups. The representation of each group is capability[,capability...] + (e|i|p), where e represents effective, I represents inheritable and P represents permitted. Different groups are separated by spaces, for example: Current: = cap_sys_chroot+ep cap_net_bind_service+eip. Another example is cap_net_bind_service+e cap_net_bind_service+ip and cap_net_bind_service+eip is equivalent.
  • Bounding set: it only refers to the capabilities in the bounding set, excluding other sets, so there is no need to add +... At the end of the group.
  • Securebits: I don't know what the hell this is.

The output information of this command is limited. For complete information, you can view the / proc file system. For example, the current shell process can view / proc/$$/status. One important status is NoNewPrivs, which can be viewed through the following command:

grep NoNewPrivs /proc/$$/status

NoNewPrivs:    0

according to prctl(2) As described in, since Linux 4.10, the NoNewPrivs value in / proc/[pid]/status represents the no of the thread_ new_ PRIVS attribute. As for no_ new_ What is PRIVS? Let me explain it separately.


In general, execve() system call can give the newly started process permission that its parent process does not have. The most common example is to set the uid and gid of the program process and the access permission of files through setuid and setgid. This gives malicious people a lot of loopholes. They can improve the permissions of the process directly through fork, so as to achieve the hidden purpose.

In order to solve this problem, the Linux kernel has introduced no since version 3.5_ new_ The PRIVS attribute (actually a bit, which can be turned on and off) provides the process with a method that can be continuously effective and safe during the whole stage of execve() call.

  • Open no_ new_ After PRIVS, the execve function can ensure that all operations can be executed only after calling execve() to judge and give permission. This ensures that neither thread nor child thread can obtain additional permissions, because setuid and setgid cannot be executed, and the permissions of the file cannot be set.
  • Once the no of the current thread_ new_ After PRIVS is set, the child thread generated by fork, clone or execve cannot clear the bit.

In Docker, you can enable no through the parameter -- Security opt_ new_ PRIVS attribute, for example: Docker run -- Security opt = no_new_privs busybox. Here is an example to experience No_ new_ The role of the PRIVS attribute.

First, create a piece of C code to display the valid user id of the current process:

$ cat testnnp.c

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>

int main(int argc, char *argv[])
        printf("Effective uid: %d\n", geteuid());
        return 0;
$ make testnnp
cc     testnnp.c   -o testnnp

Put the executable file into the docker image:

FROM fedora:latest
ADD testnnp /root/testnnp
RUN chmod +s /root/testnnp
ENTRYPOINT /root/testnnp

Build image:

$ docker build -t testnnp .
Step 1 : FROM fedora:latest
 ---> 760a896a323f
Step 2 : ADD testnnp /root/testnnp
 ---> 6c700f277948
Removing intermediate container 0981144fe404
Step 3 : RUN chmod +s /root/testnnp
 ---> Running in c1215bfbe825
 ---> f1f07d05a691
Removing intermediate container c1215bfbe825
Step 4 : ENTRYPOINT /root/testnnp
 ---> Running in 5a4d324d54fa
 ---> 44f767c67e30
Removing intermediate container 5a4d324d54fa
Successfully built 44f767c67e30

Let's do two experiments. First, start the container without opening no new privileges:

$ docker run -it --rm --user=1000  testnnp
Effective uid: 0

From the output results, as long as the SUID ID is set for the executable file, even if we use an ordinary user (UID=1000) to run the container, the effective user of the process will become root.

Then, start the container on the premise of opening no new privileges to prevent the execution of the executable with SUID ID set for UID conversion:

$ docker run -it --rm --user=1000 --security-opt=no-new-privileges testnnp
Effective uid: 1000

As you can see, no is turned on_ new_ After the PRIVS attribute, even if the SUID ID is set in the executable file, the valid user ID of the thread will not become root. In this way, even if the code in the image has security risks, it can still be prevented from being attacked by preventing it from upgrading its permissions.

Kubernetes can also turn on no_new_privs, but the logic is a little more complicated. When the value of the allowPrivilegeEscalation field under the SecurityContext definition of Pod is false (false by default), no will be enabled if any of the following conditions are not met_ new_ PRIVS attribute:

  • privileged=true is set
  • Added CAP_SYS_ADMIN capabilities, i.e. capAdd=CAP_SYS_ADMIN
  • Run as root, i.e. UID=0

For example, when privilege d = true and allowprivilegeeescalation = false are set, no will be turned on_ new_ PRIVS attribute. Similarly, capadd = cap is set_ SYS_ Admin and allowPrivilegeEscalation=false will not enable no_new_privs attribute.

Management capabilities

You can view the capabilities of the file through getcap, for example:

$ getcap /bin/ping /usr/sbin/arping

/bin/ping = cap_net_admin,cap_net_raw+p
/usr/sbin/arping = cap_net_raw+p

You can also use the - r parameter to recursively query:

$ getcap -r /usr 2>/dev/null

/usr/bin/ping = cap_net_admin,cap_net_raw+p
/usr/bin/newgidmap = cap_setgid+ep
/usr/bin/newuidmap = cap_setuid+ep
/usr/sbin/arping = cap_net_raw+p
/usr/sbin/clockdiff = cap_net_raw+p

If you want to view the capabilities of a process, you can directly use getpcaps, followed by the PID of the process:

$ getpcaps 1234

If you want to view the capabilities of a group of interrelated threads (such as nginx), you can look at it as follows:

$ getpcaps $(pgrep nginx)

Here you will see that only the main thread has capabilities, and the sub thread and other workers do not have capabilities. This is because only the master needs special permissions, such as listening to the network port, and other threads only need to respond to requests.

setcap can be used to set the capabilities of the file. The syntax is as follows:

$ setcap CAP+set filename

For example, add a CAP_CHOWN and cap_ DAC_ To add override capabilities to the permitted and effective sets:

$ setcap CAP_CHOWN,CAP_DAC_OVERRIDE+ep file1

If you want to remove the capabilities of a file, you can use the - r parameter:

$ setcap -r filename

2. libcap-ng

The installation is also simple. Take CentOS as an example:

$ yum install libcap-ng-utils


Libcap ng uses the filecap command to manage the capabilities of files. There are several points to note:

  • When filecap is added or deleted or capabilities are viewed, the names of capabilities do not need to have a CAP_ Prefix (for example, use NET_ADMIN instead of CAP_NET_ADMIN);
  • filecap does not support relative paths, only absolute paths;
  • filecap does not allow you to specify the set of capabilities. Capabilities will only be added to the permitted and effective sets.

To view the capabilities of a file:

$ filecap /full/path/to/file

Recursively view the capabilities of all files in a directory:

$ filecap /full/path/to/dir

For example:

$ filecap /usr/bin

file                 capabilities
/usr/bin/newgidmap     setgid
/usr/bin/newuidmap     setuid

Note: filecap will only display files with "capabilities added to the permitted and effective sets". So ping and arping are not shown here.

Recursively view the capabilities of all files in the whole system:

$ filecap /
# or
$ filecap -a

The capabilities syntax of the settings file is as follows:

$ filecap /full/path/to/file cap_name

For example:

$ filecap /usr/bin/tac dac_override

Remove the capabilities of a file:

$ filecap /full/path/to/file none

3. Summary

This paper demonstrates how to manage the capabilities of executable files through two tools, and takes docker as an example to show No_ new_ The power of PRIVS. If conditions permit, we recommend that you try to use capabilities to replace the full root permission or set the SUID ID in the future.

4. References

Keywords: Linux Docker root

Added by SOL-ion on Wed, 09 Feb 2022 21:38:08 +0200