Installation of common packages
# yum -y install coreutils glib2 lrzsz mpstat dstat sysstat e4fsprogs xfsprogs ntp readline-devel zlib-devel openssl-devel pam-devel libxml2-devel libxslt-devel python-devel tcl-devel gcc make smartmontools flex bison perl-devel perl-ExtUtils* openldap-devel jadetex openjade bzip2
Configuring OS Kernel Parameters
1. sysctl
Note that some parameters are configured according to memory size (explained)
The meaning is detailed.
"Operating System Kernel Parameters that DBA Can't Know"
# vi /etc/sysctl.conf # add by digoal.zhou fs.aio-max-nr = 1048576 fs.file-max = 76724600 kernel.core_pattern= /data01/corefiles/core_%e_%u_%t_%s.%p # / Data 01/corefiles was built beforehand with 777 privileges. If it is a soft link, the corresponding directory should be changed to 777. kernel.sem = 4096 2147483647 2147483646 512000 # The semaphores, ipcs-l or-u view, need 17 semaphores for each group of 16 processes. kernel.shmall = 107374182 # Increase or decrease restrictions on all shared memory segments (80% of recommended memory) kernel.shmmax = 274877906944 # Maximum single shared memory segment size (recommended half of memory). Version 9.2 has significantly reduced the use of shared memory kernel.shmmni = 819200 # How many shared memory segments can be generated altogether, and at least two shared memory segments per PG database cluster net.core.netdev_max_backlog = 10000 net.core.rmem_default = 262144 # The default setting of the socket receive buffer in bytes. net.core.rmem_max = 4194304 # The maximum receive socket buffer size in bytes net.core.wmem_default = 262144 # The default setting (in bytes) of the socket send buffer. net.core.wmem_max = 4194304 # The maximum send socket buffer size in bytes. net.core.somaxconn = 4096 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_keepalive_intvl = 20 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_time = 60 net.ipv4.tcp_mem = 8388608 12582912 16777216 net.ipv4.tcp_fin_timeout = 5 net.ipv4.tcp_synack_retries = 2 net.ipv4.tcp_syncookies = 1 # Open SYN Cookies. When SYN waiting queue overflow occurs, cookie s are enabled to prevent a small number of SYN attacks. net.ipv4.tcp_timestamps = 1 # Reduce time_wait net.ipv4.tcp_tw_recycle = 0 # If = 1, the rapid recovery of TIME-WAIT sockets in TCP connections is turned on, but the NAT environment may cause the connection to fail. It is recommended that the server close it. net.ipv4.tcp_tw_reuse = 1 # Open reuse. Allow TIME-WAIT sockets to be reused for new TCP connections net.ipv4.tcp_max_tw_buckets = 262144 net.ipv4.tcp_rmem = 8192 87380 16777216 net.ipv4.tcp_wmem = 8192 65536 16777216 net.nf_conntrack_max = 1200000 net.netfilter.nf_conntrack_max = 1200000 vm.dirty_background_bytes = 409600000 # When the dirty pages of the system reach this value, the pdflush (or other) scheduling process automatically brushes the dirty pages to disk (dirty_expire_centisecs/100) seconds ago. vm.dirty_expire_centisecs = 3000 # Dirty pages older than this will be brushed to disk. 3000 means 30 seconds. vm.dirty_ratio = 95 # If the system process brushes dirty pages too slowly, making the system dirty pages more than 95% of memory, then the user process needs to actively brush out the system dirty pages if it has the operation of writing to disk (such as fsync, fdatasync, etc.). # Effectively prevent user processes from brushing dirty pages. It is very effective in the case of single machine with multiple instances and using CGROUP to restrict single instance IOPS. vm.dirty_writeback_centisecs = 100 # The wake-up interval for a pdflush (or other) backstage dirty page brushing process is 100 for one second. vm.mmap_min_addr = 65536 vm.overcommit_memory = 0 # When allocating memory, a small amount of over malloc is allowed. If set to 1, there is always enough memory, and the test environment with less memory can use 1. vm.overcommit_ratio = 90 # When overcommit_memory = 2, it is used to participate in calculating the allowable allocated memory size. vm.swappiness = 0 # Close switching partitions vm.zone_reclaim_mode = 0 # Disable numa, or in vmlinux. net.ipv4.ip_local_port_range = 40000 65535 # Locally Automated TCP, UDP Port Number Range fs.nr_open=20480000 # The upper limit of file handle allowed to open by a single process # Note the following parameters # vm.extra_free_kbytes = 4096000 # vm.min_free_kbytes = 2097152 # If it is a small memory machine, the above two values are not recommended. # vm.nr_hugepages = 66536 # It is recommended to use large page size / proc / meminfo Huge page size when shared buffer settings exceed 64GB # vm.lowmem_reserve_ratio = 1 1 1 # For memory larger than 64G, it is recommended to set it, otherwise the default value is 256 256 32.
2. Effective configuration
sysctl -p
Configuring OS resource constraints
# vi /etc/security/limits.conf # If nofile exceeds 1048 576, you must first set fs.nr_open of sysctl to a larger value and then continue to set nofile after it takes effect. * soft nofile 1024000 * hard nofile 1024000 * soft nproc unlimited * hard nproc unlimited * soft core unlimited * hard core unlimited * soft memlock unlimited * hard memlock unlimited
It's better to focus on the file contents in the / etc/security/limits.d directory, which overrides the configurations of limits.conf.
For ulimit of existing processes, see / proc/pid/limits, for example
Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 10485760 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 11286 11286 processes Max open files 1024 4096 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 11286 11286 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us
If you want to start other processes, it is recommended to exit SHELL and go in again to confirm that the ulimit environment configuration is in effect and restart.
Configuring OS Firewall
(Suggested by business scenario settings, I'll clear it here first)
iptables -F
Configuration example
# Private network segment -A INPUT -s 192.168.0.0/16 -j ACCEPT -A INPUT -s 10.0.0.0/8 -j ACCEPT -A INPUT -s 172.16.0.0/16 -j ACCEPT
selinux
If there is no need for this, it is recommended to disable it.
# vi /etc/sysconfig/selinux SELINUX=disabled SELINUXTYPE=targeted
Turn off unnecessary OS services
chkconfig --list|grep on Turn off unnecessary, for example chkconfig iscsi off
Deployment of file systems
Pay attention to SSD alignment, prolong life and avoid writing enlargement.
parted -s /dev/sda mklabel gpt parted -s /dev/sda mkpart primary 1MiB 100%
Formatting (if you choose ext4)
mkfs.ext4 /dev/sda1 -m 0 -O extent,uninit_bg -E lazy_itable_init=1 -T largefile -L u01
The recommended ext4 mount option
# vi /etc/fstab LABEL=u01 /u01 ext4 defaults,noatime,nodiratime,nodelalloc,barrier=0,data=writeback 0 0 # mkdir /u01 # mount -a
Why do you need data=writeback?
It is recommended that pg_xlog be placed in a separate IOPS block device with good performance.
Set SSD Disk Scheduling to deadline
If it's not SSD, CFQ is still used, otherwise DEADLINE is recommended.
Temporary settings (such as sda disks)
echo deadline > /sys/block/sda/queue/scheduler
Permanent settings
Editing grub file to modify block device scheduling strategy
vi /boot/grub.conf elevator=deadline
Note that if there are both mechanical disks and SSD s, you can use / etc/rc.local to modify the specified disk to the corresponding scheduling strategy.
Close transparent pages, numa
Add the previous default IO scheduling, as follows
vi /boot/grub.conf elevator=deadline numa=off transparent_hugepage=never
Compiler
It is recommended to use a newer compiler to install gcc 6.2.0 reference
PostgreSQL clang vs gcc Compilation
If installed, it can be distributed to different machines.
cd ~ tar -jxvf gcc6.2.0.tar.bz2 tar -jxvf python2.7.12.tar.bz2 # vi /etc/ld.so.conf /home/digoal/gcc6.2.0/lib /home/digoal/gcc6.2.0/lib64 /home/digoal/python2.7.12/lib # ldconfig
environment variable
# vi ~/env_pg.sh export PS1="$USER@`/bin/hostname -s`-> " export PGPORT=$1 export PGDATA=/$2/digoal/pg_root$PGPORT export LANG=en_US.utf8 export PGHOME=/home/digoal/pgsql9.6 export LD_LIBRARY_PATH=/home/digoal/gcc6.2.0/lib:/home/digoal/gcc6.2.0/lib64:/home/digoal/python2.7.12/lib:$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH export PATH=/home/digoal/gcc6.2.0/bin:/home/digoal/python2.7.12/bin:/home/digoal/cmake3.6.3/bin:$PGHOME/bin:$PATH:. export DATE=`date +"%Y%m%d%H%M"` export MANPATH=$PGHOME/share/man:$MANPATH export PGHOST=$PGDATA export PGUSER=postgres export PGDATABASE=postgres alias rm='rm -i' alias ll='ls -lh' unalias vi
icc, clang
If you want to compile PostgreSQL using ICC or clang, please refer to
Compiling PostgreSQL with intel Compiler icc
PostgreSQL clang vs gcc Compilation
Compile PostgreSQL
Recommended use of NAMED_POSIX_SEMAPHORES
src/backend/port/posix_sema.c create sem : named : mySem = sem_open(semname, O_CREAT | O_EXCL, (mode_t) IPCProtection, (unsigned) 1); unamed : /* * PosixSemaphoreCreate * * Attempt to create a new unnamed semaphore. */ static void PosixSemaphoreCreate(sem_t * sem) { if (sem_init(sem, 1, 1) < 0) elog(FATAL, "sem_init failed: %m"); } remove sem : #ifdef USE_NAMED_POSIX_SEMAPHORES /* Got to use sem_close for named semaphores */ if (sem_close(sem) < 0) elog(LOG, "sem_close failed: %m"); #else /* Got to use sem_destroy for unnamed semaphores */ if (sem_destroy(sem) < 0) elog(LOG, "sem_destroy failed: %m"); #endif
Compile item
. ~/env_pg.sh 1921 u01 cd postgresql-9.6.1 export USE_NAMED_POSIX_SEMAPHORES=1 LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O3 -flto" ./configure --prefix=/home/digoal/pgsql9.6 LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O3 -flto" make world -j 64 LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O3 -flto" make install-world
If you are a development environment and need to debug, it is recommended to compile this way.
cd postgresql-9.6.1 export USE_NAMED_POSIX_SEMAPHORES=1 LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O0 -flto -g -ggdb -fno-omit-frame-pointer" ./configure --prefix=/home/digoal/pgsql9.6 --enable-cassert LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O0 -flto -g -ggdb -fno-omit-frame-pointer" make world -j 64 LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O0 -flto -g -ggdb -fno-omit-frame-pointer" make install-world
Initialize database cluster
pg_xlog is recommended for the best partition in IOPS.
. ~/env_pg.sh 1921 u01 initdb -D $PGDATA -E UTF8 --locale=C -U postgres -X /u02/digoal/pg_xlog$PGPORT
Configure postgresql.conf
Take PostgreSQL 9.6, 512G memory host as an example
Best to the end of the file can be repeated with the end as a valid value. $ vi postgresql.conf listen_addresses = '0.0.0.0' port = 1921 max_connections = 5000 unix_socket_directories = '.' tcp_keepalives_idle = 60 tcp_keepalives_interval = 10 tcp_keepalives_count = 10 shared_buffers = 128GB # 1/4 Host Memory maintenance_work_mem = 2GB # Min (2G, (1/4 host memory)/autovacuum_max_workers) dynamic_shared_memory_type = posix vacuum_cost_delay = 0 bgwriter_delay = 10ms bgwriter_lru_maxpages = 1000 bgwriter_lru_multiplier = 10.0 bgwriter_flush_after = 0 # IO is a good machine, no need to consider smooth scheduling max_worker_processes = 128 max_parallel_workers_per_gather = 0 # If parallel queries are required, set to greater than 1 and do not recommend exceeding host cores-2 old_snapshot_threshold = -1 backend_flush_after = 0 # IO good machine, do not need to consider smooth scheduling, otherwise recommend 128 ~ 256 kB wal_level = replica synchronous_commit = off full_page_writes = on # Block devices that support atomic writing over BLOCK_SIZE can be turned off after alignment. Or the file system that supports cow can be shut down. wal_buffers = 1GB # min( 2047MB, shared_buffers/32 ) = 512MB wal_writer_delay = 10ms wal_writer_flush_after = 0 # IO good machine, do not need to consider smooth scheduling, otherwise recommend 128 ~ 256 kB checkpoint_timeout = 30min # Frequent checkpoints are not recommended, otherwise XLOG will generate a lot of FULL PAGE WRITE(when full_page_writes=on). max_wal_size = 256GB # The recommendation is twice that of SHARED BUFFER min_wal_size = 64GB # max_wal_size/4 checkpoint_completion_target = 0.05 # When the hard disk is good, the checkpoint can be quickly terminated and the consistent state can be quickly reached when the hard disk is restored. Otherwise, recommendations are 0.5-0.9 checkpoint_flush_after = 0 # IO good machine, do not need to consider smooth scheduling, otherwise recommend 128 ~ 256 kB archive_mode = on archive_command = '/bin/date' # Later revisions, such as'test! - f/disk1/digoal/arch/%f & & cp%p/disk1/digoal/arch/%f' max_wal_senders = 8 random_page_cost = 1.3 # IO is a good machine without considering the cost differences between discrete and sequential scans parallel_tuple_cost = 0 parallel_setup_cost = 0 min_parallel_relation_size = 0 effective_cache_size = 300GB # Watch and do, deduct the session connection RSS, shared buffer, autovacuum worker, and the rest is the ACHE available to OS. force_parallel_mode = off log_destination = 'csvlog' logging_collector = on log_truncate_on_rotation = on log_checkpoints = on log_connections = on log_disconnections = on log_error_verbosity = verbose log_timezone = 'PRC' vacuum_defer_cleanup_age = 0 hot_standby_feedback = off # It is recommended to close down in order to avoid the expansion of the main repository caused by the warehouse manager's transaction. max_standby_archive_delay = 300s max_standby_streaming_delay = 300s autovacuum = on log_autovacuum_min_duration = 0 autovacuum_max_workers = 16 # CPU core, and IO good case, can be more, but note that 16*autovacuum mem, will consume more memory, so memory also has a foundation. autovacuum_naptime = 45s # It is recommended that the frequency should not be too high, otherwise there will be more XLOG due to vacuum. autovacuum_vacuum_scale_factor = 0.1 autovacuum_analyze_scale_factor = 0.1 autovacuum_freeze_max_age = 1600000000 autovacuum_multixact_freeze_max_age = 1600000000 vacuum_freeze_table_age = 1500000000 vacuum_multixact_freeze_table_age = 1500000000 datestyle = 'iso, mdy' timezone = 'PRC' lc_messages = 'C' lc_monetary = 'C' lc_numeric = 'C' lc_time = 'C' default_text_search_config = 'pg_catalog.english' shared_preload_libraries='pg_stat_statements' ## If your database has many very small files (such as hundreds of thousands of tables, indexes, etc.), and each table will be accessed, it is recommended that FD set up more files to avoid the need for the process to open and close files. ## But don't be larger than ulimit-n (open files) set by the system in the previous section. max_files_per_process=655360
Configure pg_hba.conf
Avoid unnecessary access and open access. It is recommended that password access be used.
$ vi pg_hba.conf host replication xx 0.0.0.0/0 md5 # Stream replication host all postgres 0.0.0.0/0 reject # Deny Super User Logon from Network host all all 0.0.0.0/0 md5 # Other users login
Start the database
pg_ctl start