Use Varnish to speed up your site

(1). Overview of Varnish

Varnish is a high performance open source HTTP accelerator that can effectively reduce the load on the web server and increase access speed.According to the official statement, Varnish is a cache-type HTTP reverse proxy.

Poul-Henning Kamp, author of Varnish, is one of FreeBSD's core developers and believes that computers today are much more complex than they were in 1975.In 1975, there were only two storage media: memory and hard disk.But the memory of the computer system now includes L1, L2 and even L3 caches in the cpu in addition to main memory.The hard disk also has its own caching device, so it is impossible for the squid cache's self-processing architecture for object replacement to know these things and optimize them, but the operating system can know them, so this part of the work should be handled by the operating system, which is the Varnish cache design architecture.

When Varnish is deployed, there will be some changes in the processing of web requests.The client's request will first be accepted by Varnish.Varnish will analyze the received request and forward it to the back-end web server.The back-end web server handles the request routinely and returns the result to Varnish in turn.

Varnish's capabilities are not limited to this.The core function of Varnish is to cache the results returned by the back-end web server. If a subsequent request is found to be the same, Varnish will not forward the request to the web server, but will return the results in the cache.This will effectively reduce the load on the web server, improve response speed, and respond to more requests per second.Another major reason Varnish is fast is that its caches are all in memory, which is much faster than on disk.Optimizations like these make Varnish faster than you might think.However, considering that memory is generally limited in the actual system, you need to manually configure the space limit for the cache and avoid caching duplicate content.

The order in which the cache is processed: Accept the request - Analyze the request (analyze your URL, analyze your header) -- hash calculation - find the cache - freshness detection - access source - cache - create response message - respond and log.

Listen on port 6081, manage process management, child/cache, official network https://www.varnish-cache.org/.

(2) Comparison of Varnish features with Squid

Varnish features:

Based on the memory cache, the data will disappear after restart.
I/O performance is good using virtual memory.
Setting the exact cache time in 0-60 seconds is supported.
VCL (full name varnish config language, Varnish's own domain specific language) is flexible in configuration management.
The cache file size on a 32-bit machine is up to 2G.
Has powerful administrative functions such as top, stat, admin, list, etc.
The state machine is cleverly designed and has a clear structure.
Use binary heap to manage cache files for active deletion.

Varnish vs. Squid:

Same:

Is a reverse proxy server;
Are open source software;

Varnish has advantages over Squid:

Varnish is highly stable, and Squid servers are more likely to fail when they do the same work load than Varnish, because Squid is often restarted.
Varnish has faster access. Varnish uses Visual Page Cache technology, which reads all cached data directly from memory while Squid reads from hard disk, so Varnish has faster access speed.
Varnish can support more concurrent connections because Varnish's TCP connections release faster than Squid's, so it can support more TCP connections in high concurrent connections.
Varnish cleans up partial caches in batches using regular expressions through a management port, which Squid cannot do;
Squid is a single process that uses single-core CPU s, but Varnish opens multiple processes for processing by fork, so it is reasonable to use all cores to process the corresponding requests.

Disadvantages of Varnish over Squid:

Varnish has higher CPU, I/O and memory overhead than Squid in high concurrency state.
Once a Varnish process is Hang, Crash, or restarted, the cached data is completely released from memory, and all requests are sent to the back-end server, which can be very stressful in high concurrency situations.
In Varnish usage, if a request from a single url is made through HA/F5 (load balancing) to a different varnish server each time, the requested varnish server will be penetrated to the back end, and the same request will be cached on multiple servers, which will also waste Varnish's cached resources and cause performance degradation.

(3). Install Varnish

1) Installation environment

youxi1. 192.168.1.6. Source Package Installation

youxi2. 192.168.1.7. yum Installation Instance, Web Backend

youxi3. 192.168.1.8. Web Backend

2) Installation

Source installation varnish6.2.0 on youxi1 (recommended)

//Install Dependent Packages
[root@youxi1 ~]# yum -y install make autoconf automake libedit-devel libtool ncurses-devel pcre-devel pkgconfig python3-docutils python3-sphinx graphviz
[root@youxi1 ~]# tar xf varnish-6.2.0.tgz -C /usr/local/src/
[root@youxi1 ~]# cd /usr/local/src/varnish-6.2.0/
[root@youxi1 varnish-6.2.0]# ./configure --prefix=/usr/local/varnish
[root@youxi1 varnish-6.2.0]# make && make install
[root@youxi1 varnish-6.2.0]# echo $?
0
[root@youxi1 varnish-6.2.0]# cd /usr/local/varnish/
[root@youxi1 varnish]# mkdir etc
[root@youxi1 varnish]# cp share/doc/varnish/example.vcl etc/default.vcl //Generate VCL configuration file

yum installation varnish on youxi2

[root@youxi2 ~]# vim /etc/yum.repos.d/varnishcache_varnish62.repo
[varnishcache_varnish62]
name=varnishcache_varnish62
baseurl=https://packagecloud.io/varnishcache/varnish62/el/7/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://packagecloud.io/varnishcache/varnish62/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
metadata_expire=300

[varnishcache_varnish62-source]
name=varnishcache_varnish62-source
baseurl=https://packagecloud.io/varnishcache/varnish62/el/7/SRPMS
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://packagecloud.io/varnishcache/varnish62/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
metadata_expire=300
[root@youxi2 ~]# yum clean all & & yum list //clear yum cache and regenerate
[root@youxi2 ~]# yum -y install varnish

3) Configure the Varnish cache site on youxi1 on youxi2

youxi1 Modify vcl Profile

[root@youxi1 ~]# vim /usr/local/varnish/etc/default.vcl
backend default {  //Rows 16-19
.host = "192.168.1.7";  //Modify the IP address of the Web back-end site
.port = "80";  //Modify the port number of the Web back-end site
}

sub vcl_deliver {  //Starting at line 35, cache hits
    if (obj.hits > 0) {
        set resp.http.X-Cache = "HIT cache";
    }
    else {
        set resp.http.X-Cache = "Miss cache";
    }
}

Configuring environment variables

[root@youxi1 ~]# vim /etc/profile.d/varnish.sh
export PATH=/usr/local/varnish/bin:/usr/local/varnish/sbin:$PATH
[root@youxi1 ~]# . /etc/profile.d/varnish.sh //load environment variables

Start Varnish

[root@youxi1 ~]# varnishd -a 192.168.1.6:80,HTTP -f /usr/local/varnish/etc/default.vcl
Debug: Version: varnish-6.2.0 revision b14a3d38dbe918ad50d3838b11aa596f42179b54
Debug: Platform: Linux,3.10.0-957.el7.x86_64,x86_64,-jnone,-sdefault,-sdefault,-hcritbit
Debug: Child (18374) Started
[root@youxi1 ~]# ps aux | grep varnishd 
root      18364  0.0  0.0  22188  1532 ?        SLs  22:59   0:00 varnishd -a 192.168.1.6:80,HTTP -f /usr/local/varnish/etc/default.vcl
root      18374  1.8  4.4 1029912 89468 ?       SLl  22:59   0:00 varnishd -a 192.168.1.6:80,HTTP -f /usr/local/varnish/etc/default.vcl
root      18593  0.0  0.0 112724   992 pts/0    S+   23:00   0:00 grep --color=auto varnishd
[root@youxi1 ~]# firewall-cmd --permanent --zone=public --add-port=80/tcp && firewall-cmd --reload
success
success

Build a test Web backend on youxi2

[root@youxi2 ~]# yum -y install httpd
[root@youxi2 ~]# echo youxi2 > /var/www/html/index.html
[root@youxi2 ~]# systemctl start httpd
[root@youxi2 ~]# firewall-cmd --permanent --zone=public --add-port=80/tcp && firewall-cmd --reload
success
success

* Final Test

Then use curl command to do cache hit test, -I option only takes http response header information, not web page content

[root@youxi1 ~]# Curl-I 192.168.1.7 //This is a direct access to youxi2
HTTP/1.1 200 OK
Date: Sun, 04 Aug 2019 15:14:16 GMT
Server: Apache/2.4.6 (CentOS)
Last-Modified: Sun, 04 Aug 2019 14:56:47 GMT
ETag: "7-58f4bccfca680"
Accept-Ranges: bytes
Content-Length: 7
Content-Type: text/html; charset=UTF-8

[root@youxi1 ~]# Curl-I 192.168.1.6 //First visit to youxi1
HTTP/1.1 200 OK
Date: Sun, 04 Aug 2019 15:14:19 GMT
Server: Apache/2.4.6 (CentOS)
Last-Modified: Sun, 04 Aug 2019 14:56:47 GMT
ETag: "7-58f4bccfca680"
Content-Length: 7
Content-Type: text/html; charset=UTF-8
X-Varnish: 12
Age: 0
Via: 1.1 varnish (Varnish/6.2)
X-Cache: Miss cache  //This time it was a miss
Accept-Ranges: bytes
Connection: keep-alive

[root@youxi1 ~]# Curl-I 192.168.1.6 //Second visit to youxi1
HTTP/1.1 200 OK
Date: Sun, 04 Aug 2019 15:16:39 GMT
Server: Apache/2.4.6 (CentOS)
Last-Modified: Sun, 04 Aug 2019 14:56:47 GMT
ETag: "7-58f4bccfca680"
Content-Length: 7
Content-Type: text/html; charset=UTF-8
X-Varnish: 15 32773
Age: 2
Via: 1.1 varnish (Varnish/6.2)
X-Cache: HIT cache  //This hit cached
Accept-Ranges: bytes
Connection: keep-alive

The cache time is short, you can try to configure the long link function of httpd (set KeepAlive On in the configuration file and restart)

4) Configure Varnish caching on youxi1 for multiple websites (youxi1, youxi2)

youxi1 Modify vcl Profile

[root@youxi1 ~]# vim /usr/local/varnish/etc/default.vcl
backend youxi2 {  //Change default to host name
    .host = "192.168.1.7";
    .port = "80";
}
backend youxi3 {  //Create one more
    .host = "192.168.1.8";
    .port = "80";
}

sub vcl_recv {  //Add in vcl_recv
    if (req.http.host ~ "^(www.)?you.cn"){  //regular expression matching
        set req.http.host = "www.you.cn";
        set req.backend_hint = youxi2;  //Point to youxi2 backend
    } elsif (req.http.host ~ "^bbs.you.cn") {  //regular expression matching
        set req.backend_hint = youxi3;  //Point to youxi3 backend
    }
}

To restart Varnish, you need to use the killall command to install the psmisc package

[root@youxi1 ~]# yum -y install psmisc
[root@youxi1 ~]# killall varnishd
[root@youxi1 ~]# varnishd -a 192.168.1.6:80,HTTP -f /usr/local/varnish/etc/default.vcl
Debug: Version: varnish-6.2.0 revision b14a3d38dbe918ad50d3838b11aa596f42179b54
Debug: Platform: Linux,3.10.0-957.el7.x86_64,x86_64,-jnone,-sdefault,-sdefault,-hcritbit
Debug: Child (19017) Started

Build a test Web backend on youxi3

[root@youxi3 ~]# yum -y install httpd
[root@youxi3 ~]# echo youxi3 > /var/www/html/index.html
[root@youxi3 ~]# systemctl start httpd
[root@youxi3 ~]# firewall-cmd --permanent --zone=public --add-port=80/tcp && firewall-cmd --reload
success
success

Edit/etc/hosts file on youxi1

[root@youxi1 ~]# vim /etc/hosts
192.168.1.6 www.you.cn
192.168.1.6 bbs.you.cn

- Testing

[root@youxi1 ~]# curl www.you.cn //First visit, you can see that youxi2 is pointing to
youxi2
[root@youxi1 ~]# Curl-I www.you.cn //Second visit, take only http response header, you can see hit cache
HTTP/1.1 200 OK
Date: Sun, 04 Aug 2019 16:09:19 GMT
Server: Apache/2.4.6 (CentOS)
Last-Modified: Sun, 04 Aug 2019 14:56:47 GMT
ETag: "7-58f4bccfca680"
Content-Length: 7
Content-Type: text/html; charset=UTF-8
X-Varnish: 5 32772
Age: 12
Via: 1.1 varnish (Varnish/6.2)
X-Cache: HIT cache  //Hit Cache
Accept-Ranges: bytes
Connection: keep-alive

[root@youxi1 ~]# curl bbs.you.cn // on first visit, you can see that you are pointing to youxi3
youxi3
[root@youxi1 ~]# Curl-I bbs.you.cn //second visit, only take http response header, you can see hit cache
HTTP/1.1 200 OK
Date: Sun, 04 Aug 2019 16:09:49 GMT
Server: Apache/2.4.6 (CentOS)
Last-Modified: Sun, 04 Aug 2019 16:07:43 GMT
ETag: "7-58f4ccaa0e583"
Content-Length: 7
Content-Type: text/html; charset=UTF-8
X-Varnish: 32774 8
Age: 6
Via: 1.1 varnish (Varnish/6.2)
X-Cache: HIT cache
Accept-Ranges: bytes
Connection: keep-alive

(4). Extension

1) Why use caching:

Accessed data is accessed again and hot data is accessed multiple times.

After a data access session, the nearest or closest client to a data user accesses it again.

2) Since you are caching and need to read at a high speed, the best way is to put it all in memory.

Common memory databases, memcached, redis, HANA.

But for pages, it's unrealistic to put them all in memory, memory + cached disk to store the cache.

Key-value, key in memory, value on disk.

3) One form of data: key value

Key: The result of hash calculation of access paths, URL s, specific features, stored in memory.

Value: Pages, what our users really get, are usually stored on high-speed hard drives.

4) Anything related to cache can not be separated from two components: memory and high-speed hard disk.

5) Common terms:

Hit: Data can be retrieved from the cache. If you are a web site, your cache server will be a front-end server.

Hit ratio: Hit count/(Hit count + Miss count).

Hot data: Frequently accessed data.

Memory cache space, disk cache space.

Clean up: periodically clean up, LRU (less commonly used, oldest type of data deletes it), periodically update (purge).

Cached objects: user information, cookies, transaction information, page memory, all understood as objects.

 

  

 

Reference resources: https://www.oschina.net/translate/speed-your-web-site-varnish?print

   http://book.51cto.com/art/201202/314855.htm

   https://blog.51cto.com/tetop/823904

Keywords: PHP yum curl Web Server firewall

Added by Jezz on Sun, 04 Aug 2019 19:42:02 +0300