Caching is one of the most important functions performed by proxy servers particularly in a corporate environment. This is especially relevant when the network has internet connectivity to the desktop, caching is important to help reduce the amount of traffic generated from accessing the web.
If you look at the logs of any corporate network and analyse which external websites are being visited you’ll normally find that a large percentage of traffic is generated to a small number of sites. News and social media sites if not blocked will often be accessed repeatedly, which means multiple requests for the same information. Using a proxy server to cache these pages locally can vastly reduce the amount of network traffic generated by these requests.
For example in the UK you may find that a popular website like the BBC is generating hundreds of requests for the news pages. If you enable on demand caching on a proxy server, when the first page is requested the proxy will store a copy of that page locally. When the proxy receives the next request for the same page it will provide the cached copy from it’s store and will not need to visit the web page. This means that no external traffic will be generated in this example and the amount of external bandwidth used will be heavily reduced.
This is called on-demand caching and it means that the web server/proxy only stores documents which are requested by a client. The server will not attempt to store other pages from that server, only those which are specifically requested by the client browser. This also helps you filter traffic which is not appropriate for example if someone was using a VPN to stream Netflix to their desktop.
In bigger organisations although proxies configured with caching can dramatically decrease network traffic, one is rarely enough. However it obviously makes little sense to have duplicate proxies all caching the same external pages. The question then is how to distribute this data efficiently within the network and to stop any individual proxy from being overloaded. One of the most common models used in this scenario is that of the replication model, which involves the server mirroring or replicating it’s content to other servers in the network.
John Soames, Working Netflix VPN, Cromer Press, 2015