nginx proxy cache

Not too long ago, there was an initiative to secure the books and scientific papers of the Library Genesis project. It attracted a lot of new seeders and project contributors, however, I noticed that the daily database dumps were becoming slower and slower to download because of the increased traffic.

43kBit/s from origin server

I decided to try to contribute some of my bandwidth to the project by creating a mirror of the database dump files. The initial idea was to write a bash script that would periodically download all new dumps, clean up the old ones and somehow handle connection problems and duplicate files. I quickly realized that this solution could become a hassle to maintain so I opted for a simpler alternative.

Basic nginx setup

The following configuration is all that is needed to get a cache server up and running.

proxy_cache_path /files/.cache/ 
	levels=1:2 
	keys_zone=libgen_cache:1m
	max_size=90g inactive=72h use_temp_path=off;


location / {
	proxy_cache libgen_cache;
	
	proxy_ignore_headers X-Accel-Expires Expires Cache-Control;
	proxy_cache_valid any 168h;
	proxy_cache_revalidate on;
	
	add_header X-Cache-Status $upstream_cache_status;
	proxy_pass http://gen.lib.rus.ec/dbdumps/;
}

The proxy_cache_path statement initializes a 90 GB cache folder. Entries that are not accessed for more than 72h are periodically purged. See ngx_http_proxy_module for more details about all its options.

In the location block, we tell nginx to ignore the client’s headers and to consider all cached items valid for (an arbitrary value of) one week. After 168 hours, a file is considered stale, but it will still be served from cache if it wasn’t modified on the origin server (using the If-Modified-Since header).

The $upstream_cache_status variable tells the client if they’re downloading from the origin server or from the cache.

X-Cache-Status header

Download speed improvements

The initial download for libgen.rar took 3h12m (~310 kBit/s). When I re-downloaded the file immediately after, I was able to saturate my home connection and finish the download in 3 minutes, about 60 times faster!

You can find the cache server at lgmirror.simon987.net.

Connecting to lgmirror.simon987.net (lgmirror.simon987.net)|104.31.86.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3431679578 (3.2G) [application/x-rar-compressed]
Saving to: ‘libgen_2020-02-17.rar’
2020-02-20 18:16:22 (13.9 MB/s) - ‘libgen_2020-02-17.rar’ saved [3431679578/3431679578]

Limitations and workarounds

I noticed that the file listing has shortcuts pointing to the latest database dump. Unfortunately, due to the way nginx’s proxy_cache module works, both files would need to be pulled from the origin server, even if they are identical.

libgen.rar is symlinked to today's dump

Since I’m not aware of a way to create a HTTP redirect based on the current date, a workaround for now is to force users to use the *_yyyy-mm-dd.rar files.

location /libgen.rar {
	add_header Content-Type "text/plain;charset=UTF-8";
	return 200 'Please download libgen-mm-dd.rar instead.\nПожалуйста, скачайте libgen_yyyy-mm-dd.rar.\n';
}

Full configuration