Let’s say you bought a textbook and it comes with a code that lets you read its online version.

Of course that online version is tied to your account, it expires in 6 months and is not compatible with your tablet browser. So what are you gonna do? Hack together a script that takes screenshots of the pages? That’s not a bad idea, but first let’s see if we can get through the e-reader’s DRM.

After logging into the app, I immediately open the dev tools and this is what I see:

So, individual pdf pages are being read from this getpdfpage endpoint, rendered with a Javascript library and displayed in your browser every time you flip a page in the app.

This is what is sent to the endpoint:

globalbookid: "<hash>"
pdfpage: "<hash>.pdf"
iscover: "N"
authkey: "<hash>"
hsid: "<hash>"

Obviously, the globalbookid is the unique ID of the book I am looking at. pdfpage is the ID of the page, there is probably a way to get a list of those with another endpoint. iscover and authkey are self explanatory. So what exactly is that hsid parameter? From what I can see, it is different for every request.

Looking further, I find the getpagedetails endpoint, which does exactly what the name suggests:

/getpagedetails

Okay, so we have our authkey, the list of pdfpages, and we know the globalbookid. Let’s try to dig into the minified Javascript code to find out how the getpdfpage endpoint is called.

370931
370932
370933
370934
var o = "".concat(e.serverDetails, "/ebook/pdfplayer/getpdfpage?globalbookid=") + "".concat(e.globalBookId, "&pdfpage=").concat(t.pdfPath, "&iscover=N&authkey=").concat(r),
	i = o.replace("https", "http"),
	c = Object(s.c)(l.b.MD5_SECRET_KEY + i);
o = "".concat(o, "&hsid=").concat(c), n.pdfPath = o, a.bookPagesInfo.pages.push(n)

Interesting… So the query URL is built by concatenating the different parameters together as you would expect but then a part of the URL - everything but the mysterious hsid parameter - is put into a hash function and its result is the value of the hsid parameter.

Without even looking at the s.c function it is becoming more and more obvious that the value of hsid is an MD5 hash of the whole query URL, with l.b.MD5_SECRET_KEY as the salt.

MD5_SECRET_KEY hidden in plain sight

The secret code was hidden only a few keystrokes away into the source. Now that we have all the puzzle pieces, let’s hack together a simple Python script to automate the download process:

18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
def get_page(page):
	# Generate the 'hsid' verification hash
    verification = hashlib.md5(("%s%s/ebook/pdfplayer/getpdfpage?globalbookid=%s&pdfpage=%s&iscover=N&authkey=%s" 
							   % (MD5_SECRET, URL, BOOKID, page["pdfPath"], AUTHKEY)).encode()).hexdigest()

    r = requests.get("%s/ebook/pdfplayer/getpdfpage?globalbookid=%s&pdfpage=%s&iscover=N&authkey=%s&hsid=%s"
                     % (URL, BOOKID, page["pdfPath"], AUTHKEY, verification, ))

    print(r.status_code)

	# Write the raw pdf response to a file
    with open(BOOKID + "_" + str(page["pageOrder"]) + "_" + page["bookPageNumber"] + ".pdf", "wb") as out:
        out.write(r.content)

# To save time, I manually saved the content of /getpagedetails to file
with open("book.json") as f:
    for page in json.load(f)[0]["pdfPlayerPageInfoTOList"]:
        get_page(page)

To stitch the pages together, I used pdfunite:

pdfunite $(ls -v) output.pdf

Now even if you wanted, you couldn’t even buy a digital version of that book of that quality.