Programmatically Creating a 500 Book Cover Collage

For the past few years, I’ve really gotten back into reading after a long hiatus. I used to be a voracious reader as a kid, but a long the way I got busy (swimming a ridiculous number of hours a week) and I just stopped.

But a bit more than three years ago, during the holiday break of 2015, I picked up a book I had got for Christmas. It really captured me in a way a book hadn’t in years. So when I started my co-op term that January I challenged myself to read 20 books that year as part of an office New Year’s resolution competition.

I ended up reading 70.

Since then reading and books has become such a big part of my life I’m not sure how I really lived during my hiatus.

Part of my success at reading has been: setting reading goals, carving out time for it (you have more than you think) and logging books I’ve read religiously.

I use a popular book cataloguing and social media site called Goodreads to do so. It lets you keep track of whether you’re on track to your yearly goals and also lets me keep track of my to read pile (which as of writing sits at a probably insurmountable number of 2240 books but I digress).

What this did was create a living history of all the books I have been reading in the last few years.

But then I got curious: what about all the books before I started actively cataloguing? What about all the hundreds of books I read as a kid? Could I catalogue them?

This isn’t something that happened overnight. Over the last year, I began to meticously comb through my parent’s houses for books I remember reading. I remembered back to each year of school and tried to name the books we had read for class. I searched through library catalogues to figure out which ones I remembered.

In the end, I’m pretty satisified with what I’ve recorded even if I forgot a few I’m pretty sure I’m close to the true number.

Now my Goodreads is a true life time library with all of my books lined up under one roof.

Playing Around With a Lifetime Library

There’s a number of ways we could play around with this data, so for something fun: could we create a poster of all the book covers I had read?

With some tinkering I figured it out (the result of which you can see above) but here’s a little guide of how you can too!

Steps

There are three components of this once we have your Goodreads library:

  1. Retrieve all the link locations of all the book covers
  2. Download all the images to my computer using this list of links
  3. Make a poster sized collage of these images

There’s a number of ways I could have done this, the most complex but robust way would be to create a Selenium bot to extract the links. A regular scraping method like Beautiful Soup wouldn’t have worked given that viewing a library in Goodreads is different when you’re not logged in and requires keyboard/mouse actions to navigate through. A Selenium scraper would be able to login and manipulate the browser but that seemed like more work than it was worth (although I do love using Selenium).

What I settled for was a bit hacky, and definitely less automated but still worked great.

I went to my bookshelf in my browser, turned on infinite scroll and scrolled to the bottom until all the books (and their covers) were loaded into memory.

Then I opened my developer tools and ran this tiny script in the console:

1
2
3
4
5
6
7
8
var cover_elements = document.querySelectorAll('*[id^="cover_review"]');

var links = [];
for (var i = 1; i < cover_links.length; i++){
links.push(cover_links[i].src);
}

console.log(links)

This printed out each indivudal link to all of the covers of the books in my library. I then just copied this output and placed it into a text file.

(Like I said hacky, but it works).

Downloading the Images

Now that I had a list of files, it was pretty simple to download them using Python and the urllib.request package in the standard library.

1
2
3
4
5
6
7
8
9
10
11
12
import urllib.request as ur 

raw_output = open("links.txt","r+").read()
raw_output = raw_output[1:-1].split('"')

links = [i for i in raw_output if i[:5] == 'https']

for i, link in enumerate(links):
image = ur.urlopen(link)
output = open("images/book" + str(i) + ".jpg","wb")
output.write(image.read())
output.close()

My download took a bit longer since I had over 500 books in my library but it should be decently quick.

Creating a Collage

So I spent some time googling on how to make a collage of hundreds of photos. Obviously most regular online collage services wouldn’t want you uploading hundreds of photos. I did find some really sketchy desktop software that claimed to do it but I didn’t want to go that route. I definitely didn’t want to try it manually. I figured there must be a way to do it programmatically.

With some digging, I found an actually really simple way to do something like this.

Open CV is an open source computer vision library that has a Python interface. Neccessary for many machine vision tasks is loading image data into a matrix, and so it comes with a great little function called imread().

So for example, I could load an image like so:

1
2
image = cv2.imread('images/book15.jpg', 1)
print(image)

The output would be:

1
2
3
4
5
6
7
8
9
10
11
12
[[[201 204 208]
[201 204 208]
[200 203 207]
...
[138 142 143]
[139 141 142]
[137 139 140]]

[[201 204 208]
[201 204 208]
[200 203 207]
...

Knowing that just each photo could be loaded in as a matrix, I could easily stack both horizontally and vertically all of the matrix forms of each image until I got one big collage.

Looping through all of the photos in the folder, loading each into a numpy matrix, then using np.hstack() or np.vstack() I ended up with an awesome poster!

The final result is below:

You can find all my code here.

Final Thoughts

I showed a few friends this, and someone suggested I build a web app for other Goodreads users to make their own posters which I might do eventually (if I figure out a less hacky way to retrieve the cover links).

Another thought I had was that when I hit 1000 books read I should download higher definition book covers and print out a poster to put up on my wall. That would be pretty awesome.

Anyway, you can let me know what you think on Twitter @jtloong or at joshua.t.loong@gmail.com

Comments powered by Talkyard.