PDF Splitting
One of the things that I find myself needing to do every couple years is split and recombine PDF files. In this case, I'm playing in a Pathfinder game that starts this week. I own PDFs of the rulebooks, and I wanted to do the following:
- Pull out the sections that I was using from across three books (class, race, skills, some spells, etc.)
- Reorder and combine these into a single, smaller PDF
- Load the combined PDF onto my tablet (allows quicker browsing, eliminates the need to switch between PDFs)
There are lots of ways to split and merge PDFs (according to the internet anyway), but as it turned out the PDFs that I had a "psuedo-DRM" that disallowed page splitting. Thankfully I finally found a way around all that.
I ended up using The PDF Toolkit (pdftk), which seems like very nice software, except for one thing -- it still respects the OwnerPassword and whatever flag says I can't pull subsets of pages out. However, I found a submitted patch (that wasn't accepted into my Debian distribution) here: https://bugs.launchpad.net/ubuntu/+source/pdftk/+bug/127389.
The actual patch file is located here: https://launchpadlibrarian.net/8541628/pdftk-1.12_user_pw.patch
And in case it gets moved/removed, here is the change:
diff -ur -x '*.o' -x '*.a' -x '*.class' -x tags -x '*.h' -x pdftk pdftk-1.12.orig/java_libs/com/lowagie/text/pdf/PdfReader.java pdftk-1.12/java_libs/com/lowagie/text/pdf/PdfReader.java --- pdftk-1.12.orig/java_libs/com/lowagie/text/pdf/PdfReader.java 2004-10-23 02:22:44.000000000 +0200 +++ pdftk-1.12/java_libs/com/lowagie/text/pdf/PdfReader.java 2006-03-28 21:15:37.000000000 +0200 @@ -107,7 +107,7 @@ protected char pdfVersion; protected PdfEncryption decrypt; protected byte password[] = null; //added by ujihara for decryption - protected boolean passwordIsOwner= false; // added by ssteward + protected boolean passwordIsOwner= true; protected ArrayList strings = new ArrayList(); protected boolean sharedStreams = true; protected boolean consolidateNamedDestinations = false;
Thanks gsauthof, whoever you are! I applied the patch, rebuilt pdftk from source (semi-confusing instructions, make sure you have gcj installed. Also, it doesn't hurt to install pdftk from aptitude just to get the prereqs, and then uninstall only the pdftk package -- I don't know how to do this, so I uninstalled it, looked at all the other "unused" packages it uninstalled, and then manually installed them back again).
Grab a page or subset of pages
pdftk Pathfinder\ Roleplaying\ Game\ -\ Core\ Rulebook.pdf cat 25 output Half-Elves.pdf
pdftk Pathfinder\ Roleplaying\ Game\ -\ Advanced\ Players\ Guide.pdf cat 55-65 output summoner.pdf
Recombine the pages into a pdf
pdftk summoner.pdf Half-Elves.pdf cat output My-Summoner.pdf
You can actually do these in a single step with some fancy pdftk options, but this was more straightforward to me (and you can double check the extraction, since the page numbers are not necessarily clear in PDFs).
Victory!
