Searchable PDF annotations: Automating conversion to Skim notes

Many academics now read journal articles on screen, as PDF files. Holdouts—and they are everywhere—print out forest-sized stacks of paper that teeter on crowded desks.

Freedom from clutter is just one advantage of digital reading. Another is searchability: a large database of articles is a crutch for our fallible memories. Thanks to Spotlight, a full-text keyword query can substitute for a laborious hunt for literature. A few carefully phrased searches is all it takes for instant recall.

Spotlight has a maddening flaw, however. Standard PDF annotations—both “sticky notes” and text “typed” on a page—are not indexed. As far as Spotlight is concerned, they do not exist.

For me this is no trivial problem. More often than not, these annotations contain the very text I’m searching for.

Papers, my reference manager of choice, has a partial solution. The software has its own annotation feature built-in. Though sticky notes created in Papers don’t get indexed by Spotlight, they are searchable with Papers’ own search function. Still, I often want to search PDF notes that aren’t annotated in Papers. I’m also wary of relying too heavily on Papers’ annotations since the feature is still a little buggy.

Thankfully there’s a better solution: Skim, the open source PDF software for Mac. Skim is designed for academics, integrates with LaTeX and BibDesk, and has great AppleScript support.

Skim is also Spotlight-friendly. The app stores its annotations in a separate file (with a .skim extension) that Spotlight happily indexes. And it’s easy to convert the Adobe-style annotations used by Preview, PDFpen and most other PDF software into Skim notes. Select “Convert Notes…” under the File menu, or assign a keyboard shortcut.

I now designate Skim as my default PDF editor in Papers. (In the “Paper” menu, scroll down to the “Open PDF with…” option. If you want Papers to default to Skim, rather than open a new tab in Papers, go to Preferences, select the “Papers” pane, and select Skim in the “Open PDF Files” drop-down.)

My problem was mostly solved, but what about all those PDFs already annotated using the standard tools? And then there were the PDFs annotated on the iPad. All of the best PDF apps on iOS—including iAnnotate, GoodReader, PDFpen for iPad, and (my favorite) PDF Expert—use standard, Adobe-style annotation tools. It takes too much time to manually convert each and every old or iPad-derived PDF to Skim formatting.

There’s an obvious need here for automation, but unfortunately no simple solution. The problem is that there’s no easy way to automatically detect that a PDF contains annotations (and therefore needs converting to Skim). None of the PDF metadata registers that a PDF has been annotated, so there’s no way to trigger an automated conversion.

Be warned that the solution I hit upon is fiddly. It works, but only justifies the set up if you have a large batch of PDFs to convert and/or if you frequently take notes on your iPad. The solution involves Hazel, the indispensable Mac automation tool.

A post on Hazel’s forums (by AppleSuperlatives) suggested using grep, the Unix plain-text search utility, for a related issue (detecting whether a PDF had been OCRed). The basic insight was that PDFs are, at core, binary files, which grep can search.

The next step was to isolate a snippet of text that gets added to a PDF’s binary when annotations exist. To do this I opened two PDFs—identical, except that one had annotations—in a text editor and saved them as text files. Next, I used the superb file-comparison tool Kaleidoscope to call out the differences between the two files. The annotations, in this case, had been made in Preview, and I found and copied some text (“Annot /T”) that gets generated in every annotation. This would be my trigger.

I set up a new Hazel rule to monitor my main Papers folder. (For basics on Hazel, see the tutorials and tips on the developer’s site.) The updated version of Hazel has a few new “if” conditions, including “Passes shell script.”

Skim-Hazel-1

I embedded a short script to detect if the “Annot /T” phrase appeared; if so, the Hazel action would trigger.

Skim-Hazel-2

Just in case I added a second condition, to ensure that the matched file is a PDF.

Skim-Hazel-3

The next step was to ask Hazel to convert the standard-annotation PDF to Skim. First I prompted Hazel to open the file in Skim, and then added a short, embedded AppleScript.

Skim-Hazel-4

I also added a separate Hazel rule, to run the conversion rule on all the subfolders in my Papers directory.

Skim-Hazel-5

Soon all of my old, Preview-generated annotated PDFs converted to Skim. I encountered a problem, though, with the PDFs I annotated in PDF Expert for iPad: the “Annot /T” did not appear. So I used Kaleidoscope to find another phrase that appeared in PDF Expert-generated PDFs: “/Name /Comment”. I then modified the Hazel rule, using a nested condition—so that either “Annot /T” or “/Name /Comment” would trigger Hazel.

Skim-Hazel-6

Now the articles I annotate on my iPad get converted too, once they’re back on my Mac. Of course if you use another iOS app like iAnnotate, you’ll need to locate a phrase and modify your Hazel rule accordingly.

It’s a hassle, for sure, but worth it. Now all my notes exist for Spotlight again. Memory restored.

About these ads

About Jeff Pooley

Jeff Pooley is associate professor of media & communication at Muhlenberg College, in Allentown, PA.
This entry was posted in Annotation, Automation, Tags and folders and tagged , , , , , . Bookmark the permalink.

19 Responses to Searchable PDF annotations: Automating conversion to Skim notes

  1. asaflab says:

    You can also use Skim’s handy Convert Annotations menu command to convert existing annotations entered in other PDF readers

    Like

  2. Penny says:

    have you heard of writepdf? I wanted to give this one a try and it doesnt seem to bad at all

    Like

    • Jeff Pooley says:

      WritePDF for iOS looks strong, definitely. Feature rich at least, though PDF Expert has nearly all those features (including full-text search). Exceptions seem to be 1. Conversion to PDF (from, eg, Word file) and 2. Extended printing options (though this requires background app on Mac).

      Like

  3. Roy says:

    I’ve been making annotations with iannotate on my ipad, emailing the pdf to a sendtodropbox address through the app which extracts all highlights and comments into text of the email. The sendtodropbox extracts the text from the email into a separate txt file, puts the now annotated pdf in a folder that’s named after the subject line of my email, and I’m all done. I get a folder with two files, one the annotated pdf, the second, a txt file containing the text of my highlights and comments–which is, of course, searchable via spotlight. This is working nicely at the start as I can keep a clean copy of the pdf as well as my annotated one; but I’m unsure that it’s a long term solution as I’m relying on a webservice that might close at any time and I don’t think this is that viable with more than a few hundred pdfs.

    Like

  4. byin says:

    I have a question, though it’s not directly related to the main point of your post.

    Right now I am mainly using Adobe Professional to annotate my pdf articles. I sometimes do convert the Adobe annotations to Skim notes because I want to be able to use the Export notes function of Skim.

    But what I find is that, after the note conversion, when I re-open the pdf file in Adobe, the annotations can no longer be seen in Adobe Does the conversion process make the annotation disappear or invisible to Adobe Professional? Is there a way to keep annotations visible for both Adobe and Skim?

    Thanks

    Like

    • Jeff Pooley says:

      You can always re-”embed” the notes, Adobe-style, to open in Adobe, Preview, PDFpen, etc, by choosing “Export” from the File menu, and selecting “PDF With Embedded Notes” (default). That should do the trick.

      Like

  5. Great post. I have the same problem. I initially only used papers annotations but changed to skim to be able to edit annotations. Then I create some tags in the annotations (with a hashtag #) and the export to taskpaper if I need to use the paper for some specific research.
    Your workflow seems great to convert a large amount of notes (though I’m gradually doing it manually). Great!
    Now do you keep reading and annotating on the iPad? Is there any easier solution for the iPad workflow?
    And how to do you manage the iPad mac/papers workflow?
    Thanks a lot
    And great solution!

    Like

    • Jeff Pooley says:

      Cool idea with the within-annotation tags. On the iPad I use PDF Expert, which employs the standard, Adobe-style annotations. I have my entire Papers folder–including nested folders of PDFs on–on Dropbox, though normally I import PDFs from the iOS Papers app. Once I’ve annotated a pdf, I move it to a Dropbox folder that my Mac sweeps using Hazel into my Mac inbox. After that, I normally replace the PDF in Papers for Mac with the newly annotated one, checking off the “read” box in the process. The script I describe in the post then converts the annotations to Skim notes. The iOS to Mac workflow, to recap, isn’t pretty. But at least one step, conversion to Skim notes, is now automated.

      Like

      • Karl Kemp-O'Brien says:

        If you use iAnnotate it supports two-way dropbox sync so your annotations/highlights etc are already present in your copy in your Papers library. You still have to do the convert skim notes bit but it’s one less step for you and less duplicate PDFs on your system.

        Like

  6. Januz says:

    Thanks for the instructions. Strangely, although I am using PDFExpert on the iPad as you do, your search string didn’t work for me (at least not for an annotated file without any comments). I diffed the files and “/Annots” seems to be a search string that is able do differentiate the annotated and the original file, it also seems to work with files annotated with Preview. The bad thing is that it is still contained even if I remove the annotations in PDFExpert again…

    Do you have any thoughts on that? Thanks, J

    Like

  7. Januz says:

    … please ignore my former comment – “Annots” doesn’t seem to be a good search term, it is contained in most of my PDFs (not the ones I first tested with).

    Leaves the question: Do you know why your search term doesn’t work for me? Does it work for you with files that are only annotated not commented with PDFExpert?

    Thanks,

    J.

    Like

  8. Mark says:

    This is a great thread, but really shows the next holy grail of academic workflows that needs to be found. Skim is a fantastic tool for note taking, but the lack of an iOS PDF reader that knows how to handle its notes is a real downfall. I keep my PDF library on dropbox and I have to use Skimnotes so I can maintain edits across multiple Macs…I really hope an iOS developer cracks this!

    Like

  9. Rory says:

    First off, thanks for a great article! I’m salivating at the awesome power of automation!

    I’m with you, Mark. I use Goodreader and Skim to read through journal articles, synced with Dropbox. But, once I hit the ‘convert notes’ button in Skim, all my annotations disappear on the iPad. I’d love to find a way to have both.

    Like

  10. Raff says:

    An “advanced find” in Adobe Reader will search within the full text of multiple PDF file, including annotations. It’s a bit slower than spotlight though, despite building its own search index.

    Like

  11. ivotron says:

    I think the main problem Skim’s resistance to follow the general PDF-annotation-flow. Every other app handles annotations in the “right” way (embedding them in the file). This should be the default in Skim and, on top of it, it could optionally apply its own annotation mechanism (why does converting annotations have to erase the original ones?). The current “exclusive OR” way in which it handles this (either Skim’s or Adobe’s way but not both) makes it a non-viable alternative for some people having the traditional workflow in mind, especially, as noted in the article and comments, for people annotating PDFs on their tablets.

    Like

    • Jeff Pooley says:

      True, although Spotlight doesn’t index traditional Adobe-style PDF annotating– and that’s the big issue for me. Half the value of making annotations is that they be searchable years later.

      Like

  12. On a similar topic, you can automate the exporting of your Skim notes. I’ve written a script that exports all of your annotations into Markdown formatted plain text and embeds hyperlinks to the individual pages of your source PDF (if you use my custom URL scheme, or DEVONthink). For anyone interested, you can check out my blog at hackademic.postach.io

    Like

  13. makuezue says:

    This is amazing and I’m considering adopting this method. However, since Papers 3 has been released, they’ve given us an option to synchronize our libraries to Dropbox for easier access between the Mac and iOS versions of the app. I say all of that to say that the library is stored on Dropbox so that it’s not an actual, browsable folder from within Finder, but a kind of archive!

    So I wonder if this method will still work given that the script you’ve written here depends on the folder(s) being open and browsable? I’ll keep my eye on this (and perhaps try playing around with it…)

    Like

  14. Marc says:

    Thank you for this information. But there is a slightly better way of extracting the notes from a PDF annotated with a program other than skim. Instead of automating Skim with applescript you can run the command line tool that comes with Skim called skimpdf. So for example instead of the applescript would have the following shell script.

    ~/bin/skimpdf unembed “$1″

    Have a look at the following, for where to locate skimpdf

    http://sourceforge.net/apps/mediawiki/skim-app/index.php?title=SkimPDF_Tool

    Hope it helps someone

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s