Wednesday, February 25, 2015

Darren on Blogger: Pagination for Blogger powered by jQuery

UPDATE: Replaced all uses of the pagelist class with pagelinks, as depending on the template used, Blogger may or may not use this class to manage its 'blog pages' i.e. tabs.

Some time ago I wrote a 6 part blog and I have very recently written another multi-part blog that has the potential to be quite long.

Once I have done the technical work for a blog post and start writing it up, if it becomes unwieldy, I split it into parts to make it more digestible for the reader.

I observe that different parts receive different levels of traffic, which I take to mean either some parts aren't all that interesting or, put another way, in the era of the internet age, readers seek out the information they want specifically and nothing else.

But for a while now, people who get a lot of traffic on their blog are choosing to monetize their published intellectual property, whether it be with AdSense or Amazon book lists or whatever.

I'm no SEO or click-stream expert, but obviously there's a big difference between an e-Commerce site and the blog of an individual containing all sorts of ramblings; you'd expect e-Commerce sites to be more cohesive, but blogs not necessarily so.

Whatever your online presence, you might find a trade-off between time spent on a page vs. bounce-rate.

Again, while not being an expert, I figured it might be desirable for bloggers to keep visitors on a single blog post for longer using content pagination, with the caveat that consolidating a multi-part blog into a single multi-page blog will likely improve visit times but worsen bounce-rate.

As an interesting exercise I thought I would have a go at implementing a JavaScript library to paginate blog posts.

For the implementation, jQuery seemed like an obvious choice, recognising its popularity and as the incumbent technology for doing such things, least not because the work that inspired me was written using jQuery also.

NB: I am trying to publish this library as a Blogger gadget, but am yet to succeed - watch this space!


Structure you posts like so:
  1. Leading and header content
  2. Start a page and each subsequent page with <h2 class="pagebreak">Page Title</h2>
  3. End your paginated content with <h2 class="pagelistpagelinks"></h2>
  4. Finsih off with trailing and footer content
NB: leading and trailing content will be visible always, regardless of which page you are on.

Pre-Requisites and Installation

Make sure you have jQuery installed; as mentioned, I will try to bundle this as a gadget, but jQuery needs to be installed separately in your blog's template - Google will tell you how.

Install a UUID function with the name uuid into jQuery so the paginator can use it like $.uuid()
For your convience, you can use this UUID function adapted here for jQuery, lifted from Kevin Hakanson's answer on StackOverflow; edit the layout of you blog and add a 'HTML/JavaScript' gadget at the bottom with the following content:

To install the BlogPaginator code, again edit the layout of you blog and add a 'HTML/JavaScript' gadget at the bottom with the following content:

Credit Where Credit is Due

Firstly let me give a shout to Harshit Jain, whose work largely inspired and seeded mine.

I liked what I saw there but thought it was a bit too involved: let's say the blogger is your somewhat tech-savy, retired grandmother posting baking recipes; do we really want her taking time out from baking to fiddle around with jQuery? Hell nah! #wewantcake

Also, the pages are statically formed and numbered, so adding new ones or restructuring becomes a chore. rinse and repeat for every blog post.

So I sought to make this more automatic with minimal first-time setup - in installing the library - and just imposing the need to write HTML.

The Approach

So I had two objectives initially:
  1. Allow bloggers to write new posts with pages added ad-hoc or worked into their document plan
  2. Allow bloggers to easily retro-fit long past posts or consolidate multi-part blog posts split instead across pages
It seemed intuitive to model a page as an item with content: old pre-existing blogs likely won't have such an obvious structure; while it is easy to plan a new post like this, you realise it isn't all that convenient when you start moving content and pages around - but we do not need to abandon this intuition just yet...

Having worked a lot with XML, in all forms including HTML, I know you basically have two ways to process a document:
  • With the DOM, which is built upfront and you use something like XPath to traverse and locate the elements you want
  • With a StAX parser, that processes a document tag (node) by tag, where start <div> and end </div> tags are two different events (if I had to guess, a DOM would be built this way)
The StAX idea is key here and, combined with document manipulation, allows us to separate the way content and pages are authored from how content and pages are rendered, that is, a structural transformation.

In other words, if you can drop in a 'bagebreak' marker that your processor can use to create a new page and swallow up all the content until the next pagebreak, implying the end of a page and the start of the next, then we are on our way to a solution.

This is easy, we just use a standard HTML tag; it can pretty much be anything, knowing we can replace it with something more useful and extract content - if it has any - and use it later.

For this I chose the h2 tag with a class value of pagebreak:
<h2 class="pagebreak">Page Title</h2>
The Page Title will be extracted and used to build a Table of Contents (ToC).

Using the <h2 /> tag this way feels a lot more natural and gets out the way of content authoring; you spend less effort thinking and implementing your pagination structure and more about what you want to write e.g. how she baked that cake!

Once all the content is authored with pagebreaks, we need another marker to indicate to the processor that there are no more pages and it needs to insert the click-able page navigation functionality. The same principles apply here, but we use a class value of pagelistpagelinks and note that any content is discarded:
<h2 class="pagelistpagelinks"></h2>

The Implementation Details

To implement this we don't actually use a StAX parser, we just simulate one using jQuery's .contents() method; it's important to use .contents() and not .children() as the former is one of few jQuery operations that process text-nodes, whereas the latter does not.

So when our pseudo-jQuery-StAX processor starts up, it creates a page for each pagebreak and some corresponding meta-data, until it sees a pagelistpagelinks; at this point it knows:
  1. Where the first page started
  2. How many pages there are
  3. The page title for each
  4. And where the last page ended, which is not necessarily the end of the blog post!
So we can generate the page navigation links after the last page, but we can also jump back and insert another set of page navigation links before the first page too; this is useful for long pages where, if the page is sufficiently introduced, the reader can decide early whether to keep reading or click ahead or back without needing to scroll to the links at the bottom of the paginated section.

Furthermore, with the page titles stored, we can jump back and insert a ToC, where the each page title links through to its page.

For usability, the implementation disables links to the page you are currently on and also provides 'Show All' links.

I have not (nor planned to) implement First, Previous, Next and Last semantics as I wouldn't expect a blog post to be that long - First and Last would be easy, Prev and Next a little less so.

And that's it... for a sunny day scenario at least.

What Happens When Multiple Posts Are Listed? Chaos!

The above all works fine when, say in a testing scenario, you are writing a web page and using the <body /> tag to demark content for pagination i.e. with a $("body") jQuery.

When in, pages must obey the rules like any other web-page, that is, the <body /> tag contains all the renderable content you see in the browser window, including all the chrome, navigation links, gadget content, adverts etc and when on you blog's landing page, all the (recent) posts listed too.

The effect, in theory, is you get cross-post pagination, in practice you get a mess and missing content.

There are two steps to counteract this:

First we must set apart different posts and demark pagination content by something other than the <body /> tag; after some inspection you'll see use a <div /> tag with a post-body class entry to contain each post i.e.
<div class="... post-body ..."></div>
We can grab this using a $("") jQuery.

Secondly, we need to make page links unique to a post so we don't get cross-post page linking.

This is easily done by introducing a UUID and associating that with a post and all its corresponding page links.

That's it, sunny day and rainy scenarios accounted for.

As always, feedback and comments are welcome.

Also check back for progress updates on having this published as a gadget.