On Means of Organizing Web Discussions

Suppose you have a room filled with loose folders. There are too many of them for one person to manage in a reasonable time, but you feel that you need to organize them somehow. Your friend Hubert is there to offer advice. “Let’s organize all of the folders by date,” he says, “and put them into boxes, with the newest folders being in one box, and so on.”

You suppose that’s fine, you say (although you, not having an eidetic memory of when folders were last relevant in your life, wonder whether it wouldn’t make more sense to box them based on subject), but how should the boxes be labelled? “It couldn’t be simpler,” comes the reply. “The box with all the newest files is a 1. Then we add a number for each extra box. I guess we have 167 folders here, and we should be able to put 15 folders in a box, so that’s 11.13 boxes. We put down a 12 on the oldest box. We’re good to go.”

“Don’t we have to move the folders around every time that we want to add a new folder to the newest box?”

Hubert looks at you quizzically, adjusting his smudged glasses. “Of course. But you have to move a folder every time you update it, too. What’s the matter with that?”

“Don’t you think it might be a little hard to remember where each folder is, after it’s been rearranged a few dozen times?”

“I remember everything. But if you don’t want that, that’s fine. We can just do an ascending sort. Put all the oldest folders, in order, in box one. Now the newest folders will be in the box with the largest number. Does that suit you?”

“Not really! I don’t know anything about the boxes, other than that the lower numbers are old and the higher numbers are new. I don’t know how old they are or the subject of the topics… without looking at the folders inside one, I don’t have anything. And if I do remember where a few folders are, once something happens and they get updated, everything moves and I don’t even have that. These labels are useless.”

***

This is the essence of the difficulty with most online paging systems. Nevertheless, the system was taken for granted as early as webboards of the 1990s, and it must be older than that. I would guess that this basic concept was in use on BBSes, and probably before.

Why? As someone who implemented it myself, I speculate there are three plausible reasons.

1) It is pretty easy to program (even though it is also easy to get wrong, more on that in a minute).

2) It helps keep pages a uniform length. A short page may not be a problem, but 20+ items quickly become difficult for humans to read.

3) It avoids the possibility of running into a blank page: for instance, if you organized data by time periods, it may not be obvious in advance when you pick a month whether it will be empty or not. When a page necessarily consists of a known number of existing items, that problem is avoided.

So, in spite of my misgivings, I intended to use pages for organizing threads, but ran into a problem.

***

“Let’s try this out,” Hubert says. “Please select an offset that you wish your folder selection to begin with.”

“Let’s go with 150, I guess.”

“Excellent. Processing…” Reaching for the first box, Hubert begins to count off each folder inside. You unthinkingly count with him, until he finishes and reaches for the second box. Then you come to yourself.

“What in the world are you doing? These folders don’t have anything to do with the search.”

“On the contrary, they are essential,” Hubert explains, not turning around. “We can’t know that we are selecting folders from 150 on unless we have counted out 150 folders before that.”

“You’re telling me that if we had a million folders, and I asked you to start half-way through, you would count off five hundred thousand, just so you could give me 15 folders after that?”

“That’s correct,” acknowledges Hubert, without a hint of consternation.

You sigh. “We are going to have to rethink this filing system.”

***

Many common guides on how to use a database will mention the OFFSET keyword, giving you the impression that it is the solution to all of your paging problems. What they neglect to mention is that the process works just as described in the story. This inefficiency might be tolerable if there are never that many things to search for, but if you’re assuming that nobody is going to use your system, why make it in the first place?

There are means of getting around this, some of which are more or less suitable depending on the context.

***

“So we keep track of the location of the last file in the group, and then future searches can start after that point,” Hubert explains. “Now, please enter your query.”

“Please show me the first 15 files after 150.”

Hubert shakes his head. “This system doesn’t work that way. You have to ask for the first box first. Then, if you wish to look at the second box, and so on, I can oblige you.”

“So you still can’t get to 150 without counting off every file first?”

Hubert blinks owlishly. “Not really, but this is practically irrelevant. If those files were important to you, they would be earlier in the count, not later. The later the files are in the search, the less relevant they must be.”

“What if I wanted to read older files first?”

“You would specify that in your original search, so that the first box consists of the oldest files, and so on.”

“What about the files in the middle? It sounds like they’re difficult to get to either way.”

Hubert shrugs. “There are inefficiencies in any system.”

***

This system, which some call ‘cursor-based’ and is the primary method described in a MariaDB guide on pagination optimization, is fine in certain contexts. It is the essential method search engines use (in a context where the first results are assumed to be the most relevant). But as a means of pagination on a forum or a messaging system, it seemed lacking. The premise of a forum, as I see it, is that information is being organized, not just dumped, as it might be on postmodern social media.

I had already implemented what I called ‘chapters’ for thread pagination in my messaging system (the code for which is accessible in my archive of Middling Works). The fundamental idea of chapters is that organization comes from more manual, rather than automated, processes. So, users manually manage pages by defining the start and end points of each page. This is great for managing discussions (which may have several topic shifts, which can be split into different pages), forum games (which are likely to have ’rounds’, which can be split into chapters), and fiction writing (which naturally lend themselves to being divvied up, one thread chapter for each story chapter and its commentary). But although I felt this system was brilliant for organizing posts inside threads, using it for threads has problems.

The first issue was that I was designing this system for a private messenger. So each user would see different threads, and there would be a lot of database overhead in storing each and every user’s personal arrangements for their threads.

The second issue is that these kinds of chapters break sorting systems. I wanted to be able to organize threads by the times they were updated, as well as the times they were initially created, but chapters, as I conceived them in threads, assume an ascending sort by creation time.

Finally, even if the previous points were glossed over, there’s a problem in that users might not be able to distinguish a clear pattern in the types of threads that were being made (or last updated) in certain time periods. It might be obvious in hindsight that an era was more peaceful or more tumultuous than another, but then again, it might not. In the case of longstanding threads that are accessed across multiple generations of users, you might also have enough crossover that this kind of delineation is unhelpful.

So, I settled for organizing these threads by month and year, and patched up the ’empty month’ problem by employing a conditional search: if you struck an empty month, the system would show you the next older (and next newer) months that did have content in them, so you could easily access those.

There are some practical limitations that can come up here, although they weren’t likely to in my original context. My study of forums suggests that organizing a board ‘by month’ can lead to ponderously long lists in the case of something like a support board, which can have dozens or hundreds of queries in a month.

My inclination was to fall back on an ordinary paging system in this case, and that might be the best solution for a general discussion board. When it comes to support boards, it might be wiser to encourage a system where users navigate questions by tags or keywords rather than spatial organization.

* * *

You decide to put dates on the boxes.

“It’s a viable system, if you want that kind of thing,” Hubert acknowledges. “Now, how are you going to set up the file content search?”

“I think I will leave that problem to someone with a lot of free time.”

Leave a Reply

Your email address will not be published. Required fields are marked *