Falvotech.
Conquering the world is easy — what do you do with it afterwards?

L I N K S


⇐ Unsuitable on Unsuitable: General Object Store

⇒ Spammer Alert: Likely Screen-Scraper Outfit Identified


Unsuitable on Unsuitable: Articles Table
Samuel A. Falvo II
kc5tja -at- arrl.net
2010 Jul 03 15:57 PDT

The simplest possible blogging approach involves simply editing a single HTML file over time. Among other disadvantages, one cannot automate generation of an RSS feed using this approach, and articles typically appear in one giant list. By storing articles in a database, the blog may organize the articles in more meaningful ways, depending on the view requested. Unsuitable depends on this capability to provide three views of articles: synopsis (what's seen on the homepage), detailed (what's seen when a user clicks on a synopsis title), and RSS feed.

1 Manual Blog Updates

Some people who work with Forth extensively, and who openly embraces the philosophy of minimalism it encourages, would probably express confusion over why I chose to write a whole application just to serve what looks like static content to the reader. Indeed, no overt need for blogging software exists, as Chuck Moore proves on his Haypress Creek and GreenArrays blogs. Just kick off the text editor and edit away.

Numerous disadvantages exist with this approach, however. I can think of the following:

  • No automated means of updating syndication feeds exists. You'll have to manually create, and later hand-edit, RSS and/or ATOM feeds yourself. Note that RSS has pretty wonky quotation rules which you must observe, should you wish to post HTML tags in your feed content (and, you really should, unless you intend on posting only single paragraphs of non-hyperlinked text). This means you'll need to expand every <tag> into &lt;tag&gt; yourself. If you have text which includes ampersands (like this paragraph), you'll need to doubly-escape those too: &amp;lt;tag&amp;gt;.
  • The visual layout of the document remains more or less fixed. CSS templates help immensely, but don't solve every layout problem. Additionally layout formats evolve over time, meaning you'll eventually add <div>-tags you never knew you needed in times past. As a result, even with CSS, you'll find yourself revisiting old content on every layout change.
  • You'll need to manually create a synopsis or index page. Those familiar with my website will remember my first attempt at creating such an index, containing content dating as far back as 2003, with the last major site revamp having occured in 2006: see my site's navigation index. Every time I added content to the website, I updated the navigation page. This publication process quickly became burdensome, resulting in an overall reduction in update frequency.

I created Unsuitable to help alleviate the tedium with making my thoughts known to the world. By consolidating the data needed to compose a typical article for reading into a more structured form than simple text files, Unsuitable actually adds value in several ways:

  • I don't have to manually update an index page. The RSS feed generator does this for me, automatically. External RSS readers, like the inestimable Google Reader, take on the responsibility of aggregating my entire history for me. I needn't provide it myself (but I might do this in the future for those visiting the site for the first time; I'll cross that bridge when I get there). As an additional benefit, my friends with RSS readers will automatically see new content when I post it. I don't have to send e-mails to them asking if they've seen the latest article.
  • The blog updates article links for me, automatically. As a result, Google's web-crawler automatically finds all new content. I take no action to ensure inter-article link integrity. I simply do a Google search for site:falvotech.com stuff sought here and things Just Work.
  • The blog takes care of encapsulating the content in whatever layout I choose. As I write this entry, I intend the layout to resemble a clean-looking academic publication. Tomorrow, I might choose differently. If I do, all articles update in kind, with very little effort on my part.

Of course, other reasons exist for writing my own instead of using an out-of-the-box solution, but discussion of those reasons fall outside the scope of this article.

2 Articles Schema

To offer multiple views of articles to different audiences, Unsuitable records all content in a database. Blobs of text appear in the GOS, while handles thereto appear in the articles relation (table). The relation records the minimum information necessary for rendering a detailed view of an article:

articleId title lead body timestamp
1033 Unsuitable on Unsuitable: General Object Store <p>Out of the box, Forth lacks support for persistent text strings. A blog depends heavily on the concept of persistent string data to store its articles. The General Object Store was written specifically to address this problem by providing the persistent string abstract data type. Since the data type exhibits persistency as a core feature, many parts of the blog software exhibits structural simplicity over competing blog engines written in more mainstream languages, for they must marshall data to/from SQL constructs.</p> <h1>1 General Object Store</h1> … etc. 2010 June 27, … etc.
1034 Unsuitable on Unsuitable: Articles Table <p>The simplest possible blogging approach involves simply editing a single HTML file over time. Among other disadvantages, one cannot automate generation of an RSS feed using this approach, and articles typically appear in one giant list. By storing articles in a database, the blog may organize the articles in more meaningful ways, depending on the view requested. Unsuitable depends on this capability to provide three views of articles: synopsis (what's seen on the homepage), detailed (what's seen when a user clicks on a synopsis title), and RSS feed.</p> <h1>2 Articles Schema</h1> … etc. 2010 July 1, … etc.

The articleId attribute stores each article's unique identifying integer. I started numbering Unsuitable's articles at 1000 to leave sufficient room for articles imported from my older Serendipity-based blog. When bringing up articles directly on the web browser, the article ID appears in the URL (e.g., http://www.falvotech.com/blog2/blog.fs/articles/1033).

The title, lead, and body attributes all hold GOS handles to text relevant to the article. For example, if you were debugging Unsuitable and needed to double-check a title, you might enter 1033 articleById! title gob! get, to retrieve article 1033's title text (in this case, Unsuitable on Unsuitable: General Object Store).

Note that not all articles provide a body field. If your article does not require the formality of an MLA- or APA-style publication with an abstract, the entire article content may appear in the lead. In such cases, the body field will contain a GOS handle of -1. Otherwise, the lead maps directly to the article's abstract, while the body references the remainder of the document. For examples of either kind, refer to http://www.falvotech.com/blog2/blog.fs/articles/1030 for an article with only a lead, and http://www.falvotech.com/blog2/blog.fs/articles/1033 for an article with both a lead and body. We'll get more into the distinction between leads and bodies when we come to how Unsuitable renders pages. For now, just remember that Unsuitable requires titles and leads for all articles, but it permits body omissions.

The timestamp attribute documents when the author published the article — more precisely, the time of database row creation. This attribute assumes a 32-bit or wider native cell size, for cells store several bit fields, as follows:

31 20 19 16 15 11 10 6 5 0
year month day hour minutes

The arrangement of the subfields permits Forth to compare timestamps in a meaningful way using ordinary (unsigned) arithmetic comparison operators, such as U< or U>=. The ranges of each subfield matches the eponymously named results from ANSI Forth's TIME&DATE word1. See the definition of now in the time.fs listing.

The astute reader will observe that no author nor email field exists. Since only I may submit articles for publication to this blog, Unsuitable hardcodes the author information when rendering the page. Anyone desiring a copy of Unsuitable for their own use must remember to adjust this setting. A number of ways exist to change this behavior, of course, all requiring fairly simple coding changes to Unsuitable.

3 Code Walk-Through

The astonishingly small amount of logic found in the articles.fs listing explains the exceptionally brief walk-through that follows. We start with the usual accessors needed to populate and query the articles database:

variable arn
: article     arn @ ;
: article!    arn ! ;
: articleId   articleIds arn @ + @f ;
: articleId!  articleIds arn @ + !f ;
: title       titles arn @ + @f ;
: title!      titles arn @ + !f ;
: lead        leads arn @ + @f ;
: lead!       leads arn @ + !f ;
: body        bodies arn @ + @f ;
: body!       bodies arn @ + !f ;
: timestamp   timestamps arn @ + @f ;
: timestamp!  timestamps arn @ + !f ;

As usual, for any getter name N, N! provides the corresponding setter.

The arn variable records the currently referenced database row. If you know the row handle already, you may update this variable using article! directly. Most of the time, however, you'll want to scan through the database for an article given its human-facing ID. The articleById! procedure offers this service:

: -found           ." Content-type: text/plain" cr cr ." Article ID " drop . ." doesn't exist." bye ;
: -eoi             dup [ articleIds /afields + ] literal u>= if -found r> drop then ;
: found            articleIds - article! drop ;
: -=               2dup @f = if found r> drop then ;
: articleWithId!   articleIds begin -eoi -= cell+ again ;

If a given article does not exist, an error message, expressed in HTML format, will appear to the user, either via the interactive console or through the requested web page. In this way, illegal IDs cannot compromise the blog installation. Per the DItI pattern, we have confidence that the appropriate article has been found if the word returns to the caller.

Unlike the GOS, the articles relation fails to record the number of valid rows. At the time I wrote this code, I decided the extra expense of maintaining a fencepost for articles made no sense, particularly since, according to commonly understood relational theory, rows can appear in any order in a relation without changing its meaning. Instead, using a fixed-size array for the articleId attribute, we employ a first-fit row allocation mechanism.

: available   articleIds /afields over + begin 2dup < while over
              @f -1 = if drop articleIds - exit then swap cell+
              swap repeat abort" Out of article records" ;

available identifies unused article rows by checking if articleId equals -1. Hence, if you make an error in posting an article, or want to take an article down from the blog, you would locate the article, like so: 1032 articleById! -1 articleId!.

Note that available returns a direct handle to the next available row; it remains the responsibility of the caller to update the articleId field correctly. Thankfully, the util.fs utilities listing contains the code to perform this safely:

: id        a-nextId @ articleId! 1 a-nextId +! update ;
: t         put gob title! ;
: l         S" lead.txt" slurp-file put gob lead! ;
: b         -1 body! ;
: import    available article! now timestamp! id t l b ;

Hopefully, the reader should immediately see that the import operation populates an article row. I typically invoke this utility like so (I type the bold-face text; the computer responds with plain text):

$ vim lead.txt
...edit text and save...
$ gforth util.fs
redefined place  redefined part  redefined Body with body  redefined
available  redefined mo  redefined dy  redefined hr  redefined mn
redefined .m  Gforth 0.6.2, Copyright (C) 1995-2003 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
S" Unsuitable on Unsuitable: Foreward" import flush bye

However, it only populates the bare minimum number of fields — it fails to account for the possibility of a body segment (it hardcodes the field to -1). If the article you're posting has a body, you'll need to use the w/body qualifier:

: w/body    S" body.txt" slurp-file put gob body! ;

Like so:

$ vim lead.txt
...edit text and save...
$ vim body.txt
...edit text and save...
$ gforth util.fs
redefined place  redefined part  redefined Body with body  redefined
available  redefined mo  redefined dy  redefined hr  redefined mn
redefined .m  Gforth 0.6.2, Copyright (C) 1995-2003 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
S" Unsuitable on Unsuitable: Foreward" import w/body flush bye

I've toyed with the idea of making a web-accessible means of posting articles. However, at this time, I remain unmotivated to offer such a solution. While not particularly difficult, I don't feel like dealing with the security implications this raises. Logging in to my computer via ssh allows me to re-use the security features already present in my Linux installation. Using this system for just about a year, I find the lack of a web-accessible means of posting articles merely a minor inconvenience.

4 What's Next

In this article, I described the articles relation, and some tools for working with it. This concludes discussions on the lowest levels of blog operation and implementation. Sometime next week, I intend on discussing how the blog works from the web server's perspective — the CGI interface blog.fs! Stay tuned.

5 Complete Source to articles.fs

variable arn
: article     arn @ ;
: article!    arn ! ;
: articleId   articleIds arn @ + @f ;
: articleId!  articleIds arn @ + !f ;
: title       titles arn @ + @f ;
: title!      titles arn @ + !f ;
: lead        leads arn @ + @f ;
: lead!       leads arn @ + !f ;
: body        bodies arn @ + @f ;
: body!       bodies arn @ + !f ;
: timestamp   timestamps arn @ + @f ;
: timestamp!  timestamps arn @ + !f ;

: -found           ." Content-type: text/plain" cr cr ." Article ID " drop . ." doesn't exist." bye ;
: -eoi             dup [ articleIds /afields + ] literal u>= if -found r> drop then ;
: found            articleIds - article! drop ;
: -=               2dup @f = if found r> drop then ;
: articleWithId!   articleIds begin -eoi -= cell+ again ;

: available   articleIds /afields over + begin 2dup < while over
              @f -1 = if drop articleIds - exit then swap cell+
              swap repeat abort" Out of article records" ;

6 Complete Source to time.fs

: mo          4 lshift or ;
: dy          5 lshift or ;
: hr          5 lshift or ;
: mn          6 lshift or ;
: pack        mo dy hr mn nip ;
: now         time&date pack ;

: yr          20 rshift ;
: mo          16 rshift 15 and ;
: dy          11 rshift 31 and ;
: hr          6 rshift 31 and ;
: mn          63 and ;

: months      S"    JanFebMarAprMayJunJulAugSepOctNovDec" drop ;
: .y          yr . ;
: .m          mo 3 * months + 3 type space ;
: .d          dy s>d <# # # #> type space ;
: .ymd        dup .y dup .m .d ;
: .dmy        dup .d dup .m .y ;
: .h          hr s>d <# # # #> type ;
: .m          mn s>d <# # # #> type ;
: .hm         dup .h [char] : emit .m ;
: .time       dup .ymd .hm ."  PDT" ;
: .time822    dup .dmy .hm ."  PDT" ;

7 Complete Source to util.fs

require mappings.fs
require general.fs
require articles.fs
require time.fs

: id        a-nextId @ articleId! 1 a-nextId +! update ;
: t         put gob title! ;
: l         S" lead.txt" slurp-file put gob lead! ;
: b         -1 body! ;
: import    available article! now timestamp! id t l b ;
: w/body    S" body.txt" slurp-file put gob body! ;
: hide      articleWithId! -1 articleId! update ;

1  On a 32-bit Forth system, the 11-bit wide reservation for the year field implies that Unsuitable suffers from a year-2047 bug. With the software as currently written, articles submitted after December 31, 2047 will appear on or after January 1, year 0. This will break RSS generation, as you'll see in a future walk-through of RSS feed generation. Fixing this bug involves a one-line fix to the yr definition in time.fs. However, I'll likely have died of old age by the time this happens, so I'm not particularly motivated to fix this problem.