|
Unsuitable on Unsuitable: Index Generation
Samuel A. Falvo II
kc5tja -at- arrl.net 2010 Jul 17 14:01 PDT
Many blogs exhibit index pages which resemble 1 Introduction
Blog site designers regularly overload the casual blog reader with distracting, confusing, and sometimes deliberately misleading information; for an example, visit the homepage of the Technorati site. As I write this article, the site includes, among other things, five push advertisements, four independent horizontal menu bars (three on top of the page, one at the very bottom), a plurality of conflicting update features, including recent updates, related articles, Today on Technorati, top five rising blogs, top five visited blogs, latest posts Unsuitable chooses a more spartan look, focusing on the material readers want. Unsuitable's delivery of material all but guarantees a rapid and pleasant user experience. The page loads markedly faster1, and the browser remains unburdened while the user views the page. While viewing the Technorati page linked above, for example, my browser took 2.5 seconds to re-render the page after switching away from the window and switching back. On a 2.8GHz dual-core processor equipped with 2GB of RAM and a mid-grade nVidia video card, I find this response time utterly unacceptable. A Commodore 64 updated its bitmapped screen in about 2.5 seconds. A 64-bit, multi-gigabyte machine with no less than a 201 482× speed advantage2 should do much, much better. Visiting Falvotech's Unsuitable page shows one, and only one, feature on the site: the latest five leads (abstracts). The newest articles appear towards the top, minimizing the user's need to scroll far for new content. As I post more articles, older articles disappear from the bottom of the page. The top-right corner of the page contains a very noticable RSS icon, which links to the RSS feed of the blog. The icon size minimizes the chances of mouse over-shoot when attempting to copy its link address into an RSS reader, while not over-bearing the user. Many RSS readers auto-detects the RSS link. This article explains how Unsuitable renders the index page. The process involves several moving parts, so I will cover each separately. 2 Code Walk-Through for m-index.fsThe blog invokes m-index.fs when rendering the index page. Because the web server delegated full authority for page rendering to Unsuitable, it must communicate with the browser that it will send an HTML response back. ." Content-type: text/html" cr cr Next, we need to provide the HTML response to our web client by reading a template file from disk, and expanding macros contained within it. variable s variable end here s ! S" theme/index.html" slurp here end ! include response.fs The s variable marks the start of a text buffer, while end marks the end of the same. slurp reads in the entire contents of a file, and places it here. The response.fs file contains the logic necessary to parse this buffer, expanding macros as it proceeds, and finally outputs the result to the client. You may notice that we use the Forth dictionary to hold the file contents. Per ANSI's definition, here points to the next unallocated byte of memory therein. The following illustration shows the dictionary state prior to reading in the file:
+-------------------------------------------------------------+
|///////////////////| |
|///////////////////| |
+-------------------------------------------------------------+
^
|
here --+
After slurp finishes, the dictionary state looks like this:
+-------------------------------------------------------------+
|///////////////////|...file...| |
|///////////////////|...data...| |
+-------------------------------------------------------------+
^
|
here --+
Since Forth only keeps track of here, we use s and end to record the template data boundaries, for as we'll see later, here will advance again as a result of template expansion. A template's supported macros require definition prior to expansion of the template. In the case of the index template, we define last-five-posts as a template macro which produces our familiar list of five articles. require latest-5.fs : last-five-posts scan latest ; When we've finished with the template, we terminate the process. bye 3 Code Walk-Through for response.fsIncluding response.fs generates a response for the client's web browser. Unsuitable generates this response in two steps: first, it expands the template into a raw HTML string, and then it types it to stdout. The web server, which launched Unsuitable as a CGI handler, captures all output to stdout, forwarding it verbatim to the client. variable response here response ! expand response @ here over - type Expansion of the source text, in this case our template, occurs character-by-character. Unsuitable identifies macros to expand via the tilde (~) escape character. All other characters appear in the output, verbatim. : c s @ c@ ; : -eos s @ end @ < ; : ch c [char] ~ over xor if c, else drop macro then 1 s +! ; : expand begin -eos while ch repeat ; When Unsuitable identifies a candidate macro name, we advance beyond the tilde and parse the name, per normal Forth parsing rules: any string of characters terminated by a non-graphic character (e.g., whitespace or control characters). We then locate the Forth definition with this name, and execute it immediately. : -eon c 32 > -eos and ; : name s @ begin -eon while 1 s +! repeat s @ over - ; : macro 1 s +! name sfind if execute then ; Gforth's sfind provides a more friendly interface to ANSI's find word; otherwise, they behave identically. An obvious security risk exists here: attempting to invoke an arbitrary macro name may result in execution of arbitrary Forth words. Hence, a malicious user may interject ~bye in a template to cause an abnormal termination of Unsuitable. This obviously affects site availability. However, if they can edit a template file, they can also edit the raw source code to Unsuitable, potentially affecting reliability and serviceability as well. Unsuitable operates under tight security preconditions, including a properly secured HTTP server and remote console access. I think I made a reasonable trade-off here, for even if I offered tighter security, malicious users can still easily acquire access to the source code and invalidate my efforts. Unsuitable, therefore, trusts the user to write templates responsibly. 4 Code Walk-Through of latest-5.fsReading the definition of latest-five-posts, we see that it first invokes scan, which creates the set of five posts to showcase in the index output. I discuss scan in the next section. After building the set of posts to include, latest generates the corresponding HTML. : l5 e0 e e e e e drop ; : latest ['] l5 S\" <div class=\"blogArticleIndex\">" div ; The div word wraps some functionality of macro expansion in a <div> tag. The opening tags all differ, so it expects it as a parameter. The closing tags resemble each other, so it's factored into div. : div respond execute s" </div>" respond ; respond behaves similarly to type, except that it defers output to a temporary buffer. At this point, the response to the HTTP client remains unfinished, so we cannot just type it. Although l5 indicates we always scan through five articles, beginning at address e0 (discussed below), a fresh blog may have fewer articles. e conditionally generates an entry in the response. : e dup @ -1 xor if dup @ article! entry then cell+ ; When rendered to HTML, all entries consist of a title, a timestamp indicating when the article appeared, and a copy of the lead. : entry .title .timestamp .lead ; The lead, title, and timestamp all contain their own <div> tags. : t title .goblink ; : .title ['] t S\" <div class=\"blogArticleIndexTitle\">" div ; : >web ['] respond is type ['] c, is emit ; : >con ['] (type) is type ['] (emit) is emit ; : t >web timestamp .time ." — Samuel A. Falvo II" >con ; : .timestamp ['] t S\" <div class=\"blogArticleIndexTimestamp\">" div ; : c >web ." (continued...)" >con ; : l lead .gob body 1+ if ['] c S\" <div class=\"blogArticleIndexContinued\">" div then ; : .lead ['] l S\" <div class=\"blogArticleIndexLead\">" div ; The reader will observe that I hard-wire the author of all articles. As indicated in a previous article, I reject external article contributions due to the overhead required. I choose, instead, to ssh to my computer, and inject articles from command-line. Since only I have the credentials to my server, it follows only I can post articles. Thus, I author them all, and no reason yet exists to identify the author in the articles table.
Earlier, I mentioned the behavior of respond mimicks that of type, but stuffs its output into a temporary buffer. here points to the next available location in that buffer. Moreover, we learned earlier that get, retrieved the contents of a GOS object at here. It follows that : .gob gob! get, ; For the purposes of printing a title, however, we want to wrap our GOS object in a link to the relevant article. : .goblink S\" <a href=\"/blog2/blog.fs/articles/" respond articleId
s>d <# #s #> respond S\" \">" respond .gob S" </a>" respond ;
5 Code Walk-Through of latest.fsAt the top of latest-5.fs, we find the following chunk of seemingly obscure code. create e0 5 cells allot e0 5 cells -1 fill e0 4 cells + constant en-1 en-1 cell+ constant en variable ep include latest.fs Code such as this parameterizes the latest.fs instance included into the program. latest.fs exists to provide the n most recent blog articles in the database. I parameterize it this way because m-index.fs requires five of the latest posts, while the RSS feed requires sixteen. The e0 pointer refers to the first cell of the sort buffer. en-1 refers to the en-1th slot. en refers to the enth slot. Hence, the total memory consumed by the sort buffer spans the range [e0, en). Variable ep provides operating storage for the sort implementation.
I concede I don't recall why I chose the variable name e; thinking in terms of buffer entries helps. However, like I said at the start of this series, I never said this code was terribly pretty. Now that we understand how to parameterize latest.fs, we can now understand how scan works. At the highest level, we perform a full-table scan on the articles table, while using the above-configured buffer to perform an insertion sort as we go along. : scan articleIds dup /afields + begin 2dup < while
over articleIds - article! consider swap cell+ swap repeat 2drop ;
For each row in the article table, we need to consider its eligibility for sorting. Recall that you may identify empty article table rows by the presence of -1 in the articleId field. Clearly, we do not wish to waste sorting effort on non-existent rows. : consider articleId -1 xor if sort then ; During the sorting procedure, we always start at the start of the array and work our way forward, as buffer space permits. Only one of four possible events may occur while looping through the sort buffer. : sort e0 ep ! begin eol nil Te<Tr Te>=Tr again ; If we reach the end of the sort buffer, eol will break out of the loop. : eol ep @ en >= if r> drop then ; If we find an unused sort buffer slot, we'll just assign the current article ID there. : nil ep @ @ -1 = if article ep @ ! r> drop then ; If we find an existing article ID which was posted earlier than the article we're considering, e.g., Te<Tr, we need to vacate the array at that position, and store the article ID in the newly formed hole. This will eject the article identified in slot en-1. : insert ep @ dup cell+ over en-1 swap - move ;
: Te<Tr article ep @ @ article! timestamp swap article! timestamp <
if insert article ep @ ! r> drop then ;
Otherwise, the considered article must exceed the age of the article in the sort buffer slot, so we try to find another slot by considering the next. Note that we might not have any place to put it, which eol handles. : Te>=Tr [ 1 cells ] literal ep +! ; 6 Obvious EnhancementsLooking at the implementation of scan and the articles table, one can identify a ready-made optimization: let the articles table implementation provide words to access the latest n articles by maintaining an index into the articles table. If we update the index upon article submission, we can guarantee synchronization between the index and the article table at all times. I considered implementing the software this way for a long time. I chose not to persue this approach, however, because the blog worked good enough. Considering the articles table size, the overwhelming majority of work occurs inside the CPU's cache, even with the brute force methods I use. Implementing more sophisticated data structures not only makes the code larger (longer to compile; remember Unsuitable compiles on every page hit!), it makes the code harder to learn (just look at how much effort it took to write this page), harder to debug (more moving parts), and it makes the code slower due to increased cache spills. Understand that indices hold great value in expiditing data access. However, like all things, their benefits come at some cost. Unsuitable's articles table proves large enough to hold several years worth of postings, given my current article publication rate, yet it still consumes only 10KiB of storage space. Although scan currently provides a lack-luster Ω(mn) performance, the constant factor for each iteration proves quite small in practice. A b-tree or skip-list index offers Ω(log2 n) performance, apparently quite an improvement, but with a larger constant factor. While there exists a threshold where a large enough body of articles gives the performance advantage to using an index, at present I find the simpler implementation wins on several accounts: fewer cache fetches, smaller code size, and easier maintenance. 7 What's NextI decided that the depth-first traversal of the m-index.fs's source code offered the greatest opportunity for understanding to the reader. I would like to look at the RSS feed generation in the next article. I promise a shorter article for the elucidation of m-rss.fs, for it re-uses many of the parts found here. See you then. 8 Source Listing for m-index.fs#! /usr/bin/env gforth ." Content-type: text/html" cr cr require mappings.fs require general.fs require articles.fs require slurp.fs require time.fs require respond.fs require latest-5.fs : last-five-posts scan latest ; variable s variable end here s ! S" theme/index.html" slurp here end ! include response.fs bye 9 Source Listing for slurp.fsvariable h : open r/o open-file throw h ! ; : close h @ close-file throw ; : grab begin here 65536 h @ read-file throw dup allot 0= until ; : slurp open grab close ; 10 Source Listing for response.fs: c s @ c@ ; : -eos s @ end @ < ; : -eon c 32 > -eos and ; : name s @ begin -eon while 1 s +! repeat s @ over - ; : macro 1 s +! name sfind if execute then ; : ch c [char] ~ over xor if c, else drop macro then 1 s +! ; : expand begin -eos while ch repeat ; variable response here response ! expand response @ here over - type 11 Source Listing for template/index.html<html>
<head>
<title>
Sam's All New, New and Improved, New News news news
</title>
<link rel="stylesheet" href="/blog2/theme/css.css" />
<link rel="alternate" type="application/rss+xml" title="RSS" href="/blog2/blog.fs/rss">
</head>
<body>
<table>
<tr>
<td width="100%" align="left" valign="top">
<div class="blogHead">
Falvotech.
</div>
<div class="blogSubhead">
Conquering the world is easy — what do you do with it afterwards?
</div>
</td>
<td width="48" align="right" valign="top">
<a href="/blog2/blog.fs/rss"><img width="48" height="48" src="/blog2/theme/rss48.png" border="0" /></a>
</td>
</tr>
</table>
<hr />
<div id="blogLastFivePosts">
~last-five-posts
</div>
</body>
</html>
12 Source Listing of latest-5.fscreate e0 5 cells allot
e0 5 cells -1 fill
e0 4 cells + constant en-1
en-1 cell+ constant en
variable ep
include latest.fs
: .gob gob! get, ;
: .goblink S\" <a href=\"/blog2/blog.fs/articles/" respond articleId
s>d <# #s #> respond S\" \">" respond .gob S" </a>" respond ;
: div respond execute s" </div>" respond ;
: t title .goblink ;
: .title ['] t S\" <div class=\"blogArticleIndexTitle\">" div ;
: >web ['] respond is type ['] c, is emit ;
: >con ['] (type) is type ['] (emit) is emit ;
: t >web timestamp .time ." — Samuel A. Falvo II" >con ;
: .timestamp ['] t S\" <div class=\"blogArticleIndexTimestamp\">" div ;
: c >web ." (continued...)" >con ;
: l lead .gob body 1+ if ['] c S\" <div class=\"blogArticleIndexContinued\">" div then ;
: .lead ['] l S\" <div class=\"blogArticleIndexLead\">" div ;
: entry .title .timestamp .lead ;
: e dup @ -1 xor if dup @ article! entry then cell+ ;
: l5 e0 e e e e e drop ;
: latest ['] l5 S\" <div class=\"blogArticleIndex\">" div ;
13 Source Listing of latest.fs: insert ep @ dup cell+ over en-1 swap - move ;
: nil ep @ @ -1 = if article ep @ ! r> drop then ;
: eol ep @ en >= if r> drop then ;
: Te<Tr article ep @ @ article! timestamp swap article! timestamp <
if insert article ep @ ! r> drop then ;
: Te>=Tr [ 1 cells ] literal ep +! ;
: sort e0 ep ! begin eol nil Te<Tr Te>=Tr again ;
: consider articleId -1 xor if sort then ;
: scan articleIds dup /afields + begin 2dup < while
over articleIds - article! consider swap cell+ swap repeat 2drop ;
1 Informal Unsuitable performance measurements suggests I/O limitations; over a typical 14-hop Internet connection, Unsuitable satisfies no less than 20 requests for content per second using single-threaded tests. People smarter than I in this field seem to agree 25 RPS meets the criteria for good-performing websites. If I do not publish a more formal performance analysis, please remind me! I'm interested in how well Unsuitable fares too. 2 In a previous article, I estimated a PC, only one PC-generation ago, executed instructions some 56,280 times faster than a 1.79MHz 6502. In this article, I'm comparing a modern (as I write this) PC against a 1.00MHz 6502. Contemporary CPUs double the clock speed, and the C64 ran slower than any contemporary Atari offering. |