Related Articles:
Recent pyblosxom activity

The future of PyBlosxom - WSGI Integration

The future of PyBlosxom

Bonsai Bugs Update

Pyblosxom Performance Analysis

backprop.net and bonsaibugs.org update

Pyblosxom 1.0

Bonsai Bugs and pyblosxom update

Pyblosxom 0.9.1

Pyblosxom 0.9


Work on pyblosxom · 1. September 2004, 15:30

Walk caching, metadata storage and comment moderation

While working on my upgrade to Bonsai Bugs I have implemented a number of changes to pyblosxom These include:

  1. Caching of Walk results
  2. Metadata callback to facilitate storing mtime, post status, entry type, summary etc.
  3. Integrating comments plugin to use metadata facility and implement comment moderation

All of these ideas are intertwined but I will presented them in the order they were implemented.

The Walk function in pyblosxom is called by a number of separate plugins and by the main pyblosxom script itself. Each time it is called it must reread the filesystem. This is a large overhead for a weblog with many entries (the results of ‘stat-ing’ individual files were cached by Will in a previous release). Solving this problem will help reduce the overhead in having a large number of entries but unfortunately will not eliminate it entirely.

In my implementation I make a single call to build a cache of a full walk of all of the entries on the file systems. Subsequent calls to Walk will return subsets of this cache of entries. In a revised version of my patch I have also integrated the filestat cache into my walk cache. This has the benefit of saving memory but does have the undesired effect of increasing the filestat execution time – more investigation is required here to achieve the right balance.

During the building of a walk cache, I have introduced a new callback cb_metadata. The purpose of this callback is to read any metadata specific to the directory passed as an argument. For demonstration purposes I have implemented a simple scheme of reading an INI file like structure from a file named ’.pyblosxom-metadata’ using the Python standard library module ConfigParser This callback could however read a more complex format e.g. RFC 2822 metadata as suggested be used as part of a more ambitious NewProject (NewProject was proposed to allow separation of content from its management, thus allowing pyblosxom to maintain its simplicity but allowing extended and advanced functionality). It could even read from a database though I suspect that few in the minimalist world of pyblosxom would be interested in this.

Finally, to demonstrate the usefulness and extensibility of this new structure, I added some code to the comments plugin. This code firstly builds a cache of the comments directory and queries this cache for comments relating to the current story (this should be more efficient than the current implementation which uses glob). It also means that if metadata exists for all comments then new comments can be marked with a default status assigned by the weblog owner. As Ted Leung pointed out though, currently the weblog owner must edit the metadata file to mark comments as live. This should be easily solved either through a simple administration interface or a shell script.

Performance

The final work I have done on pyblosxom is to use the profile module to assess the total cost of running pyblosxom with a large number of entries. Once I have made more sense of the numbers I will make some notes here about other performance bottlenecks.

* * *