Previous Article:
Book: Girl with a Pearl Earring
Next Article:
Bonsai Bugs Update
Related Articles:
Recent pyblosxom activity
The future of PyBlosxom - WSGI Integration
The future of PyBlosxom
Work on pyblosxom
Bonsai Bugs and pyblosxom update
Pyblosxom 0.9.1
Pyblosxom 0.9
Pyblosxom Performance Analysis
Pyblosxom doesn’t scale well. This fact is demonstrated on my site Bonsai Bugs and on Ted Leung’s site. Both these sites have very large numbers of posts. There is a definite long delay as posts are read from the disk and displayed to the user.
Pyblosxom can be run in one of two ways – dynamically or statically. Posts are rendered dynamically on both Bonsai Bugs and Ted’s weblog. Static page generation would appear to be the answer – after all Apache is highly optimised for the delivery of static pages. Unfortunately static generation is not yet optimised either; an analysis of static page generation shall be covered by a future post in this series. This post shall comprise an analysis of dynamic page generation only. I shall also outline ideas for future directions to be taken by pyblosxom in future post/posts.
In order to test the performance of dynamic page generation – I have used the Python profile module. If you would like to try this on your own setup then simply change the call to p.run() to the code below in pyblosxom.cgi:
import profile
profile.run('p.run()', '/path/to/results/dbg-profile-patch1.txt')
import pstats
p = pstats.Stats('/path/to/results/dbg-profile-patch1.txt')
p.sort_stats('cumulative').print_stats(100)
This code will wrap the call to Pyblosxom.run with a call to the profiler. The results of profiling will be written to the file dbg-profile-patch1.txt. The pstats module will then read these profiling results and write them to standard output (which will then appear at the end of the HTML sent to the browser). Python 2.3 has a method dump_stats which will print all results to a file name, however, I’m using Python 2.2 and found it just as easy to copy and paste the results to Excel for analysis there. The statistics will be ordered by the cumulative time spent in each function.
My full setup is as follows:
I recommend that you keep at least two different codebases available (you can switch between them quickly by keeping only a single codebase line uncommented in config.py). The first of these should be a stable CVS checkout used as a reference codebase – this should be the first codebase you profile – your hardware will provide different result to mine. The second codebase should be your current codebase – this is code that you are currently optimising. Along the way you could also create “stable” checkpoints to benchmark performance differences using different ideas.
The results files linked to below use only a single run. To get an average performance measurement it is usually preferable to run several tests and average these. However, the computer I’m using to test pyblosxom is running a minimal set of services and is not also the client machine making the request. Total running time of subsequent renderings tend to differ by very little. As will be seen very quickly in the results below this error is negligible in the context of the problems in pyblosxom’s architecture.
To fully understand these results I recommend downloading and viewing the stable CSV results as you read. The quick patches I wrote recently attempt to solve some of the problems presented here were also benchmarked and these current CSV results are also available. A patch with all my changes is available in the pyblosxom-devel archive . More information on the columns in these results files is available in the profile module documentation.
The total time taken to render the front page for this test setup is 13.59 seconds. Of this time, 12.18 seconds is spent in run_callback and 12.17 seconds of this time is spent in blosxom_file_list_handler. That means 89.55% of the total time to render this test weblog is spent in one function! Most of the problem lies in a single function – a closer analysis of this function is required.
The docstring for the blosxom_file_list_handler function defines it as follows:
This is the default handler for getting entries. It takes the
request object in and figures out which entries based on the
default behavior that we want to show and generates a list of
EntryBase subclass objects which it returns.
Understanding this function certainly isn’t rocket science. It is responsible for sorting the entries into a list of objects that have EntryBase characteristics. Why does it take so long!?
Delving a little deeper we find that most of the time spent inside this function is actually spent inside the FileEntry.<i>init</i>. At this point it is useful to look at a column other than the cumulative column – the ncalls column. FileEntry.<i>init</i> is called 2168 times – does this value look familiar? It should – it’s the number of entries that I used for testing. We are building a full FileEntry object for every entry – including those that are never shown!
The indiviudal actions performed by this function in setTime (sets the date fields for entries) and __populateBasicMetadata (sets metadata such as paths etc. for entries) are not in themselves very time consuming, however the number of times they are called make them so.
To understand why building all of these FileEntry objects every time is so pointless, given most peoples use of pyblosxom, it is also necessary to read the code for the blosxom renderer. The blosxom renderer when handed a list of FileEntry objects slices the list of objects down to the maximum number of entries set in your configuration (the py[‘num_entries’] variable). This can be seen in BlosxomRenderer._processContent.
The function blosxom_file_list_handler is the main problem in pyblosxom. Optimising the performance of this function will go a long way towards solving the pyblosxom performance problem for large number of entries.
In my patch, I have attempted to solve this problem by building a small proxy object that only calls filestat on each file to generate an mtime and a timetuple for each entry. These are used for sorting the entries into order. A valid list of entries is extracted and this valid list is sliced to be of size py[‘num_entries’]. From each of these proxy objects a real file entry object is created then created.
The results of this change and some changes to Walk and filestat are presented in the second CSV file The changes I have made to filestat actually slows down the performance of this example by about 0.5 seconds (the changes to Walk help improve performance dramatically when plugins relying on Walk are also used – the benefits of the changes to filestat are less clear). Despite these slowdowns these results run 62% faster than a stable CVS checkout!
Unfortunately, life is never that simple. Although my patch does show an excellent increase in performance it is still not ideal. For example, one common callback that is requested is a sort callback – obviously to be able to sort the entries there must exist an Entry object for every file. In order to implement a sort callback efficiently therefore it will be necessary to compile a detailed list of use cases for sorting so that only the required entry metadata need be computed for every entry.
In the static generation case I would assume (although I have not tested this so far) that it would be preferable to keep a full cache of full FileEntry objects between renderings of every page on the system. This is not covered by my patch (although my patch should still provide a performance increase in its current form for static rendering – currently FileEntry objects are created for all entries for every entry/index rendered). I shall test the performance of static rendering in a later post.
The architecture of pyblosxom requires some serious work. In my opinion there is much of it that is needlessly complex and could be simplified dramatically. As a first step for the most common case of using pyblosxom dynamically – my patch is useful. For the future – a rearchitecting/refactoring of the Entry hierarchy must be completed to achieve optimal performance and allow for additional callbacks, e.g. sort. As plugins are added the Walk function must also be refactored (it is not a bottleneck in this simple example but does quickly become one – I have some further CSV files obtained from profiling which show the Walk function taking an increased time slice when pycategories, pyarchives and pycalendar are installed). Together these two fixes already reduce the execution time substantially.
* * *