Harvest stuff for GSoC: Week 7!

Time for my bi-weekly Harvest update! Everything this time went into the gsoc-client-stuff branch.

The first thing I learned (well, decided) is that YUI has incredibly dense, loopy and uncool documentation. I guess different people are compatible with different kinds of docs. As I read the YUI stuff I just couldn’t keep it all straight for some reason. Its landing page leads off in many directions: there’s an API reference that was written and designed to put me to sleep, an Examples section that doesn’t bother to link to the API reference (but is more pleasantly written), and a lot of extra listings in between.

All I could think about was how much I preferred JQuery’s docs. So, I dropped YUI for JQuery and nothing has exploded. In fact, it’s all gone wonderfully.

JQuery has a considerably simpler core than YUI, but there is lots of functionality in little self-contained plugins. Some of these are official parts of the project, others are external things linked from the plugin repository. (Granted, lots of redundancy there). It’s a different approach — YUI is richer from the start — but for me, JQuery wins by being so much easier to take in at a glance. That, and its documentation is a lot prettier. Instead of your boring automatically generated list of docstrings attached to function names, they have a beautifully presented web app. There are no frustrating stubs; everything I may think to use is there and explained in detail. And it’s a single destination. One page to learn everything there is to know about each bit of functionality in JQuery, including examples. Pretty cool.
JQuery, I promise I will always love you :)

I also learned about doing object-oriented Javascript. This one surprised me. I have used Javascript for lots of small jobs before, but I have never used it with anything big. I realized Javascript doesn’t have the class keyword I have come to love, but after some learning I am back to thinking it’s a pretty cool scripting language. That is mainly thanks to an excellent blog post by Stoyan Stefanov, all about doing classes in Javascript. Everything is an object, so of course we don’t need a class keyword! (Apparently we’re getting one some time this century, though, if everyone is nice to it).

Making something that feels like a class is a bit of a hack, but it works elegantly in the end. It’s really just a function, and we put other functions and things inside it for methods and properties. I’m doing them like this:function Filter (dom_node) { var filter = this; this.get_value_serialized = function () { return null; }}

To create an instance we use Javascript’s new keyword and write out the function as usual (including its parameters). The this keyword can’t be trusted if we are using callback functions, because each function is given its own version based on what object it is being called with. (If it’s via a reference to a function stored somewhere else, things get messy). On the other hand, anything inside Filter can see that filter variable we created at the top, as well as the dom_node parameter.

Those variables pointing at functions, of course, can be really easily reassigned to point at different functions. Lots of power here. For fancier stuff, including multiple inheritance (yes, it gets crazier), Mike Koss has an excellent article. In my case, I decided not to go into that. (Well, okay, I chickened out then called it a decision). Proper subclasses might make my code look smarter, but in this case that whole chunk barely needs to do anything anyway, so I’m fine how it is.

With that all out of the way, I worked on a really fun list of new stuff!

I have Harvest using XHR in a few different places now. XHR is a wonderful thing that lets us directly request new data for our page and handle it through Javascript. It means, if you want to select a bunch of filters, you can do it without flooding our server with each one.

When a filter’s value is changed, it posts the query parameter that change represents to the global Results object. (Some goodies here: if you change the value of a filter in a positive way, that filter is selected implicitly). Instead of instantly yapping at the server, the Results object stores the parameter it receives in an object (dictionary style) and starts (or restarts) its timer. When the timer finishes, it uses JQuery’s $.get function to request new results for the selected filters, passing the function its dictionary of new parameters. (JQuery magically turns that into a querystring for us, so if something weird happens and everyone decides to use something else, Harvest will still work).

The result surprised me. I’m still not doing much to limit the number of results, but even with a query that returns the most packages possible (around 3700 at once) the whole thing feels a lot quicker than it did. Funny…

The next one is expanding packages. When the user clicks a package, we send another http request asking for that package’s details. We get a snippet of HTML back from the server, throw it in the right element and reveal it with an animation. (Did I mention I love JQuery?).

By the way, I worked on the visual design for packages. Any comments? Thoughts on the arbitrary green highlight?

I don’t trust http requests going on unchecked. So, as is the convention, I added little “loading” indicators (from the very helpful and sickeningly popular ajaxload.info). It was really distracting to have these appear all the time, though, so I played with it and now the indicators will slowly fade in. A loading indicator should only become obvious if there is a long operation going on. If everything is normal and the operation is nearly instant, the indicator stays out of the way.

I went two whole days without an Internet connection this week (oh, the humanity!), so I was thinking about people with similar predicaments. It sucks when an application decides it wants to download something and keeps on trying and trying, oblivious to my repeated attempts to convince it I have no connection. JQuery’s $.get function returns an object to control the http request, so I store that using the $.data method:package_node.data('xhr', xhr);Later, if someone tries to collapse a package that was still loading its details before expanding, we can do something like:package_node.data('xhr').abort();We give $.get a callback function that is run when the request is finished — be it successful, an error from the server, or an abort. So, cleanup (like removing the loading indicator) can all be done there.

The Django debug-toolbar only kicks in when we load a whole new page inheriting from the base template, so I had to try something new to gauge performance. In this case I found a cool bit of middleware at DjangoSnippets.org. It adds a header for every page Django creates, saying how long it took to generate. It isn’t much information, but it helps! What I like here is it doesn’t edit the page at all and it’s a really small bit of code; it is as unobtrusive as possible. The data is always visible with a tool like Firebug or Webkit’s web inspector (or telnet, if you’re crazy). It is super easy to present this with Javascript, too:time_header = xhr.getResponseHeader('X-Django-Request-Time')if (time_header) { $('#requeststats').html('Results generated in '+parseFloat(time_header).toFixed(2) + ' seconds');}

And that is that! There is lots of polish left to do for next week, and a strange headache of a merge conflict to resolve. Assuming bzr doesn’t eat anyone’s work, things are really picking up!

Harvest GSoC project: week 5!

The last two weeks of my Harvest project have gone really well. It isn’t flashy and exciting and earth-shattering (yet), but I’m happy with it.

First of all, my branch now has Packages and Opportunities filters. I implemented a bunch of each, and they are resolved in order. First Harvest runs the package filters, then it filters the opportunities that belong to those packages, then it hides packages that have no visible opportunities after all that filtering.

After a long period of me obsessively poking things, Daniel talked me into making a merge request for my branch (to lp:harvest). He and James W gave it some really thorough code review, which has been a huge help! Now it all feels tidier and a little more justified, so I can use the code I wrote without that constant temptation to rewrite it.

At this point, performance is much improved. Details are only shown for one package at a time, so the incredibly long waits (and self-destructs) have gone away.

In addition, the filter system consistently does 4 SQL queries no matter what is being searched for. Of course, that doesn’t say it’s running as well as it can, but it does mean the system is more orderly. It only hits the database once for each type of data. (One query for package sets, one for opportunity lists, one for relevant source packages, one for relevant opportunities). This puts more thorough optimization within reach.

Rather than trying to optimizing things for eternity, I’m off to something completely different for now: fancy Javascript to load new content in line with the page! The idea is a big query will still have a visible wait (always will!), but a complete page won’t need to be created each time; just the specific results, with a nice spinner while they’re loaded. Less jumping around, quicker and more fun.

For the first step, I have to admit I got a little carried away and started redoing the base template from scratch. (It should save me some time, really. The Javascript will be attached to some kind of DOM tree and I don’t want to fiddle with that twice). Still an early WIP, but I think this looks quite pretty :)

I am starting with a prototype written in straight HTML; no Django template markup yet. It’s helping me straighten my thoughts for how the filters’ render() functions should link together. The interface is taller than I would like, so I will need to do something to collapse the Choice filters when they aren’t being used. Oh, and we don’t seem to have a logo. Still, the gist of it is there.

Next for me is figuring out YUI and making that filter interface on the left interactive. Really exciting! Javascript toolkits are amazingly fun to use.

Speaking of web design, I decided I didn’t like my nearly-stock Blogger template anymore so I spent way too long redesigning it (while cursing Blogger for being an awful platform to make templates with). What do you think?

Filters are addictive! (GSoC 2010 Week 3)

The last two weeks I have spent a bit more time than I expected playing with filters, filters and more filters. The first iteration went in maybe a bit too quickly, so I have done quite a bit of clean-up. Python is being too lenient with me!

I’m finally pulling my head out of the code and looking at this from further back, and from here things are looking pretty cool. There is still a lot of stuff in the implementation that needs fixing, but it is entirely possible to start playing with something else without wasting any effort.

 

Yes, that bar on the left is for controlling the filters! Clicking on a filter toggles it, which is done by loading a new URL with the appropriate parameters appended so Harvest knows what to do. The debug toolbar is reminding us that performance is still pretty ugly, but most of those SQL queries are from the opportunities list below each package. Some time in the next few weeks I will be moving that feature so it only appears on demand.

As for the other filters, those are coming. At this point it should be pretty easy to get a rough implementation of everything that is on the left side in the mockup. They won’t be pretty, but they should work. Just extend one of the Filter base classes, add a custom function for manipulating a queryset, override the render function if desired, and it’s done!

All the HTML stuff for that filters bar had to be implemented in Python, rather than the template. It’s the same kind of design as Django’s own forms module, so I think it is forgiveable…

Speaking of filters, I’ve developed a dangerous obsession with The GIMP’s collection of goodies. Quite proud of this rendering I made for a friend. It’s proof he looks exactly like Tintin!

It was straight-forward to do. I used the toon filter and oil painting filter to add outlines and soften the details. Then I whipped out the trusty liquid rescale to make his face a little taller (which worked beautifully given that the other details were insignificant). Finally I did some extra brush work to mimic Herge’s drawing style, making colours more solid and connecting some lines.

In the end it’s probably too subtle, and it would have been easier (though slightly less exciting) to draw it from scratch. Still, it was an interesting exploration. I look forward to doing this to more peoples’ heads in the future.

Making Harvest awesome! (My GSoC 2010 project) — Week 1(ish)

It is time that I babble about my project for Google Summer of Code 2010!

Over the summer, I will be working on Harvest with Daniel Holbach as my awesome mentor. Harvest is a neat web service, built with Django, that brings together opportunities (things that need doing), from many different places on the web. Those opportunities are all neatly linked to source packages, which are, themselves, nicely described by package sets like ubuntu-desktop, unr, xubuntu and kernel.

Suffice it to say, Harvest is a huge database with a lot of intricate detail. It can be really helpful to see all this stuff in one place. My task is to implement views that harness all that data and resist the temptation to throw it all at the user at once. The result should be really helpful, quick and fun to use. (This may be difficult; I can be a very wordy person).

The first stage to achieving that is filters.

I’m using an existing wiki page as a starting point, which has one very pretty mockup. I created a mockup of my own, full of crazy tangents and marked by the single worst green pencil I have ever used.

There are a few crazy and opinionated things I want to do. First of all, I want to go as far as possible without a numerical paging system. Numerical pages make no sense. The second thing I want to avoid is a detached searching system; I think searching could be integrated pretty smoothly with filtering, so everything on the site works under the same rules. Third, while I want this to work really well with Javascript, it has to work well without Javascript.

Last week I continued familiarizing myself with Harvest, and I made a start on the back-end for the filtering. I am building a generic structure that describes all the filters and what they do in terms of database queries. This way we can change the mechanics or add new filters quite easily (and have them do all sorts of neat things), and we’re secure from people executing arbitrary operations on Harvest; only filtering operations that we define can be run.

The actual visual portion of the filtering stuff is entirely detached, and it is non-existent at this point. I expect the rest will be easy to pop together once the back-end is in place.

So, yeah, not very exciting yet. It’s Harvest, and the list of packages is filtered. The URL format is, of course, not at all decided but it’s most likely going to carry parameters with GET. That way people can link straight to Harvest showing a particular query. The one for that screenshot is /opportunities/filter?pkg=name,set&pkg_name=ge&pkg_set=ubuntu-desktop,core, which describes a package with a name that starts with “ge” and belongs to the “ubuntu-desktop” or “core” package set. I would love to know if that looks sane to anyone but me :)

Exciting, eh?

This week I will be doing a bit more with the invisible half of the filtering system, then I’ll make a start on the user interface half. Still nothing pretty (yet), unless the urge overpowers me; just drawing the widgets and having them do the right thing when they are clicked. I want to test my approach before it’s stuck that way.