Sunday, May 2, 2010

A New Design for Managing Categories

If you haven't read the previous blog on Categories, please read it now.  Otherwise, the examples in this entry won't make a lot of sense.

In Agenda, Items are managed by the categories they belong to.  Agenda treats Categories like 'Initial View' and 'PC Platforms' (which seem to be Views that present groups of items) identically 'Game' and 'Manufacturer' (which seem to be groupings of items) and even identically to 'Dell' and 'Xbox360' (which seem to be attributes assigned to Items).  

This simple approach is the root of Agenda's great power.



In the previous column, I mentioned two 'bugs' - not searching the Category tree deep enough, and starting every View at the top of the tree.

There is an Agenda 'workaround' for the first bug.  You can use a Category's 'Assignment Actions' or 'Assignment Conditions' to add explicit or conditional assignments to the parent.  But remember, it's a tree.  If you are deeper in the Category tree, you might have to add assignments to the parent-of-parent as well.

That's seems too complicated - especially since a View/Section already knows where in the tree this Item is being viewed.  The View is almost always the parent of the Section, so it's more reasonable to make that assignment automatically.  If Agenda had done that, I wouldn't have realized there was a problem unless I created another sub-view to my sub-view (the parent-of-parent link).

There's an Agenda 'workaround' for the 'Views at the Top' bug as well.  Before you create a View, you can manually create the Category that you intend to use as your first Section, and assign it where you want in the Category tree.  Blech.




Let's talk about something much more serious.

Agenda looks to have a totally flexible way of organizing data.  But underneath, it only allows a tree hierarchy of categories - for example Hardware / PC / Dell / Latitude / D620.    It's a nice way to classify things, like a Dewey Decimal System at the Library.

Maybe it's not the best approach.  Hierarchical trees are very limited.  Consider, for example, that I might also have a hardware category for 'Smartphones' (with Manufacturers like Palm, RIM, Motorola, Google, HTC, etc).

But Apple has both PCs and Smartphones.  If I stick Apple in as a Smartphone, then I'm stuck on how to classify their PCs.  And what do I do if Apple releases a game?

Today we use relational models for data instead of the hierarchial models that we used when Agenda was written.  In database terms, a 'Relation' is set of objects that have the same attributes - think of it as a big spreadsheet-like table with rows representing items and columns representing attributes of those items.

In a relational system, 'Apple' could be an attribute of each of 'Smartphones', 'PCs', and 'Game' without any confusion.

Further, sometimes I might want to reorganize the way I think about the data.   Maybe I want Apple and Dell to be at the top of my tree, and drill down on their product lines and then their products.

In a relational system, this just a different wording of the query I use to ask for data.  But in a hierarchical system, it's like pulling your arteries out and repositioning them.  There is serious pain involved - especially once the tree grows a bit.




Let's be realistic.  Changing the Category tree to a full relational model just so I can classify two types of Apple products is like crushing a nut with a sledgehammer.  I need something more powerful than a tree, but only a little more powerful.

Well, a tree is just a constrained network.  Here's a sample of other types of networks, showing different constraints.












We regularly use network diagrams other than trees.  Mind Maps, Flow Charts, and Pert Charts are common examples.

On reflection, the network style I want for my version of Agenda is a 'Semantic Network'.  It's a popular way to represent knowledge, and pretty easy to navigate.












For now, I'm going to stick with the most basic style of Semantic Network - a directed graph with edges (lines) representing either of two concepts: 'IsA' or 'HasA'.

I'm going to try to build this graph as automatically as Agenda builds the current Category tree graph.


Remember that Views are just categories in Agenda, and so are Columns.  Sections are simply groupings of Items that have a Category assignment.



The rules for adding links as the user adds views, columns, and category assignments are as follows:

When a user creates a View, each Column's Category gets a 'HasA' link to the the View's Category.  When a Category is assigned to a Column, then that Category gets an 'IsA' link to that Column's Category.  No changes to the relationship between Items and Categories.



Here are the assignments that would have been created for the example I used in the previous posting.

"Initial Section"  HasA    "Hardware"
"Initial Section"  HasA    "Game"

"Xbox360"          IsA     "Hardware"
"Far Cry 2"        IsA     "Game"
"Wow"              IsA     "Game"

"PC Platforms"     HasA    "Manufacturer"
"PC Platforms"     HasA    "OS"

"Dell"             IsA    "Manufacturer"
"XP"               IsA    "OS"


Almost done.  This is simply a tree with each view starting at the top (like Agenda).   Every time a Section is specified in a View, I will add an IsA from the View's Category to the Section's Category.   Like this:

"PC Platforms"     IsA    "PC"

This is analogous to adding the 'parent' link like I would have manually done to fix the 'parent' bug.

And I will assign the View's Category to the Item as a conditional link (ie: on the basis of being in that View).  Agenda doesn't have to do this explicitly because it implicitly knows that categories under a View belong to that view.




Both the bugs are fixed.   Now If I enter an item into the "PC Platforms" view, it will show in Initial Section properly because I'm looking for Hardware, and the system can find the rules:

"PC Platforms"     IsA      "PC"
"PC"               IsA      "Hardware"


So it knows that that "Hardware"  IsA  "PC"  IsA  "PC Platforms".  Because the Item entered is in the view is assigned "PC Platforms", it is "Hardware".



And no issues with 'Parent of 'Parent'.  If I added another subview - perhaps about the video cards of PC Platforms, then I would automatically have another rule like "Video Cards" IsA "PC Platforms". 



I'm going to put in some safeguards.  There is a problem with loops in networks, if someone added the rule 'Dell IsA Initial Section' then everything in the database becomes a Dell.

So I'm going to number each Category's distance from 'Initial Section' and require that edges always move 'away' (much like a tree, just a bit less constrained).

With this rule, I think I can display the 'Tree' pretty easily in my version of Category Manager.  It will look like a tree, but some categories (and all their child categories) will show up in more than one place.  

I'm going to follow the conditional and explicit assignment rules of Agenda, not changing much except the topology.  I'm thinking this isn't a huge change and won't break Agenda.  And maybe make it a bit easier for users to understand.


Down the road, I can experiment with letting users define their own relationships (like the "LivesIn" in the Whale example).   And if I'm really brave, I can add negative rules like 'NotIsA' and 'NotHasA'.  Maybe I can build in a 'Mind Map' tool to help the user view and build their models.




This change to managing Categories will make Agenda a lot smarter.  I'm guessing that it will also make Agenda about 100 times slower.   Not a worry.  

The original Lotus Agenda was written in 1990, we've run through 13 iterations of Moore's Law doubling horsepower since then.  That means my server should be about 8,000 times more powerful than the 486 processors of those days.

Can't think of a better way to burn off that extra horsepower.

0 comments: