Tuesday, October 12, 2010

624 #12 Constellation Models: Sharon

Comments

Introduction
This paper introduces Constellation Models (pictoral structure models) from computer vision as a method of sketch recognition. Constellation Models are used to identify the subcomponents of complex shapes, such as faces. This is done by using features of individual shapes as well as shared features between shapes to apply labels to shapes. In the case of recognizing parts of a face this would mean identifying that an ear has a relatively ear-like shape and that it is located a certain distance to the side of the eye and nose.

In order to make this method more efficient, as it is at heart an O(n^2) algorithm, Sharon defines certain sub-shapes as mandatory or optional. The mandatory shapes are a smaller subset of the total shapes and are identified first. Once the mandatory shapes are labeled they serve as a solid anchor for labeling the the optional shapes. The algorithm is further optimized by using a multipass algorithm that starts with a very optimistic threshold for identifying shapes and progressively gets lower, identifying more shapes as it progresses and narrowing down the search space as it goes.

Discussion
I like the Constellation method mostly because of the simplicity of the feature vector. The features calculated for each stroke are very simple but when used in conjunction with the relative positioning and shape of other strokes work as a good means of labeling. It is amazing to see some of the example sketches which have wildly varying sub-shapes, but due to their relative positioning all are identified correctly.

I do, however, have some problems with the paper as a whole. There does not seem to be much evaluation of recognition rates or failure points. There is a lot of discussion of the speed of the method, which is important, but what good is speed if you are mislabeling a large portion of the elements?

Another question is how this functions and deals with multi stroke shapes. Most of the example sub-shapes are single stroke, though there are a few that must be multi-stroke. It is not clear if these are grouped and labeled as a single shape or they are treated as multiple shapes of the same label.

624 #11 LADDER: Hammond

Comments

Introduction
LADDER is a system for describing and recognizing hand drawn shapes using a human readable geometric description language. This is meant to allow system designers to create sets of shapes that can be recognized as part of a visual grammar, that is a certain domain. In addition to shape recognition, LADDER also allows designers to describe how recognized shapes should be displayed, what actions can be performed on them or what actions they perform on other shapes.

Shape structures can be made up of basic recognition shapes, such as lines, poly-lines, circles etc. as well as previously defined shapes. Constraints can then be placed on the relationships between these subshapes.

The system uses these descriptions in a bottom up approach, starting with identifying basic shapes from strokes, constructing many higher level shapes from each basic shape. Eventually each shape is part of one high level shape.

Discussion
LADDER is very useful for domains with simple geometric shapes that are easy to describe either individually or as part of a hierarchy. This becomes problematic with more complex individual shapes that are hard to describe. It might be interesting, as mentioned in LADDER's future work, if a designer could automatically generate a LADDER description of a complex shape, both to make it easier on the designer and to show which shapes might be problematic for LADDER to describe at all. In such cases it would seem useful if a designer could use some other manner of recognition to describe a particular shape, but could then use that shape in a later LADDER description. This way LADDER could incorporate more complex shapes while still keeping the geometric descriptions to create composite complex shapes.

Tuesday, October 5, 2010

The question we all want answering....

"Does the generative annealing activation information composition visualization in the hot space, driven by information semantics, a user-interest model, and a res- ponsive crawler, help people to be creative?"
From Provocative Stimuli, Kerne et all, CHI 2011