A series of unfortunate thoughts.: 2010

Thursday, December 16, 2010

624 #28 Dixon,Dasarp,Hammond - iCanDraw?

Introduction

This paper presents an assistive feedback system for hand drawing human faces. It is meant to help teach students how to draw a human face and help them see where to make corrections to make their drawing more accurate. The system does this automatically by constructing a drawing template from the reference image that the user has chosen to draw from. The user's drawing strokes are then compared against the template to see if they need correction and feedback.

Discussion

iCanDraw is a nice different way to apply sketch recognition ideas. I personally would like to use some of the design principles that come from this paper to create other such systems, such as for teaching users one and two point perspective drawing.

Monday, December 13, 2010

624 #27 Davis, Colwell, Landay - K-Sketch

Introduction

K-Sketch is an animated sketching system, a tool that aims directly at one of the main purposes and uses of modern pen based interfaces, drawing and animation. The focus of the work is not only to build a tool that makes animation simpler and easier, but also to develop and refine interaction techniques for sketch workspaces. In a user study comparing K-Sketch and PowerPoint in an animation task, it was found that the users were able to learn to use K-Sketch much faster, and generally felt that K-Sketch require much less cognitive load.

Discussion

Frankly it is a little bit surprising that they compared creating animations between K-Sketch and Powerpoint. While powerpoint can do animations, it is well known that Microsoft products usually require a high cognitive load and are not particularly easy to use. Additionally it seems that an actual animation program would be a better comparison, such as Flash or something like it. Still I expect the results would have been very similar, because the controls of K-Sketch seem well thought out and do not overly crowd the interface.

In other news the manipulator tool used in K-Sketch looks very similar to the manipulator tool in Prezi (or vice versa rather).

624 #26 Gabe Johnson - Picturephone

Introduction

Picturephone, as discussed earlier on this blog, is a game developed to build sketch recognition datasets. The game asks multiple players to simultaneously draw a picture based on a text description. This results in the "same" drawing but visually the results are completely different. Generating sketch testing data with this technique allows for a much broader set of images, but at the same time they still depict the same concept.

Discussion

See discussion here

624 #25 Eitz, Hildebrand, Boubekeur, Alexa - Image retrieval based on sketched feature lines

Introduction

This paper describes an image descriptor that allows retrieval of images from a database based on a sketched drawing as input. The descriptor, a Tensor descriptor, is proposed for use in stead of an edge histogram descriptor. The tensor descriptor works by finding the direction of the image gradient in subsections of the image. This descriptor is calculated for every image in the database, then it is calculated for the sketch that is submitted for querying and the descriptors in the database are compared against it. The tensor's performance was better than the edge histogram and subjectively the users of the system preferred the results returned by the tensor.

Discussion

Where can I find one of these? It would be a great way for artists to search for material, for people to be able to look up places or things that they remember visually. It would be interesting to try to visualize your own dreams by sketching out thoughts and seeing what is returned by the search. I hope that Flickr or some other large scale repository picks up on this as a search technique.

624 #24 Gabe Johnson, Ellen Yi-Luen Do - Games for sketch data collection

Introduction

This presentation introduces two games that can be used to collect sketch samples as well as associated metadata or textual description for context. The first is Picturephone, which is very similar to the party game of the same name, except that it is played on a website and players can submit either new descriptions for drawing, new drawings from concepts or rate drawings. The second game is Stellasketch, a game which asks a user to draw an image based on a prompt which other players will then see and use as a clue to figure out the current theme of image prompts. Both games would provide traditional sketch samples, that stroke and timestamp information, as well as ratings and descriptions, all of which are useful elements to data for use in training sketch recognition systems.

Discussion

Systems such as this are some of my favorite topics in computer science. The use of people as "Mechanical Turks" while at the same time entertaining them and providing solid data is a very appealing idea. If you can present people with motivation to help solve problems that are inherently human, while at the same time keeping the task straight forward and engaging you will have no lack of useful data. Imagine if solving puzzles in your favorite game actually solved real world problems at the same time, wouldn't it be ten times more addictive and interesting to play? (I am looking at you PopCap!)

624 #23 Hinckley etc. - InkSeine

Introduction

InkSeine is a sketch input overlay interface that is focused on providing search functionality to support active note taking tasks. The system lets the user mingle ink notes, search queries and documents in one space, acting as a sketch workspace for research, design or creative activities.

Discussion

It is interesting that the main focus of InkSeine is in-situ searching, though it makes sense when their primary user task is note taking and analysis of their personal document collections. I would think personally that users would not want handwritten notes all over their computer workspace, but the idea of having that information available for later access in its original context seems compelling.

624 #22 Mori, Igarashi - Plushie

Introduction

Plushie builds off of Teddy and provides an interface not just to create 3d models but to construct a simulated plush toy. From this simulation the system is able to generate patterns and instructions for assembling the plush toy in real life. Editing can either be performed using techniques of Teddy on the 3d representation, or directly on the 2d construction pattern. While editing, the 3d representation also runs a "plushie" simulation that gives an accurate representation of what the final shape will look like .

Discussion

I did not think I would see anything more brilliant than teddy but this is an amazing combination of natural interaction and physical modeling all working together to simplify an inherently difficult problem. Even professional balloon designers, who were interviewed as part of the user study process for Plushie, felt that the software could help them decrease design time for new balloons.

624 #21 Igarashi, Matsuoka, Tanaka - Teddy

Introduction

Teddy is a system that turns 2D freeform strokes into 3d objects by extracting 2D silhouettes from the sketches. The system is easy to use and it typically took novice users only 10 minutes before they were able to start making 3D shapes. Teddy supports several 3d editing commands, such as creating new objects, painting on the surface, extrusion, cutting, smothing and transformation.

Discussion

Teddy is quite amazing as it really truly takes advantage of sketching as a natural input form to make a task like 3d modeling, which typically requires a long learning curve, simple and quick to learn. The only issue with teddy is that it requires explicit editing modes, but I suppose it is necessary given the complexity of the domain.

Sunday, December 12, 2010

624 #18 Shilman,Viola - Spatial Recognition and grouping of text and grpahics

Introduction

This paper discusses a spatial approach for recognition that is quick and efficient. The strokes of the sketch are connected in a proximity graph, then a classifier determines if the strokes compose part of an already classified shape. The classifier only uses a small subset of the features (image, curvature and endpoints) to increase recognition speed.

Discussion

While this approach is very efficient at achieving recognition for the given set of shapes, it is not as flexible as a geometrically based recognizer. The first issue is that each gesture is limited to 6 strokes, which is fine for gesture recognition but is constraining in an actual sketch environment that is supposed to be free form. Secondly the classifier is based on non-deformable templates so objects like arrows must be drawn as the template specifies and cannot be shaped differently.

624 #17 Bishop, Svensen - Distinguishing text from graphics in on-line handwritten ink

Introduction

This paper presents three methods for analysing text vs shapes in sketches: a multilayer perceptrion neural network (MLP), a hidden markov model (HMM) and a bi-partite HMM. These methods can be layered on top of each other to get a more complete picture of the stroke type. The MLP is the lowest level and attempts to identify strokes by constructing a feature vector of 9 features for each stroke and running them through an MLP to identify their type. The uni-partite HMM combines the individual stroke knowledge of the MLP and combines it with information about the temporal context of the stroke and those that came before it. The intuition with this HMM is that text strokes will follow text strokes and graphical strokes with follow graphical ones. Finally the bi-partite HMM adds information about the spatial context of strokes, how close they are to preceding strokes. In experiments, they found that the addition of the temporal context helped recognition rates for the MLP, but it was not always the case that the spatial context helped.

Discussion

It is interesting that they compared several layers of recognition in this paper, rather than entirely different techniques. Also of interest is the set of features that they chose for their feature vector, though it is not clear why they chose that particular set (perhaps from their own previous work). The combination of static classifiers and

on-line classifiers was particularly interesting as it showed how dependent this method is on training data.

624 #16 Segzin An Efficient Graph based recognizer

Introduction

This paper introduces a graph based system for recognizing symbols. Graph structures are used to represent the primitives that make up a symbol and how they are connected, geometrically and topologically. The recognizer is trained by creating these graph structures for each example sketch, then building average graphs that represent the individual symbol classifications. Four graph comparison algorithms were then compared for use in recognition, a stochastic search, error driven matching, greedy search and geometric sort matching. Stochastic, error driven and greedy search all achieved similar top-1 recognition rates, around 93% with relatively close running time, with stochastic being the slowest and greedy being the quickest. Geometric search was much faster, 2ms compared to the 12ms of greedy or 68ms of stochastic, however its top-1 recognition rate was lower, at 78%, and was aided by drawing consistency between users in the study.

Discussion

I like the search methods presented in this paper, and the fact that they can be applied directly to a symbol recognition system. Such a system lends itself to future improvements in search speeds, and would seem ideally suited for other optimizations like parallelization. This is in contrast with other recognition systems we have been introduced to which have not appeared trivially parallelizeable.

Tuesday, October 12, 2010

624 #12 Constellation Models: Sharon

Comments

Introduction

This paper introduces Constellation Models (pictoral structure models) from computer vision as a method of sketch recognition. Constellation Models are used to identify the subcomponents of complex shapes, such as faces. This is done by using features of individual shapes as well as shared features between shapes to apply labels to shapes. In the case of recognizing parts of a face this would mean identifying that an ear has a relatively ear-like shape and that it is located a certain distance to the side of the eye and nose.

In order to make this method more efficient, as it is at heart an O(n^2) algorithm, Sharon defines certain sub-shapes as mandatory or optional. The mandatory shapes are a smaller subset of the total shapes and are identified first. Once the mandatory shapes are labeled they serve as a solid anchor for labeling the the optional shapes. The algorithm is further optimized by using a multipass algorithm that starts with a very optimistic threshold for identifying shapes and progressively gets lower, identifying more shapes as it progresses and narrowing down the search space as it goes.

Discussion

I like the Constellation method mostly because of the simplicity of the feature vector. The features calculated for each stroke are very simple but when used in conjunction with the relative positioning and shape of other strokes work as a good means of labeling. It is amazing to see some of the example sketches which have wildly varying sub-shapes, but due to their relative positioning all are identified correctly.

I do, however, have some problems with the paper as a whole. There does not seem to be much evaluation of recognition rates or failure points. There is a lot of discussion of the speed of the method, which is important, but what good is speed if you are mislabeling a large portion of the elements?

Another question is how this functions and deals with multi stroke shapes. Most of the example sub-shapes are single stroke, though there are a few that must be multi-stroke. It is not clear if these are grouped and labeled as a single shape or they are treated as multiple shapes of the same label.

624 #11 LADDER: Hammond

Comments

Drew

Introduction

LADDER is a system for describing and recognizing hand drawn shapes using a human readable geometric description language. This is meant to allow system designers to create sets of shapes that can be recognized as part of a visual grammar, that is a certain domain. In addition to shape recognition, LADDER also allows designers to describe how recognized shapes should be displayed, what actions can be performed on them or what actions they perform on other shapes.

Shape structures can be made up of basic recognition shapes, such as lines, poly-lines, circles etc. as well as previously defined shapes. Constraints can then be placed on the relationships between these subshapes.

The system uses these descriptions in a bottom up approach, starting with identifying basic shapes from strokes, constructing many higher level shapes from each basic shape. Eventually each shape is part of one high level shape.

Discussion

LADDER is very useful for domains with simple geometric shapes that are easy to describe either individually or as part of a hierarchy. This becomes problematic with more complex individual shapes that are hard to describe. It might be interesting, as mentioned in LADDER's future work, if a designer could automatically generate a LADDER description of a complex shape, both to make it easier on the designer and to show which shapes might be problematic for LADDER to describe at all. In such cases it would seem useful if a designer could use some other manner of recognition to describe a particular shape, but could then use that shape in a later LADDER description. This way LADDER could incorporate more complex shapes while still keeping the geometric descriptions to create composite complex shapes.

Tuesday, October 5, 2010

The question we all want answering....

"Does the generative annealing activation information composition visualization in the hot space, driven by information semantics, a user-interest model, and a res- ponsive crawler, help people to be creative?"

From Provocative Stimuli, Kerne et all, CHI 2011

Tuesday, September 28, 2010

624 #10 Graphical Input Through Machine Recognition of Sketches: Herot

Comments

Chris

Introduction

In this paper Herot discusses the HUNCH system, a hierarchy of inference programs for sketch recognition. HUNCH works by taking input data and running several layers of inference programs on top of that data, from basic line and curve recognizers, to line latching, overtracing and finally high level inferences such as 3d object inference or floor plan recognition. HUNCH encountered many problems with specific modules, such as the curve recognizer CURVIT and the line recognizer with endpoint latching STRAIT, however due to is modular design some of these inferences could be thrown out and higher level inferences could still be made to a certain extent.

Discussion

The most important idea from this paper is the hierarchy of recognizers that provide a chain of recognition and alternate interpretations of the same starting data. If a system is going to be interactive, multiple interpretations must be calculated while the user is entering a sketch so that the user is given options for correcting the interpretation if necessary. This also allows the programmer to add new recognition contexts to the system simply by adding a module that understands that context. In many ways this is what we are doing with Paleo for our truss recognition project.

624 #9 PaleoSketch: Paulson

Comments

Chris

Introduction

PaleoSketch is a primitive shape recognizer and beautifier. It supports recognition of several basic geometric types: Lines, Polylines, Cirles, Ellipses, Arcs, Curves, Spirals and Helixes. The recognizer takes input strokes and gives several possible interpretations depending on how ambiguous the stroke is. Paleo uses a few new features for recognition, mostly to distinguish between Polylines and Curves. These features are normalized distance between direction extremes (NDDE) and direction change ratio (DCR), both of which are used to find spikes in the direction graph that would differentiate between a Polyline and Curve.

Discussion

Paleo is very accurate at classification (99.89%) although it does not always present the correct interpretation as the primary interpretation. Still an overall recognition rate of 98.56% is very impressive and shows that Paleo can form a very solid basis for higher level recognition systems. I find this system to be a lot more useful for actual sketch recognition than most of the previous papers we have read. I understand that most of those are gesture recognizers, and that is all very well, but Paleo is much more relevant for complete sketch recognition.

Thursday, September 16, 2010

624 #8 $N Multistroke Recognizer - Anthony

Introduction

$N is a recognizer based on wobbrock's $1 recognizer, extending it and giving it several new abilities. The most important extension is that $N is now a multistroke recognizer. This is achieved by connecting the endpoints of the multiple strokes together to form a unistroke and then interpolating all the different possible ways the multiple strokes could be connected. This in essence treats the user's whole gesture, both when their pen is drawing on the screen and when it is in the air, as a unistroke. The second improvement is that $N introduces bounded rotation invariance so that it can distinguish between gestures that have been rotated if necessary. The third change is that $N can now recognize the difference between 1D and 2D gestures. Finally there are a few optimizations that are included, the first is using the start angle of the gesture to constrain the search space and the second is by using the number of strokes to to further restrict the search space.

Discussion

$N adds several useful changes to $1 that makes $1 more flexible without losing very much of the simplicity of $1. What i like the most is the way in which they decided to bring multistroke gestures down to a single stroke in order to use the existing methods of $1 to analyze them.

624 #7 Sketch Based Interfaces - Sezgin

Introduction

The Segzin Stahovich Davis system is meant to take sketch data and clean it up and recognize basic shapes within a drawing. There are several steps to this process. The first is vertex detection which requires filtering out noise to find vertexes on straight edge objects. The second is detecting curves and drawing/approximating them with bezier curves. Finally the figures are beautified and then basic object recognition.

Discussion

This paper gives a fair amount of description about the vertex and curve detection processes, however it does not give very many specifics on beautification or recognition or for that matter what sort of recognition applications this system would be used for. I can only assume it was an early paper that is further expounded upon later, or that most of the other concepts were explored sufficiently in the related work.

624 #6 Protractor - Lee

Introduction

Protractor is a modified version of $1 recognizer by Wobbrock that is mean to decrease the memory and processing requirements for unistroke gestures so that gesture recognition is feasible on systems with less capable hardware, such as mobile devices like Android phones. Li introduces changes to the classification algorithm that measures the total angular distance between pre-processed templates and a new un classified gesture. This angle measure is enhanced by calculating the optimal angle to rotate the template by so that it best matches the unknown gesture.

Additionally protractor employed a simple method of orientation variance by breaking the orientation up into 8 cardinal directions.

Discussion

Protractor is a practical implementation of gesture recognition for mobile phones. I think that mobile applications are the most obvious use for simple gestures and having an efficient and simple gesture recognition system makes including gestures into mobile applications much easier for developers.

624 #5 $1 Recognizer - Wobbrock

Introduction

Wobbrock's $1 recognizer provides a lightweight, quick and easy gesture recognizer that can be used without requiring much technical knowledge on the developer's end. The recognizer works by taking one initial gesture for each template then comparing the distance in points in an unclassified gesture with each template and determining which matches most closely. To make certain that factors such as gesture speed and sampling speed do not affect recognition the gestures are re-sampled to between 32-256 points that are linearly interpolated from the original gesture data. The indicative angle of the gesture (the angle between the first point and the centroid) is then found and the gesture is rotated by that angle so that all examples of that gesture no matter their orientation can be compared. The points are then scaled to fit a bounding square and the distance between each point is then calculated and turned into a 0-1 score. $1 is therefore rotation, scale and position invariant.

Discussion

I very much like the idea of the $1 because it would be easy to implement in any programming context quickly. Too often advanced computing concepts, such as gesture recognition, require specific libraries and are limited to specific platforms which limits their usefulness and hampers finding new applications of the technology. I hope to implement the $N extension of the $1 recognizer so I can see exactly what it is capable, particularly on a handheld system to accept finger input.

Thursday, September 9, 2010

The Star

"The purpose of computer metaphors, in general, and particularly of graphical or icon-oriented ones, is to let people use recognition rather than recall. People are good at recognition, but tend to be poor at recall."

"... we chose a two-button mouse because, in testing, we found that users demonstrated lower error rates, shorter learning times, and less confusion than when they used either one-button or three-button mice."

- David E. Liddle

Wednesday, September 8, 2010

624 #4 Sketchpad: Ivan E. Sutherland

Introduction
In this paper Sutherland introduces his Sketchpad system, one of the earliest pen based computing devices. The main feature of Sketchpad is a system for drawing and defining geometric shapes. Users can draw shapes, duplicate them and place geometric and mathematical constraints onto the subcomponents of the shape. The geometries are stored in a well defined format which can then be used in other programs to perform simulations and other manipulation.

In essence Sutherland created the precursors to our modern 3d CAD systems. The pen would ultimately be replaced with the mouse but otherwise it is very similar.

Discussion

Sutherland's system is very a very rigid way to draw, but the constraints create exact geometric shapes. In some sense what we are trying to achieve now is to get the Sutherland's resulting exact geometric shapes without the rigid input. We want to take an inexact drawing and get the computer to interpret it into the precise shape that we intended. So far it has been hard to achieve this without a very constrained system, large amounts of exact numerical input or a user who is an expert in the CAD system being used. In sketch system terms, Sutherland's Sketchpad is better for idea stabilization than creative change. Some combination of Sutherland's geometric system and a more natural input could be ideal to balance the creative change and idea stabilization.

Perhaps the most important thing to take away from this paper comes in the Sutherland's conclusions:

"It is only worthwhile to make drawings on the computer if you get something more out of the drawing than drawing. "

Tuesday, September 7, 2010

624 #3 Gesture Design Advice: Chris Long

Introduction

In this paper the authors put forward quill, a tool for designing gestures for use in pen based applications. quill lets the designer build gestures, then advises the designer (user) as to whether the gestures they have created are easy to distinguish, both for the gesture recognition algorithm and for the humans who will eventually use the software.

Discussion

The most interesting idea not in this paper is the algorithm for determining which gestures might be confused by users. quill itself seems to be a solution looking for a problem, as I cannot think of an occasion where a designer would be designing gestures without being expert enough to check the distinguishability of gestures on their own. Also the question of when to alert users to errors seems to me to be a well covered problem that doesn't really need to be addressed in this paper.

Monday, September 6, 2010

624 #2 Specifying Gestures By Examples: Dean Rubine

Rubine specifies a technique for gesture recognition using 13 features of the gesture. Using these 13 features with a linear classifier it can generally recognize a gesture to 90% accuracy with 15 pre-defined examples of each class. These gestures must have a known start and stop point to be able to be classified, so only individual parts of a sketch can be used.

It will be interesting to implement Rubine's but what will be more interesting is to use some of the extensions discussed at the end of the paper. The most interesting will be implementing this with multitouch to create my own multifinger gesture recognition.

Thursday, September 2, 2010

624 #1 Gesture Recognition: Tracy Hammond

This paper is an overview of a few techniques for basic pen gesture techniques. Note that these techniques are generally only good for recognizing a single stroke gesture and nothing more complex than that.

The first technique introduced is Rubines features. These features can distinguish gestures classes by using around 15 training examples. Rubines technique uses linear classifiers to compare a new gesture to all known classes and can distinguish between similar objects at different orientations. Long's feature set can be added to Rubines, but they do not add much additional capability at the cost of additional complexity and so they are not often used.

The second technique addressed is that of Wobbrock, the "$1 Gesture Recognizer". Wobbrock's method is simpler than Rubine's, however it is slower and unable to differentiate gestures that have been rotated or stretched.

Tuesday, August 31, 2010

624 #0 Self Introduction

Email: eyce9000 at gmail dot com

Standing: 1st Year MSCS

Taking this class: Because sketch recognition/machine learning are things I want to use in projects.

Experience: Not much directly useful to SR, but I am a sharp cookie (mmm cookies)

Doing in 10 years: Living somewhere by the sea, making awesome gizmos

Next Tech Advance: Memresistors!

Favorite Undergrad Course: Figure Drawing.

Favorite Movie: Kung Fu Panda, mostly because it has great character animation and because Jack Black is awesome.

Time Traveler: I would travel back to meet Steve Jobs and Wozniak right when they started Apple. Then I would go meet Bill Gates when he started Microsoft.

Interesting Fact: I lived in Italy when I was little, and got engaged this summer!