Draft of: Stampe, D. M., & Reingold, E. M. (1995). Selection by looking: A novel computer interface and its application to psychological research. In J. M. Findlay, R. Walker, & R. W. Kentridge Eds.), Eye movement research: Mechanisms, processes and applications (pp. 467-478). Amsterdam: Elsevier Science Publishers.
Selection by Looking:
A Novel Computer Interface and its
Application to Psychological Research
Dave M. Stampe and Eyal M. Reingold
Real time monitoring of a subject's gaze position on a computer display of response options may form an important element of future computer interfaces. Response by gaze position can also be a useful tool in psychological research, for example in a visual search task.
Experimental tasks reported include visual search, and typing from an alphabetic menu. A lexical decision study revealed gaze response RT to be much more powerful than button press RT and new phenomena including self-correction were observed. Methods of improving response reliability are introduced, including drift correction by dynamic recentering, gaze aggregation, and automated selection.
Eye-movements, eye-tracking, computer-interface, control, handicapped, lexical-decision, typing, disabled, visual search
Psychological research using eye movements has focused on the study of natural tasks such as reading or problem solving. Typically, gaze position is simply recorded for later analysis, and is incidental to the experimental task. One exception is the gaze-contingent display paradigm, where the displayed stimuli are changed rapidly in response to eye movements in order to present different images to the foveal and peripheral regions of the subject's visual field (e.g. McConkie and Rayner, 1976). In this paper, we explore the active use of gaze in performance of an experimental task: selection by looking. We investigate implementation and psychophysical issues, and demonstrate tasks which illustrate the unique potential of this paradigm.
In a typical task, the subject's gaze position is monitored while viewing displays such as those in Figure 1. The subject registers responses or commands by directing his gaze to targets on the screen and holding his gaze until the response is registered. The gaze-response system's computer processes the eye tracker output in real time to compute gaze position on the screen and detect response events, then modifies the image displayed to the subject in response to the gaze input.
The use of control by eye-movement as an aid to handicapped persons is not new (e.g. Laefsky and Roemer, 1978), but such systems have usually been developed with little research into psychophysical or cognitive factors.
------------------------------------------- In final print, Figure 1 appears here. -------------------------------------------
Recently, eye movements have been proposed as computer interface devices for normal users (e.g. Jacob, 1991), but despite high expectations, the advantages and limitations of this interface modality have yet to be determined. There are many tasks in which gaze control would be advantageous, for example as an input device when hands are otherwise occupied (Charlier, Sourdille, Behague, and Buquet, 1991).
In this paper, we will discuss some of the advantages of gaze position response, then summarize the practical aspects of implementing gaze response systems. Dynamic recentering will be introduced as a technique to correct for eye tracker drift to prevent response errors. Results from several experimental tasks will be presented that explore implementation issues, demonstrate the methods discussed, and compare response by gaze to more traditional button press responses.
Advantages of Gaze Response in Research
Gaze response is well suited to typical eye movement research tasks such as visual search within pictures or stimulus arrays. Once the search target is found all the subject need do is to continue fixating the target until the response is registered. This is a highly intuitive response method, requiring no training and resulting in fast reaction times. Targets can be selected even when embedded within pictures or dense arrays of distractors. Gaze response acts as a pointer, providing two-dimensional input similar to devices such as touch screens, light pens or computer mice. However, these physical input devices require gross motor movements, which may introduce noise into the eye movement record. Verbal responses can be used to implement complex responses paradigms, but are not practical with eye tracking systems that require bite bars or chin rests. Gaze response not only does not exhibit these problems, but actually enhances the eye movement record.
In eye movement research, gaze response can help to disambiguate the temporal relationship between eye movements and responses during task performance. Asynchrony between eye movements and manual or vocal responses is often seen, as eye movements are free to continue with task execution after the motor program for the response is initiated. With gaze response, the temporal relationship between eye movements and responses is clear, since both cannot be performed at once. Task-related fixations (e.g. searching) and response fixations are easily disambiguated during analysis, allowing the eye movement record of sequential (multiple step) tasks to be clearly divided at the gaze responses.
The use of gaze response in eye movement research is likely to reveal new aspects of cognitive processes. It is important to compare gaze response to more classical response methods such as button presses: such a comparison will be reported in this paper for a lexical decision task. The combination of intuitive operation, spatial selection capability and potential for new findings make gaze position response a powerful tool for psychological research.
For gaze response to be practical, especially for psychological research, the implementation is critical. The method chosen for detection of gaze responses in the eye movement data must be carefully chosen to prevent unintended responses. In the following discussion we will evaluate the effects of eye tracker performance on response accuracy, and discuss methods for correcting eye tracker drift.
Detecting Gaze Responses
The most natural technique for gaze response is simply to hold gaze on the response area for a critical time period. Subjectively, the subject simply concentrates on a target until the selection response occurs. Most subjects can use this technique immediately, and need little or no practice to perform well. The duration of gaze needed to select a target, also called dwell time (Jacob, 1991), must be short enough to be comfortable for the subject, yet long enough to prevent unintentional triggering. Fixations with durations greater than 500 msec are often seen during cognitive integration phases of difficult tasks, and could be mistaken by the computer for response events. Pilot studies indicated that a dwell time of 1000 msec makes such false selections unlikely, and 700 msec or less works well for simple tasks.
System responses to gaze input must be quick, correct and predictable to encourage linking of gaze and response by the user. It is disconcerting if the target next to the intended selection is selected because the eye tracking system has miscalculated the gaze position due to eye tracker drift. Subjects become frustrated if the gaze time required to select a response is unpredictable or if selection does not occur at all due to system instabilities. Careful layout of response targets, use of high quality eye tracking systems, drift correction, and reliable response detection methods can prevent these problems.
Ideally, gaze on a response target would be detected as a single long fixation. Initial investigations indicated that gaze durations longer than 800 msec gaze are often broken by blinks or corrective saccades. Single fixations are an unreliable measure of gaze duration: it is necessary to aggregate several nearby fixations into a gaze period, for example by cluster analysis (Kundel, Nodine, and Krupinski, 1989). Fixations may also be grouped within a region surrounding each response target. Responses are registered when the sum of the duration of all fixations within the cluster or region exceeds the dwell time threshold, and the average position of gaze may be computed.
Eye Tracker Drift Correction
It is important for any eye monitoring system to have good resolution, accuracy, and stability. Almost all eye tracking systems exhibit drift over time, with computed gaze position gradually moving away from the subject's true gaze location. Severe drift can be caused by head movement, or motion of eye cameras relative to the eye, but even systems that control these may require periodic drift correction.
While a complete system recalibration will correct drift, it is much more efficient to measure the drift directly and compensate for it. This is performed by displaying a fixation target to the subject, then measuring the deviation of computed gaze from the target position. This process of recentering may be done between each trial or block of trials, and dramatically improves stability (Stampe, 1993).
A novel drift correction technique was developed which is unique to gaze position control and is invisible to the subject. Assuming that the average gaze position during target selection falls at the center of the response target, we can compute drift as the mean offset between the target and gaze position at each selection. Small variations in gaze position on targets will be averaged out over several selections. This technique dynamically performs the recentering operation to correct system drift at each gaze response event, and can be combined with normal recentering to correct for larger drifts. Drift usually accumulates slowly, and the mean error from several fixations of targets will track it closely. Sudden increases in drift will be corrected over several selections.
Dynamic recentering is implemented by a low-pass filter which tracks the drift component of target fixation error while ignoring small random differences in target fixation. A step-by step description of the dynamic recentering algorithm is:
1) Subtract the estimated drift from the uncorrected gaze position to compute the corrected gaze position. The drift estimate is initially set to zero after recentering or calibration, or is carried over from step 3 of the previous selection's correction.
2) Subtract the location of the true target center from the corrected gaze position to compute the residual fixation error.
3) Add a fraction (1/4 to 1/6) of the residual fixation error to the estimate of drift. This will reduce the error in the estimate of drift, eventually eliminating the fixation error. Random variations in target fixation will average out over time.
Three experimental tasks were performed to validate the gaze response system and to explore important aspects of the gaze control paradigm. The first task measured fixation accuracy and selection error rates, and evaluated the effectiveness of dynamic recentering in correcting drift. The second task demonstrated the efficacy of dynamic recentering in a typing by eye paradigm. The third task contrasted button press and gaze response methods in a simple lexical decision task.
General Method for Tasks
Experimental tasks were implemented using a prototype eye tracking system developed by SR Research Ltd. This system uses a headband-mounted video camera and a proprietary image processing card to measure the subject's pupil position 60 times per second. Resolution of the system is very good (0.005° or 15 seconds of arc), with extremely low noise. A second camera on the headband views LED targets attached to the display monitor to compensate for head position, correcting gaze position to within 0.5° of visual angle over ± 20° of head rotation, and allows head motion within a 100 cm cube.
Task displays were presented in black on a white background on a 21" VGA monitor located 75 cm in front of the subject, with a field of view of 30° horizontally and 24° vertically. A second VGA monitor was used by the experimenter to perform calibrations and monitor subject gaze in real time during experiments. Gaze position accuracies of better than 0.5° on all parts of the screen were routinely obtained.
Twelve subjects, five male and seven female with an average age of 25 years were run on all tasks in a single 60-minute session. Tasks were run in the same order for all subjects. All subjects had normal or corrected to normal vision, four with eyeglasses and three with contact lenses. A system calibration was performed before each task, and repeated if needed to meet a 0.5° accuracy criterion (Stampe, 1993). The experimenter monitored gaze position during trials to ensure continuing accuracy.
TASK 1: Array Search and Selection
The first task explored the effect of target layout and dynamic recentering in tasks where arrays of targets representing multiple response alternatives are used. The distance between targets affects the likelihood that an eye tracking or fixation error will cause a target adjacent to the intended one to be selected in error. One dimensional (line) and two dimensional (grid) arrays were investigated, and visual density of the target array was also manipulated.
Subjects were required to indicate a search target "T" hidden in an array of "O" distractors by gaze. The search target was highly salient, keeping trials short and minimizing non-selection errors. All characters in the search array were selectable by gaze, with selection of distractor characters counted as selection errors.
Three search array configurations were used in the task: a square grid, a horizontal line, and a vertical line. Line arrays contained 6 characters spaced by 3° or 8 characters spaced by 2° . The small character spacing was designed to increase error rates, which were known from pilot studies to be vanishingly small for target spacings of 4° or greater. Grids consisted of a square array of 6 by 6 characters spaced by 3° or 8 by 8 characters spaced by 2° . All characters were 0.6° in size.
A total of 124 trials were presented, consisting of 32 8x8 grids, 36 6x6 grids, 16 each of horizontal and vertical lines arrays spaced by 2° , and 12 each of horizontal and vertical lines arrays spaced by 3° . Complete sampling of all target position and arrays required 496 trials, which were divided between 4 subjects to reduce task length, for a total of 124 trials per subject.
Trials were presented in a random sequence in four blocks of 31 trials each, separated by recentering screens to correct eye tracking system drift. The first 8 trials for each subject were discarded as practice. Search arrays were displayed until a target was selected by a gaze duration of 1000 msec, and the screen blanked for 200 msec between trials. Gaze position was aggregated by cluster analysis with a diameter of 1.5° , with position recorded for use in analysis. Dynamic recentering was simulated during analysis.
Results and Discussion
Two error measures were computed for each selection: fixation error and selection error. The calculation of fixation error was determined by the target array type. For horizontal line arrays, the horizontal distance from the search target to the gaze location was used as the fixation error, the vertical distance for vertical line arrays, and the largest of either the horizontal or vertical distance for grid arrays. Selection errors were considered to have occurred if the fixation error exceeded half of the intertarget spacing.
Gaze position was corrected by the dynamic recentering method, allowing both corrected and uncorrected target fixation error magnitudes (target position minus gaze position) to be measured. Dynamic recentering was found to be effective with mean corrected fixation errors of 0.38° versus 0.51° before correction: t(11) = -4.72, p < .001. More importantly, dynamic recentering was highly effective in reducing the frequency of selection errors from 6.6% before correction to 2.4% after correction: t(11) = 2.71, p < 0.05. The relationship between the magnitude of fixation error and probability of selection error (gaze falling on response targets neighboring the intended target) is best illustrated by a plot of the fixation error distribution before and after dynamic recentering was applied (Figure 2).
A fixation error larger than half of the distance between response targets places the gaze position closer to one of the search distractor targets than the search target itself, producing a selection error. For example a fixation error greater than 1.5° will cause selection of an adjacent target in an array of targets spaced by 3° .
--------------------------------------- In final print, Figure 2 appears here. ---------------------------------------
Figure 2 also reveals a long tail caused by eye tracker drift, which results in significant numbers of selection errors at target spacings of up to 3° . For example, at 2° target spacing, fixation errors greater than 1° produce selection errors, and occur in 9.5% of trials before dynamic recentering, and 2.8% of trials after correction.
Three dependent measures (fixation error, selection error, and number of fixations) were analyzed after dynamic recentering was applied, using a 3 by 2 analysis of variance. Array type and target spacing were evaluated as within subject factors. Means and F values are summarized in Table 1.
The horizontal line arrays showed significantly smaller fixation errors than vertical and grid array, attributable to the largely vertical component of drift for this eye tracking system, a characteristic shared by most eye trackers we have tested. This is also show by comparing horizontal and vertical errors collapsed across array types, the mean vertical error of 0.38° was greater than mean horizontal error of 0.23° : t = -3.73, p < .01. Note that no selection errors occurred for horizontal one-dimensional target arrays at all: thus it should be possible to place response targets closer together horizontally than vertically when designing gaze response screens.
Comparisons of selection error rates of 2.8% for 2° target spacing and 1.9% for 3° target spacing were not significant (Table 1) as the rarity of selection errors inflated the variance. The means do indicate that errors decrease with large target spacings, and pilot studies had indicated that errors with target spacings of 4° or greater are very rare.
The number of fixations per trial was greater for 2° target spacing, indicating that the task became more difficult as visual density increased. This implies that well-spaced target arrays will decrease search time and improve task performance.
-------------------------------------- In final print, Table 1 appears here. --------------------------------------
TASK 2: Typing by Eye
Typing by eye is one of the most common applications of gaze-controlled aids for the handicapped. It is unfortunate that little investigation has been done into the cognitive aspects and efficiency of this paradigm, perhaps because of the emphasis on the implementation of such systems rather than on research with them.
This task was set up to evaluate the subjects' impressions and types of error made during performance. The screen layout, as shown in Figure 1, used a 7 by 4 grid of 1.2° characters spaced by 4° horizontally and vertically. The top of the screen contained a line of text to be typed and a space for display of the typed output.
To type, subjects fixated a desired letter for 750 msec, with all fixations within a 4° region centered on each target counted towards the gaze period. Dynamic recentering was applied at each selection to correct for system drift. Selection feedback was given by placing a round highlight spot on the letter for 300 msec. If the subject continued to fixate the character, it was typed repeatedly. Typing ended when a button was pressed by the subject.
Each subject performed three typing trials. The subject first typed random input (usually their name), then two test sentences of 48 and 44 characters each. Characters typed and use of the backspace function were recorded for later analysis.
Results and Discussion
In this preliminary investigation, only simple statistics and subjective impressions were collected. Subjects enjoyed the task, but found it slow compared to manual typing. The gaze selection time of 750 msec subjectively seemed limiting, but in reality selection time was only 40% of the 1870 msec average time required to type each character, with the remaining 60% of the time spent searching for the next character in the typing array.
Errors were classified by counting backspaces and examining the typed output: transcription errors included missed characters or spelling mistakes (4 instances in 1400 characters typed, 0.29%). Selection errors were scored if a spelling mistake involved a letter adjacent to the correct letter on the selection grid, and occurred 5 times out of 1400 characters typed (0.36%). Error rates compare well to the 1.3% reported for a 54-character typing screen (Spaepen and Wouteers, 1989). All selection errors involved selection of a target above or below the intended character, analogous to the vertical selection errors seen in Task 1.
It is apparent that typing by eye is much slower than manual typing, with most time spent searching for the character to be typed. With much practice, search time may be minimized and the dwell time may be reduced further. Research has shown that typing by touch screen can be as fast as 500 msec per character (25 words/minute), and by mouse at 700 msec (17 words/minute) (Andrew, 1991). Typing by eye can probably be as fast once character positions in the typing array are memorized.
TASK 3: Lexical Decision Comparison
To demonstrate the potential of selection by looking in psychological research, this task compared reaction times for gaze response to a more typical button press response method. The experimental paradigm used was a simple lexical decision task, classifying five-letter strings as words or nonwords. Both reaction time and response accuracy measures were compared for button and gaze response methods in the analysis.
-------------------------------------- In final print, Figure 3 appears here. --------------------------------------
-------------------------------------- In final print, Table 2 appears here. --------------------------------------
In the gaze response condition the subject classified the stimulus by fixating the word or nonword response areas on the screen (Figure 1). The response areas were also displayed in the button response condition to indicate word/nonword response assignments to the upper and lower buttons on a three switch button box. In both conditions the third button was used to start each trial.
Gaze and button response conditions were blocked and their order counterbalanced across subjects. The first 12 trials in each block served as practice, with 60 experimental trials following. The stimuli were selected randomly from 144 common five-letter words and 144 nonwords created by randomizing the letters of the word set. The strings were displayed in characters of 1.2° in size.
The subject initiated each trial by pressing a button. A fixation point was displayed for 300 msec, followed by the word/nonword string. The subject responded by fixating one of the response areas or by pressing a response button, depending on the block type. In the gaze response condition, selection was triggered by 850 msec of gaze within each response area. The selected response was then highlighted for 300 msec, and dynamic recentering was automatically applied at each selection to correct for system drift.
Results and Discussion
In the gaze response condition, response reaction time was measured from stimulus onset to when gaze last entered a response area: the start of the gaze period that resulted in the response. Because gaze must leave the stimulus to register a response, a second temporal measure is available: duration of gaze on the stimulus. This provides a secondary measure of processing required to make the decision. In the button response condition, the subject's gaze remained on the stimulus throughout the trial and the only meaningful measures were button-press reaction time and response accuracy.
Table 2 summarizes button and gaze reaction time measures. The overall mean RT for gaze response is not significantly different from overall button RT: F(1,11) = 1.55, p = .24. When reaction times for word and nonword stimuli are compared, the interaction between word/nonword and gaze/button conditions indicates that gaze response RT is much more sensitive to the word/nonword stimulus dimension than button RT: F(1,11) = 21.27, p < .001. Smoothed histograms of word/nonword RTs are shown in Figure 3 for gaze and button conditions. These clearly show the greater sensitivity of the gaze response paradigm to the word/nonword stimulus dimension, and display the positively skewed RT distributions typical of this task. Similarly, gaze duration on the stimulus in the gaze response condition is also shorter for words versus nonwords, t(1,11) = 6.64, p < .001, and is also more sensitive to the word/nonword manipulation than button RT: F(1,11) = 5.47, p < .05.
The proportion of correct responses in the gaze and button response conditions are summarized in Table 2. Fewer errors occur with gaze response, F(1,11) = 10.34, p < .01, as the long gaze dwell needed for target selection allowed the subject to correct their choice. Analysis of fixations preceding target selection shows that such corrections occurred in 11.5% of trials. Typical fixation time on the first response target before correction was 100 msec: such transient cognitive events can only be observed using the gaze response paradigm.
In general, the results from the experimental tasks suggest that gaze response is intuitive and reliable enough to be practical in many psychological research and computer interface applications. All subjects performed a wide variety of control tasks without need for any training, and were enthusiastic about the natural quality of selection by looking. These positive subjective impressions were supported by the speed and accuracy scores for the tasks.
Important to the success of the paradigm was the ability to precisely place gaze on response targets and to hold the gaze for long enough to trigger the response. Although natural gaze is often broken by blinks or refixations, the aggregation of gaze by cluster or region resulted in reliable selections and predictable gaze times. Subjects had no difficulty with dwell times requiring gaze periods as long as 1000 msec. This is in marked contrast to reported difficulty with dwell times over 700 msec by Jacob (1991), who used only single fixation as a measure of gaze duration.
Psychophysical limits on accuracy of gaze placement were not large enough to be a problem in response selection. The main source of selection errors appeared to be the result of occasional drifts in the eye tracking system. Such drift could be corrected by the use of dynamic recentering, which in Task 1 reduced selection errors by 64%. Selection errors were also reduced by increasing spacing between response targets: error rates were 2.8% for the 2° target spacing, 1.9% for 3° target spacing, and were 0.4% for the 4° target spacing used in the typing task. No selection errors occurred with horizontal arrays of targets even for 2° target spacing, as most drift in the eye tracking system used for the experiment was in the vertical direction.
If eye trackers with low resolution or with rapid drifting such as that caused by head movements are used, response targets must be widely separated. For example, the screen layout used in the lexical decision task used targets spaced by 11°, and would probably work reliably with most eye trackers. Dynamic recentering works best in systems where drift changes slowly over several responses. Sudden changes in drift will require several responses to repair. If the drift is large enough to cause target selection errors, correction will be toward the center of the erroneously selected response target and can result in shifting of responses in the target array. A conventional recentering using a single target will correct the offset.
The typing task was representative of gaze control computer interfaces. Tasks requiring reliable selection between many response targets, require the use of high-quality eye tracking devices with good accuracy and low drift. User comfort is important if gaze control is to be accepted by computer users, requiring headband-mounted or desktop eye trackers that do not constrain head motion. Computer control systems must be able to be set up and used by the user without assistance.
Gaze response shows great promise in psychological research. A commonly used lexical decision task was utilized to compare gaze and button response methods. The gaze response RT proved much more sensitive to the word/nonword manipulation than the button response RT. The new phenomena of rapid self-correction was revealed and the new measure of gaze time on the stimulus was made possible by use of gaze response. Investigation of the effects of the gaze response method in other research paradigms is likely to show similar advantages.
This research was supported by NSERC grant OGP0105451 to E. Reingold. We thank Elizabeth Bosman for her helpful and constructive comments on a preliminary version of this paper.
Andrew, S. (1991). Improving touchscreen keyboards: Design issues and a comparison with other devices. Interacting with Computers, 3(3), 253-269.
Charlier, J., Sourdille, P., Behague, M., & Buquet, C. (1991). Eye-Controlled Microscopes for Surgical Applications. Developments in Ophthalmology, (22), 154-158.
Jacob, R.J.K. (1991). The use of eye movements in human-computer interaction techniques: What you look at is what you get. ACM Transactions on Information Systems, 9(3), 152-169.
Kundel, H.L., Nodine, C.F., & Krupinski, E.A. (1989). Searching for lung nodules: Visual dwell indicates locations of false-positive and false-negative decisions. Investigative Radiology, 24, 472-478.
Laefsky, I.A. & Roemer, R.A. (1978). A real-time control system for CAI and prosthesis. Behavioral Research Methods & Instrumentation, 10(2), 182-185.
McConkie, G.W. & Rayner, K. (1976). Identifying the span of the effective stimulus in reading: Literature review and theories of reading. In H. Singer and R. Ruddel (Eds.), Theoretical models and processes of reading. Newark, Del.: International Reading Association.
Spaepen, A.J. & Wouters, M. (1989). Using an eye-mark recorder for alternative communication. In A.M. Tjoa, H. Reiterer, and R. Wagner (Eds.), Computers for Handicapped Persons (pp. 475-478). Vienna: R. Oldenbourg.
Stampe, D.M. (1993). Heuristic filtering and reliable calibration methods for video-based pupil-tracking systems. Behavioral Research Methods, Instruments, & Computers, 25(2), 137-142.