Wizard of Oz experiment

From Wikipedia, the free encyclopedia

In the field of human-computer interaction, a Wizard of Oz experiment is a research experiment in which subjects interact with a computer system that subjects believe to be autonomous, but which is actually being operated or partially operated by an unseen human being.

The name of the experiment comes from The Wonderful Wizard of Oz story, in which an ordinary man hides behind a curtain and pretends, through the use of "amplifying" technology, to be a powerful wizard.

Contents

[edit] Concept

The term Wizard of Oz (originally OZ Paradigm) has come into common usage in the fields of experimental psychology, human factors, ergonomics, linguistics, and usability engineering to describe a testing or iterative design methodology wherein an experimenter (the "wizard"), in a laboratory setting, simulates the behavior of a theoretical intelligent computer application (often by going into another room and intercepting all communications between participant and system). Sometimes this is done with the participant's a-priori knowledge and sometimes it is a low-level deceit employed to manage the participant's expectations and encourage natural behaviors (though always, one would hope, with appropriate disclosure during the debriefing part of the experiments).

For example, a test participant may think he or she is communicating with a computer using a speech interface, when the participant's words are actually being secretly entered into the computer by a person in another room (the "wizard") and processed as a text stream, rather than as an audio stream. The missing system functionality that the wizard provides may be implemented in later versions of the system (or may even be speculative capabilities that current-day systems do not have), but its precise details are generally considered irrelevant to the study. In testing situations, the goal of such experiments may be to observe the use and effectiveness of a proposed user interface by the test participants, rather than to measure the quality of an entire system.

[edit] Etymology

John F. ("Jeff") Kelley coined the terms "Wizard of OZ" and "OZ Paradigm" for this purpose circa 1980 to describe the method he developed during his dissertation work at The Johns Hopkins University (his dissertation advisor was Professor Alphonse Chapanis, the "Godfather of Human Factors and Engineering Psychology"). Amusingly enough, in addition to some one-way mirrors and such, there literally was a curtain separating Jeff, as the "Wizard", from view by the participant during the study.

An unpublished fact about the etymology of the term in this context: Dr. Kelley did originally have a definition for the "OZ" acronym (aside from the obvious parallels with the 1900 book The Wonderful Wizard of Oz by L. Frank Baum). "Offline Zero" was a reference to the fact that an experimenter (the "Wizard") was interpreting the users' inputs in real time during the simulation phase. Colleagues considered this acronym to be contrived and he eventually dropped it.

[edit] Significance

The Wizard of OZ method (unlike the eponymous "wizard" in the book) is very powerful. In its original application, Dr. Kelley was able to create a simple keyboard-input natural language recognition system that far exceeded the recognition rates of any of the far more complex systems of the day.

The thinking current among many computer scientists and linguists at the time was that, in order for a computer to be able to "understand" natural language enough to be able to assist in useful tasks, the software would have to be attached to a formidable "dictionary" having a large number of categories for each word. The categories would enable a very complex parsing algorithm to unravel the ambiguities inherent in naturally produced language. The daunting task of creating such a dictionary led many to believe that computers simply would never truly "understand" language until they could be "raised" and "experience life" as humans, since humans seem to apply a life's worth of experiences to the interpretation of language.

The key enabling factor for the first use of the OZ method was that the system was designed to work in a single context (calendar-keeping), which constrained the complexity of language encountered from users to the extent where a simple language processing model was sufficient to meet the goals of the application. The processing model was a two-pass keyword/keyphrase matching approach, based loosely on the algorithms employed in Weizenbaum's famous Eliza program. By inducing participants to generate language samples in the context of solving an actual task (using a computer that they believed actually understood what they were typing), the variety and complexity of the lexical structures gathered was greatly reduced and simple keyword matching algorithms could be developed to address the actual language collected.

This first use of OZ was in the context of an Iterative design approach. In the early development sessions, the experimenter simulated the system in toto, performing all the database queries and composing all the responses to the participants by hand. As the process matured, the experimenter was able to replace human interventions, piece by piece, with newly-created developed code (which, at each phase, was designed to accurately process all the inputs that were generated in preceding steps). By the end of the process, the experimenter was able to observe the sessions in a "hands-off" mode (and measure the recognition rates of the completed program).

OZ was important because it addressed the obvious criticism:

Who can afford to use an iterative method to build a separate natural language system (dictionaries, syntax) for each new context? Wouldn't you be forever adding new structures and algorithms to handle each new batch of inputs?

The answer turned out to be:

By using an empirical approach like OZ, anyone can afford to do this; Dr. Kelley's dictionary and syntax growth reached asymptote (achieving from 86% to 97% recognition rates, depending on the measurements employed) after only 16 experimental trials and the resulting program, with dictionaries, was less than 300k of code.

In the 23 years that followed initial publication, the OZ method has been employed in a wide variety of settings, notably in the prototyping and usability testing of proposed user interface designs in advance of having actual application software in place.

[edit] References

Here are some of the original (and subsequent) references on the subject (the method has been picked up in many research domains, and there are numerous subsequent references, only a few of which are listed here).

Summary of the technical aspects of the work:

Kelley, J.F., "CAL - A Natural Language program developed with the OZ Paradigm: Implications for Supercomputing Systems". First International Conference on Supercomputing Systems (St. Petersburg, Florida, 16-20 December 1985), New York: ACM, pp. 238-248

Brief description of the method:

Kelley, J.F., "An empirical methodology for writing user-friendly natural language computer applications". Proceedings of ACM SIG-CHI '83 Human Factors in Computing systems (Boston, 12-15 December 1983), New York: ACM, pp. 193-196. [1]

The best description of the method:

Kelley, J.F., "An iterative design methodology for user-friendly natural language office information applications". ACM Transactions on Office Information Systems, March 1984, 2:1, pp. 26-41. [2]

The unpublished dissertation itself:

Kelley, J.F., "Natural Language and computers: Six empirical steps for writing an easy-to-use computer application". Unpublished doctoral dissertation, The Johns Hopkins University, 1983. (Item 8321592 can be obtained from University Microfilms International; 300 North Zeeb Road; Ann Arbor, Michigan 48106.)

Subsequent References and implementations (a sampling of 20+ years of citations):

Akers, D. 2006. Wizard of Oz for participatory design: inventing a gestural interface for 3D selection of neural pathway estimates. In CHI '06 Extended Abstracts on Human Factors in Computing Systems (Montréal, Québec, Canada, April 22 - 27, 2006). CHI '06. ACM Press, New York, NY, 454-459. [3]

Höysniemi, J., Hämäläinen, P., and Turkki, L. 2004. Wizard of Oz prototyping of computer vision based action games for children. In Proceeding of the 2004 Conference on interaction Design and Children: Building A Community (Maryland, June 01 - 03, 2004). IDC '04. ACM Press, New York, NY, 27-34. [4]

Molin, L. 2004. Wizard-of-Oz prototyping for co-operative interaction design of graphical user interfaces. In Proceedings of the Third Nordic Conference on Human-Computer interaction (Tampere, Finland, October 23 - 27, 2004). NordiCHI '04, vol. 82. ACM Press, New York, NY, 425-428. [5]

Lai, J. and Yankelovich, N. 2003. Conversational speech interfaces. In the Human-Computer interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, J. A. Jacko and A. Sears, Eds. Human Factors And Ergonomics. Lawrence Erlbaum Associates, Mahwah, NJ, 698-713.

Gleicher, M. L., Heck, R. M., and Wallick, M. N. 2002. A framework for virtual videography. In Proceedings of the 2nd international Symposium on Smart Graphics (Hawthorne, New York, June 11 - 13, 2002). SMARTGRAPH '02, vol. 24. ACM Press, New York, NY, 9-16. [6]

Klemmer, S. R., Sinha, A. K., Chen, J., Landay, J. A., Aboobaker, N., and Wang, A. 2000. Suede: a Wizard of Oz prototyping tool for speech user interfaces. In Proceedings of the 13th Annual ACM Symposium on User interface Software and Technology (San Diego, California, United States, November 06 - 08, 2000). UIST '00. ACM Press, New York, NY, 1-10. [7]

Hewett, Thomas T. (et al), "Curricula for Human-Computer Interaction", ACM SIGCHI, 1992, 1996, Chapter 2. [8]

Piernot, P. P., Felciano, R. M., Stancel, R., Marsh, J., and Yvon, M. 1995. Designing the PenPal: blending hardware and software in a user-interface for children. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Denver, Colorado, United States, May 07 - 11, 1995). I. R. Katz, R. Mack, L. Marks, M. B. Rosson, and J. Nielsen, Eds. Conference on Human Factors in Computing Systems. ACM Press/Addison-Wesley Publishing Co., New York, NY, 511-518. [9]

Prager, J. M., Lamberti, D. M., Gardner, D. L., and Balzac, S. R. 1990. REASON: an intelligent user assistant for interactive environments. IBM Syst. J. 29, 1 (Jan. 1990), 141-164.

Dahlbäck, N. and Jönsson, A. 1989. Empirical studies of discourse representations for natural language interfaces. In Proceedings of the Fourth Conference on European Chapter of the Association For Computational Linguistics (Manchester, England, April 10 - 12, 1989). European Chapter Meeting of the ACL. Association for Computational Linguistics, Morristown, NJ, 291-298. [10]

Carroll, J. and Aaronson, A. 1988. Learning by doing with simulated intelligent help. Commun. ACM 31, 9 (Aug. 1988), 1064-1079. [11]

Gould, J. D. and Lewis, C. 1985. Designing for usability: key principles and what designers think. Commun. ACM 28, 3 (Mar. 1985), 300-311. [12]

Embley, D. W. and Kimbrell, R. E. 1985. A scheme-driven natural language query translator. In Proceedings of the 1985 ACM Thirteenth Annual Conference on Computer Science (New Orleans, Louisiana, United States). CSC '85. ACM Press, New York, NY, 292-297. [13]

Good, M. D., Whiteside, J. A., Wixon, D. R., and Jones, S. J. 1984. Building a user-derived interface. Commun. ACM 27, 10 (Oct. 1984), 1032-1043. [14]