Eolas Technologies Incorporated v. Adobe Systems Incorporated et al

Filing 1301

Opposed MOTION for Leave to File a Brief Re The Term "Browser Application" by Adobe Systems Incorporated, Amazon.com Inc., CDW Corporation, Google Inc., J.C. Penney Corporation, Inc., Staples, Inc., The Go Daddy Group, Inc., Yahoo! Inc., YouTube, LLC. (Attachments: # 1 Exhibit 1 - Defs Brief re the Term "Browser Application", # 2 Exhibit C to Brief - p353-meyrowitz82, # 3 Exhibit D to Brief - IRIS Hypermedia_Haan92, # 4 Exhibit E to Brief - ADBE0196713 Rowe92, # 5 Exhibit F to Brief - DBE0196715 Hindus93)(Wolff, Jason) (Additional attachment(s) added on 2/1/2012: # 6 Certificate of Authorization to File Under Seal, # 7 Text of Proposed Order) (mjc, ).

Download PDF
EXHIBIT F Capturing, Structuring, Ubiquitous Audio DEBBY HINDUS, and Representing CHRIS SCHMANDT and CHRIS HORNER MIT Media Lab Although talking acqmrmg and is an integral accessing the a udLo, or the unobtrusive recognition selves not interaction by user cannot available for M derived interaction structuring stored from and of applications. mobde audio An fluent conversational captured interactions. the information capture. and article aspect the work social inherent This describes this in everyday is placed the is choosing the and themof an augmented for capturing for later an appropriate context and retrieval of representations broader for Speech words structure speech apphcations of a family within so the Instead, stored support on ubzquztous environments. and mechanisms of retrieval evolution work in the calls, computer speech, describes and telephone Important article applications, has focused interactions discussions Finally, little approach transcribe or after this has been Our of speech yet office there of conversations. acoustical interactions. representation, of collaboration, organizing from during audio of these range capture technology are part contents visual across of desktop a audio, implications. Categories and Subject Descriptors: C.3 [Computer Systems Organization]: Special-Purpose and Apphcation-Based Systems; H.3, 1 [Information Storage and Retrieval]: Content Analysis and Indexing, H.4.3 [Information Systems Applications]: Communications Apphcations; H,5, 1. [Information Interfaces and Presentation]: Multimedia Information Systems—aucho mpu t / output: H.5.2 [Information Interfaces and Presentation]: User Interfaces —mteraction styles; H.5.3 [Information Interfaces and Presentation]: Group and Organization Interfaces—asynchronous General Terms: Additional tion 1. interaction; Design, Key Words software, synchronous Human Factors and Phrases: semi-structured interaction Audio data, interactions, software collaborative telephony, stored work, speech, multimedia ubiqmtous worksta- computmg INTRODUCTION People spend study face-to-face The majority Authors’ of their of this 20 Ames Street, Permission not made work has been email: pubhcation Association and its specific ACM an supported MA 02139: date email. appear, and [1990] this and time Inc Road, Building C, Schmandt and c. Homer, MIT Media Lab, edu: horner@medla.mlt,edu. material is granted advantage, the ACM notice Yet, 1801 Page Mill geek@medla.mit of this commercial Machinery. C. Schwab’s 25–50%, Microsystems, Corporation, and 2090 of the workday, additional by Sun Research fee all or part In Reder about is given To copy otherwise, that provided copyright copying that notice the copies is by permission or to republish, requires are and the title of the a fee and/or permission. 01993 for Interval for direct for Computing talking. comprised hindus@interval.tom; Cambridge, to copy without or distributed calls accounted D. Hindus, CA 943o4; workday phone meetings addresses: Palo Alto, of the much of professionals, 1046 -8188 /93/’ ACM TransactIons 1000-0376 $01.50 on Information Systems, Vol. 11, No. 4, October 1993, Pages 376-400 Ubiquitous spent This talking has been to a great loss of speech of the audio medium the presence Furthermore, poses than information extent in influencing communication given “is speech has organization been used in a limited way, of the audio data. A number technology. especially regardless and Chapanis communicative valuable complex, controversial, and social aspects of a collaborative et al. 1991, p. 21]. Nevertheless, speech is an underutilized resource for 377 the dominance outcomes media [Oschman fulfills different and . of computer striking, communication of visual and other speech communication text out of reach is all the more Audio for the task” of 1974]. purmore [Chalfonte CSCW. Recorded in applications that require little of CSC W systems have focused on synchronous video and audio communication Egido [1990], and informal communication, for conferencing, including RAVE summarized in from EuroPARC [Gaver et al. 19921, Cruiser from Bellcore [Fish et al. 19931, CAVECAT from the University of Toronto [Mantei et al. 1991], PARC’s media spaces [Bly et al. 1993], Mermaid from NEC [ Watabe et al. 1991], Team Workstation from NTT [Ishii 1990], and COCO from Sun [Isaacs and Tang 1993]. However, the potential as a source of data to capture these collaborative interactions has been rarely exploited to date. Capturing enables hearing repeated of interesting utterances, tions with colleagues not present for the original conversations with other kinds of communications, and use them conversations sharing of and shared documents, that are already stored on computers. The Activity Information Retrieval (AIR) project at Rank Xerox illustrates how capture can provide people with conversa- discussion, and collating of such as electronic mail EuroPARC access to information about their own previously inaccessible day-to-day activities. Lamming and Newman [1992] make use of EuroPARC’s RAVE system that continually videotapes lab members, and they have made the stored video retrievable by using situational information from the “active laboratory made such as where the person was at a particular time badges” (developed by Olivetti Research, Limited) members on pen-based [Want devices of the progression from retrieval of the contents This article describes everyday work et al. 1992]) during and by using meetings. AIR timestamped is, we believe, support of synchronous interactions of interactions. various means of capturing speech environments; we call this ubiquitous (obtained worn by audio. notations an example to storage and interactions Common in work- day activities other than formal meetings provide a starting point for exploring ubiquitous audio, in terms of both user interfaces and audio processing. Ubiquitous audio can come from a number of sources and through a variety of computing refers to the eventual replacephysical input devices. Ubiquitous ment of explicit unobtrusively computing computer present approach interactions in day-to-day will eventually by specialized pursuits lead smart [ Weiser to sizable devices 199 1]. The but not that are ubiquitous unmanageable quantities of stored information. For example, assuming four hours of conversation per workday (of which 30UZ0 is silence) and 10:1 compression of telephone-quality speech, a year of office speech for one person would require approximately 2 gigabytes of storage. ACM Transactions on Information Systems, Vol. 11, No. 4, October 1993. 378 . Hindus many In Debby ways, storing et al. and retrieving communication for later review is much more demanding than merely establishing the synchronous channel. Both audio and video are time-dependent media with few sharp boundaries to exploit for indexing or classification. If it were and natural language Such processing will processing techniques not be as reliable and perhaps 1991], decades [Zue practicable, speech recognition could convert speech to text. as fast as human speech for and in any case would cause the loss of nuances carried by the audio signal but not in the words themselves. In the meantime, extending audio technology to spontaneous conversation will require automatic derivation of structure without understanding the spoken words. Malone et al. [1987] introduced the term “semi-structured messages” in their work on electronic mail. Such messages but some of the fields contain unstructured mation Lens users can fill in these fields contain write rules to route and sort received messages, attributes [Mackay et al. 1989]. We use the term respect to audio who is involved and user-supplied on the based on these Inforand can additional with semi-structured audio but the actual words in the recordings are not approach defines a framework for making these by incorporating acoustical cues, situational data, structure. Acoustical structure detection and the association of portions talker. Semi-structure aids in providing relying set of fields, information. messages recordings to indicate that some information (e.g., date, time, in the conversation, and when someone was speaking) about the recordings is known, known. The semi-structured quantities of audio usable they a known text or other when writing explicit see fit, but they creation are not of structure—users required to create be manageable and accessible. In the following sections, we describe collaboration and how these in work situations includes speech and silence of the audio signal with the correct flexible access to the data without can structure applications create structure for the audio for capturing applications derive data as to spoken structure during or after capture. We describe user interfaces for later retrieval of these stored speech interactions. Our emphasis is on choosing visual representations and mechanisms for interacting with speech segments that range from seconds-long snippets to hour-long recordings. Finally, we discuss the technological and social contexts of digital audio recording. These contexts include mobile computing variety of applications. 2. RELATED devices and the use of speech as a data type across a WORK Audio in most CSCW applications is only a medium for synchronous communication and not yet a source of data. Short pieces of recorded speech—speech snippets—are the main use of speech as data in current CSCW applications. These snippets are used in message systems, such as voice mail, and in multiuser editing systems. Speech can also be used to annotate text, a facility demonstrated in Quilt [Fish et al. 1988] and now common in commercial software. The stored speech is not itself structured in these applications; the recorded speech is treated as a single unbroken entity, and the application ACM Transactions on Information Systems, Vol. 11, No, 4, October 1993 Ubiquitous Audio maintains an external reference to the sound, such as a message position within the text. This simple approach is suitable and is not very informative for our work. The Etherphone system at Xerox PARC addressed many ing a functional ments Zellweger primary application al. et for the were interface [1988] for stored an speech, although of stored overview of the number only for aspects in or snippets of provid- annotations speech 379 . of docu- Etherphone Etherphone (see system). This sophisticated and innovative work included a moving indicator during playback, a sound-and-silence display, segmentation at phrase boundaries, editing, cut and paste of pieces of audio, markers, text annotations, an elegant storage cated system, many and encryption of these features [Ades dynamic displays of conversation port lengthy recordings and speech across a range PhoneSlave approach of applications. and HyperVoice to enrich and in our work telephone Swinehart 1986]. and extended We have it to explicitly repli- support and spontaneous capture. We also supas a data type that can be cut and pasted are examples interactions Slave [Schmandt and Arons 1985] telephone message, asking callers with of using respect a semi-structured to messages. Phone- used conversational techniques to take a a series of questions and recording the answers. These speech segments could be highly correlated with structured information about the call. For example, the response to, “At what number can you be reached?” contained the phone number. has been applied by Resnick [1992] in HyperVoice, for telephone-based bulletin boards. HyperVoice Structured data capture an application generator applications provide a speech- and touchtone-driven interface that uses the form-entry metaphor. While recording their messages, contributors to the bulletin board are asked to fill in some specific fields using appropriate mechanisms. For instance, the headline field is filled in with given by touchstones so that also supports Resnick’s Skip gating among fields a brief recording, whereas expiration dates are validity checks can be performed. HyperVoice and Skan retrieval mechanism for easily navi- and messages [Resnick and Virzi 1992]. PhoneSlave and HyperVoice demonstrate the value of even simple turing, and we have taken a similarly simple approach to structure applications that capture conversations. A contrasting in hypermedia documents. These embody considerable strucin our approach can be seen structure, which is explicitly supplied during the authoring process. Muller and Daniel [1990] implemented HyperPhone, a software environment for accessing voice documents in a conversational fashion. HyperPhone’s voice documents are text items that have been structured with links to facilitate access when spoken by a synthesizer. One reported conclusion is that the items must be short and very highly describes recognition a series connected for the user interactions HyperSpeech, a speech-only for navigation. HyperSpeech of interviews, and to be successful. Arons [1991] hypermedia system utilizing speech nodes contain recorded speech from HyperSpeech users can move between topics or between speakers. Hundreds of manually constructed links exist in this system. These examples illustrate the amount of structure needed to make quantities of audio useful. Our work on structuring emphasizes automatic ACM Transactions on Information Systems, Vol. 11, No. 4, October 1993 380 . Debby derivation et al. than rather Hindus explicit authoring. As we have extended our visual displays and interactions to lengthy recordings, we have had to add features so that applications can support multiple levels of structure. None of the above applications addresses our primary interest area, spontaneous collaboration. related recent The SoundBrowser work. A portable project prototype for at Apple is the most closely spontaneous capturing user- structured handheld audio has been developed by Degen et al [1992]. They modified tape recorder so that users could mark interesting portions recordings that these of meetings annotations recorded. The a of or demarcate items in personal memos. A key point is could be made in real time, as the sound was being SoundBrowser itself is a Macintosh application for reviewing the stored audio, and it supports innovative visual representations and user interactions, including zooming and scanning operations during playback of the recordings. Although our visual display of recorded audio is quite different from the SoundBrowser’s, we have incorporated into our display bookmark-style annotations, zooming, and scanning. We, too, recognized importance of retrospective marking and invented an interactive display that continually shows the conversation’s recent past. 3. THE “HOLY GRAIL’”: AUTOMATIC TRANSCRIPTION the dynamic OF FORMAL MEETINGS Conspicuously absent from our discussion so far is the notion of capturing the spoken contents of formal group meetings without human transcription. Given the importance of meetings, this is an obvious CSCW application, as indicated by the body of work on electronic meeting systems [Dennis et al. 1988; Mantei 1988]. Due to technological issues, however, it is very difficult to automatically structure recordings of meetings. One issue is the association of each utterance with a participant. The optimal solution is to record each person’s speech on a separate audio channel, but it is quite difficult to get each attendee’s speech to be transmitted by only one microphone. In fact, high-quality recordings of meetings are problematic in general, due to background noise, room acoustics, and poor microphone placement with respect to some meeting participants. Using one or more wide-area microphones (such as boundary zone microphones often used for teleconferences) allows more flexibility in seating but compromises audio quality. Highly directional microphones can eliminate some background noise and ambient room noise, but they require careful placement and restrict the mobility of meeting participants. The recording may be intelligible. However, the added noise and variable speech amplitude interfere with further digital signal processing, particularly speech recognition, which is quite sensitive to microphone type and placement. Transcription of the spoken words is the other issue. Speech recognition of fluent, unconstrained natural language is nowhere near ready yet, even with ideal acoustic conditions. Keyword spotting, a less ambitious approach that could produce partial transcripts, is very difficult when applied to spontaneous speech, especially speech from multiple talkers [Soclof and Zue 1990]. ACM Transactmns on Information Systems, Vol 11, No. 4, October 1993 Ubiquitous However, word-spotting indexing and need techniques be perfect not editing graphical color through system to be useful; display luminance have that This ubiquitous informal section the audio short-term incorporated Bush 199 1], Intelligent the Ear into and of the 381 an keyword [Schmandt confidence audio spotting 1981] word used a recognition This one in such and tool, is the audio notes, application memory DISCUSSIONS semi-structured personal xcapture, captures. auditory OFFICE and meetings, describes it the indicate AND RETRIEVING explored support to been and . levels. 4. CAPTURING We have [Wilcox Audio and a digital office, with in of tools conversations. issues tape no a variety telephone in loop inherent structuring that provides structure to the recording. 4.1 Capturing When Office multiple may suggest down but perfect, cannot to the flow provides with xcapture minutes are 4,2 and could the next office is the user xcapture of a by Many but into retrieval versions task. is not the was making the sound but because in microphone, for “That Xcapture speech. microphone lengths be recorded, in early one to write in the recorded a background and the says, forgotten. memory; memory in project, strives author been structure ambient buffer second already conversation, a speaker records the writing the other short-term audio Whenever typical recording, stores recent in practice in use circular scenario the of micro- by another buffer. just difficulties use workstations appli- Five to described; make that from an 15 longer impractical, section. audio data to xcapture the data data application needs bar in the time-based the This as our audio. on the until work, Time left replay offset with fresh server displays recording clicks window is in on its in the sound plays mouse can to jump to the new sound, allowing the ACM Transactions icon, and be that on Information the stream of icon which causes buffer user marks. bar to move is, the used interface When the flows this mouse a using SoundViewer, a cursor used random least another animated audio 1. The location; into right. halts entire as tick The to When a small manipulation horizontally server the progress. the Figure a direct audio fills, data. animated displays illustrated provides buffer temporarily xcapture that is displayed SoundViewer, the audio user data When replaced recording, appear. a SoundViewer from the audio buffer. and as a reminder to extensively clicks discarded widget, window digital circular to record, records window receives its During Xcapture recorded in is xcapture. of a moving cause recent which Retrieving Office Discussions While new of resources used. cation, discussions the time have sort short-term equipped is rarely as discussed the use of any additional recording now phone By words this of on a collaborative to a paragraph, the exactly no direct workstation are again,” supply Xcapture are working remember. it remembers made authors a new wording say meant Discussions into cursor provides to user the and a access. Systems, Vol. 11, No 4, October 1993. 382 . Debby Hindus et al. m m Save Fig. 1. During within Viewer it the indicated the back slider in to up but playback is speeds. not is periods and perception 4.3 of time works have Our ture as review point times sound boundaries. (For increase of the recorded twice done at at normal even higher understandable which heard samples are in some cartoon playback during and see Arons a collaborative is a better complete pitch discarding a survey of the techniques and [1992a].) memory utterance a conversation; even searching forgotten is that to turn them on, ten are a discussion It random just microphones during dialog. with through our aid or is less audio access minutes and xcaptu of audio is powered, battery defeating to successful and we re’s continuous recording. experience utility of rate the replaying to than it remains effect finding a consequential A minor at the techniques, background the well to time-scaling tedious. by compression, replay used chunks be an by user faster the pitch, with and it. xcapture can sound of Xcapture immediately when the improved by smoothing Discussion Xcapture is to record so that increasing raises Discarding quality audio Comprehension speech the even xcapture recorded is replayed Sound- of recording, Therefore, material speech original contents the it familiar Simply the speed. when Time-compressing voices. approach; the interesting The lengthy required normal this the allows times through inadequate; characters’ was reduced straightforward. played than to a above. SoundViewer scanning clue for through time is significantly speed, no described less to three speech recording. gave scanning discussion to find longer) onerous the under playback it is support speech A but mechanism SoundViewer wants (or long Retrieval random-access ~ user xcapture time, 1.2 version of Xcapture after anoffice a five-minute represented. the Anearly retrieval, segment Speed; with xcapture of spontaneous implicit in led to several recordings. a conversation can One be research direction exploited directions considers as part to improve how of the the capture strucand process. A second direction explores improvements to visual representations of stored speech and to audio-related interaction mechanisms. These directions are described in the following two sections, respectively. retrieval ACM Transactions on Information Systems, Vol 11, No. 4, October 1993 Ubiquitous Audio 5. 383 . DYNAMIC CAPTURE AND DISPLAY OF TELEPHONE CONVERSATIONS This part of our structure, work addresses interaction at the the time derivation of inherent of capture, and the conversational visual presentation of audio during conversations. The inherent structure of a conversation is defined by the naturally occurring pauses at the end of phrases and sentences and by the alternation of utterances audio; workstations, can very the relevant are a practical little and talker be separated. average speakers (this equipment of telephone choice for is because calls are in a study also calls beyond as the semi- audio-capable the two audio typically of professional is demonstrating required is possible detection Telephone of 3–6 minutes portions alternation segmented into tool that allows ). users to identify and save conversation progresses. Telephone conversations structured between The audio data can be automatically pieces by the Listener, a telephone listening turntaking called understandable briefi work channels calls activities lasted an [Reder and Schwab 1990]. Studies of audio interactions and telephone calls informed the design of the Listener’s segmentation strategy. Beattie and Barnard [1979] focused on turntaking during found turntaking that inquiries to British pauses directory averaged turns were accomplished conversational parameters, within such 0.2 seconds. as turntaking, as summarized [1987]. The by Rutter and of itself is insufficient Turntaking pauses length, happen ing and the 5.1 The will pauses conversational Capturing Listener following not A number pausing, studies suggest a conversation, be distinguishable phrases although from of one They 34% of of studies quantify and interruptions, that pausing in for three reasons: pauses by their other will be attributable to turntaking, pausing. The Listener therefore between operators. long, and many turns uses both turntak- speaker’s utterance to derive structure. and Displaying captures scenario. notification above for structuring not all pauses with minimal assistance 0.5 seconds structure You window Conversational from receive on your Structure telephone a telephone screen. You calls, call. choose as described The Listener to record in pops the call. the up While a you are talking, a graphical representation of the conversation is constructed your screen, showing the shifts in who is speaking and the relative length on of each turn. You can click on a segment to indicate that it should be saved. At the end of the phone call, you can listen to segments and decide which ones to save, or just save all the marked Two microphones collect audio to the telephone microphone sits person’s speech phone). This distinguish handset and in the user’s (assuming second, between segments. signals for the carries office that the single-talker the two ACM talkers. Transactions speech near Listener. from both the telephone handset is used One is connected talkers. and rather than audio stream enables the The Listener receives audio on Information Systems, The carries just other that a speakerListener data Vol. 11, No. 4, October to from 1993. 384 . Debby Hindus et al both microphones, performs pause detection on each source, synchronizes the sources, and then locates changes of talker between pauses. This last step is needed because turntaking pauses can be undetectable with just pause detection. The new segment is then added to the call display. The call display that appears during the conversation must so as not straints to interfere imply and interesting takes that with only the the recent Also, portions of the 2 shows conversation call proceeds, 30 seconds reflecting the display a visual is displayed the can be identified during memory conversation to the left, relative Segments sation visual conversational segments marking or por- distinguished when to mark and allows segments highlighted, requires even fewer matters, and not on interacting with user actions are clicking application functions. is shown, segment, audio each one second of audio. New segments scroll out of view colors for users segments to substantial marking interaction As the the previous turn Each can be visually border by their unmarked. segments users of the conver- those by clicking and marking user when automatic is focused on the seg- on them. is reversible. interactions; attention the Listener the pointer to identify at any time the user can toggle that all new segments are marked. During the phone call, the user’s automatic the user conversation. may merit later rehearing. The Listener’s structure reinforces the user’s memory of individual are visually mechanism turns talkers. segment Structure mechanisms A user can mark Marked sation provides the within a SoundViewer, the same our applications. In this picture, each talker that are interesting and display of conversational significant ments. from after conversational of the and by different User-Supplied The Listener of the each SoundViewer represents at the right-hand side, and older positioning 5.2 Adding shortly con- are salient, of approximately Each utterances tion of the audio signal, is displayed representation that is used throughout tick mark within segments appear part representation retrospectively. phrase-level only be unobtrusive, short-term place. Figure segments conversation. Another the convermarking so conversation program. Therefore, the only feasible on segments of interest or on the toggle. Once the conversation is completed, the nature of changes from capture to review. The postcall Browser displays the entire conversation and provides A user can replay all or part of the conversation, additional editing revise the choice of segments to store, and save the segments for later retrieval. Users can also provide a descriptive tag for each conversation, although tags are not required. Once these postcall revisions are made, only the marked segments are saved. Marked segments will typically occur in consecutive groups, and when the conversation is retrieved in the future these groups are visually distinct, as shown in Figure 3. 5.3 Retrieving Stored conversations Situational ACM Stored TransactIons and Telephone may be retrieved supplemental on Information Conversations Systems, long structure Vol after can the provide 11, No 4, October 1993 phone call memory took cues place. to the 1 I D: Hello, this is Debby Hindus speaking. B: Hi Deb, it’s Bob. I’m just getting D: Well, B: OK, I think it’ll take me about out of work, I figured another hour, hour I’m just going to head on home, I’ll probably if you think Well, OK. By the way, somebody, B: mentioned in your B: Yeah, uh... an article you might tutorial. it’s up the things to finish I’m doing now. do a little shopping on the way. of it, maybe you could get some of that good ice Cream that you got last week D: B: B: I’d call and see how late you’re going to stay tonight. and a half, Debby: by Graeme Oh really? Hirst, Fig, 2, be able to use in the Sequence [Debby’s very short turn June ’91 Computational of segments during is ignored. ] Linguistics. a phone call, with transcriptions 386 Debby . H]ndus et al. g ]mlm~ Fig, 3. Browsing content of the stored collects and data includes and date which of the segments There are speech other cannot information; party’s and the with name phone three groups be searched for and phone number structure, two is stored kinds of chat with in a file retrieval: The Listener calls, situational of the user. The user’s of the marked along text. if known; The representation data, like segments, number, of supplemental audio of saved telephone to save is one form corresponding supplemental conversation situational call; tags are another. the audio; stores the a stored structure, segments situational, and referred one is finding the time choice and and indices conversational, of textual into and to as a chat. the desired audio segments within a chat, and the other is locating a particular chat from among numerous stored chats. Our work has been narrowly focused on capturing and retrieving segments within a single conversation. Future efforts will need to address mechanisms for navigating among many chats, such as making use of situational data to locate a chat in a fashion akin to locating an electronic mail message. 5.4 Discussion of the Listener We have used the applications ourselves enough to be confident that the underlying concepts are viable and worthy of additional research. We have, for example, used the Listener while collaborating long-distance on papers xcapture recordings of impromptu office discussions to other and mailed group members. Although the Listener’s day-to-day usage was limited to one of the authors for technical reasons, that author experienced consistent success in marking segments of interest or adjoining segments. Furthermore, ACM Transactmns on Information Systems, Vol 11, No 4, October 1993 Ubiquitous the minimal interactions in conversations, by casual interactive the Listener with was engaging static highlights involving into such distinguishable display several her and was aspects conversation. and One participation comprehended considerably of building less real-time continuing segment length of two Additionally, problem by the aspect aspect seconds, As Listener is how well is how determination segments or sequentially. calculated between utterances. Another significant Another Final as a minimum individually are imperfect. segments. segments. played boundaries interfere display Browser’s is awkward signal constraints, when noticeably 387 . is consistent high-quality audio from microphones in offices. The from telephones is good, but using two microphones to segment on talker audio not dynamic The applications how to obtain audio quality the the observers. understandable. Experience with based did and Audio in so that the chosen that ensure should shown to divide must sound reflect visually complete Figure fall they visual 4, segment the in pauses presentation works during and after the conversational the conversation. The Listener’s call display does represent structure of speech, and it worked well as a dynamic repre- sentation the conversation. tation and during for later browsing. innovation when with informed It was less successful Clearly, respect there to representation by the cognitive as a static are opportunities science and perspective. represen- for experimentation interaction, particularly For example, interacting with a computer program while engaged in conversation raises issues of task and memory workload, and use of attentional resources. Finally, privacy issues received only minimal attention in this prototype implementation. that record As we discuss conversation need toward outside 6. PRESENTING AND INTERACTING accommodate material for with speech working lengthy of a small with research that well, application-specific of this privacy the however, and limitations by using SoundViewers arranging them the applications concerns before they Listener SoundViewer to be too simple structure. avoided for each segment conversational RECORDINGS original and it proved or user-supplied snippets, article, group. WITH LENGTHY xcapture recordings to represent end to accommodate can be employed We saw when the did It worked the In well SoundViewer’s of the conversation structure. not for audio this section, and we describe enhancements we made to the SoundViewer that enable it to directly support segmentation, multiple levels of structure, and presentation of lengthy recordings. These enhancements include the display of segmentation, scaling and zooming of long sounds, and the ability to annotate parts of the sound with text or markers that act as bookmarks. Mechanisms for navigating among segments and for rapid searches were developed as well. 6.1 Displaying Multiple Levels of Structure The enhanced SoundViewer widget supports the optional display of several levels of structure. The most general structuring for speech is to distinguish ACM Transactions on Information Systems, Vol 11, No 4, October 1993. 388 . Debby ..~ Hindus et al “~~~’ “ “ segment Fig. 4 n segment Silences are divided segment n+l between n+2 seagnents. between speech and silence intervals. Figure 5 shows the modified widget incorporated into a voice mail application; segments of speech are displayed as black bars, and SoundViewer silence allows is white, following the user to jump Etherphone’s forward example. and backward between The speech segments during playback, by pressing the space bar or “b” key, respectively. Applications may require segmentation at a level of semantic and structural ing information between higher two application-specific level Viewer its may than speakers specify of structure, own layer and silence bars, as shown layer is up to the application, content. For musical notes. another. The third example, level arrow-shaped another way and an interludes can of structure silence, of black be used and as To distinguishpresent containing white in bars a radio to skip by adding [Degen in the SoundViewer to convey such speech. application is user-supplied within a recording the SoundBrowser markers and music below the information within show from structure. visual et al. are one Displaying and Interacting wkh The SoundViewer emphasizes interactive playback controls speech indicated content Users can Lengthy by bar to denote bookmarks. We followed 1992]; users can place by pressing the caret a SoundViewer, and key. Text is an optional text label can be set by the application. This label can display the name in a telephone message, for example, or the date of a recording. 6.2 this a Sound- in Figure 6. The interpretation of this content which can also associate a visual icon with the musical Keystrokes points of interest the example of speech or between caller’s Recordings the temporal aspect of audio and has required improvement together with to accommo- date recordings longer than a few minutes. Like other graphical interfaces to time-varying media, the SoundViewer uses a mapping from time to space (length) to represent sound. Showing the total duration of a recording, along with a continually updated position indication during playback, is important for navigation and user comfort [Myers 1985]. The SoundViewer initially used tick marks of varying size to convey a time scale, and longer sounds were shown with closer spacing between tick marks. But these visual cues were inadequate for indicating total duration or positioning within long recordings. Because tick marks failed to present absolute duration, text labels were introduced into the SoundViewer to a Is-minute display sound duration; for example, “1 min 30 see” labels recording. Tick marks did provide navigational cues, however, and we are evaluating tick marks and speech-and-silence displays for navigation. ACM Transactions on Jnformatlan Systems, VO1 11, No 4, October 1993 Ubiquitous . —“. . . . . . “. —-------- “.. -...” ---- —.--..—. “-— pmail -::W:! —.-— LEFT:vieu 119 \ .—--.-- RIGHT: delete II 1 <ec ml Ilml 2 mln 30 258-9803 389 . —. HIllllLEssave - F Unknoun v ——. Audio IIBI mllllIll Ml Inmlll I mmllllml~m, I sec Unknoun - ~ Play L ..— speed: . .. . ..-. —.-. Fig.5. 1 mln nullins .,, mllllll la Unknaun i 3 .— Enhanced lmlll~,lllllllllm Illllmmllllmmll Illlmmllm min SoundViewer nllmlll Imlllmll 47 sec I 1.4 ——. ——— .- .—. --- .-.— Ill 33 sec in Vmail, showing -- “--. ..— ... . “.—.. speech andsilence .-. . ..-. intervals, 111111111111111111111111111114 Fig.6. Accurate ping position to support Content bars indication direct displayed beneath is important manipulation segmentation for using during layer. the time-to-space playback. But when map- recordings become very long (e.g., a thirty-minute conversation), so much audio corresponds to a single pixel that this mapping breaks down; the bar moves so slowly during playback that the display appears static and provides inadequate feedback. Additionally, if the user chooses to move to a different location within To overcome the sound, temporal the SoundViewer’s called MegaSound chical Video compress A Magnifier pieces MegaSound was added resolution resolution above of Mills the without widget, in shown SoundViewer. et al. [1992], of the timeline is too coarse. limitations, an interface losing Figure Following MegaSound global can layer the Hierarexpand and positioning information. of two SoundViewer 7, consists widgets, with lines connecting them, that show the zoom region in its global context. Because MegaSound incorporates SoundViewers, SoundViewer behaviors are preserved, such as speed control and random access, by moving the position indicator. The user can interact with either the root level or the zoomed SoundViewer, and the region of magnification moves in synchrony with playback. MegaSound maintains the link between the two layers so that the position indicators are synchronized. MegaSound also generalizes the bookmark mentioned earlier, by turning it into an annotation where the user can type arbitrary text. It does this by “flagging” bookmarks and linking them to a text entry window. The annotation text can be entered by the user or can be provided by the application. As ACM Transactions on Information Systems, Vol. 11, No. 4, October 1993. 390 Fig. Debby . 7. the Revmed pointer displayed H[ndus xcapture, showing is moved in the et al. text a MegaSound into a given window, widget flag, and the its with both root associated sound’s position and text zoom levels is automatically indicator automati- cally jumps to the flag’s position, as shown in Figure 8. Clicking on a flag sets the zoom region of the MegaSound widget to the temporal boundaries for that flag and plays the associated audio, navigating among the annotations. Along with adding the MegaSound Accelerator layer, keys the can also be used for itself was SoundViewer improved with respect to scanning and speed control. Playback speed can be specified with the numeric keys or changed relative to the current speed with the plus and minus keys. Additionally, the SoundViewer includes speed control as part of the scanning interface. Mouse motion events are treated forward dragging as play commands; or backward, motion, as the user clicks there the faster is accompanying the audio plays and drags audio in order the position feedback. The indicator faster to cover the region the swept out . 7. IMPLEMENTATION OVERVIEW The audio applications described in this article were developed on a common platform and were designed to complement each other. Software was developed in the C language on a Sun Sparcstation 2 running SunOS. Releases 4 and 5 of the X Window System toolkit were used with the widget set. The Sparcstation 2 can digitize a single sound Athena channel (Xaw) at a sampling rate of 8,000 samples per second in an 8-bit ~-law format, which is comparable to a good-quality telephone connection; a l-minute recording is 480KB long without further compression. The desktop applications described here are X Window System client applications that rely on independent servers—the X Server and the audio server—to handle their interactions with the user’s display and workstation audio, respectively. The workstation’s X server displays the user interfaces and passes along user input events. The audio server is a networked server for managing asynchronous audio operations that is built on top of audio library routines. (These device-independent routines provide a level of abstraction above the workstation audio devices and support nonblocking ACM Transactions on Information Systems, Vol 11, No 4, October 1993 Ubiquitous Audio w .lhd uhere evening. Me begin in sanalia lmnighL U.S. forces have rmu been caughb 111111111111111111114 30 min I Fig.8. audio 391 . Anonzoomed operations.) The MegaSound audio widget server’s with annotation architecture flags. emulates that of the X Window System in that it consists of a server process that executeson the library of routines that are linked into an workstation and a client-side application municate The to com- program. The client-side routines enable the application with the server process through callbacks [Arons 1992b]. Listener also made use of the Phoneserver, another independent server. The Phoneserver monitors lines and can deliver phone-related the status of the group’s ISDN telephone events to programs such as the Listener. The Phoneserver on a Sun Sparcstation telephone To currently interface implement Listener, two kinds of items: the strings added by users. widget superclass like Among type of the the bulletin ChatViewer widget manager. database for “Chat ChatViewer because (See Hindus Although CAPTURE, xcapture and of by the Listener and text components, the Soundmanager a provides fields are defined as key-value was designed as a more general ChatMan manages the item decisions up to its subclass. or text [1992] are how to lay out items items. it can handle database, The OmniViewer all possible for lower-level layouts descriptions is and what one by acting STRUCTURE, the Listener such as a of ChatViewer functions and implementation details, and Homer [1993] for detailed tions of the ChatMan, OmniViewer, and MegaSound widgets.) 8. BEYOND The track all widget by the subclass made created. and keeps leaves to use for sound board. ISDN widget. but so named built-in was displays The database in which Manager”) choices these widget that sound segments generated It makes use of two other ChatViewer, of widget subclass, the System and the database simple record-oriented pairs. ChatMan (short just with hardware. the is an X Window ChatViewer Viewer resides descrip- AND PRESENTATION are worthwhile by themselves, such ubiquitous audio applications are more powerful when implemented in a broader computing context. This section considers three contexts in which these applications operate. The first context is a family of applications employing speech at the desktop. These applications feature a consistent ACM Transactions on Information Systems, Vol. 11, No. 4, October 1993 392 . visual interface Debby Hindus and et al interoperability. The second context is portability and mobility to allow speech to be captured in a wider range of social situations. The final context is the social implications of ubiquitous recording technology. 8.1 Desktop Audio Appllcatlons The SoundViewer Media Lab audio applications and related applications. make use audio widgets are used in a number of other As shown in Figure 9, the various audio of shared whether the user’s interactions A screen-based user interface random access within call-return and originate from text and between audio messages, message-return unanswered and databases capability calls, speed control [Stifelman other voice mail 1990]. A speech extent of editor, Sedit, phrases that additional recording. NewsTime provides uses the the a visual interface digital Radio), audio recordings delivered and it utilizes the structure 1993]. Figure right-hand top story 10 shows cut, to minute-long region of the NewsTime recording display audio to indicate and augment radio broadcast application. that summary does not capabilities can or news with and to over computer networks (Internet Talk inherent in news programming [Homer the of the talk necessarily although both the text and the recording use of the SoundViewer in all these cut-and-paste and messages can be used as daily schedule [Schmandt paste, window displays MegaSound widgets is being played, so the top MegaSound window displays the text A summary paragraph Sound, The the can 5, provides subscribers, speech-and-silence user of of playback, 1991]; attachments to ordinary Internet email. Speech snippets calendar entries in xcal, a speech and text personal the regardless are graphical or telephone based. to voice mail, shown in Figare between (In this picture, the for four feature stories; is expanded to magnify user is hearing. show.) correspond The the the left-hand to a single Mega- are in order. applications provides audio applications in addition to a consistent visual user interface. The SoundViewer supports cut and paste through standard X Window System selection mechanisms. A segment of audio can be selected in any SoundViewer and then pasted into any other application that supports recording, thus increasing the utility of simple capture applications. For example, one could take a portion of a conversation and send it as voice mail to a third party who was asked to comment. Or one might agree to perform a task and then copy the portion of the conversation describing the task into one’s calendar on the task due date, as shown in Figure 11. Tools to capture ubiquitous audio contribute a new source of audio as a data type applications. 8.2 to be shared and manipulated range of desktop Mobility provide recording Xcapture and the Listener tive conversations also occur spontaneously outside of work sites. Capture applications ACM by a wide TransactIons on Information Systems, Vol of office conversations. Producin hallways, common areas, and will ultimately employ technolo- 11, No 4, October 1993 Ubiquitous window 393 . phone shell databases system Audio text ......--”” ... voice e ...... .....-’ phone visual ““ ...---- ............ text - .. user voice e interface rolodex, ..--.. list ---------text @ ... ....-----------““’” voice ,....’., .“ calendar-”””” ““ ,....- j .... .......... . ... telephone interface text ! voice -“”” ‘e mail ,, : :::, :, ,! 1-:1 ...... ‘“’ other unix processes (mail, appointments, Fig. 9. Multimedia databases are accessed by fax service..) applications that support various media for presentation. that gies can operate portable ubiquity might in prototypes A et al. the hand-held VoiceNotes, 1993]. consists keyboard hands and eyes ments. VoiceNotes conversations. Although memos be accessed or devices interfaces into initial how such recognition for speech manual control, fit in is each By and segmented based on of that lit saved to other user’s environ- for recording can a list literally the dimly xcapture of category eliminating while or in version [Stifelman a pocket example, elements or categories; interaction a nonvisual sequential system sequentially. could for provides file allow also driving, become these computer, tions offer speech the is to Lab complete hand-held pauses, VoiceNotes prototypes personal information mobility a family provide telephone Phoneshell Media telephone-based Phoneshell speech Lab’s show be and browsed audio files 1992]. top and Media speech audio can recording, During speech digital record busy—while also segments under The interfaces supports al.’s portable Speech are et that display, hand. these as well. user computer users of memos and be at [Stifelman voice Stifelman VoiceNotes of a list always environments telephone-based be achieved. prototype accessing such and users applications to mail, can by applications, access number afforded of such voice databases, as email, and calendar, and news, weather, as record messages into databases above, or type short TransactIons cellular digitized described ACM tethered on Information to management hand-held using well are shared text entries Systems, a lapapplica- telephones. synthesized personal name and with such traffic. the desktop as “ok” Vol. 11, No. 4, October on 1993. 394 . Debby Hindus et al. u “ L# EEEIEEclEEmEEIEEEIEEzlEEEmEl Carl Iialanud interviews Billlllllnnlllllmlmlllllllmnlllllllll Lynch, forner conputer manager at SRI, founder and president of Interop Conpang, and nenber of the Internet a long-tine Daniel This Board. uas conducted in 1993, just before Lynch doun as a nenber of the flrchitecture intervieu January, stepped IRE. unique The intervieu contribution Lynch nade In to this shows that the Daniel body. that intervieu, Carl tialanud and Dan Lynch range of topics, fron role for the telphone the reasons that the cover a wide the proper conpang to current 11 m Internet does not yet address the needs of the snail corporate network. Lynch shares his views on why unified global directories cannot uork and why Netware is just as inportant as TCP/IP. of the Tourist, lllso in this Ueek is the featuring episode Incidental a review of ImlllllnlmBlmBllllInlllfllIIIlllml Geek of 1 t 10. NewsTime m~m I I I I 111111111111111% Chili Pepper naeazine, the capisciun lover;s periodical. Fig. < mln I application displaying aportion ofan Internet Talk Radio talk show. the telephone keypad. Phoneshell has been operating for over two years and is regularly used by group members [ Schmandt 1993]. by mobile devices can be retrieved at Recorded speech that was captured the desktop. provides For example, a top-level is routinely used note tasks on ferred to speech by one the text of the mail authors and can be system to record home.) speech items voice function commute a free-form and the menu to record These text sorted memos under memos. ideas display, and offered personal during are as shown arranged Phoneshell (This long function walks automatically in Figare spatially or to trans- 12. Here, into different categories. 8.3 Social Context and Privacy Ubiquitous audio will exist organizations. The the user, individual researchers/users. all beneficial. ACM Transactions within applications as we Outside Capturing on Information the described have experienced of that conversation Systems, limited has social in context this within context, social of article a small, the impact Vol, 11, No 4, October 1993 individuals can social clearly trusted impacts because it and benefit group of are not challenges Ubiquitous Audio 395 . P——’——7 L————___1 Monday June 10:30-11:00 Tuesday 7 June 1:00- Knight-lUdder Jur proposal for Emil photos for FRAMES 1:00-1:45 sun TOISpape.r Ava and Kaya tic 11111[ Cd Debby Fig, 11. [19911, assumption in discussing munication as say and take less the and stored back can capture during For only or tools, the relates requests video-based I see you. for Often communication and Kiesler the com- remark that about what concerned Listener therefore, assumptions proposed privacy key enable conversation makes a framework it of stored and will important an audible to data, be and the point for privacy. out, One do not to is most feedback like consent capture a telephone call accepted ACM Transactions to the incorporated model or rejected. on Information and feel employed, Strict of control is spied feedupon. particularly et al. if 1992]. during in the conver- you symmetry; is for to which issue appropriate, [Gaver have defines addressed purposes Sellen signal ubiqui- representations of a recording systems are must constructing notification-people to control: designing framework issues important connotation The that data, Bellotti for mechanisms. considerations accessibility Some if in actions—capturing As issue storable; and ephemeral They and Xcapture and change artifacts. committed received. have as the most capture tangible Sproull describe concerns. the conveys Another sation. be [1993] capture, speech directly of tangible be put. are is ephemeral. as ephemeral, to be less incorporate storage, data over Sellen or user for mail This feedback system conversation a lack will privacy that and data it become ephemeral. systems four how account Bellotti control people to into tous by causes conversation become that electronic marked ephemeralness they overdue Recorded speech can be selected from xcapture and pasted into the calendar. an ingrained it 8 1:45 ABC Radio 4:00-5:00 I I I I II Wedn see which symmetry, me video while Systems, Vol. 11, No 4, October 1993 396 Debby . Hindus et al —“— —.. ..— r“— Wmpx=iiqlmzqpimiiq(ixq 5PONSOFS TOIS n3net_ conf , rm Consort dates I urn PHONESHELL ~= -.—. f GRRPHICRL UI ~ [ 1 f F I —— Fig. 12. Spatial management suitable layout mto for some kinds system Listener-capable recall . “e-—. in a mixed who how [Dourish the they 1993] remote group were and would such . voice Listener—only interaction may need could calling, Cruiser captured. using and interactive systems outside be overly be and text reminded Consent controls, might periodically project work, of a people the part of based on on as in the RAVE Obtaining more consent problematic. although this utility from controls such is of the calls access workplace consent limit et al. 1993]. [Fish of the confirming to as the be negotiated could collaborators touchtone well interaction, workstations within from .——-— supports of video capture someone and categories k list telephone-based at ——.. d~fau Itz 1 change app f I x d I a I –by–name –– f reward to bus’) option A conversant during lengthy conversations, Making stored of conversation material stored information safeguarded [ 1992]. ping might have Potential by use would for be ACM Transactions and been used. guidelines to attitudes stored a barrier on Information to capture of the how Encryption snooping. which of the digital Kling [ 1991] systems afterward. for stored and misrepresent long purposes questions general Dunlop recordings. real-time issues The for in of audio editing political of the raises be addressed parties, is encryption also later misuses third examination retrievable include data and later what was One barrier during Another Systems, Vol 11, No 4, October 1993, the risks can be Rothfeder eavesdrop- said, and even to unintended the idea conversation is to build in Ubiquitous autodestruct mechanisms, have not been not eliminate their and systems should encryption shows and how can privacy. providing The choices, such as providing Sellen and framework Godard, and systems, [Dourish individual’s Dourish’s control communication be implemented an meantime, Bellotti and flexible that does however. in the affordances, ones This periodically. supersede choices for those computer-mediated dances those may 397 . or perhaps removed copies, forces support for conversations, are on backup political design routine recording, affordances to infrastructure in since existence Organizational but so that accessed Audio a software feedback illustrates mechanisms how these affor- 1993]. 9. CONCLUSION This article has presented have introduced applications only begun the and to explore amounts of audio, and Interval Research will designing ways pursuing auditory 1993] and the Another an Chen that aspects The data aids feasible in by in use becomes persons this article to deriving of interest by struc- examining in creating segments of discourse could operate extending exploring on capture the demonstrates situational telephoneapplications technical that and and structuring and recordings related everyday semi-structuring constraints lengthy issues social can be value and audio display adopting is simple presented visually. mobility, utility in speech the and by of privacy, available ubiquitously to information Real-time the addition above. locate ways, retrieval. the is [Arons devices. in various and in and Lab audio described at audio Media databases, prosodic that and The approaches automatically influence more exploited work of through segments recording with will new and of strategies, type, is in include and along news of interest two users. We large Follow-on skimming and audio. presenting research. applications significant be captured concepts, a data audio [ 1992] than making segmentation These work ubiquitous storage audio could be described can accessing and to for of audio ubiquitous presentations tangible is to delineate areas of more work future to Other of portable, interactions for that found speech. to meetings more of the Withgott detector listeners quality as for this visual techniques approach and emphasis to continue audio gathering manage structuring, exploring interactions direction prosody; expect development A promising to capturing, include make of ubiquitous interfaces presentation audio continuing ture. to we concept user of workday and recorded speech audio as situations. ACKNOWLEDGMENTS The work the presented Media audio server, Stifelman many ten Lab a key contributed of the by in this Speech Mark X Window audio relies who ACM Barry of audio research applications. applications on software Group. component to the Ackerman, System article Research The also the content Transactions widget designed software. and the SoundViewer provided and developed Arons writing. on Information was training The Systems, of and built the he and Lisa Both software widget invaluable by members engineering of initially and mechanics writ- advice for Vol. 11, No. 4, October on audio 1993. 398 Debby . cut and by Lorin paste Hindus implemented were Jurow; et al, and Eric by Sheldon Ly and Atty Pacotti; Mullins xcapture worked was on written related audio applications. We are indebted support in for significant their James the to Tom development Baker their for Wendy careful their for his persistent, article, and comments. critiques; to the Mackay for of this and contributions and Malone Terry discussion their We thoughtful to also the on on Amy issues; an Schiano and Bruckman and earlier and reviewers Diane and privacy comments anonymous thank Winograd guidance Mark version for Ackerman of this article. REFERENCES ADES, S., AND SWINEHART, ment. In Calif., D. C. 1986. of the 1986 Proceedings Voice annotation and 1993. Interactively and Technology— Software ARONS, B. 1992a. Proceedings skimming the recorded UIST’93 Techniques, of 1992 The editing speech Conference perception, Conference. Voice In the applications American 1/0 envn-on- Society, Symposzwn San Voice 1/0 Jose, on User Interface ACM, New York. of time-compressed Proceedings. and The American in a workstation 13-28. ARONS, B Conference. Society, speech, In San Jose, Calif., 169-177. ARONS, 1992b. B apphcations, Conference ARONS, B. New In G. building asynchronous on User ACM, New York, Hyperspeech W., ANI) V,, ND SELL~N, P. J enquiry A, 1979. calls) 1993. of European Xerox Design Tech B speech support and speech hypermedia. In Hypertext text temporal for privacy Rep structure in ubiquitous on Computer Media of natural computmg Sapported Spaces: Video, 1991. S , ANn KRAUT, R. E as media Processing recorders and Conference for revision. In Cooperatz audio, IEEE, New desktop Proceedings. Inforruatlon technology P. 1993. Con ference Tech York, environments, ve Work, Commun, Factors richness: A comparison zn Computer ACM, 1992. In New to support and Culture of Systems—CHI’91 summarize Speech, a and York, control with Factors audio: m Integrating Computer personal tape System-CHI’92 413-418. NUNAMAKER, electronic Supported Workmg Human in meetings. a media Cooperatzc,e J. F., JR., AND VOGEL, MIS space Work. D. R. 1988. Q, 12, 4, 591-624. In Proeeedmgs Available of the as Rank Xerox European EuroPARC Rep. EPC-93-101 Social C’hotces. EGIDO, C. 1990 and limit Work. R., EDS. Academic 1991. Press, Team uork Erlbaum, cooperative ceedings. writing. ACM, Transactions New In Hillsdale, York, on Information to support Social NJ,, on Office and Chapter M. D., AND COHEN, Conference and Controversy: M. In formatzon cooperative Technological Conflzcts and, Vol work Its possibilities Foundations of Coopera- 13, 351–371. 1988 Quilt: A collaborative System- CO IS’88 30–37. Systems, Value York. as a technology In Intellectual Lawrence Computerzzafzon New Teleconferencing ations. FISH, R. S., KRAUT, R. E., LELAND, ACM Available I-229-232. computers. on Computer DUNLOP, C., ANI) KLING, tive ACM, telephone and computing. Expressive Human A. R., GEORGE, J. F,, JESSUP, L. M., DOURISH, ’91 EPC-93-103 DI?,GEN, L., MANDER, R., AND SALOMON, G, DENNIS, audio UIST92 17, 213–229. Conference Proceedings. ACM, New York, 21–26. CHEN, F. R., .ANDWJTHGOTT, M M. 1992 The use of emphasis to automatically of the International Conference on Acoustzcs, spoken discourse. In Pi-oceedzngs Szgnal and Technology— 28-46, L , FISH, R and The Linguistics Conference EuroPARC 1 (Jan.), 36, CHALFONTE, to Software in speech-only BLY, S. A., HARRISUN, S. R,, AND IRWIN, S. ACM servers Interface 71-78. Navigating BARNARD, (directory Proceed~ngs as Rank for Symposium 133-146. conversations B~LLOTTI) the Proceedings. 1991, York, BEATTIE, Tools In 11, No 4, October 1993 tool Conference for Pro- Ublqultous Audio FISH, R., KRAUT, R., ROOT, R., mm ACM RICE, R. 1993. Video informal . communication. 399 Commun. 1 (Jan.), 48-61. 36, GAVER, W., MoaAN, T., MACLEAN, A., LOVSTRAND, L., DOURISH, P., CARTER, K., AND BUXTON, B. Factors in 1992. Realizing a video environment: EuroPARC’s RAVE system. In Human Computer Systems—CHI’92 Conference Proceedings. ACM, New York, 27-35. HINDUS, D. 1992. Semi-structured capture and display of telephone conversations. Master’s thesis, Massachusetts Institute of Technology, Cambridge, Mass. HORNER, C. 1993. NewsTime: A graphical user interface Massachusetts Institute of Technology, Cambridge, Mass. ISHH, H. 1990. portecl TeamWorkStation: Cooperative ISAACS, E. A., AND TANG, J. C. International Towards Work —CSCW’90 Conference to audio news. Master’s thesis, a seamless shared workspace. In Computer Proceedings. ACM, New York, 13-26. Szzp- Conference 1993. What video can and can’t do for collaboration. ACM, New York, 199-206. In the 1st on Multimedia. LAMMING, M., AND NEWMAN, W. 1992. Activity-based information retrieval: Technology in support of human memory. Tech. Rep. 92-002, Rank Xerox EuroPARC. MACKAY, W. E., MALONE, T. W., CROWSTON,K., RAO, R., ROS~NBLITT, D., AND CARD, S. K. 1989. How do experienced Systems—CHI’89 Information Conference Lens users use rules? In ACM, New York, Proceedings. Human Factors in Computer 211-216. MALONE, T. W., GRANT, K. R., LAI, K.-Y., RAO, R., AND ROSENBLITT, D. 1987. Semi-structured ACM Trans. Office messages are surprisingly useful for computer-supported coordination. Infi Syst. MANTEI, 5, 2, 115-131. 1988, M. supported ference the use M,, BAEC!KER, R., Factors SELLIIN, space. ACM, COHEN, New In in the design Cooperative of computer Work —CSCW’88 Con- The Y. Y. 1992. T. 1991. Experiences Computer Systerns-cH~91 A magnifier Conference 1990. importance OSCHMAN, R. behavior B., Factors AND MILLIGAN, m in Conference Toward a definition Conference for of voice progress video data. In Human ACM, New York. proceedings. of percent-done in Compz~ter tool proceedings. documents. ACM, indicators AND CHAPANIS, of teams during A. 1974. Syste~ns—CHT85 In Conference New York, 174– 183. for computer-human 11-17. York, Human W., 203-208. Systems—COIS’90 1985. New A case study Supported Factors Systerns—CHI’92 Information interfaces. concepts: BUXTON, Human York, MULLER, M. J., AND DANIEL, J. E. MYERS, B. A. A., In J., AND WONG, in Computer on Office Lab In Computer ACM, New York, 257-270. of a media Proceedings. MILLS, the Capture environments. Proceedings. M., MANT!N, Capturing meeting The co-operative effects problem Conference of ten solving, Proceedings. communication Int. J. Man ACM, modes on / Machine the Syst. 6, 579-619. REDER, S., AND SCHWAB, R. G. uter Supported 1990. Cooperatzue The temporal Work—CSCW’90 structure of cooperative Conference activity. Proceedings. ACM, In New CompYork, 303-316. RIMNICK, P. erative 1992. HyperVoice: Work —CSCW’92 A phone-based Confkren RESNICK, P., AND VIRZI, R. A. Human Factors in 1992. Computer CSCW Skip Systems platform. In Computer Supported Coop- ACM, New York, 2 18–225. ce Proceedings. and Scan: Cleaning —CHI’92 Conference up telephone Proceedings. interfaces. ACM, New In York, 419-426. ROI’HFEDER, J. 1992. Priuacy RU’ITER, D. R. 1987. Commurucating SCHMANDT, C. 1993. Phoneshell: Conference SCHMANIIT, on Multlmedta. C. 1990. The American SCHMANIIT, C. of the IEEE 1/0 Sot. New York, A multi-media Society, and Schuster, San Jose, Dlspla.v 26, 1985. Phone York. press, New terminal. York. In the 1st International 373-382. calendar. Calif., The Intelligent Ear: A graphical Conference on Cybernetics and society. Infi New pergamon as computer 1981. SCHMANDT, C., AND ARONS, B. Proc. Simon by Telephone. The telephone ACM, Caltalk: Voice for Sale. Slave: In Proceedings of the 1990 Conference. interface to digital audio. In IEEE, NYork 393–397. Proceedings 71-75. A graphical telecommunications interface. 1, 79-82. ACM Transactions on Information Systems, Vol 11, No 4, October 1993. 400 . Debby Hindus et al, SOCLOF, M., spoken SPROULL, 1990. AND ZUE, V. language L., system AND KIESL~R, Collection development. S. 1991. and In analysis of spontaneous Proceedings Connections: of ICSLP. New Ways and read corpora for 1105–1108. of Working zn the Networked MIT Press, Cambridge, Mass, ST[FELMAN, L. J. 1992, VoiceNotes: An application for a voice-controlled hand-held computer. Master’s thesm, Massachusetts Institute of Technology, Cambridge, Mass In Proceedings of the 1991 STIFELMAN, L. J. 1991. Not Just another voice mail system. Organization. Conference. STIFELMAN, American Voice 1/0 Society, L. J., ARONS, B., SCHMANDT, interface for a hand-held voice San Jose, Calif., C., AND HULTEEN, notetaker. In 21-26. E. A Human 1993. Factors VoiceNotes: m Conference Proceedings, ACM, New York, 179-186. R,, HOPPER, A., FALCCO, V., AND GIBBONS, J. 1992. The active ACM Trans. Office Infi Syst. 10, 1, 91-102 A speech Computer Systems— InterCHI’93 badge WANT, WATARE, K., SAKATA, S., MAENO, con ferencmg system with K,, FUKUOKA, multluser H., AND OHMORI, T. multimedia interface. IEEE 1991. location system. Distributed J. Sel. Areas desktop Coznm UIL. 9, 4, indexing. In 531-539. WEISER, M. WILCOX, L., .mm BUSH, M. 1991. Proceedings ZELLWEGER, and The computer of Eurospeech P., TERRY, its applications. for the 1991, 21st HMM-based SCL Am. wordspotting 265, 3 (Sept.), for voice editing 66-75. and 91.25-28. D., AND SWIN~HART, In century, Proceedwzgs D. 1988. of the 2nd An IEEE overview Conference of the Etherphone on Computer IEEE, New York, 160-168. ZU~, V. W. 1991. From signals to symbols to meaning, On machine understanding of the 12th Internat~onal Congress of PhonetLc Sciences. language. In Proceedings Received ACM January TransactIons 1993; revised on Information July 1993; Systems, accepted VOI 11, No July 1993 4, October 1993 system Workstations. of spoken

Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.


Why Is My Information Online?