Freshub, Inc. et al v. Amazon.Com Inc. et al
Filing
1
COMPLAINT ( Filing fee $ 400 receipt number 0542-12267080), filed by Freshub, Inc., Freshub, Ltd.. (Attachments: # 1 Exhibit 1, # 2 Exhibit 2, # 3 Exhibit 3, # 4 Exhibit 4, # 5 Exhibit 5, # 6 Exhibit 6, # 7 Exhibit 7, # 8 Exhibit 8, # 9 Exhibit 9, # 10 Exhibit 10, # 11 Exhibit 11, # 12 Exhibit 12, # 13 Exhibit 13, # 14 Exhibit 14, # 15 Exhibit 15, # 16 Exhibit 16, # 17 Exhibit 17, # 18 Exhibit 18, # 19 Exhibit 19, # 20 Exhibit 20, # 21 Exhibit 21, # 22 Exhibit 22, # 23 Exhibit 23, # 24 Exhibit 24, # 25 Exhibit 25, # 26 Exhibit 26, # 27 Exhibit 27, # 28 Exhibit 28, # 29 Exhibit 29, # 30 Exhibit 30, # 31 Exhibit 31, # 32 Exhibit 32, # 33 Exhibit 33, # 34 Exhibit 34, # 35 Exhibit 35, # 36 Exhibit 36, # 37 Exhibit 37, # 38 Exhibit 38, # 39 Exhibit 39, # 40 Civil Cover Sheet)(Palmer, John)
EXHIBIT 29
6/6/2019
SpeechRecognizer Interface | Alexa Voice Service
SpeechRecognizer Interface
version 2.0
Table of Contents
Version Changes
State Diagram
Capabilities API
SpeechRecognizer Context
Recognize Event
StopCapture Directive
ExpectSpeech Directive
ExpectSpeechTimedOut Event
Every user utterance leverages SpeechRecognizer. It is the core interface of the Alexa Voice
Service (AVS). It exposes directives and events for capturing user speech and prompting a
client when Alexa needs additional speech input.
Additionally, this interface allows your client to inform AVS of how an interaction with Alexa
was initiated (press and hold, tap and release, voice-initiated/wake word enabled
(/docs/alexa-voice-service/audio-hardware-configurations.html#applications)), and choose
the appropriate Automatic Speech Recognition (ASR) profile (/docs/alexa-voiceservice/audio-hardware-configurations.html#asr) for your product, which allows Alexa to
understand user speech and respond with precision.
Important: Cloud-based wake word verification is required for voice-initiated
products. It improves wake word accuracy by reducing false wakes that are caused by
utterances that sound similar to the wake word. See Enable Cloud-based Wake Word
Verification (/docs/alexa-voice-service/enable-cloud-based-wake-word-verification.html)
for implementation details.
Version Changes
Opus (http://opus-codec.org/) is now a supported format for captured audio. For more
details, see the specification under the Recognize event.
State Diagram
The following diagram illustrates state changes driven by SpeechRecognizer components.
Boxes represent SpeechRecognizer states and the connectors indicate state transitions.
SpeechRecognizer has the following states:
(http
IDLE: Prior to capturing user speech, SpeechRecognizer should be in an idle state.
SpeechRecognizer should also return to an idle state after a speech interaction with AVS has
concluded. This can occur when a speech request has been successfully processed or when an
ExpectSpeechTimedOut event has elapsed.
https://developer.amazon.com/docs/alexa-voice-service/speechrecognizer.html
1/9
6/6/2019
SpeechRecognizer Interface | Alexa Voice Service
Additionally, SpeechRecognizer may return to an idle state during a multiturn interaction, at
which point, if additional speech is required from the user, it should transition from the idle
state to the expecting speech state without a user starting a new interaction.
RECOGNIZING: When a user begins interacting with your client, specifically when captured
audio is streamed to AVS, SpeechRecognizer should transition from the idle state to the
recognizing state. It should remain in the recognizing state until the client stops recording
speech (or streaming is complete), at which point your SpeechRecognizer component should
transition from the recognizing state to the busy state.
BUSY: While processing the speech request, SpeechRecognizer should be in the busy state.
You cannot start another speech request until the component transitions out of the busy
state. From the busy state, SpeechRecognizer will transition to the idle state if the request is
successfully processed (completed) or to the expecting speech state if Alexa requires
additional speech input from the user.
EXPECTING SPEECH: SpeechRecognizer should be in the expecting speech state when
additional audio input is required from a user. From expecting speech, SpeechRecognizer
should transition to the recognizing state when a user interaction occurs or the interaction is
automatically started on the user's behalf. It should transition to the idle state if no user
interaction is detected within the specified timeout window.
(https://images-na.ssl-images-amazon.com/images/G/01/mobile-apps/dex/alexa/alexavoice-service/docs/speechrecognizer-state.png)
Click to enlarge
Capabilities API
To use version 2.0 of the SpeechRecognizer interface, it must be declared in your call to the
Capabilities API. For additional details, see Capabilities API (../alexa-voice-service/capabilitiesapi.html).
Sample Object
{
"type": "AlexaInterface",
"interface": "SpeechRecognizer",
"version": "2.0"
}
SpeechRecognizer Context
(http
Alexa expects all clients to report the currently set wake word, if wake word enabled.
https://developer.amazon.com/docs/alexa-voice-service/speechrecognizer.html
2/9
6/6/2019
SpeechRecognizer Interface | Alexa Voice Service
To learn more about reporting Context, see Context Overview (../alexa-voiceservice/context.html).
Sample Message
{
"header": {
"namespace": "SpeechRecognizer",
"name": "RecognizerState"
},
"payload": {
"wakeword": "ALEXA"
}
}
Payload Parameters
Parameter
Description
Type
wakeword
Identifies the current wake word.
Accepted Value: "ALEXA"
string
Recognize Event
The Recognize event is used to send user speech to AVS and translate that speech into one or
more directives. This event must be sent as a multipart message, consisting of two parts:
A JSON-formatted object
The binary audio captured by the product's microphone.
Captured audio that is streamed to AVS should be chunked to reduce latency. The stream
should contain 10ms of captured audio per chunk (320 bytes).
After an interaction with Alexa is initiated, the microphone must remain open until:
A StopCapture directive is received.
The stream is closed by the Alexa service.
The user manually closes the microphone. For example, a press and hold implementation
(/docs/alexa-voice-service/audio-hardware-configurations.html#applications).
The profile parameter and initiator object tell Alexa which ASR profile should be used to
best understand the captured audio, and how the interaction was initiated.
All captured audio must be sent to AVS in either PCM or Opus, and adhere to the following
specifications:
PCM
Opus
16bit Linear PCM
16bit Opus
16kHz sample rate
16kHz sample rate
Single channel
32k bit rate
Little endian byte order
Little endian byte order
(http
Important: If your product is voice-initiated it must adhere to the Requirements for
Cloud-Based Wake Word Verification (/docs/alexa-voice-service/streaming-requirementsfor-cloud-based-wake-word-verification.html).
https://developer.amazon.com/docs/alexa-voice-service/speechrecognizer.html
3/9
6/6/2019
SpeechRecognizer Interface | Alexa Voice Service
For a protocol specific example, see Structuring an HTTP/2 Request (/docs/alexa-voiceservice/structure-http2-request.html#examples).
Sample Message
{
"context": [
// This is an array of context objects that are used to communicate the
// state of all client components to Alexa. See Context for details.
],
"event": {
"header": {
"namespace": "SpeechRecognizer",
"name": "Recognize",
"messageId": "{{STRING}}",
"dialogRequestId": "{{STRING}}"
},
"payload": {
"profile": "{{STRING}}",
"format": "{{STRING}}",
"initiator": {
"type": "{{STRING}}",
"payload": {
"wakeWordIndices": {
"startIndexInSamples": {{LONG}},
"endIndexInSamples": {{LONG}}
},
"token": "{{STRING}}"
}
}
}
}
}
Binary Audio Attachment
Each Recognize event requires a corresponding binary audio attachment as one part of the
multipart message. The following headers are required for each binary audio attachment:
Content-Disposition: form-data; name="audio"
Content-Type: application/octet-stream
{{BINARY AUDIO ATTACHMENT}}
Context
This event requires your product to report the status of all client component states to Alexa
in the context object. For additional information see Context (/docs/alexa-voiceservice/context.html).
Header Parameters
Parameter
Description
Type
messageId
A unique ID used to represent a specific message.
string
dialogRequestId
A unique identifier that your client must create for each
Recognize event sent to Alexa. This parameter is used to
string
correlate directives sent in response to a specific
Recognize event.
Payload Parameters
Parameter
Description
https://developer.amazon.com/docs/alexa-voice-service/speechrecognizer.html
Type
(http
4/9
6/6/2019
SpeechRecognizer Interface | Alexa Voice Service
Parameter
Description
Type
profile
Identifies the Automatic Speech Recognition (ASR) profile
associated with your product. AVS supports three distinct
string
ASR profiles optimized for user speech from varying
distances.
Accepted values: CLOSE_TALK , NEAR_FIELD , FAR_FIELD .
format
Identifies the format of captured audio.
string
Accepted value: AUDIO_L16_RATE_16000_CHANNELS_1 (PCM),
OPUS .
initiator
Lets Alexa know how an interaction was initiated.
object
This object is required when an interaction is originated by
the end user (wake word, tap, push and hold).
If initiator is present in an ExpectSpeech directive then
it must be returned in the following Recognize event. If
initiator is absent from the ExpectSpeech directive, then
it should not be included in the following Recognize
event.
initiator.type
Represents the action taken by a user to initiate an
interaction with Alexa.
string
Accepted values: PRESS_AND_HOLD , TAP , and WAKEWORD . If
an initiator.type is provided in an ExpectSpeech
directive, that string must be returned as initiator.type
in the following Recognize event.
initiator.payload
Includes information about the initiator.
object
initiator.payload.wak
This object is required when initiator.type is set to
object
eWordIndices
WAKEWORD .
wakeWordIndices includes the startIndexInSamples and
endIndexInSamples . For additional details, see
Requirements for Cloud-Based Wake Word Verification
(/docs/alexa-voice-service/streaming-requirements-forcloud-based-wake-word-verification.html).
initiator.payload.wak
Represents the index in the audio stream where the wake
eWordIndices.startIn
dexInSamples
word starts (in samples). The start index should be
accurate to within 50ms of wake word detection.
initiator.payload.wak
Represents the index in the audio stream where the wake
eWordIndices.endInd
exInSamples
word ends (in samples). The end index should be accurate
to within 150ms of the end of the detected wake word.
initiator.payload.tok
en
An opaque string. This value is only required if present in
the payload of a preceding ExpectSpeech (/docs/alexa-
long
long
string
voice-service/speechrecognizer.html#expectspeech)
directive.
Profiles
ASR profiles are tuned for different products, form factors, acoustic environments and use
cases. Use the table below to learn more about accepted values for the profile parameter.
Value
Optimal Listening Distance
CLOSE_TALK
0 to 2.5 ft.
NEAR_FIELD
0 to 5 ft.
FAR_FIELD
(http
0 to 20+ ft.
https://developer.amazon.com/docs/alexa-voice-service/speechrecognizer.html
5/9
6/6/2019
SpeechRecognizer Interface | Alexa Voice Service
Note: See Audio Hardware Configurations (/docs/alexa-voice-service/audio-hardwareconfigurations.html) to determine the appropriate ASR Profile for your Alexa-enabled
product.
Initiator
The initiator parameter tells AVS how an interaction with Alexa was triggered; and
determines two things:
1. If StopCapture will be sent to your client when the end of speech is detected in the cloud.
2. If cloud-based wake word verification will be performed on the stream.
initiator must be included in the payload of each SpeechRecognizer.Recognize event. The
following values are accepted:
Value
Description
Supported
Profile(s)
StopCaptur
e Enabled
Wake Word
Verification
Enabled
Wake Word
Indices
Required
PRESS_AND_
HOLD
Audio stream
initiated by
CLOSE_TALK
N
N
N
Audio stream
NEAR_FIELD ,
Y
N
N
initiated by
FAR_FIELD
Y
Y
Y
pressing a
button
(physical or
GUI) and
terminated
by releasing
it.
TAP
the tap and
release of a
button
(physical or
GUI) and
terminated
when a
StopCapture
directive is
received.
WAKEWORD
Audio stream
NEAR_FIELD ,
initiated by
the use of a
wake word
FAR_FIELD
and
terminated
when a
StopCapture
directive is
received.
StopCapture Directive
(http
This directive instructs your client to stop capturing a user’s speech after AVS has identified
the user’s intent or when end of speech is detected. When this directive is received, your
client must immediately close the microphone and stop listening for the user’s speech.
https://developer.amazon.com/docs/alexa-voice-service/speechrecognizer.html
6/9
6/6/2019
SpeechRecognizer Interface | Alexa Voice Service
Note: StopCapture is sent to your client on the downchannel stream and may be
received while speech is still being streamed to AVS. To receive the StopCapture directive,
you must use a profile in your Recognize event that supports cloud-endpointing, such
as NEAR_FIELD or FAR_FIELD .
Sample Message
{
"directive": {
"header": {
"namespace": "SpeechRecognizer",
"name": "StopCapture",
"messageId": "{{STRING}}",
"dialogRequestId": "{{STRING}}"
},
"payload": {
}
}
}
Header Parameters
Parameter
Description
Type
messageId
A unique ID used to represent a specific message.
string
dialogRequestId
A unique ID used to correlate directives sent in response to
string
a specific Recognize event.
ExpectSpeech Directive
ExpectSpeech is sent when Alexa requires additional information to fulfill a user's request. It
instructs your client to open the microphone and begin streaming user speech. If the
microphone is not opened within the specified timeout window, an ExpectSpeechTimedOut
event must be sent from your client to AVS.
During a multi-turn interaction with Alexa, your device will receive at least one ExpectSpeech
directive instructing your client to start listening for user speech. If present, the initiator
object included in the payload of the ExpectSpeech directive must be passed back to Alexa as
the initiator object in the following Recognize event. If initiator is absent from the
payload, the following Recognize event should not include initiator .
For information on the rules that govern audio prioritization, please review the Interaction
Model (/docs/alexa-voice-service/interaction-model.html).
Sample Message
(http
https://developer.amazon.com/docs/alexa-voice-service/speechrecognizer.html
7/9
6/6/2019
SpeechRecognizer Interface | Alexa Voice Service
{
"directive": {
"header": {
"namespace": "SpeechRecognizer",
"name": "ExpectSpeech",
"messageId": "{{STRING}}",
"dialogRequestId": "{{STRING}}"
},
"payload": {
"timeoutInMilliseconds": {{LONG}},
"initiator": {
"type": "{{STRING}}",
"payload": {
"token": "{{STRING}}"
}
}
}
}
}
Header Parameters
Parameter
Description
Type
messageId
A unique ID used to represent a specific message.
string
dialogRequestId
A unique ID used to correlate directives sent in response to
a specific Recognize event.
string
Parameter
Description
Type
timeoutInMillisecond
s
Specifies, in milliseconds, how long your client should wait
for the microphone to open and begin streaming user
long
Payload Parameters
speech to AVS. If the microphone is not opened within the
specified timeout window, then the
ExpectSpeechTimedOut event must be sent. The primary
use case for this behavior is a PRESS_AND_HOLD
implementation.
initiator
Contains information about the interaction. If present it
must be sent back to Alexa in the following Recognize
object
event.
initiator.type
An opaque string. If present it must be sent back to Alexa
in the following Recognize event.
string
initiator.payload
Includes information about the initiator.
object
initiator.payload.tok
en
An opaque string. If present it must be sent back to Alexa
in the following Recognize event.
string
ExpectSpeechTimedOut Event
This event must be sent to AVS if an ExpectSpeech directive was received, but was not
satisfied within the specified timeout window.
Sample Message
https://developer.amazon.com/docs/alexa-voice-service/speechrecognizer.html
(http
8/9
6/6/2019
SpeechRecognizer Interface | Alexa Voice Service
{
"event": {
"header": {
"namespace": "SpeechRecognizer",
"name": "ExpectSpeechTimedOut",
"messageId": "{{STRING}}",
},
"payload": {
}
}
}
Header Parameters
Parameter
Description
Type
messageId
A unique ID used to represent a specific message.
string
Payload Parameters
An empty payload should be sent.
(http
https://developer.amazon.com/docs/alexa-voice-service/speechrecognizer.html
9/9
Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.
Why Is My Information Online?