Zum Hauptinhalt springen

Techniques for interpreting spoken input using non-verbal cues

DISNEY ENTERPRISES, INC.
2024
Online Patent

Titel:
Techniques for interpreting spoken input using non-verbal cues
Autor/in / Beteiligte Person: DISNEY ENTERPRISES, INC.
Link:
Veröffentlichung: 2024
Medientyp: Patent
Sonstiges:
  • Nachgewiesen in: USPTO Patent Grants
  • Sprachen: English
  • Patent Number: 11887,600
  • Publication Date: January 30, 2024
  • Appl. No: 16/593938
  • Application Filed: October 04, 2019
  • Assignees: DISNEY ENTERPRISES, INC. (Burbank, CA, US)
  • Claim: 1. A computer-implemented method for interpreting spoken user input, the method comprising: determining a first time marker associated with a first prediction that is generated based on a first type of non-verbal cue and a second time marker associated with a second prediction that is generated based on a second type of non-verbal cue; determining that the first time marker and the second time marker fall within a relevance time window associated with a base time marker for a first text input that has been derived from a first spoken input received from a user; upon determining that the first time marker and the second time marker fall within the relevance time window, generating a first predicted context based on a function of the first text input, the first prediction, the second prediction, a first weight indicating a relative contribution of the first text input to the first predicted context, a second weight indicating a relative contribution of the first prediction to the first predicted context, and a third weight indicating a relative contribution of the second prediction to the first predicted context; and transmitting the first text input and the first predicted context to at least one software application that subsequently performs one or more additional actions based on the first text input and the first predicted context.
  • Claim: 2. The computer-implemented method of claim 1 , wherein the first prediction is based on one or more user actions.
  • Claim: 3. The computer-implemented method of claim 1 , wherein the one or more additional actions performed by the at least one software application comprise generating a first text output based on the first text input and the first predicted context.
  • Claim: 4. The computer-implemented method of claim 1 , wherein generating the first predicted context comprises inputting the first prediction and the second prediction into a trained machine learning model that, in response, outputs a composite prediction that is included in the first predicted context.
  • Claim: 5. The computer-implemented method of claim 1 , wherein generating the first predicted context comprises applying one or more rules to the first prediction and the second prediction to compute a composite prediction that is included in the first predicted context.
  • Claim: 6. The computer-implemented method of claim 1 , wherein the first predicted context relates to at least one of an intention, an emotion, a personality trait, a user identification, a level of attentiveness, or a user action.
  • Claim: 7. The computer-implemented method of claim 1 , wherein generating the first predicted context comprises computing a composite prediction based on the first prediction and the second prediction, wherein the second prediction predicts at least one of a personality trait or a user identification.
  • Claim: 8. The computer-implemented method of claim 1 , further comprising: determining that a third prediction is relevant to a second text input that has been derived from a second spoken input received from the user; generating a second predicted context based on the third prediction and the first predicted context; and transmitting the second text input and the second predicted context to the at least one software application.
  • Claim: 9. The computer-implemented method of claim 1 , wherein determining that the first time marker and the second time marker fall within the relevance time window comprises determining the relevance time window based on a prediction type associated with the first prediction.
  • Claim: 10. The computer-implemented method of claim 1 , wherein the first prediction is generated by inputting at least one of an audible input associated with the user or a visual input associated with the user into a trained machine-learning model.
  • Claim: 11. The computer-implemented method of claim 1 , further comprising generating a fused output that includes the first text input, the base time marker, and the first predicted context, wherein transmitting the first text input and the first predicted context comprises transmitting the fused output to the at least one software application.
  • Claim: 12. One or more non-transitory computer readable media including instructions that, when executed by one or more processors, cause the one or more processors to interpret spoken user input by performing the steps of: determining a first time marker associated with a first prediction that is generated based on a first type of non-verbal cue and a second time marker associated with a second prediction that is generated based on a second type of non-verbal cue; determining that the first time marker and the second time marker fall within a relevance time window associated with a base time marker for a first text input that has been derived from a first spoken input received from a user; upon determining that the first time marker and the second time marker fall within the relevance time window, generating a first predicted context based on a function of the first text input, the first prediction, the second prediction, a first weight indicating a relative contribution of the first text input to the first predicted context, a second weight indicating a relative contribution of the first prediction to the first predicted context, and a third weight indicating a relative contribution of the second prediction to the first predicted context; and transmitting the first text input and the first predicted context to at least one software application that subsequently performs one or more additional actions based on the first text input and the first predicted context.
  • Claim: 13. The one or more non-transitory computer readable media of claim 12 , wherein the first type of non-verbal cue comprises at least one of a non-verbal sound, a gesture, or a facial expression.
  • Claim: 14. The one or more non-transitory computer readable media of claim 12 , wherein the one or more additional actions performed by the at least one software application comprise generating a first text output based on the first text input and the first predicted context.
  • Claim: 15. The one or more non-transitory computer readable media of claim 12 , wherein generating the first predicted context comprises inputting the first prediction and the second prediction into a trained machine learning model that, in response, outputs a composite prediction that is included in the first predicted context.
  • Claim: 16. The one or more non-transitory computer readable media of claim 12 , wherein the first prediction predicts at least one of an intention, an emotion, a personality trait, a user identification, a level of attentiveness, or a user action.
  • Claim: 17. The one or more non-transitory computer readable media of claim 12 , wherein generating the first predicted context comprises computing a composite prediction based on the first prediction and the second prediction, wherein the second prediction predicts at least one of a personality trait or a user identification.
  • Claim: 18. The one or more non-transitory computer readable media of claim 17 , wherein the second prediction is generated by inputting at least one of an audible input associated with the user or a visual input associated with the user into a trained machine-learning model.
  • Claim: 19. The one or more non-transitory computer readable media of claim 12 , wherein the instructions, when executed by the one or more processors, further cause the one or more processors to perform the steps of: determining that a third prediction is relevant to a second text input that has been derived from a second spoken input received from the user; generating a second predicted context based on the third prediction and the first predicted context; and transmitting the second text input and the second predicted context to the at least one software application.
  • Claim: 20. The one or more non-transitory computer readable media of claim 12 , wherein determining that the first time marker and the second time marker fall within the relevance time window comprises determining the relevance time window based on a prediction type associated with the first prediction.
  • Claim: 21. A system, comprising: one or more memories storing instructions; and one or more processors that are coupled to the one or more memories and, when executing the instructions, are configured to: determine a first time marker associated with a first prediction that is generated based on a first type of non-verbal cue and a second time marker associated with a second prediction that is generated based on a second type of non-verbal cue; determine that the first time marker and the second time marker fall within a relevance time window associated with a base time marker for a first text input that has been derived from a first spoken input received from a user; upon determining that the first time marker and the second time marker fall within the relevance time window, generate a first predicted context based on a function of the first text input, the first prediction, the second prediction, a first weight indicating a relative contribution of the first text input to the first predicted context, a second weight indicating a relative contribution of the first prediction to the first predicted context, and a third weight indicating a relative contribution of the second prediction to the first predicted context; and transmit the first text input and the first predicted context to at least one software application that subsequently performs one or more additional actions based on the first text input and the first predicted context.
  • Patent References Cited: 10831442 November 2020 Fox ; 20100280983 November 2010 Cho ; 20100306796 December 2010 McDonald ; 20170140041 May 2017 Dotan-Cohen ; 20170160813 June 2017 Divakaran ; 20180232571 August 2018 Bathiche ; 20190289372 September 2019 Merler ; 20200380389 December 2020 Eldeeb
  • Primary Examiner: Armstrong, Angela A
  • Attorney, Agent or Firm: Artegis Law Group, LLP

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

oder
oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

oder
oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.

xs 0 - 576
sm 576 - 768
md 768 - 992
lg 992 - 1200
xl 1200 - 1366
xxl 1366 -