Systems and methods for providing non-lexical cues in synthesized speech

Corporation, Intel

2023

Online Patent

Zugriff:

View record in USPTO Patent Grants (Volltext)

Systems and methods are disclosed for providing non-lexical cues in synthesized speech. An example system includes processor circuitry to generate a breathing cue to enhance speech to be synthesized from text; determine a first insertion point of the breathing cue in the text, wherein the breathing cue is identified by a first tag of a markup language; generate a prosody cue to enhance speech to be synthesized from the text; determine a second insertion point of the prosody cue in the text, wherein the prosody cue is identified by a second tag of the markup language; insert the breathing cue at the first insertion point based on the first tag and the prosody cue at the second insertion point based on the second tag; and trigger a synthesis of the speech from the text, the breathing cue, and the prosody cue.

Titel:	Systems and methods for providing non-lexical cues in synthesized speech
Autor/in / Beteiligte Person:	Corporation, Intel
Link:	View record in USPTO Patent Grants (Volltext)
Veröffentlichung:	2023
Medientyp:	Patent
Sonstiges:	Nachgewiesen in: USPTO Patent Grants Sprachen: English Patent Number: 11848,001 Publication Date: December 19, 2023 Appl. No: 17/848028 Application Filed: June 23, 2022 Assignees: Intel Corporation (Santa Clara, CA, US) Claim: 1. A system comprising: at least one memory; machine readable instructions; and processor circuitry to at least one of instantiate or execute the machine readable instructions to: generate a breathing cue to enhance speech to be synthesized from text; determine a first insertion point of the breathing cue in the text, wherein the breathing cue is identified by a first tag of a markup language; generate a prosody cue to enhance speech to be synthesized from the text; determine a second insertion point of the prosody cue in the text, wherein the prosody cue is identified by a second tag of the markup language; insert the breathing cue at the first insertion point based on the first tag and the prosody cue at the second insertion point based on the second tag; and trigger a synthesis of the speech from the text, the breathing cue, and the prosody cue. Claim: 2. The system of claim 1 , wherein the breathing cue is to fill a pause. Claim: 3. The system of claim 1 , wherein the prosody cue is to fill a pause. Claim: 4. The system of claim 1 , wherein processor circuitry to: generate a phrasal stress cue to enhance the speech to be synthesized from the text; and determine a third insertion point of the phrasal stress cue in the text. Claim: 5. The system of claim 4 , wherein processor circuitry to trigger the synthesis of the speech from the phrasal stress cue. Claim: 6. The system of claim 1 , wherein processor circuitry to: generate an intonation cue to enhance the speech to be synthesized from the text; and determine a third insertion point of the intonation cue in the text. Claim: 7. The system of claim 6 , wherein processor circuitry to trigger the synthesis of the speech from the intonation cue. Claim: 8. The system of claim 1 , wherein processor circuitry to: generate a disfluency cue to enhance the speech to be synthesized from the text; and determine a third insertion point of the disfluency cue in the text. Claim: 9. The system of claim 8 , wherein processor circuitry to trigger the synthesis of the speech from the disfluency cue. Claim: 10. The system of claim 1 , wherein processor circuitry to: identify an intent of a user; and identify the text based on the intent. Claim: 11. The system of claim 1 , wherein processor circuitry to: identify an intent of a user; and execute a command based on the intent. Claim: 12. At least one storage device comprising computer readable instructions that, when executed, cause at least one machine to at least: generate a breathing cue to enhance speech to be synthesized from text, the breathing cue identified by a first tag of a markup language; generate a prosody cue to enhance speech to be synthesized from the text, the prosody cue identified by a second tag of the markup language; insert the breathing cue at a first insertion point in the text based on the first tag; insert the prosody cue at a second insertion point in the text based on the second tag; and transmit data to cause a device to synthesize the speech from the text, the breathing cue, and the prosody cue. Claim: 13. The at least one storage device of claim 12 , wherein the breathing cue is to fill a pause. Claim: 14. The at least one storage device of claim 12 , wherein the prosody cue is to fill a pause. Claim: 15. The at least one storage device of claim 12 , wherein the instructions cause the at least one machine to: generate a phrasal stress cue to enhance the speech to be synthesized from the text; and insert the phrasal stress cue at a third insertion point in the text. Claim: 16. The at least one storage device of claim 12 , wherein the instructions cause the at least one machine to: generate an intonation cue to enhance the speech to be synthesized from the text; and insert the intonation cue at a third insertion point in the text. Claim: 17. The at least one storage device of claim 12 , wherein the instructions cause the at least one machine to: generate a disfluency cue to enhance the speech to be synthesized from the text; and insert the disfluency cue at a third insertion point in the text. Claim: 18. The at least one storage device of claim 12 , wherein the instructions cause the at least one machine to: identify an intent of a user; and identify the text based on the intent. Claim: 19. The at least one storage device of claim 12 , wherein the instructions cause the at least one machine to: identify an intent of a user; and execute a command based on the intent. Patent References Cited: 6226614 May 2001 Mizuno et al. ; 6236966 May 2001 Fleming ; 6282599 August 2001 Gallick et al. ; 7617188 November 2009 Hu et al. ; 7685140 March 2010 Jackson ; 7689617 March 2010 Parikh ; 8935151 January 2015 Petrov et al. ; 8972259 March 2015 Tepperman et al. ; 9223537 December 2015 Brown et al. ; 9223547 December 2015 Endresen et al. ; 9305544 April 2016 Petrov et al. ; 9524650 December 2016 Yavari ; 9542929 January 2017 Christian ; 9721573 August 2017 Fritsch ; 9767788 September 2017 Li ; 10026393 July 2018 Christian ; 10445668 October 2019 Oehrle ; 10679606 June 2020 Christian ; 11398217 July 2022 Christian ; 11404043 August 2022 Christian ; 20060217966 September 2006 Hu et al. ; 20070094030 April 2007 Xu ; 20120065977 March 2012 Tepperman et al. ; 20120084248 April 2012 Gavrilescu ; 20130006952 January 2013 Wong et al. ; 20130282688 October 2013 Wong et al. ; 20130289998 October 2013 Eller et al. ; 20150371626 December 2015 Li ; 20170256252 September 2017 Christian et al. ; 20180227417 August 2018 Segalis et al. ; 20190115007 April 2019 Christian et al. ; 20200243064 July 2020 Christian et al. ; 20200243065 July 2020 Christian et al. ; 1208910 February 1999 ; 1602483 March 2005 ; 1604183 April 2005 ; 1945693 April 2007 ; 101000764 July 2007 ; 101000765 July 2007 ; 101504643 August 2009 ; 102368256 March 2012 ; 103366731 October 2013 ; 103620605 March 2014 ; 104021784 September 2014 ; 1363200 November 2003 Other References: Allen, “Linguistic Aspects of Speech Synthesis,” Proceedings of the National Academy of Science, vol. 92, Colloquium Paper, Oct. 1995, pp. 9946-9952. cited by applicant ; Sprout, “Multilingual Text Analysis for Text-to-Speech Synthesis,” IEEE, vol. 3, Oct. 3, 1996, pp. 1365-1368. cited by applicant ; United States Patent and Trademark Office, “Non-final Office Action,” issued in connection with U.S. Appl. No. 14/497,994, dated May 5, 2016, 5 pages. cited by applicant ; United States Patent and Trademark Office, “Notice of Allowance,” issued in connection with U.S. Appl. No. 14/497,994, dated Sep. 19, 2016, 7 pages. cited by applicant ; Arnold et al., “Disfluencies Signal Theee, Um, New Information,” Journal of Psycholinguistic Research, vol. 32, No. 1, Jan. 2003, pp. 25-36. cited by applicant ; International Searching Authority, “International Search Report and Written Opinion,” issued in connection with International Patent Application No. PCT/US2015/047534, dated Oct. 30, 2015, 11 pages. cited by applicant ; Shriver et al., “Audio Signals in Speech Interfaces,” Language Technologies Institute, Carnegie Mellon University, 2000, 7 pages. cited by applicant ; Tang et al., “Humanoid Audio-Visual Avatar with Emotive Text-to-Speech Synthesis,” IEEE Transactions on Multimedia, vol. 10, No. 6, Oct. 6, 2008, pp. 969-981. cited by applicant ; United States Patent and Trademark Office, “Non-final Office Action,” issued in connection with U.S. Appl. No. 15/384,148, dated Oct. 18, 2017, 7 pages. cited by applicant ; United States Patent and Trademark Office, “Notice of Allowance,” issued in connection with U.S. Appl. No. 15/384,148, dated Mar. 27, 2018, 5 pages. cited by applicant ; European Patent Office, “Extended European Search Report,” issued in connection with European Patent application No. 15844926.4, dated Apr. 30, 2018, 7 pages. cited by applicant ; United States Patent and Trademark Office, “Non-Final Office action,” issued in connection with U.S. Appl. No. 16/037,872, dated Aug. 12, 2019, 13 pages. cited by applicant ; United States Patent and Trademark Office, “Notice of Allowance,” issued in connection with U.S. Appl. No. 16/037,872, dated Feb. 3, 2020, 12 pages. cited by applicant ; Patent Cooperation Treaty “International Preliminary Report on Patentability” dated Mar. 28, 2017, issued in International Application No. PCT/US2015/047534, 7 pages. cited by applicant ; United States Patent and Trademark Office “Non-Final Office Action” dated Nov. 3, 2021, issued in related U.S. Appl. No. 16/851,457, 12 pages. cited by applicant ; United States Patent and Trademark Office “Notice of Allowance” dated Mar. 3, 2022, issued in related U.S. Appl. No. 16/851,457, 8 pages. cited by applicant ; Nick Campbell, “Specifying Affect and Emotion for Expressive Speech Synthesis,” Lecture Notes in Computer Science, 2004, pp. 395-406, vol. 2945, ATR Human Information Science Laboratories, Kyoto, Japan. cited by applicant ; European Patent Office, “Examination Report,” issued in connection with Patent Application No. 15 844 926.4-1231, dated Apr. 5, 2021, 9 pages. cited by applicant ; Jonathan Allen, “Linguistic aspects of speech synthesis,” Human-Machine Communication by Voice, Feb. 8-9, 1993, 7 pages, Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA., United States of America. cited by applicant ; Shive Sundaram and Shrikanth Narayanan, “An Empirical Text Transformation Method for Spontaneous Speech Synthesizers,” Eurospeech, 2003, 4 pages, Department of Electrical Engineering-Systems and Integrated Media Systems Center, University of Southern California, Los Angelos, CA, United States of America. cited by applicant ; Chinese Patent Office,“Notification to Grant Patent right for Invention,” issued in connection with Chinese patent application No. 201580045620.x, dated Dec. 25, 2020, 7 pages. cited by applicant ; Wata et al., “Speaker's intentions conveyed to listeners by sentence-final particles and their intonations in Japanese conversational speech,” retrieved from https://waseda.pure.elsevier.com/en/publications/speakers-intentions-conveyed-to-listeners-by-sentence-final-parti, on Feb. 24, 2021,4 pages. Abstract only. cited by applicant ; United States Patent and Trademark Office, “Notice of Allowance,” issued in connection with U.S. Appl. No. 16/851,457, dated Jun. 2, 2022, 5 pages. cited by applicant ; Chinese Patent Office,“Second Office action and Search report,” issued in connection with Chinese patent application No. 201580045620.X., dated Sep. 10, 2020, 8 pages. cited by applicant ; Chinese Patent Office,“First Office Action,” issued in connection with Chinese patent application No. 201580045620.x, dated Mar. 26, 2020, 39 pages. cited by applicant ; United States Patent and Trademark Office, “Notice of Allowability,” issued in connection with U.S. Appl. No. 16/851,444, dated Jun. 10, 2022, 6 pages. cited by applicant ; United States Patent and Trademark Office, “Notice of Allowance,” issued in connection with U.S. Appl. No. 16/851,444, dated May 18, 2022, 5 pages. cited by applicant ; United States Patent and Trademark Office, “Notice of Allowance,” issued in connection with U.S. Appl. No. 16/851,444, dated Mar. 2, 2022, 8 pages. cited by applicant ; United States Patent and Trademark Office, “Non-Final Office action,” issued in connection with U.S. Appl. No. 16/851,444, dated Mar. 2, 2022, 15 pages. cited by applicant ; United States Patent and Trademark Office, “Restriction Election,” issued in connection with U.S. Appl. No. 16/851,444, dated Jul. 21, 2022, 6 pages. cited by applicant ; United States Patent and Trademark Office, “Restriction Election,” issued in connection with U.S. Appl. No. 16/851,457 dated Jul. 30, 2022, 6 pages. cited by applicant ; European Patent Office “Communication pursuant to Article 94(3) EPC” dated Apr. 5, 2021, issued in connection with European application No. 15844926.4, 8 pages. cited by applicant Primary Examiner: McFadden, Susan I Attorney, Agent or Firm: HANLEY, FLIGHT & ZIMMERMAN, LLC

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.