{"id":17290,"date":"2018-03-28T12:00:48","date_gmt":"2018-03-28T19:00:48","guid":{"rendered":"https:\/\/www.kith.org\/words\/?p=17290"},"modified":"2018-03-28T12:17:33","modified_gmt":"2018-03-28T19:17:33","slug":"ai-aided-text-to-speech","status":"publish","type":"post","link":"https:\/\/www.kith.org\/words\/2018\/03\/28\/ai-aided-text-to-speech\/","title":{"rendered":"AI-aided text-to-speech"},"content":{"rendered":"\r\n<p>I\u2019ve written here in the past about speech recognition (<a href=\"https:\/\/www.kith.org\/words\/1998\/07\/26\/ddragon\/\">column DD<\/a>, and <a href=\"https:\/\/www.kith.org\/words\/2009\/07\/15\/wreck-an-icings-beach-again\/\">brief notes on Google Voice<\/a>), but I haven\u2019t written much about speech synthesis, except for a post about <a href=\"https:\/\/www.kith.org\/words\/2012\/11\/07\/daisy-50-years-of-speech-synth\/\">song synthesis<\/a> and an aside in <a href=\"https:\/\/www.kith.org\/words\/1999\/04\/25\/iiintonation\/\">column iii<\/a>.<\/p>\r\n<p>So I\u2019m pleased to note that Google has made some remarkable improvements in text-to-speech lately.<\/p>\r\n<p>For example, as I posted elsewhere at the time, a couple months ago Google provided <a href=\"https:\/\/google.github.io\/tacotron\/publications\/tacotron2\/index.html\">audio samples<\/a> from a paper titled \u201cNatural TTS [Text To Speech] Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions.\u201d I recommend going to that page, and without listening to any of the earlier samples on the page, scroll down to the \"Tacotron 2 or Human?\" section. Listen to the four pairs of recordings, and see if you can tell which one is machine-generated and which is a human in each pair.<\/p>\r\n<p>(Google apparently hasn't said what the answers are, but an <a href=\"https:\/\/www.inc.com\/minda-zetlin\/googles-new-text-to-speech-ai-is-so-good-we-bet-you-cant-tell-it-from-a-real-human.html\">article at Inc<\/a> provides a likely-sounding meta-criterion.)<\/p>\r\n<p>After you've listened to the computer-or-human samples, it's worth also listening to the other samples on the page. Most of those do still sound machine-generated to me, but not nearly as much so as most text-to-speech systems.<\/p>\r\n<p>And this week, Google announced a new <a href=\"https:\/\/www.theverge.com\/2018\/3\/27\/17167200\/google-ai-speech-tts-cloud-deepmind-wavenet\">text-to-speech service<\/a> called Cloud Text-to-Speech that anyone can use. It\u2019s partly powered by machine-learning software called WaveNet, which uses different techniques for putting together speech sounds than most traditional speech synthesis software has used.<\/p>\r\n<p>The main <a href=\"https:\/\/cloud.google.com\/text-to-speech\/\">documentation page<\/a> lets you enter text and choose one of the many available voices and accents to read the text aloud. Unfortunately, only a few of those voices are currently powered by WaveNet (it may only be available for US English, not sure); but I\u2019m hoping there\u2019ll be a wider range of WaveNet voices soon.<\/p>\r\n\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[81,84],"tags":[],"class_list":["post-17290","post","type-post","status-publish","format-standard","hentry","category-software","category-speech-spoken"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/posts\/17290","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/comments?post=17290"}],"version-history":[{"count":5,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/posts\/17290\/revisions"}],"predecessor-version":[{"id":17296,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/posts\/17290\/revisions\/17296"}],"wp:attachment":[{"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/media?parent=17290"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/categories?post=17290"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kith.org\/words\/wp-json\/wp\/v2\/tags?post=17290"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}