【已解决】调用语言识别接口传递wav录音尝试翻译识别出的英文的效果

<code>https://speech.platform.bing.com/speech/recognition/&lt;RECOGNITION_MODE&gt;/cognitiveservices/v1?language=&lt;LANGUAGE_TAG&gt;&amp;format=&lt;OUTPUT_FORMAT&gt;
</code>

好像是旧的的接口啊

Sample for Speech-to-Text – Microsoft Cognitive Services | Microsoft Docs

幸好有sample：

Azure-Samples/SpeechToText-REST: REST Samples of Speech To Text API

下载看看

结果是：

https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech/getstarted/getstartedrest?tabs=Powershell

旧的接口：

speech.platform.bing.com

-》“Utterances are limited to 15 seconds or less when using the REST API.”

https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech/concepts#using-the-speech-recognition-service-from-your-apps

“Convert a short spoken audio, for example, commands (audio length < 15 s) without interim results”

关于语音识别，传入禁区的wav录音，最大时长是：15秒

-》

https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech/getstarted/getstartedrest?tabs=Powershell

如果想要去除时间限制，可以使用SDK（client library）或Websocket协议

然后大致看懂了：

语音识别的：

新接口：

举例：

https://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US

文档：

https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech-service/rest-apis
没解释具体使用步骤
只说了句：参考（下面这个）旧的文档

旧接口：

举例：

https://speech.platform.bing.com/speech/recognition/<RECOGNITION_MODE>/cognitiveservices/v1?language=<LANGUAGE_TAG>&format=<OUTPUT_FORMAT>

文档：

https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech/getstarted/getstartedrest?tabs=Powershell
解释了具体使用步骤
其中包括了：

基本概念

Concepts | Microsoft Docs

支持的语言种类：

https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech/concepts#recognition-languages

返回响应：

https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech/concepts#transcription-responses

输出格式：

https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech/concepts#output-format

脏话处理

https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech/concepts#profanity-handling-in-speech-recognition

参考代码：

https://github.com/Azure-Samples/SpeechToText-REST

最终，关于API如何使用，还是要去好好看看：

https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech/getstarted/getstartedrest?tabs=Powershell

现在是：

已经知道了：

URL：

https://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US

需要搞清楚：

RECOGNITION_MODE

https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech/concepts#recognition-modes

Interactive模式

适合人对着机器说话
一般很短，比如2，3秒

Conversation模式

人与人对话
有时候还会说俚语（方言）

Dictation模式

往往说话很长，引用一些别的长的句子
一般比较长，比如5-8秒

此处选择：Interactive模式

OUTPUT_FORMAT

https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech/concepts#output-format
看起来用simple即可

-》

目前情况：

<code>POST https://westus.stt.speech.microsoft.com/speech/recognition/interactive/cognitiveservices/v1?language=en-US&amp;format=simple

Headers:
Ocp-Apim-Subscription-Key: your_key
Content-type:audio/wav; codec=audio/pcm; samplerate=16000
Accept: application/json;
Transfer-Encoding: chunked
</code>

注意：

此处实际上自己的Content-type是：

audio/wav; codec=audio/pcm; samplerate=44100

但是先这么去试试，看看能否使用再说。

所以，此处先要去在服务器端Flask中封装出API接口，供浏览器客户端调用此语言识别接口。

其中，需要先去实现：

【已解决】Flask的REST API添加支持POST时body中分块传输二进制数据

转载请注明：在路上 » 【已解决】调用语言识别接口传递wav录音尝试翻译识别出的英文的效果

【已解决】调用语言识别接口传递wav录音尝试翻译识别出的英文的效果

与本文相关的文章

Hi，您需要填写昵称和邮箱！