【Android进阶】Android平台的文字转语音使用记录

本文介绍了在Android平台上使用TextSpeech实现文字转语音的使用记录。
背景
最近在做AI大模型对接的一些功能,调用完chat接口返回结果之后,发现豆包和Kimi等客户端都有语音播报功能,并且这些大厂经过一系列调优,可以实现很好听的音色和节奏停顿的效果。
那个人开发者可不可以在系统自带的免费语音助手的基础上做一个tts(Text To Speech)的播报呢?
调查发现Google已经有相关的接口了,并且尝试使用魅族20Pro手机成功实现了语音播报效果,记录一下使用过程。
TextSpeech
TextSpeech 是Android平台的文字转语音的接口,可以将文本合成为语音,可以支持立即播放,或者存储为音频文件。
初始化实例
创建实例需要传入两个参数,一个Context,一个连接的监听器,监听器会在初始化完成后回调。
/**
* The constructor for the TextToSpeech class, using the default TTS engine.
* This will also initialize the associated TextToSpeech engine if it isn't already running.
*
* @param context
* The context this instance is running in.
* @param listener
* The {@link TextToSpeech.OnInitListener} that will be called when the
* TextToSpeech engine has initialized. In a case of a failure the listener
* may be called immediately, before TextToSpeech instance is fully constructed.
*/
public TextToSpeech(Context context, OnInitListener listener) {
this(context, listener, null);
}
播放停止与释放
就播放功能来说,使用起来非常简单,只需要创建一个TextSpeech对象,然后调用speak方法即可。
方法签名:
public int speak(final CharSequence text,
final int queueMode,
final Bundle params,
final String utteranceId) {
return runAction((ITextToSpeechService service) -> {
Uri utteranceUri = mUtterances.get(text);
if (utteranceUri != null) {
return service.playAudio(getCallerIdentity(), utteranceUri, queueMode,
getParams(params), utteranceId);
} else {
return service.speak(getCallerIdentity(), text, queueMode, getParams(params),
utteranceId);
}
}, ERROR, "speak");
}
参数说明: text:要转换的文本 queueMode:播放模式,有三种:
QUEUE_ADD、QUEUE_FLUSH、QUEUE_MODE_DEFAULTparams:参数,包括语音的语言、音调、语速等 utteranceId:唯一标识,用于区分不同的语音
停止时调用该对象的 stop() 方法,使用完毕退出时,需要调用 shutdown() 方法来释放引擎所使用的原生资源。我猜会这里会占用系统的多媒体编解码器连接,使用完需要及时释放防止其他app播放多媒体资源出错。
工具类完整代码
使用object实现单例,全局共享,在viewmodel里初始化,给界面提供接口。
object SpeechUtils {
private lateinit var textToSpeech: TextToSpeech
private const val TAG = "SpeechUtils"
private const val TEST_IDENTIFIER = "test"
private const val TEST_HELLO = "Hi, How are you? I'm fine. Thank you. And you?"
private var isConnected = false
val ttsConnectedListener = TextToSpeech.OnInitListener { status ->
Log.d(TAG, "OnInitListener status: $status")
isConnected = status == TextToSpeech.SUCCESS
}
fun init() {
textToSpeech = TextToSpeech(appContext, ttsConnectedListener)
}
fun speak(text: String = TEST_HELLO, locale: Locale = Locale.US) {
Log.d(TAG, "==========>speak<=========")
if (isConnected) {
textToSpeech.language = locale
textToSpeech.speak(
text,
TextToSpeech.QUEUE_ADD,
null,
TEST_IDENTIFIER
)
} else {
Log.d(TAG, "==========>TTS is not connected!<=========")
}
}
fun stop() {
textToSpeech.stop()
}
fun shutdown() {
textToSpeech.shutdown()
}
}
Viewmodel代码:
class MainStateHolder(
private val retroService: RetroService,
private val ktorClient: KtorClient,
) : ViewModel() {
companion object {
const val TAG = "MainStateHolder"
const val TOKEN_KEY = "token"
const val USER_NAME_KEY = "user_name"
}
init {
SpeechUtils.init()
}
// ...
fun speak(text: String, locale: Locale) {
SpeechUtils.speak(text, locale)
}
fun stopSpeech(){
SpeechUtils.stop()
}
override fun onCleared() {
super.onCleared()
ktorClient.release()
SpeechUtils.shutdown()
}
}
界面使用时,服务器返回值时调用播放,页面取消组合时,调用stop停止。同时,加入LifeCycle感知,在Activity退到后台,也调用停止接口:
@Composable
fun MyServerPage(
mainStateHolder: MainStateHolder,
lifecycleOwner: LifecycleOwner = LocalLifecycleOwner.current,
onBackStack: () -> Unit,
) {
BasePage("个人服务器测试", onCickBack = onBackStack) {
LaunchedEffect(Unit) {
mainStateHolder.getMyServerResponse()
}
val myResponse = mainStateHolder.myServerResponseStateFlow.collectAsState().value
LaunchedEffect(myResponse) {
if (myResponse.isNotEmpty()) {
mainStateHolder.speak(myResponse, Locale.US)
}
}
Box(
modifier = Modifier.fillMaxSize(),
contentAlignment = androidx.compose.ui.Alignment.Center
) {
Text(text = myResponse)
}
DisposableEffect(lifecycleOwner) {
Log.i("MyServerPage", "MyServerPage ${lifecycleOwner.lifecycle.currentState}")
val observer = LifecycleEventObserver { _, event ->
if (event == Lifecycle.Event.ON_STOP) {
// 当 Activity 退到后台时,Lifecycle 会触发 ON_STOP 事件
mainStateHolder.stopSpeech()
}
}
lifecycleOwner.lifecycle.addObserver(observer)
onDispose {
mainStateHolder.stopSpeech()
lifecycleOwner.lifecycle.removeObserver(observer)
}
}
}
}
后续尝试使用付费版本的本地引擎,集成aar到本地进行调用,达到更好的播放效果。使用方式应该都是按照原生的接口设计。