首先明确需求和操作步骤,应用必须要实时监听获取面试官的提问,然后手动确认问题调用大模型api流式输出,这里面第一个技术是实时获取面试官的语音问题转成文字,这里推荐使用开源的whisper-large-v3-turbo
使用/whisper-large-v3-turbo实时获取说话者的文本后再点击确认调用ai大模型的api即可实现
import torch from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline from datasets import load_dataset device = "cuda:0" if torch.cuda.is_available() else "cpu" torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32 model_id = "openai/whisper-large-v3-turbo" model = AutoModelForSpeechSeq2Seq.from_pretrained( model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True ) model.to(device) processor = AutoProcessor.from_pretrained(model_id) pipe = pipeline( "automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, torch_dtype=torch_dtype, device=device, ) dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation") sample = dataset[0]["audio"] result = pipe(sample) print(result["text"])
https://huggingface.co/openai/whisper-large-v3-turbo
体验地址:https://huggingface.co/spaces/KingNish/Realtime-whisper-large-v3-turbo
网友回复
acejs代码编辑器如何调用openai api实现选择代码修改与代码自动补全?
ace.js如何获取选择文本的开始和结束行数?
如何把qwen code cli或gemini cli的免费调用额度换成http api对外开放接口?
如何限制windows10电脑只能打开指定的程序?
python如何调用ai大模型实现web网页系统的功能测试并生成测试报告?
有没有免费进行web网站ai仿真人测试生成测试报告的mcp服务或api?
Context Engineering到底是啥,有什么用?
如何使用Google veo 3+高斯溅射(Gaussian Splatting)技术生成4d视频?
浏览器中如何实时调用摄像头扫描二维码?
grok4、gemini2.5pro、gpt5、claude4.1到底谁的编程能力更强一些?