【RaspberryPi】Whisperで音声認識させてみた

OpenAIが開発した音声認識モデルであるWhisperを使って音声認識を試してみました。

前回はmacOS上にWhisperをインストールして音声認識を試しましたが、今回はRaspberryPiにWhisperをインストールして同様の音声認識ができるか検証します。

試した環境は以下です。

RaspberryPi 4
Linux raspberrypi 5.15.84-v8+ #1613 SMP PREEMPT Thu Jan 5 12:03:08 GMT 2023 aarch64 GNU/Linux
Python 3.9.2

「環境構築」→「音声ファイルの転送」→「Whisperの実行」の順に進めます。

1. WhisperをRaspberryPiにインストール
2. RaspberryPiに音声ファイルを転送
3. Pythonファイルを作成してWhisperを実行する
4. まとめ

1. WhisperをRaspberryPiにインストール

WhisperをRaspberryPiにインストールします。

Whisperのインストール方法の詳細は前回の記事もご参考ください。

基本的には、以下のコマンドを順に実行するだけです。

全体としては「pipをインストール」→「whisperをインストール」→「ffmpegをインストール」という流れになります。

1.1. Python3の開発パッケージをインストール

sudo apt-get -y install python3-dev

1.2. pipをインストール

sudo apt-get -y install python3-pip

1.3. whisperをインストール

pip install -U openai-whisper

1.4. ffmpegをインストール

sudo apt update && sudo apt install ffmpeg

1.5. ffmpeg-pythonをインストール

pip install ffmpeg-python

1.6. pythonでwhisperを実行

ここでインタプリタでPythonを起動してwhisperが実行できるか試してみます。

$ python
>>> import whisper
Illegal instruction

Illegal instructionというエラーが出て実行できませんでした。

1.7. Illegal instruction のデバッグ

エラーメッセージは「Illegal instruction」だけなので、ほとんど役に立ちません。

そこでデバッグのために「-v」オプションを付けてPythonを実行します。

「test_whisper.py」ファイルを作成して「import whisper」と書き込みます。

$ vi test_whisper.py
import whisper # これを書き込む

作成したPythonファイルを「-v」オプションを付けて実行します。

$ python -v test_whisper.py
...（省略）...
import 'torch.version' # <_frozen_importlib_external.SourceFileLoader object at 0x7f92703a60>
import 'torch.torch_version' # <_frozen_importlib_external.SourceFileLoader object at 0x7f9272b670>
Illegal instruction

torchで問題が起きてることがわかります。

torchのバージョンを確認します。

$ pip list | grep torch
torch 2.0.0

torchのバージョンは2.0.0になっていました。

ここでバージョンを下げてみましょう。

pip uninstall torch==2.0.0
pip install torch==1.13.1

これでWhisperをimportできるようになります。

$ python test_whisper.py 
/home/ユーザー名/.local/lib/python3.9/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def backtrace(trace: np.ndarray):

Warningが出ていますが今回は無視します。

2. RaspberryPiに音声ファイルを転送

Whisper使えるようになったら、RaspberryPiに音声ファイルを転送します。

今回は以下のmp3ファイルを使います。

「今日の天気は晴れのち曇りです」という機械音声です。

↓（weather.mp3）クリックしてダウンロードできます。

weather ダウンロード

これをRaspberryPiに転送します。

scp weather.mp3 ユーザー名@raspberrypi.local:~/

RaspberryPiの「~」ディレクトリ以下で「ls」コマンドを打って転送できていることを確認します。

$ ls ~ | grep weather.mp3
weather.mp3

weather.mp3が表示されれば転送できています。

3. Pythonファイルを作成してWhisperを実行する

以下のPythonファイルをRaspberryPi上に作成します。

$ cd ~
$ vi test_whisper.py

以下のコードをコピーペーストします。

import whisper

model = whisper.load_model("tiny") # tinyモデルを指定する
result = model.transcribe("weather.mp3") # 音声ファイルを指定する
print(result["text"]) # 認識結果を出力

whisper.load_modelの引数でモデルを指定できますが、RaspberryPiの計算リソースは小さいためtinyモデルを使用します。

作成できたらPythonファイルを実行します。

$ python test_whisper.py
/home/ユーザー名/.local/lib/python3.9/site-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
明日の天気は晴れのチクモリです

UserWarningは出ていますが動作しています。

認識精度はmacOSでtinyモデルを実行した場合と同じです。

実行時間は10秒程度でした。