A digital pattern playback system implemented in Python

The software converts images of magnitude spectrogram to sounds. Image of the spectrogram may be loaded from a file (e.g. png, jpg), cropped from the loaded file, or drawn from scratch on a blank canvas. The spectrogram is converted to waveform via inverse short-term Fourier transform using zero phase spectrum or the Griffin-Lim algorithm (Griffin & Lim, 1984).

Download

Download and unzip this file. It should contain

pattern_playback.py
tutorial.pdf
README.md
environment.yml
An examples folder

The files are also available on github here.

How to cite

Koo, H. (2022). A digital pattern playback system implemented in Python. Journal of the Acoustical Society of America, 151(4), A132.

How to install and use the software

Read tutorial.pdf in the zip file. It's also available here.

Examples

Loading image from a jpg file (from here) and converting to waveform using the Griffin-Lim algorithm:

python pattern_playback.py --duration 3 --sampling_rate 16000 --load ./examples/example1.jpg --griffinlim --save_wav ./examples/example1.wav --show_graphs

Input
Output

Cropping image from a png file (from Ladefoged & Johnson, 2014) and converting to waveform using the Griffin-Lim algorithm:

python pattern_playback.py --duration 0.4 --sampling_rate 8000 --load ./examples/example2.png --crop --griffinlim --save_wav ./examples/example2.wav --show_graphs

Input
Output

Cropping image from a png file (from Cooper et al., 1952) and converting to waveform assuming zero phase spectrum:

python pattern_playback.py --duration 1.1 --sampling_rate 8000 --load ./examples/example4.png --crop --save_wav ./examples/example4.wav --show_graphs

Input
Output

Drawing on a blank canvas (my attempt to recreate "salmon" in Cooper the figure above) and converting to waveform assuming zero phase spectrum:

python pattern_playback.py --duration 0.5 --sampling_rate 8000 --draw --save_drawing ./examples/example3.png --save_wav ./examples/example3.wav --show_graphs

Input
Output

References

Cooper, F. S., Delattre, P. C., & Liberman, A. M. (1952). Some experiments on the perception of synthetic speech sounds. Journal of the Acoustical Society of America, 24(6), 597-606.

Griffin, D., & Lim, J. (1984). Signal estimation from modified short-time Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2), 236-243.

Ladefoged, P., & Johnson, K. (2014). A Course in Phonetics. Cengage Learning.