Reverse Engineering the Speaking Piano

I was really intrigued by Peter Ablinger’s Speaking Piano (officially titled Quadraturen, auf Deutsch)–a system that takes human speech and translates it to a sequence of notes to be played on a piano by a bunch of solenoids, or “mechanical fingers.”

Since I’ve been learning Puredata, I thought it would be a fun exercise to attempt to recreate the software Ablinger wrote to translate speech to midi notes. The secondary purpose was to turn my oft-idle digital piano into an interactive sound piece, translating sound from another part of the house into music downstairs. The result isn’t perfect, but I think it still achieves the same ambiguous result where you are able to hear the voice once you see the transcript. The biggest difference is obviously that I’m using a digital piano, not a mechanically-actuated analog piano. However, the Roland has a fairly sophisticated physical model with things like dampening and string resonance, so it’s better than nothing.

Below are the two components to the software. Clicking the image will link to the pd file if you’d like to experiment yourself. You can either load in a pre-recorded wave file and play it back, or set the gain to the adc~ to 1 and use a microphone to drive it in real time (although I set up a delay of about 3 seconds so I could evaluate the results without hearing my own voice). delread~ passes the data into fiddle~ which does all the hard work of Fourier analysis. A metronome set to 15 ms samples the outputs of the individual sine components and creates midi notes. The blocks that create the actual notes are partial_key.  The highest key on the piano is midi 108, which corresponds to 4186 Hz, so I added a low pass filter to remove frequencies that couldn’t be reproduced.


partial_key.pd creates the midi notes which are sent to the piano. The signal makenote_b is received from the metronome, which causes the note to be made. Additionally, no note is sent if the midi key number is higher than 108, the limit of my piano, or if the amplitude is too small (< 0.01).


Here is a sample of the result, speaking the following: “these are very profound words, which is why they are being spoken by a piano. I hope you are forever moved by these profound words.”

I’d love any feedback from Puredata or DSP gurus on how the software could be improved. I’m not quite sure what sorts of additional analysis and synthesis steps are being taken in the Quadraturen software as it is not available.