Creating a 'Winning' Audio Lambda Service using Serverless, Polly and compiled SOX
Following on from my previous post which discussed manipulating images, I would now like to expand upon this and look into how you can interact with audio using Lambda. To highlight this use-case we will be creating a simple service which, given a name and an optional voice (provided by Polly), will synthesise the name and include it in a returned “And the winner is…” applause MP3 file. This will demonstrate how to integrate Polly within Lambda, compile and execute native code within Lambda and return a binary MP3 file to the client.
Compiling SOX for Lambda
As we wish to join our static intro and outro audio files with the dynamically produced Polly response, we will need an application that can achieve this.
I have decided to use SOX for this task, as it provides us with a very simple API for joining multiple files together into a single track.
Lambda allows us to execute natively compiled code, providing that it has been correctly compiled for the underlying host operating system.
To correctly compile SOX for Lambda, we will be using a Docker image which locally replicates the environment as best it can, providing all the necessary build tooling.
First, we need to start up a container (of this image) with bash
running, so we can compile SOX and its required dependencies.
$ docker run -it lambci/lambda:build bash
This will pull down the required image from Docker Hub and begin a bash
interpreter session.
Within this session, we will start by compiling MPEG Audio Decoder, which is a dependency of SOX.
$ curl -L -o libmad-0.15.1b.tar.gz "http://downloads.sourceforge.net/project/mad/libmad/0.15.1b/libmad-0.15.1b.tar.gz"
$ tar zxf libmad-0.15.1b.tar.gz && cd libmad-0.15.1b
$ sed -i '/-fforce-mem/d' configure # https://stackoverflow.com/questions/14015747/gccs-fforce-mem-option
$ ./configure --prefix=/usr/libmad-0.15.1b --disable-shared --enable-static
$ make && make install
Next we will compile LAME, which will allow us to encode the desired MP3 audio file within SOX.
$ curl -L -o lame-3.100.tar.gz "https://downloads.sourceforge.net/project/lame/lame/3.100/lame-3.100.tar.gz"
$ tar zxf lame-3.100.tar.gz && cd lame-3.100
$ ./configure --prefix=/usr/lame-3.100 --disable-shared --enable-static
$ make && make install
Finally, we are able to compile SOX, providing locations to the previously compiled libraries.
$ curl -L -o sox-14.4.2.tar.bz2 "http://downloads.sourceforge.net/project/sox/sox/14.4.2/sox-14.4.2.tar.bz2"
$ tar jxf sox-14.4.2.tar.bz2 && cd sox-14.4.2
$ CPPFLAGS="-I/usr/libmad-0.15.1b/include -I/usr/lame-3.100/include" \
LDFLAGS="-L/usr/libmad-0.15.1b/lib -L/usr/lame-3.100/lib" \
./configure --prefix=/usr/sox-14.4.2 --disable-shared --enable-static
$ make && make install
You will notice that we have statically compiled all these applications, as we desire to depend on only a single executable within the Lambda service.
With SOX now compiled, we can open up a host terminal session and copy the newly compiled sox
executable from the container.
$ docker ps # displays the running container's ID
$ docker cp {CONTAINER-ID}:/usr/sox-14.4.2/bin/sox ~/sox
Creating the Serverless Project
Now with the native executable compiled, we can create the accompanying Lambda service. In a similar manner to the previous blog post, we will first create a skeleton Serverless project template.
$ serverless create --template aws-nodejs --path and-the-winner-is
Running this will create the basic handler and Serverless definition file. Replace the given Serverless definition file with the following:
service: and-the-winner-is
provider:
name: aws
runtime: nodejs6.10
stage: prod
region: eu-west-1
environment:
SOX_EXEC: ./sox
INTRO_FILE: ./intro.mp3
OUTRO_FILE: ./outro.mp3
iamRoleStatements:
- Effect: Allow
Action:
- polly:DescribeVoices
- polly:SynthesizeSpeech
Resource: '*'
plugins:
- serverless-apigw-binary
custom:
apigwBinary:
types:
- '*/*'
functions:
winner:
handler: handler.winner
events:
- http:
path: /
method: get
This configuration defines a single Lambda function which is exposed via a root API Gateway path.
It also provides a couple of environment variables which specify the SOX executable location, along with the desired intro and outro audio files.
We then use a Serverless plugin to correctly add binary support to the API Gateway.
As we desire to use Polly within Lambda, we permit access to both the DescribeVoices
and SynthesizeSpeech
actions.
Before continuing, we should include the Serverless plugin we have defined as a development dependency.
$ npm install serverless-apigw-binary --save-dev
Synthesising the Name
With this definition in place, we will move on to generating (synthesising) the name provided by the client using Polly.
If the client does not supply a desired voice, we will randomly choose one from the list of available options.
After creating a new file called synthesise-name.js
, copy the following functions into the file:
'use strict';
const AWS = require('aws-sdk');
const random = arr => arr[Math.floor(Math.random() * arr.length)];
const polly = new AWS.Polly();
const getRandomVoice = () =>
new Promise((res, rej) => {
polly.describeVoices({}, function (err, { Voices }) {
if (err) rej(err);
else res(random(Voices).Id);
});
});
const synthesiseSpeech = (text, voice) =>
new Promise((res, rej) => {
const params = {
OutputFormat: 'mp3',
SampleRate: '22050',
Text: text,
TextType: 'text',
VoiceId: voice,
};
polly.synthesizeSpeech(params, function (err, speech) {
if (err) rej(err);
else res(speech.AudioStream);
});
});
module.exports = (name, voice = undefined) =>
Promise.resolve(voice || getRandomVoice()).then(voice => synthesiseSpeech(name, voice));
We have a couple of helper functions - one returns a randomly selected Polly voice (if no voice is supplied), and another generates the audio representation of the supplied name. Combining these two helpers returns an audio buffer stream which we can later use in our response.
Joining audio files using SOX
Having synthesised the client’s desired name, we now wish to join the multiple audio files together and generate the output track.
SOX requires that all audio files be of the same sample rate and channel count to successfully produce a joined file.
As Polly returns a mono-channel audio file with a sample rate of 22050, the intro and outro I have provided are re-sampled to these requirements.
After creating a new file called generate-track.js
, copy the following logic into the file:
'use strict';
const fs = require('fs');
const tempfile = require('tempfile');
const childProcess = require('child_process');
const { SOX_EXEC, INTRO_FILE, OUTRO_FILE } = process.env;
module.exports = nameAudio => {
const nameTempFile = tempfile('.mp3');
fs.writeFileSync(nameTempFile, nameAudio);
const trackTempFile = tempfile('.mp3');
childProcess.execFileSync(SOX_EXEC, [INTRO_FILE, nameTempFile, OUTRO_FILE, trackTempFile]);
return fs.readFileSync(trackTempFile);
};
This function simply takes in the audio buffer stream returned from the Polly service and writes it to a temporary file. We use an external temporary file library to achieve this, so we need to include it as a project dependency.
$ npm install tempfile --save
We then supply this file, along with the intro and outro audio files, to the SOX executable to generate the final joined output track. As this output is written to a temporary file, we then read its contents into a buffer which we can later use within our service.
Wiring it all together
With the two key problems now solved, we can now wire the handler together.
Replace the sample handler.js
file contents with the following:
'use strict';
const synthesiseName = require('./synthesise-name');
const generateTrack = require('./generate-track');
module.exports.winner = (event, context, callback) => {
const input = event.queryStringParameters || {};
synthesiseName(input.name || 'All of us', input.voice)
.then(generateTrack)
.then(track => {
callback(null, {
statusCode: 200,
headers: { 'Content-Type': 'audio/mpeg' },
body: track.toString('base64'),
isBase64Encoded: true,
});
});
};
This composes the two functions together, returning the resulting audio track back to the client. API Gateway requires that we Base-64 encode the binary response, so we do so within the callback.
We are all winners
With the implementation now fully complete, we can deploy the Lambda service by executing:
$ serverless deploy -v
Finally, we can visit the returned endpoint URL and enjoy creating our own winning audio tracks! You can find the code in its entirety, along with supporting assets, in this GitHub repository.