Creating a 'Winning' Audio Lambda Service using Serverless, Polly and compiled SOX
Following on from my previous post which discussed manipulating images, I would now like to expand upon this and look into how you can interact with audio using Lambda. To highlight this use-case we will be creating a simple service which given a name and optional voice (provided by Polly), will synthesise the name and include it in a returned ‘And the winner is…’ applause MP3 file. This will demonstrate how to integrate Polly within Lambda, compile and execute native-code within Lambda and return a binary MP3 file to the client.
Compiling SOX for Lambda
As we wish to join our static intro and outro audio files with the dynamically produced Polly response, we will need an application that can go about achieving this.
I have decided to use SOX for this task, as it provides us with a very simple API for joining multiple files together into a single track.
Lambda allows us to execute natively compiled code, providing that it has been correctly compiled for the underlying host operating system.
To go about correctly compiling SOX for Lambda, we will be using a Docker image which locally replicates the environment as best it can, providing all the necessary build tooling.
First, we need to start up a container (of this image) with bash
running, so we can go about compiling SOX and its’ required dependencies.
$ docker run -it lambci/lambda:build bash
This will pull down the required image from Docker Hub and begin a bash
interpreter session.
Within this session, we will start by compiling MPEG Audio Decoder, which is a dependency of SOX.
$ curl -L -o libmad-0.15.1b.tar.gz "http://downloads.sourceforge.net/project/mad/libmad/0.15.1b/libmad-0.15.1b.tar.gz"
$ tar zxf libmad-0.15.1b.tar.gz && cd libmad-0.15.1b
$ sed -i '/-fforce-mem/d' configure # https://stackoverflow.com/questions/14015747/gccs-fforce-mem-option
$ ./configure --prefix=/usr/libmad-0.15.1b --disable-shared --enable-static
$ make && make install
Next we will compile LAME, which will allows us to encode the desired MP3 audio file within SOX.
$ curl -L -o lame-3.100.tar.gz "https://downloads.sourceforge.net/project/lame/lame/3.100/lame-3.100.tar.gz"
$ tar zxf lame-3.100.tar.gz && cd lame-3.100
$ ./configure --prefix=/usr/lame-3.100 --disable-shared --enable-static
$ make && make install
Finally, we are able to compile SOX, providing locations to the previously compiled libraries.
$ curl -L -o sox-14.4.2.tar.bz2 "http://downloads.sourceforge.net/project/sox/sox/14.4.2/sox-14.4.2.tar.bz2"
$ tar jxf sox-14.4.2.tar.bz2 && cd sox-14.4.2
$ CPPFLAGS="-I/usr/libmad-0.15.1b/include -I/usr/lame-3.100/include" \
LDFLAGS="-L/usr/libmad-0.15.1b/lib -L/usr/lame-3.100/lib" \
./configure --prefix=/usr/sox-14.4.2 --disable-shared --enable-static
$ make && make install
You will notice that we have statically compiled all these applications as we desire to only depend on a single executable within the Lambda service.
With SOX now compiled we can open up a host terminal session and copy the newly compiled sox
executable from the container.
$ docker ps # displays the running containers id
$ docker cp {CONTAINER-ID}:/usr/sox-14.4.2/bin/sox ~/sox
Creating the Serverless Project
Now with the native executable compiled, we can go about creating the accompanying Lambda service. In a similar manner to the previous blog post, we will first create a skeleton Serverless project template.
$ serverless create --template aws-nodejs --path and-the-winner-is
Running this will create the basic handler and Serverless definition file. Replace the given Serverless definition file with the following.
service: and-the-winner-is
provider:
name: aws
runtime: nodejs6.10
stage: prod
region: eu-west-1
environment:
SOX_EXEC: ./sox
INTRO_FILE: ./intro.mp3
OUTRO_FILE: ./outro.mp3
iamRoleStatements:
- Effect: Allow
Action:
- polly:DescribeVoices
- polly:SynthesizeSpeech
Resource: '*'
plugins:
- serverless-apigw-binary
custom:
apigwBinary:
types:
- '*/*'
functions:
winner:
handler: handler.winner
events:
- http:
path: /
method: get
This configuration defines a single Lambda function which is exposed via a root API Gateway path.
This also provides a couple of environment variables which specifiy the SOX executable location, along with the desired intro and outro audio files.
We then use a Serverless plugin to correctly add the desired binary support to the API Gateway.
As we desire to use Polly within Lambda we permit access to both the DescribeVoices
and SynthesizeSpeech
actions.
Before continuing we should include the Serverless plugin we have defined as a development dependency.
$ npm install serverless-apigw-binary --save-dev
Synthesising the Name
With this definition in-place we will move on to generating (synthesising) the provided name given to us by the client using Polly.
If the client happens to not supply us with a desired voice we will randomly choose one from the list of available options.
After creating a new file called synthesise-name.js
, copy the following functions into the file.
'use strict';
const AWS = require('aws-sdk');
const random = arr => arr[Math.floor(Math.random() * arr.length)];
const polly = new AWS.Polly();
const getRandomVoice = () =>
new Promise((res, rej) => {
polly.describeVoices({}, function (err, { Voices }) {
if (err) rej(err);
else res(random(Voices).Id);
});
});
const synthesiseSpeech = (text, voice) =>
new Promise((res, rej) => {
const params = {
OutputFormat: 'mp3',
SampleRate: '22050',
Text: text,
TextType: 'text',
VoiceId: voice,
};
polly.synthesizeSpeech(params, function (err, speech) {
if (err) rej(err);
else res(speech.AudioStream);
});
});
module.exports = (name, voice = undefined) =>
Promise.resolve(voice || getRandomVoice()).then(voice => synthesiseSpeech(name, voice));
We have a couple of helper functions, one of which returns a randomly selected Polly voice (if no voice is supplied) and another to go about generating the audio representation of the supplied name. Combining these two helpers together returns to us an audio buffer stream which we can later use within our response.
Joining audio files uing SOX
Having synthesised the clients desired name, we now wish to join the multiple audio files together and generate the output track.
SOX requires that all audio files be of the same sample-rate and channel count to successfully produce a joined file.
As Polly returns a mono-channel audio file with a sample-rate of 22050, the intro and outro I have provided is re-sampled to these requirements.
After creating a new file called generate-track.js
, copy the following logic into the file.
'use strict';
const fs = require('fs');
const tempfile = require('tempfile');
const childProcess = require('child_process');
const { SOX_EXEC, INTRO_FILE, OUTRO_FILE } = process.env;
module.exports = nameAudio => {
const nameTempFile = tempfile('.mp3');
fs.writeFileSync(nameTempFile, nameAudio);
const trackTempFile = tempfile('.mp3');
childProcess.execFileSync(SOX_EXEC, [INTRO_FILE, nameTempFile, OUTRO_FILE, trackTempFile]);
return fs.readFileSync(trackTempFile);
};
This function simply takes in the audio buffer stream returned from the Polly service and writes it into a temporary file. We use an external temporary file library to achieve this so we need to include it as a project dependency.
$ npm install tempfile --save
We then supply this file, along with the intro and outro audio files to the SOX executable to generate the final joined output track. As this output is written into a tempoary file, we then read its’ contents into a buffer which we can later on use within our service.
Wiring it all together
With the two key problems now solved, we can now go about wiring the handler together.
Replace the sample handler.js
file contents with the following.
'use strict';
const synthesiseName = require('./synthesise-name');
const generateTrack = require('./generate-track');
module.exports.winner = (event, context, callback) => {
const input = event.queryStringParameters || {};
synthesiseName(input.name || 'All of us', input.voice)
.then(generateTrack)
.then(track => {
callback(null, {
statusCode: 200,
headers: { 'Content-Type': 'audio/mpeg' },
body: track.toString('base64'),
isBase64Encoded: true,
});
});
};
This composes the two functions together, returning the resulting audio track back to the client. API Gateway requires that we Base-64 encode the binary response, so we do so within the callback.
We are all winners
With the implementation now fully complete, we can deploy the Lambda service by executing.
$ serverless deploy -v
Finally, we can visit the returned endpoint URL and enjoy creating our own winning audio tracks! You can find the code in its entirety, along with supporting assets in this GitHub repository.