flypig.co.uk

Personal Blog

RSS feed Click the icon for the blog RSS feed.

Blog

5 most recent items

3 Sep 2018 : Yet More Proof that the Human Race is Screwed #

I don’t usually get angry, but something about this really hustles my hircus. I just clicked through an advertarticle on the Register about “Serverless Computing London”, a conference that claims to help developers “decide on the best path to a more efficient, scalable and secure computing future.”.

The speaker roster looked interesting, because I’d never heard of any of them (that’s just me; I’m not following Serverless trends closely), so I clicked through to find out about the headline keynote, Chad Arimura, from Oracle. Chad’s image seemed to load slower than the rest of the page, which made me suspicious. So I loaded up the image separately and this is what I found.
 

This image is too large

Chad’s mugshot is being downloaded as a behemoth 2756 x 2756 pixel image and then scaled down on my screen to a 114 x 114 pixel image client-side. Check out those timing bars. It’s taking 1.5 seconds to download the bastard. Because it’s nearly 1 meg of data.

I did some scientific testing, and established that if the image had been scaled down at the site, it could have been served to me as 3.9kB of data. That’s 0.004 of the bandwidth. Huge swaths of time, resources and human ingenuity have gone in to developing efficient image compression algorithms so that we can enjoy a rich multimedia Web, minimising the energy required while we fret about global warming due to our historical excesses. A visually-identical 114x114 pixel BMP image (circa 1995) would have taken 52kB of bandwidth.

This all wouldn’t be so bad if maybe Chad didn’t look quite so smug*, and if we couldn’t discern from the title of the image that someone went to the trouble of cropping it. Why didn’t they just scale it down at the same time?

But the saddest part, of course, is that this is to advertise a conference about Serverless Computing. What’s the point of Serverless Computing? To allow better allocation of resources so that server time is spent serving content, rather than waiting for requests.

I totally appreciate the irony of me spending an hour posting an image-heavy blog post about how a conference on a perfectly valid technical subject is wasting bandwidth. But I would simply say that this only strengthens my argument: I'm human too. We're all screwed.

* To be fair to Chad, it’s almost certainly not his fault (other speakers get the same treatment), and the admirable minimalism of his personal website suggests he’s actually totally bought in to the idea of efficient web design.
 
Comment
23 Aug 2018 : Sending emails from AWS Lambda inside a VPC without NAT #

Many websites are made up of some core stateless functionality tied to a database where all the state lives. The functionality may make changes to the database state, but all of the tricky issues related to concurrency and consistency (for example, in case two users simultaneous cause the database to be updated) are left to the database to deal with. That allows the stateless part to be partitioned off and run only when the user is actually requesting a page, making it easily scalable.

In this scenario, having a full-time monolithic server (or bank of servers) handling the website requests is overkill. Creating a new server instance for each request is potentially much more cost efficient and scalable. Each request to the site triggers a new function to be called that runs the code needed to generate a webpage (e.g. filling out a template with details for the user and view), updating the database if necessary. Once that’s done, the server is deleted and the only thing left is the database. An important benefit is that, if there are no requests coming in, there’s no server time to pay for. This is the idea behind ‘serverless’ architectures. Actually, there are lots of servers involved (receiving and actioning HTTP requests, running the database, managing the cluster) but they’re hidden and costs are handled by transaction rather than by uptime.

AWS Lambda is one of the services Amazon provides to allow this kind of serverless set up. Creating ‘Lambda functions’ (named after the Lambda calculus, but really they’re just functions) that run on various triggers, like a web request, has been made as easy as pie. Connecting these functions to an RDS database has also been made really easy. But there’s a fly in the ointment.

To get the Lambda function communicating with the RDS instance, it’s common practice to set them both up inside the same Virtual Private Cloud. This isn’t strictly necessary: it’s possible to have the database exposed on a public IP and have the Lambda function communicate with it that way. However, the obvious downside to doing it like this is that the database is exposed to the world, making it a hacking and denial-of-service target. If both the Lambda function and database are in a VPC, then assuming everything is suitably configured, the database will be effectively protected from external attack.

Setting up a VPC

The beauty of this arrangement is that the Lamdba functions will still respond to the GET and POST requests for accessing the site, because these are triggered by API Gateway events rather than direct connections to the functions. It’s a nice arrangement.

However, with the Lambda function inside the VPC, just like the database, it has no public IP address. This means that by default it can’t make any outgoing connections to public IP addresses. This doesn’t necessarily matter: a website access will trigger an event, the Lambda function fires up, communicates with the database, hands over a response which is sent back to the user. The API gateway deals with the interface between the request/response and Lambda function interface.

The problem comes if the Lambda function needs to access an external resource for some other reasons. For example, it might want to send an email out to the user, which requires it to communicate with an SMTP server. Websites don’t often need to send out emails, but on the occasions they do it tends to be to ensure there’s a second communication channel, so it can’t be handled client-side. For example, when a user registers on a site it’s usual for the site to send an email with a link the user must click to complete the registration. If the user forgets their password, it’s common practice for a site to send a password reset link by email. Increasingly sites like Slack are even using emails as an alternative to using passwords.

A Lambda function inside a VPC can’t access an external SMTP server, so it can’t send out emails. One solution is to have the RDS and the Lambda function on the public Internet, but this introduces the attack surface problem mentioned above. The other solution, the one that’s commonly recommended, is to set up a NAT Gateway to allow the Lambda function to make outgoing connections to the SMTP server.

Technically this is fine: the Lambda function and RDS remain protected behind the NAT because they’re not externally addressable, but the Lambda function can still make the outgoing connection it needs to send out emails. But there’s a dark side to this. Amazon is quite happy to set up a NAT to allow all this to happen, but it’ll charge for it by the hour as if it’s a continuously allocated instance. The benefits of running a serverless site go straight out the window, because now you’ve essentially got a continuously running, continuously charged, EC2 server running just to support the NAT. D’oh.

Happily there is a solution. It’s a cludge, but it does the trick. And the trick is to use S3 as a file-based gateway between a Lambda function that’s inside a VPC, and a Lambda function that’s outside a VPC. If the Lambda function inside the VPC wants to send an email, it creates a file inside a dedicated S3 bucket. At the same time we run a Lambda function outside the VPC, triggered by a file creation event attached to the bucket. The external Lambda function reads in the newly created file to collect the parameters needed for the email (recipient, subject and body), and then interacts with an SMTP server to send it out. Because this second Lambda function is outside the VPC it has no problem contacting the external SMTP server directly.

So what’s so magical about S3 that means it can be accessed by both Lambda functions, when nothing else can? The answer is that we can create a VPC endpoint for S3, meaning that it can be accessed from inside the VPC, without affecting the ability to access it from outside the VPC. Amazon have made special provisions to support this. You’d have thought they could do something similar with SES, their Simple Email Service, as well and fix the whole issue like that. But it’s not currently possible to set SES up as a VPC endpoint, so in the meantime we’re stuck using S3 as a poor-man’s messaging interface.

The code needed to get all this up-and-running is minimal, and even the configuration of the various things required to fit it all together isn’t particularly onerous. So let’s give it a go.
 

Creating an AWS Lambda S3 email bridge

As we’ve discussed, the vagaries of AWS mean it’s hard to send out emails from a Lambda function that’s trapped inside a VPC alongside its RDS instance. Let’s look at how it’s possible to use S3 as a bridge between two Lambda functions, allowing one function inside the VPC to communicate with a function outside the VPC, so that we can send some emails.

At the heart of it all is an S3 bucket, so we need to set that up first. We’ll create a dedicated bucket for the purpose called ‘yfp-email-bridge’. You can call it whatever you want, but you’ll need to switch out ‘yfp-email-bridge’ in the instructions below for whatever name you choose.

Create the bucket using the Amazon S3 dashboard and create a folder inside it called email. You don’t need to do anything clever with permissions, and in fact we want everything to remain private, otherwise we introduce the potential for an evil snooper to read the emails that we’re sending.

Here’s my S3 bucket set up with the email folder viewed through the AWS console.

 
Create an S3 bucket with a folder called 'email' inside
 

Now let’s create our email sending Lambda function. We’re using Python 3.6 for this, but you can rewrite it for another language if that makes you happy.

So, open the AWS Lambda console and create a new function. You can call it whatever you like, but I’ve chosen send_email_uploaded_to_s3_bridge (which in retrospect is a bit of a mouthful, but there’s no way to rename a function after you’ve created it so I’m sticking with that). Set the runtime to Python 3.6. You can either use an existing role, or create a new one with S3 read and write permissions.

Now add an S3 trigger for when an object is created, associated with the bucket you created, for files with a prefix of email/ and a suffix of .json. That’s because we’re only interested in JSON format files that end up in the ‘email’ folder. You can see how I’ve set this up using the AWS console below.

 
Set up the Lambda function to trigger at the right time.
 

When the trigger fires, a JSON string is sent to the Lambda function with contents much like the following. Look closely and you’ll see this contains not only details of the bucket where the file was uploaded, but also the filename of the file uploaded.

 
{
    "Records": [
        {
            "eventVersion": "2.0",
            "eventSource": "aws:s3",
            "awsRegion": "eu-west-1",
            "eventTime": "2018-08-20T00:06:19.227Z", 
            "eventName": "ObjectCreated:Put", 
            "userIdentity": {
                "principalId": "A224SDAA064V4C"
            }, 
            "requestParameters": {
                "sourceIPAddress": "XX.XX.XX.XX"
            }, 
            "responseElements": {
                "x-amz-request-id": "D76E8765EFAB3C1", 
                "x-amz-id-2": "KISiidNG9NdKJE9D9Ak9kJD846hfii0="
            }, 
            "s3": {
                "s3SchemaVersion": "1.0", 
                "configurationId": "67fe8911-76ae-4e67-7e41-11f5ea793bc9", 
                "bucket": {
                    "name": "yfp-email-bridge", 
                    "ownerIdentity": {
                        "principalId": "9JWEJ038UEHE99"
                }, 
                    "arn": "arn:aws:s3:::yfp-email-bridge"
                }, 
                "object": {
                    "key": "email/email.json", 
                    "size": 83, 
                    "eTag": "58934f00e01a75bc305872", 
                    "sequencer": "0054388a73681"
                }
            }
        }
    ]
}

Now we need to add some code to be executed on this trigger. The code is handed the JSON shown above, so it will need to extract the data from it, load in the appropriate file from S3 that the JSON references, extract the contents of the file, send out an email based on the contents, and then finally delete the original JSON file. It sounds complex but is actually pretty trivial in Python. The code I use for this is the following. You can paste this directly in as your function code too, just remember to update the sender variable to the email address you want to send from.

 

import os, smtplib, boto3, json
from email.mime.text import MIMEText

s3_client = boto3.client('s3')

def send_email(data):
	sender = 'test@test.com'
	recipient = data['to']
	msg = MIMEText(data['body'])
	msg['Subject'] = data['subject']
	msg['From'] = sender
	msg['To'] = recipient

	result = json.dumps({'error': False, 'result': ''})
	try:
		with smtplib.SMTP(host=os.environ['SMTP_SERVER'], port=os.environ['SMTP_PORT']) as smtp:
			smtp.set_debuglevel(0)
			smtp.starttls()
			smtp.login(os.environ['SMTP_USERNAME'], os.environ['SMTP_PASSWORD'])
			smtp.sendmail(sender, [recipient, sender], msg.as_string())
	except smtplib.SMTPException as e:
		result = json.dumps({'error': True, 'result': str(e)})
	return result

def lambda_handler(event, context):
	for record in event['Records']:
		bucket = record['s3']['bucket']['name']
		key = record['s3']['object']['key']
		size = record['s3']['object']['size']
		# Ignore files over a certain size
		if size < (12 * 1024):
			obj = s3_client.get_object(Bucket=bucket, Key=key)
			data = json.loads(obj['Body'].read().decode('utf-8'))
			send_email(data)

		# Delete the file
		print("Deleting file {bucket}:{key}".format(bucket=bucket, key=key))
		s3_client.delete_object(Bucket=bucket, Key=key)

This assumes that the following environment variables have been defined:

SMTP_SERVER
SMTP_PORT
SMTP_USERNAME
SMTP_PASSWORD

The purpose of these should be self-explanatory, and you’ll need to set their values to something appropriate to match the SMTP server you plan to use. As long as you know what values to use, filling them on the page when creating your Lambda function should be straightforward, as you can see in the screenshot below.

 
The lambda function needs some environment variables configured.
 

Now save the Lambda function configuration. We’ve completed half the work, and so this is a great time to test whether things are working correctly.

To perform a test, create a text file on your local machine with the following contents:

{
	"to": "test@test.com",
	"subject": "Auto S3 send test",
	"body": "Hello there\n\nFrom me."
}

Fill out the to field with your own email address. It’s important you use an email address you have access to, otherwise you won’t be able to check whether things are working or not.

Now upload this file to the email folder of your S3 bucket using the S3 Web interface. Immediately on uploading the file, AWS will trigger the Lambda function we created to send our little ditty of an email to the address specified.

I’m going to assume everything is working and you received the email. Wahey! You’re half way there. If not, you’ll need to go back and retrace your steps to get things working.

Now, if you don’t have one already you should set up another Lambda function inside a VPC. This is the original function that you wanted to send email from, for example for your website, so I’ll assume you have this set up already.

We’ll add the following function to our Lambda code. Anywhere in the Lambda code where you need to send out an email, just call this function to do the work.

import boto3, json, uuid

EMAIL_S3_BRIDGE_BUCKET = ‘yfp-email-bridge’

def send_email(recipient, subject, message):
	s3_client = boto3.client('s3')
	output = json.dumps({'to': recipient, 'subject': subject, 'body': message})
	filename = "email/{name}.json".format(name=uuid.uuid4())
	s3_client.put_object(Bucket=EMAIL_S3_BRIDGE_BUCKET, Key=filename, Body=output)

This is a pretty simple function that constructs a JSON string comprised of the recipient address, subject line and message body. It uploads this as a file to the S3 bucket and then its job is done. All of the hard work of actually sending the email is done by the other function we set up earlier.

We’re nearly done, but there’s one final step we need to proceed, and that’s to set up the VPC S3 endpoint. Select the VPC dashboard from the AWS web console, then in the menu bar on the left click the Endpoints entry. Create a new endpoint, selecting s3 as the service name, along with the VPC and route table that matches your internal Lambda function. Create the endpoint, and you’re good to go. My own configuration I use for this is shown below.

 
Create a VPC Endpoint for S3

And that’s it. Now, whenever you want to send out an email, just use something like the following:

send_email("david@flypig.co.uk", "Subject line", "Hello\n\nThis is a test email.\n")

And hey presto! It’ll send an email without the need to set up a costly NAT instance.

Comment
29 Jun 2018 : Going QML-Live #

In my spare time I've been developing a QT app called GetiPlay. It's a simple app that allows you to download audio and video from BBC iPlayer, for use on Sailfish OS phones. The traditional approach on Linux devices would be to use get_iplayer in a console, but for all of the progress that's been made on mobile devices in the last decade, console use still sucks. Given I spend so much time listening to or watching BBC content, slapping a simple UI over the command line get_iplayer was an obvious thing to do.

The app has been developing nicely, using the QT Creator for C++ and the UI written in QML. Historically I've not been a fan of QML, but as I grow more familiar with it, it's been growing on me. For all of the things that I find weird about it, it really does give great performance and helps build a consistent UI, as well as promoting loose coupling between the UI and underlying functional logic.

A big downside to QML is that there's no preview, so the development process follows a consistent cycle: adjust code, build code, deploy code, test, repeat. The build and deploy steps are loooong. This impacts things in three serious ways: it makes development slow, it makes me sleepy, and it incentivises against making minor tweaks or experimentation.

Is It Worth the Time?
 

Nevertheless, there's always a trade-off between configuring and learning new technologies, and just getting things done using those you're already using. The ever-relevant XKCD has more than one pertinent comics covering this topic.

 
Automation

The UI for GetiPlay is straightforward, so I was quite content to use this lengthy, but (crucially) working approach until yesterday. What prompted me to change was a feature request that needed some more subtle UI work, with animated transitions between elements that I knew would take a couple of hundred cycles round that development loop to get right. Doing the maths using Randall Munroe's automation matrix, I needed to find a more efficient approach.

So this morning I started out using QML Live. This is a pretty simple tool with an unnecessarily bulky UI that nevertheless does a great job of making the QML design approach more efficient. You build and run the app as usual, then any QML changes are directly copied over to the device (or emulator) and appear in the app immediately. Previously a build cycle took between 40 and 100 seconds. Now it's too quick to notice: less than a second.

QT Creator IDE and QML-Live

Using a quick back of the envelope calculation, I'll perform a UI tweak that would previously have required a rebuilt around 20 times a day, but probably only every-other day, so let's say 10 times a day for the next six months. So (10 * 365 * 0.5) / (60 * 24) = 1.27 days I can save. I spent about half a day configuring everything properly, so that leaves a saving of 0.77 days, or 18 hours. Not bad!

QML-Live certainly isn't perfect, but it's simple, neat and has made me far more likely to try out interesting and experimental UI designs. Time configuring it is time well spent, even if that extra 18 hours is just about the same amount of time I wasted dithering over the last two days!

Comment
12 Jun 2018 : GetiPlay now actually plays, too #
For some time now I've been meaning to add a proper media player to GetiPlay. Why, you may well ask, bother to do this when Sailfish already has a perfectly good media player built in? Well, there are two reasons. First, for TV and radio programmes, one of the most important controls you can have is 'jump back a few seconds'. I need this when I'm watching something and get interrupted, or miss an important bit of the narrative, or whatever. It's such a useful button, it's worth writing a completely new media player for. Second, it's just far more seamless to have it all in one application.

So I finally got to adding it in. Here's the video player screen.
 

The QT framework really does make it easy to add media like this. It still took a good few days to code up of course, but it'd be a lot quicker for someone who knew what they were doing.

I'm also quite proud of the audio player, with the same, super-useful '10 seconds back' button. It also stays playing no matter where you move to in the app. Here it is, showing the controls at the bottom of the screen.
 


If you'd like to get these new features in your copy of GetiPlay, just download the latest version from OpenRepos, grab yourself the source from GitHub, or check out the GetiPlay page.
Comment
6 Jun 2018 : Huge GetiPlay release 0.3-1 #
I'm really pleased to release version 0.3-1 of GetiPlay, the unofficial interface for accessing BBC iPlayer stuff on Sailfish OS. This latest version is a huge update compared to previous releases, with a completely new tab-based UI and a lovely download queue so you can download multiple programmes without interruption.

Immediate info about every one of the thousands and thousands of TV and radio programmes is also now just a tap away.

Install yourself a copy from OpenRepos, grab the MIT-licensed source from GitHub or visit the GetiPlay page on this site.
 
Comment