Baresoil Image Resizing Benchmark (AWS)

This is a repeatable, open code and data benchmark.

Benchmark date: Mon Sep 11 2017 20:41:27 GMT-0700 (PDT)

Introduction

Baresoil can be used to quickly turn standard command-line programs into scalable web services. Consider the case of building a web API that accepts photo uploads and returns cleaned-up thumbnails of the image, as well as Exif camera metadata like the make and model.

In this benchmark, a Baresoil program uses the Python library Pillow, a common open-source package for processing images, to perform these tasks.

Like all Baresoil programs, each socket connection is allocated to its own Linux container, each containing a clean copy of the Baresoil server-side project. This allows server-side programs to be short, often resembling simple shell scripts, but scalable across a cluster of servers. The Baresoil runtime handles the task of ensuring that each connection gets a fresh container.

Summary

Images processed per hour120449
Image data processed per hour507 GB
Cluster cost per hour (on-demand)*$8.56 USD
Cluster cost per hour (reserved)$5.64 USD
Cluster cost per hour (spot)$2.41 USD

* Using 20 on-demand EC2 c4.2xlarge instances in us-east-1, priced at $0.398 per hour (on-demand), $0.252 per hour (reserved), $0.0908 per hour (spot), RDS on-demand costs of $0.095 per hour, ELB costs of $0.025 per hour and $0.008 per gigabyte transferred.

Benchmark Setup


This benchmark load tests a user-uploaded image processing API hosted on a Baresoil cluster. The image processing performed is basic cropping, image adjustments, and metadata extraction from a JPEG image, using the Unix command-line tool ImageMagick.

A Baresoil cluster of the dimensions below is first created on Amazon AWS using the standard Baresoil cluster setup tool. This includes assigning the load balancer to a top-level DNS domain name secured by a TLS certificate.

Then, a separate client tier of 10 instances is created in the same AWS region as the server, to generate traffic for the server. Each server in the client tier spawns 64 indepenent processes that each perform the following steps in a loop:

  1. Make an HTTP POST request to the server's DNS name with one of four sample JPEG images, each between 9 and 11 Megabytes.
  2. Wait for the server to return the resized versions of the image.
  3. Wait a small amount of time, loop back to step 1.

All requests from the client tier are sent over multipart HTTPS requests via Curl to the top-level domain name of the server cluster. As a result, the benchmarks here are for SSl/TLS-secured traffic.

Baresoil Cluster

Instance Count20
Instance Typec4.2xlarge
AWS Regionus-east-1
Instance cost per hour (on-demand)$0.398 USD
Instance cost per hour (reserved)$0.252 USD
Instance cost per hour (spot)$0.0908 USD

Load Generating Instances

Instance Count10
Instance Typec4.xlarge
AWS Regionus-east-1

Raw Statistics


Experiment time (seconds)419
Requests made15426
Successful responses received14019
Error responses received806
Image bytes processed by the cluster63408498114
Total response bytes returned from cluster1331274987
Total CPU-seconds used by all requests19002.747000000036

Request Statistics


Request Rate

Total number of requests that were initiated to the server in each time window, aggregated over all clients.

Image Bytes Processed

Total amount of image data processed by the cluster at each time window. Image data is only counted when it is successfully processed by the cluster and returned, at the time of return.

Server Time per Image

Wall time spent by server on resizing each image, as reported by the server.

Round-trip Latency

Time from starting the HTTP POST request to receiving a successful response. Requests are grouped by the time they were started at the client, not when the response was successfully received.

Client Tier Statistics


Client CPU Usage

Instances in the client tier should not be overloaded in order to ensure that server response measurements are not biased. The following time series plot shows average CPU usage for each server in the client tier.

Client Memory Usage

The following time series plot shows the percentage of system memory used per host, as free by node's builtin os module.

Concurrent Agents

Each agent is an independent process running on one of the client tier instances that makes a continuous stream of requests to the server. The following time series plot shows the total number of active agents over time for all client tier hosts.