Baresoil Image Resizing Benchmark (AWS)
This is a repeatable, open code and data benchmark.
Benchmark date: Mon Sep 11 2017 20:41:27 GMT-0700 (PDT)
Demo | img.baresoil.cloud |
Code and data | iceroad/baresoil-benchmark-image-resizer |
Homepage | www.baresoil.org |
Other benchmarks | Face Detection |
Baresoil can be used to quickly turn standard command-line programs into scalable web services. Consider the case of building a web API that accepts photo uploads and returns cleaned-up thumbnails of the image, as well as Exif camera metadata like the make and model.
In this benchmark, a Baresoil program uses the Python library Pillow, a common open-source package for processing images, to perform these tasks.
Like all Baresoil programs, each socket connection is allocated to its own Linux container, each containing a clean copy of the Baresoil server-side project. This allows server-side programs to be short, often resembling simple shell scripts, but scalable across a cluster of servers. The Baresoil runtime handles the task of ensuring that each connection gets a fresh container.
Images processed per hour | 120449 |
Image data processed per hour | 507 GB |
Cluster cost per hour (on-demand)* | $8.56 USD |
Cluster cost per hour (reserved) | $5.64 USD |
Cluster cost per hour (spot) | $2.41 USD |
* Using 20 on-demand EC2 c4.2xlarge instances in us-east-1, priced at $0.398 per hour (on-demand), $0.252 per hour (reserved), $0.0908 per hour (spot), RDS on-demand costs of $0.095 per hour, ELB costs of $0.025 per hour and $0.008 per gigabyte transferred.
This benchmark load tests a user-uploaded image processing API hosted on a Baresoil cluster. The image processing performed is basic cropping, image adjustments, and metadata extraction from a JPEG image, using the Unix command-line tool ImageMagick.
A Baresoil cluster of the dimensions below is first created on Amazon AWS using the standard Baresoil cluster setup tool. This includes assigning the load balancer to a top-level DNS domain name secured by a TLS certificate.
Then, a separate client tier of 10 instances is created in the same AWS region as the server, to generate traffic for the server. Each server in the client tier spawns 64 indepenent processes that each perform the following steps in a loop:
All requests from the client tier are sent over multipart HTTPS requests via Curl to the top-level domain name of the server cluster. As a result, the benchmarks here are for SSl/TLS-secured traffic.
Instance Count | 20 |
Instance Type | c4.2xlarge |
AWS Region | us-east-1 |
Instance cost per hour (on-demand) | $0.398 USD |
Instance cost per hour (reserved) | $0.252 USD |
Instance cost per hour (spot) | $0.0908 USD |
Instance Count | 10 |
Instance Type | c4.xlarge |
AWS Region | us-east-1 |
Experiment time (seconds) | 419 |
Requests made | 15426 |
Successful responses received | 14019 |
Error responses received | 806 |
Image bytes processed by the cluster | 63408498114 |
Total response bytes returned from cluster | 1331274987 |
Total CPU-seconds used by all requests | 19002.747000000036 |
Total number of requests that were initiated to the server in each time window, aggregated over all clients.
Total amount of image data processed by the cluster at each time window. Image data is only counted when it is successfully processed by the cluster and returned, at the time of return.
Wall time spent by server on resizing each image, as reported by the server.
Time from starting the HTTP POST request to receiving a successful response. Requests are grouped by the time they were started at the client, not when the response was successfully received.
Instances in the client tier should not be overloaded in order to ensure that server response measurements are not biased. The following time series plot shows average CPU usage for each server in the client tier.
The following time series plot shows the percentage of system memory used per host, as free by node's builtin os
module.
Each agent is an independent process running on one of the client tier instances that makes a continuous stream of requests to the server. The following time series plot shows the total number of active agents over time for all client tier hosts.