Baresoil Face Detection Benchmark (AWS)

This is a repeatable, open code and data benchmark.

Benchmark date: Wed Aug 09 2017 20:30:52 GMT-0700 (PDT)


A recent positive development in machine learning has been the availability of fast and accurate methods for detecting and recognizing human faces in an image. Given a photograph, we would like to know where the human faces (if any) are located in the photograph, as well as the approximate shape of each face. This is an import first step for face recognition, biometrics, and instant messaging apps that allow animal parts to be overlaid onto human faces.

Baresoil can be used to quickly turn standard command-line programs into scalable web services. For this benchmark, Adam Geitgey's Modern Face Recognition with Deep Learning Python tutorial is adapted into a Baresoil server-side function using approximately 100 lines of Python from the tutorial.

The images on the right show the location of facial features extracted from each face found in the left image. Each yellow box represents a face, and facial features are outlined in red.
The images on the right show the location of facial features extracted from each face found in the left image. Each yellow box represents a face, and facial features are outlined in red.

The adapted code is deployed to a Baresoil cluster consisting of 5 c4.8xlarge AWS EC2 instances. As of Wed Aug 09 2017 20:30:52 GMT-0700 (PDT), the overall throughput of this cluster was measured at 633 gigabytes per hour of image data analyzed for faces, for a total cost of $8.58 USD per hour. The code and data repository contains the Baresoil adaptation of Adam Geitgey's tutorial.


Images processed per hour120100
Image data processed per hour633 GB
Cluster cost per hour (on-demand)*$8.58 USD
Cluster cost per hour (reserved)$5.66 USD

* Using 5 on-demand EC2 c4.8xlarge instances in us-east-2, priced at $1.591 per hour, RDS on-demand costs of $0.095 per hour, ELB costs of $0.025 per hour and $0.008 per gigabyte transferred. Pixel resolution of all test images is approximately 3840x2160 (i.e., 4K video frame).

Benchmark Setup

Baresoil cluster
A Baresoil cluster is first created to host the face recognition service using the instructions for creating a Baresoil cluster on AWS. This can take 25 minutes or longer in some AWS regions.
Benchmarking client cluster
A standalone cluster of load-generating EC2 instances is created using the benchoid benchmarking tool, available on npm.
Load generating agents
Each of the 10 servers in the test cluster creates 20 parallel processes that each perform the following actions in a loop:
  1. Connect to the face detection service over an encrypted, persistent WebSocket using the Baresoil cluster's DNS endpoint,
  2. Randomly choose one of the test images and send it to the server to be analyzed for faces.
  3. Wait for the server response before repeating the previous step.
After a traffic ramp-up period, this results in 200 simultaneous WebSocket connections to the cluster continuously sending images to be analyzed. The source code for the load-generating agent is in the agent directory of the code and data repository.

Baresoil Cluster

Instance count5
Instance typec4.8xlarge
AWS regionus-east-2
Instance cost per hour (on-demand)$1.591 USD
Instance cost per hour (reserved)$1.008 USD

Load Generating Instances

Instance count10
Instance typet2.xlarge
AWS regionus-east-2

Raw Statistics

Experiment time (seconds)357
Requests made12110
Successful responses received11910
Error responses received0
Image bytes processed by the cluster67406267535
Total response bytes returned from cluster747173248
Total CPU-seconds used by all requests9199.121000000045

Request Statistics

Request Rate

Total number of requests that were initiated to the server in each time window, aggregated over all clients.

Image Bytes Processed

Total amount of image data processed by the cluster at each time window. Image data is only counted when it is successfully processed by the cluster and returned, at the time of return.

Server Time per Image

Wall time spent by server on resizing each image, as reported by the server.

Round-trip Latency

Time from starting the RPC request over a WebSocket to receiving a successful response. Requests are grouped by the time they were started at the client, not when the response was successfully received.

Client Tier Statistics

Client CPU Usage

Instances in the client tier should not be overloaded in order to ensure that server response measurements are not biased. The following time series plot shows average CPU usage for each server in the client tier.

Client Memory Usage

The following time series plot shows the percentage of system memory used per host, as free by node's builtin os module.

Concurrent Agents

Each agent is an independent process running on one of the client tier instances that makes a continuous stream of requests to the server. The following time series plot shows the total number of active agents over time for all client tier hosts.