Computer Vision WebSocket Support

April 22nd, 2020

Headshot for Bruce Armstrong

Bruce Armstrong

Principal Engineer


The MobiledgeX Computer Vision Android library and Face Detection Server are a couple of pieces of software that help demonstrate some of the advantages of mobile edge computing. We use them in our demo app, and in our workshop projects. This is all open source code and is available in our Github repository.

A Little History

The initial implementation of the Android library had only one method of communicating with the server running on the Edge cloudlet, and that was via an HTTP REST interface. Last year we added support for a second method, which was a persistent TCP socket connection. This offered speed improvements, but also added complexity, and it was difficult for client software to implement the protocol required to send images to the server and receive results.

What's a WebSocket?

From Wikipedia: WebSocket is a computer communications protocol, providing full-duplex communication channels over a single TCP connection.

This sounds quite similar to what we invented in our persistent TCP socket connection implementation, but common sense tells us WebSockets probably have some advantages over our own hastily-invented and unoptimized protocol. Having been a standard for many years, and having libraries available for most programming languages in use today, we can easily see why having WebSocket support for our computer vision server and Android library is an attractive proposition.

WebSocket Libraries

Now that we’ve decided to add WebSocket support to our server and client, we need to find some libraries.


Our most fully featured client is written in Java for Android devices. If you search for “Android WebSocket libraries”, you’ll find a few to choose from. We went with OkHttp’s version because we were already using their main library in the app for HTTP communication in other parts of our code. Their WebSocket implementation is pretty simple. In particular, you create a okhttp3.WebSocket instance to send data on, and an okhttp3.WebSocketListener instance to receive results back from the server. You get the image from the camera, convert it to a string of bytes, and call send() on the WebSocket instance. Much simpler than having to worry about headers and lengths and offsets like with our own protocol.

Here is the code for our WebSocketListener. The onMessage method is where we receive the results from the server. We call handleResponse exactly the same as if the results came from the REST server or Persistent TCP server.

private final class ResultWebSocketListener extends WebSocketListener {
    public void onMessage(WebSocket webSocket, String text) {
        Log.i(TAG, "onMessage text="+text);
        mBusy = false;
        long endTime = System.nanoTime();
        mLatency = endTime - mStartTime;
        handleResponse(text, mLatency);
    public void onClosing(WebSocket webSocket, int code, String reason) {
        webSocket.close(NORMAL_CLOSURE_STATUS, null);
        Log.i(TAG, "Closing: " + code + " / " + reason);
    public void onFailure(WebSocket webSocket, Throwable t, okhttp3.Response response) {
        String message = "WebSocket Error: " + t.getMessage();
        Log.e(TAG, message);

The startWebSocketClient method is what we call to start communicating with the WebSocket server. Notice that the 'listener' used is an instance of the 'ResultWebSocketListener' defined above.

private void startWebSocketClient() {
    String url = "ws://"+mHost+":8008/ws"+mDjangoUrl;
    okhttp3.Request request = new okhttp3.Request.Builder().url(url).build();
    ResultWebSocketListener listener = new ResultWebSocketListener();
    mWebSocketClient = new OkHttpClient();
    mWebSocket = mWebSocketClient.newWebSocket(request, listener);
    Log.i(TAG, "Started WebSocket client. url: " + url);

Finally, this is the code that sends the image data “bytes” to the WebSocket server. It really couldn’t be much simpler.

if(mConnectionMode == ConnectionMode.WEBSOCKET) {

You can see the full Java source code for the ImageSender class here.


Our server code is written in Python and I expected it would be easy to find a WebSocket library to use. Our server has been based on the Django framework since day one. Django is specifically for single-shot HTTP request/response messages. Because of this I didn’t have high hopes for being able to tightly integrate any WebSocket implementation with the Django server. I was pleasantly surprised when I discovered the Channels library. It does exactly what I was looking for, and will allow us to use all of our existing face detection, face recognition, pose detection, and object detection Python class files with little to no modification.

After using the Channels framework to set up routing for the WSS URLs, the code that actually handles the data from the WebSocket connection is in Here is an example of a WebsocketConsumer that uses the existing FaceDetector instance to perform detection, then returns the results via the send function.

from tracker.apps import myFaceDetector, myFaceRecognizer, myOpenPose, myOpWrapper, myObjectDetector

class ImageConsumerFaceDetector(WebsocketConsumer):
    def connect(self):

    def disconnect(self, close_code):"disconnect. close_code=%s" %close_code)

    def receive(self, text_data=None, bytes_data=None):
        if bytes_data != None:
  "bytes_data length=%d" %(len(bytes_data)))
            start = time.time()
            image = imread(io.BytesIO(bytes_data))
            rects = myFaceDetector.detect_faces(image)
            elapsed = "%.3f" %((time.time() - start)*1000)
            if len(rects) == 0:
                ret = {"success": "false", "server_processing_time": elapsed}
                ret = {"success": "true", "server_processing_time": elapsed, "rects": rects.tolist()}
            response = json.dumps(ret)
  "text_data=%s" %(text_data))
            # If text is received, just echo back what we received.
            response = text_data"response=%s" %response)

You can see the full Python source code for the the ImageConsumer classes here.


The results we are interested in are the “Full Process Latency” values. The full process numbers are not simply network-only latency plus server processing time. The value also includes time to transfer the image data and receive back the results. Depending on network speed, that transfer time can be the biggest part of “full process” value. We shrink the camera image down to 180x240 before sending it. We also switched from PNG to JPG at some point for another speedup.

Real World Testing

These first results were measured using a Samsung Galaxy S8 phone on 4G LTE, connecting to a server with around a 40 ms ping time. This output is from the Benchmark feature of the MobiledgeX SDK Demo app’s Face Detection activity.

Apr 19, 2020 17:35
EDGE hostname:
Connection mode=REST
Latency test method=socket

EDGE Full Process Latency:
min/avg/max/stddev = 93/110/142/10 ms

EDGE Network Only Latency:
min/avg/max/stddev = 25/40/66/10 ms
Apr 19, 2020 17:37
EDGE hostname:
Connection mode=WEBSOCKET
Latency test method=socket

EDGE Full Process Latency:
min/avg/max/stddev = 41/53/78/7 ms

EDGE Network Only Latency:
min/avg/max/stddev = 26/43/64/11 ms

The main numbers we care about are the Full Process average values. We see 110ms for REST, and 53ms for WebSocket. Using (new-old)/old x 100%, we see that this is a 51.82% decrease in latency. Stated another way, this is over a 2x speedup!

Synthetic Testing

Another method of measuring we can use is our test script. This script can send images to our server and receive results using either REST or WebSocket. For our test setup, we have two machines on the same LAN. One acting as the client, and the other as the server. Since this results in a ping time between the machines of less than 1ms, this is not a very realistic scenario, even for Edge computing. To remedy this, we use the linux Traffic Control tool, tc, which allows us to add a delay to a specified network interface. The following command gives us around a 15 ms ping time between the 2 machines.

sudo tc qdisc add dev eth0 root netem delay 15ms

This is a much more realistic Edge-like environment. Let’s see the results.

$ python3 -s -e /detector/detect/ -c rest -f Bruce.jpg -r 40 --base64
2020-04-20 16:35:08,622 - MainThread - INFO - =====================================================
2020-04-20 16:35:08,622 - MainThread - INFO - Grand totals for /detector/detect/ rest
2020-04-20 16:35:08,623 - MainThread - INFO - 1 threads repeated 40 times on 1 files
2020-04-20 16:35:08,623 - MainThread - INFO - =====================================================
2020-04-20 16:35:08,623 - MainThread - INFO - ====> Average Latency Full Process=44.963 ms (stddev=1.846)
2020-04-20 16:35:08,623 - MainThread - INFO - ====> Average Latency Network Only=15.408 ms (stddev=0.077)
2020-04-20 16:35:08,623 - MainThread - INFO - ====> Average Server Processing Time=3.477 ms (stddev=0.203)

$ python3 -s -e /detector/detect/ -c websocket -f Bruce.jpg -r 40
2020-04-20 16:35:24,353 - MainThread - INFO - ==========================================================
2020-04-20 16:35:24,354 - MainThread - INFO - Grand totals for /detector/detect/ websocket
2020-04-20 16:35:24,354 - MainThread - INFO - 1 threads repeated 40 times on 1 files
2020-04-20 16:35:24,354 - MainThread - INFO - ==========================================================
2020-04-20 16:35:24,354 - MainThread - INFO - ====> Average Latency Full Process=22.300 ms (stddev=0.556)
2020-04-20 16:35:24,355 - MainThread - INFO - ====> Average Latency Network Only=16.644 ms (stddev=0.187)
2020-04-20 16:35:24,355 - MainThread - INFO - ====> Average Server Processing Time=4.382 ms (stddev=0.234)

In this case, we see 44.963 ms for REST, and 22.3 ms for WebSocket. This is a 50.4% decrease in latency. Again, right around a 2x speedup.

Using the tc command, we can easily modify our environment to simulate different network scenarios. These are the Full Process results for several different network delay values.

Network Latency

REST Full Process

WebSocket Full Process

Full Process Latency Decrease

Full Process Speedup

5 ms

24.826 ms

12.196 ms



10 ms

34.835 ms

17.08 ms



20 ms

55.085 ms

27.317 ms



40 ms

94.614 ms

47.304 ms



80 ms

174.706 ms

87.618 ms



160 ms

335.182 ms

167.215 ms



320 ms

654.888 ms

327.8 ms



This is very linear and somewhat surprising. I thought there might be some differences when different network latencies were used, but it looks like it’s safe to say we can expect a 2x speedup in most any network we encounter.

Bonus Speedup for REST

During benchmarking with the Python client script, we were not able to reproduce the numbers we were seeing on the Android phone. Investigation revealed that we had introduced a significant speedup to the server some time ago, but never took advantage of it in the Android client. That speedup was removing the requirement that the image data be encoded in Base64 before being sent. Sending the raw image bytes instead of Base64-encoding them meant a much smaller outgoing payload and a faster overall processing of the image. Base64 encoding an image results in a 30% average increase in data size.

The --base64 option added to the script was to allow for an “apples to apples” comparison with the Android phone. A Jira ticket has been opened against the Android code to remove Base64 encoding and enjoy the same speedup seen by our Python script.


Adding WebSocket support to our client and server was well worth it. The benefits in better latency and easier compatibility for new clients is a great payoff. The existence of well-designed libraries allowed our implementations to be relatively quick and uncomplicated.

This new WebSocket implementation completely supersedes the old Persistent TCP socket connection method in every way. We’ll keep support around for a while for all 3 connection methods, if only for experimentation and benchmarking purposes, but I expect that Persistent TCP support will eventually be removed.