Computer Vision Persistent TCP Connection

January 14th, 2020

Headshot for Bruce Armstrong

Bruce Armstrong

Principal Engineer

Did you know that the MobiledgeX Computer Vision Android library communicates to a server running on the Edge cloudlet via a REST interface? And, by using this interface, this is how most face detection apps developed. It does make sense to use the interface. It is simple, well documented, and almost any type of client may utilize it. Although it may make sense to use this interface, unfortunately, it's not very efficient. For every image sent to the server, a new HTTP connection must be established to address the image data, receive the response, then tear the connection down. This sequence occurs for every single image frame processed. At ten frames per second, this quickly adds up to handling hundreds of frames with this "open, send, receive, close" sequence continually occurring.

One obvious optimization is to remove some of the repeated steps. In this case, a persistent TCP connection protocol is implemented. Now the open and close steps only happen once for each session.

This idea is good in theory, but implementing it comes with a cost, complexity, and losing any pre-existing frameworks that were leveraged for our REST version.

With that said, here is a summary of some of the pros and cons of each connection mode.





Inefficient, Slower

Easy to document the API

Leverages well supported REST server framework, Django

Leverages well-supported Android HTTP request library, Volley

Supported by many client types

Persistent TCP



Efficient, Faster

Complex, must invent our own protocol

Can‘t use server framework

TCP Socket libraries not as well supported on Android

Early Results - Too Good to be True

From the beginning, we’ve had command-line scripts to test the REST server implementation. The script is fairly simple – it encodes a given image, and uses an HTTP POST to send it to the server repeatedly, then tallies the results. Here is some sample output:

python -s -e /detector/detect/ -f Bruce.jpg --show-responses -r 4
171.609 ms - {"success": "true", "server_processing_time": "14.969", "rects": [[73, 76, 147, 150]]}
42.769 ms to open socket
172.330 ms - {"success": "true", "server_processing_time": "22.358", "rects": [[73, 76, 147, 150]]}
163.079 ms - {"success": "true", "server_processing_time": "11.933", "rects": [[73, 76, 147, 150]]}
157.578 ms - {"success": "true", "server_processing_time": "12.203", "rects": [[73, 76, 147, 150]]}
Average Latency Full Process=166.149 ms
Average Latency Network Only=42.769 ms

The simplest way to do initial testing of the Persistent TCP server was to use a similar test script. The initial implementation of the server and client was a bit naive. There was no flow control, no way to tell where one request ended an another began, and the results turned out too good to be true. Results were showing a 100% increase in performance. For example, the REST results shown above showed a full process latency of 166 ms, and using the same server location, the new results were around 80 ms!

When it came time to implement the client-side in our Android app, we found that we had to define a protocol with a header including an operation code (specifying detection, recognition, etc.), and the length of the actual payload. Additionally, the length of each of these elements had to be defined as well. This extra information did not increase the data stream, but it turns out that processing did. Constantly parsing the stream for these lengths and values added a surprising amount of overhead, and the results were much less impressive.

Final Results

Test Script -s -o 1 -f Bruce.jpg -r 4 --show-responses
143.631 ms to send and receive: {"success": "true", "server_processing_time": "12.842", "rects": [[73, 76, 147, 150]]}
50.676 ms to open socket
103.116 ms to send and receive: {"success": "true", "server_processing_time": "13.233", "rects": [[73, 76, 147, 150]]}
96.268 ms to send and receive: {"success": "true", "server_processing_time": "12.541", "rects": [[73, 76, 147, 150]]}
105.010 ms to send and receive: {"success": "true", "server_processing_time": "11.806", "rects": [[73, 76, 147, 150]]}
===> Average Latency Full Process=112.006 ms
===> Average Latency Network Only=50.676 ms

This is a 32.5% improvement. Nothing like the 100% seen in our first iteration, but definitely worth implementing.

Note: Improvement calculation is (old-new)/old x 100%, so (166-112)/166) x 100% = 32.53.

Android Results

The following are some results collected from a few of our cloudlets around the world. The improvements observed appear dependent on network latency. The lower the network latency, the more substantial the increase in performance. This increase in performance is a great reason to run your application on the MobiledgeX infrastructure!

Network Latency


Persistent TCP


49 ms

min/avg/max/stddev = 115/150/218/22 ms

min/avg/max/stddev = 103/121/176/16 ms


41 ms

min/avg/max/stddev = 140/163/199/13 ms

min/avg/max/stddev = 63/109/172/19 ms


13 ms

min/avg/max/stddev = 38/55/104/13 ms

min/avg/max/stddev = 18/32/48/6 ms


Note: These results may have some variance. The data represented was collected using various Android phones and different server configurations.

Computer Vision Library and Face Detection Server Support

Both our server and our client library now support persistent TCP connection. To enable persistent TCP connections, call the setPreferencesConnectionMode() static method on the ImageSender class, and pass in our instance. Here's an example:

ImageSender.setPreferencesConnectionMode(ImageSender.ConnectionMode.PERSISTENT_TCP, mImageSenderEdge);


You can work through the Face Detection App Workshop to try this yourself. See Android Workshop: Adding Edge Support and Face Detection to Workshop App.