Did you know that the MobiledgeX Computer Vision Android library communicates to a server running on the Edge cloudlet via a REST interface? And, by using this interface, this is how most face detection apps developed. It does make sense to use the interface. It is simple, well documented, and almost any type of client may utilize it. Although it may make sense to use this interface, unfortunately, it's not very efficient. For every image sent to the server, a new HTTP connection must be established to address the image data, receive the response, then tear the connection down. This sequence occurs for every single image frame processed. At ten frames per second, this quickly adds up to handling hundreds of frames with this "open, send, receive, close" sequence continually occurring.
One obvious optimization is to remove some of the repeated steps. In this case, a persistent TCP connection protocol is implemented. Now the open and close steps only happen once for each session.

This idea is good in theory, but implementing it comes with a cost, complexity, and losing any pre-existing frameworks that were leveraged for our REST version.
With that said, here is a summary of some of the pros and cons of each connection mode.
HTTP REST
Pros |
Cons |
---|---|
Simple |
Inefficient, Slower |
Easy to document the API |
|
Leverages well supported REST server framework, Django |
|
Leverages well-supported Android HTTP request library, Volley |
|
Supported by many client types |
Persistent TCP
Pros |
Cons |
---|---|
Efficient, Faster |
Complex, must invent our own protocol |
Can‘t use server framework |
|
TCP Socket libraries not as well supported on Android |
Early Results - Too Good to be True
From the beginning, we’ve had command-line scripts to test the REST server implementation. The script is fairly simple – it encodes a given image, and uses an HTTP POST to send it to the server repeatedly, then tallies the results. Here is some sample output:
python server_tester.py -s facedetection.defaultcloud.mobiledgex.net -e /detector/detect/ -f Bruce.jpg --show-responses -r 4
171.609 ms - {"success": "true", "server_processing_time": "14.969", "rects": [[73, 76, 147, 150]]}
42.769 ms to open socket
172.330 ms - {"success": "true", "server_processing_time": "22.358", "rects": [[73, 76, 147, 150]]}
163.079 ms - {"success": "true", "server_processing_time": "11.933", "rects": [[73, 76, 147, 150]]}
157.578 ms - {"success": "true", "server_processing_time": "12.203", "rects": [[73, 76, 147, 150]]}
Average Latency Full Process=166.149 ms
Average Latency Network Only=42.769 ms
The simplest way to do initial testing of the Persistent TCP server was to use a similar test script. The initial implementation of the server and client was a bit naive. There was no flow control, no way to tell where one request ended an another began, and the results turned out too good to be true. Results were showing a 100% increase in performance. For example, the REST results shown above showed a full process latency of 166 ms, and using the same server location, the new results were around 80 ms!
When it came time to implement the client-side in our Android app, we found that we had to define a protocol with a header including an operation code (specifying detection, recognition, etc.), and the length of the actual payload. Additionally, the length of each of these elements had to be defined as well. This extra information did not increase the data stream, but it turns out that processing did. Constantly parsing the stream for these lengths and values added a surprising amount of overhead, and the results were much less impressive.
Final Results
Test Script
tcp_client.py -s facedetection.defaultcloud.mobiledgex.net -o 1 -f Bruce.jpg -r 4 --show-responses
143.631 ms to send and receive: {"success": "true", "server_processing_time": "12.842", "rects": [[73, 76, 147, 150]]}
50.676 ms to open socket
103.116 ms to send and receive: {"success": "true", "server_processing_time": "13.233", "rects": [[73, 76, 147, 150]]}
96.268 ms to send and receive: {"success": "true", "server_processing_time": "12.541", "rects": [[73, 76, 147, 150]]}
105.010 ms to send and receive: {"success": "true", "server_processing_time": "11.806", "rects": [[73, 76, 147, 150]]}
===> Average Latency Full Process=112.006 ms
===> Average Latency Network Only=50.676 ms
This is a 32.5% improvement. Nothing like the 100% seen in our first iteration, but definitely worth implementing.
Note: Improvement calculation is (old-new)/old x 100%, so (166-112)/166) x 100% = 32.53.
Android Results
The following are some results collected from a few of our cloudlets around the world. The improvements observed appear dependent on network latency. The lower the network latency, the more substantial the increase in performance. This increase in performance is a great reason to run your application on the MobiledgeX infrastructure!
Network Latency |
HTTP Rest |
Persistent TCP |
Improvement |
---|---|---|---|
49 ms |
min/avg/max/stddev = 115/150/218/22 ms |
min/avg/max/stddev = 103/121/176/16 ms |
19.33% |
41 ms |
min/avg/max/stddev = 140/163/199/13 ms |
min/avg/max/stddev = 63/109/172/19 ms |
33.13% |
13 ms |
min/avg/max/stddev = 38/55/104/13 ms |
min/avg/max/stddev = 18/32/48/6 ms |
41.82% |
Note: These results may have some variance. The data represented was collected using various Android phones and different server configurations.
Computer Vision Library and Face Detection Server Support
Both our server and our client library now support persistent TCP connection. To enable persistent TCP connections, call the setPreferencesConnectionMode() static method on the ImageSender class, and pass in our instance. Here's an example:
ImageSender.setPreferencesConnectionMode(ImageSender.ConnectionMode.PERSISTENT_TCP, mImageSenderEdge);
Activity
You can work through the Face Detection App Workshop to try this yourself. See Android Workshop: Adding Edge Support and Face Detection to Workshop App.