Correcting Image Orientation Using Convolutional Neural Networks : Adding final operationalisation touch

Yousry Mohamed
10 min readMar 4, 2019
A subway platform shot using Dutch angle — https://en.wikipedia.org/wiki/Dutch_angle

Preamble

Well, first thing to mention here is to give due credit to Daniel Sáez as most of the content of this article is based on his work to develop a deep learning Keras model to detect rotation angle of an image. The original blog post can be found here.

A teammate of mine who is a HoloLens guru asked whether machine learning could help with a vision problem he has with some mixed reality application. Rephrasing the problem, it’s mainly about finding if an image is rotated and what’s the rotation angle. He needs such information to adjust camera orientation and stuff like that. So ignoring mixed reality world, our machine learning problem is to have an input image like the left half of the below image and then detect how many degrees it’s rotated from normal orientation. If the rotation angle is known, then the left image could be rotated (or the camera could be adjusted) to get the correct one to the right. Normal orientation here is my term for the concept of having most images taken by cameras or smart phones with viewpoint similar to the human eye viewpoint.

Coming back to the question of my teammate, I assumed a CNN (Convolutional Neural Network) or a GAN (special type of CNN) could solve this problem. A quick Google search landed me on the blog post by Daniel mentioned above. It’s very detailed and provides source code needed to train and test such model. That’s awesome indeed but as usual we need to make the final model more accessible to application developers or whoever would be the final consumer of the model. So my idea was to grab the source code for training the model and then convert the trained model from Keras format to ONNX format and then that could be consumed easily.

The rest of this article is a step by step tutorial of starting with Daniel’s GitHub repo and ending with a web service that takes an image and returns the rotation angle of that image. The main thing we need here is an Azure subscription with a bit of credit. The training part and conversion to ONNX probably could be done on any machine equipped with Keras and a decent GPU. The Azure more relevant part is to use Azure Machine Learning service to host that ONNX model into a docker container and expose the inference functionality as a web service. It makes it easy to use your final model instead of setting up a complex environment like TensorFlow Serving or something.

Model training

I urge you to read the blog post by Daniel but in a nutshell, the core idea of solving our problem is to do some transfer learning by using a ResNet model plus its trained weights and add an extra final layer for detecting rotation angle. Also the model solves the problem as a classification problem not a regression one. The data could be MNIST images or Google Street View images where we can feed the model with a custom data generator that takes a certain image then rotates it a random number of degrees and crops it from the centre. So the training features are the rotated images and training labels (i.e. classes) are known rotation angles.

The first step to be done is to create an Azure Deep Learning virtual machine preferably the Linux-based one with at least NC6 SKU which basically has an NVIDIA K80 GPU. Once the VM is created, SSH into it and run the following to clone this repo.

git clone https://github.com/ylashin/RotNet.git

It’s basically a forked version of Daniel’s repo with a few tweaks:

  • Remove URL encoding from the links to download Google Street View training data from file data\streetview.py as this caused the script to fail. Maybe the original stuff was coded for a different OS flavour or Python version
  • Doing the training against half of the original Street View dataset, this just saves half of training time for me with probably minor impact on overall result accuracy

The main change done in the training script is to extend it to save the final model in ONNX format. Azure deep learning VM has a Python library called onnxmltools that makes the conversion from Keras to ONNX very easy. So the extra few lines which were added to training script are as follows:

We are now ready to run the training script, this will take time so you better find something useful to do and just check the progress every now and then.

cd ./RotNet
python ./train/train_street_view.py

The first step of the training script is to download Google Street View dataset. This is a huge dataset so will take time to download even on a cloud VM with decent internet connection.

For the model training part, the screenshot below shows only a single epoch as I have done the main training on one VM (50 epochs) and just used another VM to take screenshots and write the article. So on the second VM, I initialised the model with the training weights done on the first VM. Anyway, if you do the normal training from scratch it needs 50 epochs (unless you change it) and the final few lines of the script will convert the model to ONNX format and save it in current directory you run the training script from (root repo directory).

To confirm the existence of the final ONNX model run:

stat -c "%y %s %n" *

As shown above, there is a new ONNX file generated and it’s pretty huge.

Side note: For devices with small compute power like phones or HoloLens, there are options to train smaller models with acceptable accuracy for much smaller model size.

Now we have a trained model in ONNX format, the next step is to host it for consumption.

Hosting using Azure Machine Learning Service

Not to repeat myself, I will follow similar approach to my own blog post about hosting ONNX models using the new Azure Machine Learning Service. So to save myself the typing, I will be less verbose here.

First we need to copy that ONNX file and host it somewhere so we can download it from an Azure notebook. In my case, I uploaded that file to an Azure public blob storage container. Using Azure CLI which is already installed on the VM, the upload is as follows:

az storage blob upload --container-name PUBLIC_CONTAINER_NAME --file ./rotnet_street_view_resnet50.onnx --name rotnet_street_view_resnet50.onnx

Next we need to create an Azure Machine Learning Service workspace and create a new Azure notebook inside it. For more details on how to do that, please refer to the this blog post.

Inside the new notebook, the first step is to load AzureML SDK, login to your Azure subscription and download the ONNX file from the location it was uploaded to above.

By the way, I will share this Azure notebook in the same forked GitHub repo.

After that, we get a pointer to the workspace and register the ONNX file as an ML model in this workspace.

If the above runs fine, the workspace models tab will show the new model registered and ready to be used in a container image.

To consume the model, a docker image needs to be built that will have a web service to run the inference based on this model. To build this image, Azure ML SDK provides the needed API to do that but it will need the following:

  • Scoring script that acts as the web service code used to parse input and run the model in inference mode and return the result.
  • A conda dependency file for any libraries needed by the above script. In our case, I wanted to simplify for the consumer so all the pre-processing needed will be done in the scoring script and this needs Keras (for ResNet pre-processing) and OpenCV to load the input image from base 64 encoded byte array. If this pre-processing is not done in the scoring code (web service), then we need the web service consumer to do it which will just complicate this tutorial.
  • A docker file used to run some bash scripts to install some dependencies for Open CV.

Let’s have a look on those bits, starting with the scoring script:

The above script has two main component. An init function that is called once to load ONNX model injected in the container and keep it ready to be used as a global variable. A function that acts as we service receiving input image base64 encoded in a JSON document. The core steps of the web service execution are:

  1. Extract the image from JSON string and convert to numpy array
  2. Pre-process it using some Keras helper (normalising pixels and so on using same values used in RESNET)
  3. Expand the numpy array dimension as in most deep learning training/inference scenarios we need to pass a batch of samples/images. In this case it will be a batch of a single element
  4. Run the current model in inference mode and return the rotation angle that has highest probability

Next we should prepare a conda dependency file that lists all dependencies needed in YAML format. Azure ML SKD helps with that so we don’t have to write the file manually and remember the schema.

From the scoring script listed before, it’s obvious that we need numpy, OpenCV and Keras which requires TensorFlow as a backend. ONNX runtime and azureml-core are also needed.

Next a small docker file is required to install some Ubuntu package required for OpenCV otherwise OpenCV will just fail installation.

The next step is to write the code to build a docker image hosting the model and the web service code. The below is this code which is pretty easy to understand as it contains pointers to all mentioned components above

The last line will trigger building the image and if it works fine, the newly created docker image will appear in workspace images tab. This image is stored in docker container registry linked with current workspace.

If building the image fails for any reason, you can run the below to get a URL for build log where it can be inspected for more details.

print(image.image_build_log_uri)

The next step is to deploy a container instance out of the built image. This instance will be owned/hosted by Azure Container Instances but there is an option also to use Azure Kubernetes Service, at the end it’s a docker image. All needed is to specify HW specs of the container and it’s docker image and we are good to go.

Assuming things work fine, successful message will be shown and web service URL could be also obtained.

The new container instance will also appear in Deployments tab of the workspace.

Now it’s moment of truth, we have a web service and we want to see if it works as expected. I have rotated a picture of Queens Wharf in Brisbane, rotated it 180° and cropped it to have a 224x224 image like below:

This image was then uploaded to the same folder containing my azure notebook. The following simple snippet shows how to open this image and wrap it in a JSON document and call the web service.

And the result would be …

Not too bad, just away from the correct value 😉

Note that if we provide an image rotated like 90°, the outcome of the inference might be around 270° depending if the rotation direction made (clockwise vs anti-clockwise) matches the rotation direction done while training the model. I am just too lazy to go and have a look.

That’s cool but what if we want to call this web service using C# or Java ?

No problem, a small LinqPad program pointing to the correct web service URL can easily do the job.

And the result should be the same.

Conclusion

Deep learning is cool. Thanks to the effort done by Daniel, it’s easy to grab the trained model or do the training from scratch and with the help of Azure machine learning we had a working web service ready to be used from many application types.

Like the concept of transfer learning which is basically not reinventing wheels or standing on shoulders of giants, there are heaps of trained models and more will come for developers to make use of. Most of deep learning libraries have some repositories called model zoos hosting such things. Think of it as package repositories like npm/nuget/maven but not for code but for magic!

Resources:

The notebook, sample image and Linqpad script are shared in the Azure folder of the forked repository.

--

--

Yousry Mohamed

Yousry is a lead consultant working for Cuusoo. He is very passionate about all things data including Big Data, Machine Learning and AI.