Quick and easy guide to prepare a Detectron sandbox

Yousry Mohamed
8 min readJan 12, 2019



Update March 2020:

There is a next generation Detectron version now. If you are not strictly interested in Detectron v1, please head off to v2 stuff.

Object detection is an exciting topic in deep learning and it takes the basic idea of classifying an object in an image to the next level of finding where the object(s) is inside the image and even what its bounding contour. Many deep learning models are there and some of them are even available as ONNX models. One of those models is YOLO and I have a post on hosting and consuming it using Azure Machine Learning Service.

Facebook AI research has released a new system in 2018 called Detectron. Detectron is based on Caffe2 deep learning library and can be used with several underlying architectures like Mask R-CNN and RetinaNet. From its home page, it looked cool specially that it can detect contours of objects identified so I wanted to give it a go.

Using Detectron is supposed to be straightforward as per its getting started page on GitHub. Unfortunately this is not the case specially with the latest changes on Caffe2. Caffe2 was merged inside PyTorch sometime in 2018 and most of the guides to build a Caffe2 environment to try Detectron were not working. I have tried different combinations but always hit walls but that could be a problem of mine only.

Anyway, I thought to try again and instead of trying to prepare a Caffe2 environment I switched to create a PyTorch environment assuming it will have a valid Caffe2 setup. That approach worked fine as of Jan 2019 and I thought to share it. So let’s get cooking!

Create a PyTorch GPU-enabled docker container

We will not start from scratch so the easy way is to create a GPU-enabled docker container of PyTorch. The official PyTorch images are found on PyTorch page on docker hub. There is a bunch of different images but I picked pytorch/pytorch:nightly-devel-cuda9.2-cudnn7 for two reasons. The first is that it has a recent build of PyTorch which means it will probably have Caffe2 bundled. The second is that it uses Cuda 9 as I recall I tried the one with Cuda 10 and failed somewhere along the way. Detectron documentation says it has been tested on certain versions of Cuda & Python so pulling the latest and greatest might not always work. Another thing for this setup to work is nvidia-docker. As we will have a PyTorch GPU-enabled version we need to have GPU pass-through from host to the container. On a Windows box that might be hard unless you have latest versions of Windows Server (2016 or maybe 2019). I don’t know about macOS options but unless you run a Linux distro then we need the help of the cloud. Azure came to the rescue here and I can simply create a deep learning virtual machine. There are a bunch of settings to provision it but it’s mainly around OS flavour and HW specs. A Linux one with a single GPU will be good enough and it also comes with docker, nvidia-docker and GPU drivers pre-installed.

You can create a free trial Azure account if you don’t have one but the same idea applies to any other cloud. We just need a Linux VM with nvidia GPU, nvidia-docker & drivers installed.

Once the VM is created, the public IP address will be available in Azure portal to be grabbed and used to SSH into the VM. The first step is to pull the needed image and spawn a container out of it.

sudo docker pull pytorch/pytorch:nightly-devel-cuda9.2-cudnn7
sudo nvidia-docker run --rm -it pytorch/pytorch:nightly-devel-cuda9.2-cudnn7

We can then verify PyTorch is correctly installed and works fine with the GPU.

python -c "import torch;print(torch.cuda.get_device_name(0))"

Next step is to verify Caffe2 is installed as well but unfortunately if we try the same approach with Caffe2, we would get the following error.

To save you the Googling step, it’s due to missing python modules protobuf & future. So when they are installed, things work fine for Caffe2.

pip install protobuf
pip install future
python -c "from caffe2.python import workspace; print(workspace.NumCudaDevices())"

Cool, just with a simple provisioning of an Azure VM plus a few bash commands we have a running Caffe2 environment. Next we need to have Detectron installed inside the container.

Add Detectron to the recipe

As per installation page of Detectron, the following snippet will get it installed, there will be a couple of warnings during the build but they don’t break anything.

git clone https://github.com/facebookresearch/Detectron
pip install -r ./Detectron/requirements.txt
cd Detectron && make

To test the installation is fine, we will run a few tests mentioned in same installation page.

python ./detectron/tests/test_spatial_narrow_as_op.py

Fine but we need to fix one thing before proceeding which is OpenCV. The one installed along this process is version 4.0.0 (at the time of writing) and this comes with two issues. The first issue is a missing dependency we will install now and second issue is a breaking change in findContours function but we will come to that later.

So first let’s install the missing dependency and verify OpenCV is fine.

# credit goes to : https://www.kaggle.com/c/inclusive-images-challenge/discussion/70226apt-get update 
apt-get install libgtk2.0-dev
python -c "import cv2; print(cv2.__version__)"

On extra thing before we try Detectron is to install Coco Python API as it’s just referenced from the test script we are going to use shortly. So, the installation of Coco API is pretty easy.

cd /workspace
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
make install
cd /workspace/Detectron/

Moment of truth

Now it’s time to run one of those simple commands mentioned on Detectron’s get started page on GitHub and see if things work fine.

python tools/infer_simple.py \
--cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \
--output-dir /tmp/detectron-visualizations \
--image-ext jpg \
--wts https://dl.fbaipublicfiles.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \

The above script first downloads the weights of a certain model then uses this model to run the inference on a bunch of images in demo folder and is supposed to generate an output of one PDF for every input image containing the image itself with the objects identified and highlighted. Running the above, you would probably get the following error:

This error is due to the fact that the test scripts coming with Detectron assume OpenCV v 3.x while we have now v 4.0.0. There is a function called findContours that was returning three values in v 3.x but for v 4.0.0 it returns two values only. Just a small breaking change we need to work around.

We can use one of those lovely Linux text editors and edit the file /detectron/utils/vis.py and fix the two occurrences that have that extra return value. But I thought to fork Detectron repo and do a small version check like the following.

That will not work for 4.1but that’s another story. Then all needed now is to drop the current Detectron folder and pull this Detectron forked version and build it.

cd /workspace
rm -rf ./Detectron
git clone https://github.com/ylashin/Detectron
cd Detectron && make

Update May 2019:

The above OpenCV issue has been fixed now so probably there is no need to worry about it anymore and the above step of using my forked copy is better not to be used.

Alright, let’s try the same test script once again.

python tools/infer_simple.py \
--cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \
--output-dir /tmp/detectron-visualizations \
--image-ext jpg \
--wts https://dl.fbaipublicfiles.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \

Congrats, nothing failed and if we list files in output folder /tmp/detectron-visualizations we would see the PDF files generated.

It’s up to you how you would like to pull those PDFs and download them for inspection. For myself, I used docker to copy them from the container to the host VM. Then in the host VM, I put them in some folder under /notebooks as Azure deep learning VM comes with Jupiter installed and published externally on port 8000 (using HTTPS). So I could access them using the browser from my Windows laptop and download the PDFs locally.

Trying Detectron with some of my images yielded result like the following:

If you have high res images, Detectron will be able to find small objects that you may not even notice when you view the image without zooming 😃

Now what

That was nice and cool and before we forget, it’s better to open another SSH window and commit that container as a local image. We can also push it to docker hub for other people to use directly.

So if you want to save your time, you may try the final image.

sudo docker pull ylashin/detectron
sudo nvidia-docker run --rm -it ylashin/detectron

Other ideas to be investigated now could be:

  1. Exporting Detectron models to ONNX format so that it could be consumed more easily.
  2. Publishing a web service out of this image using Flask or .NET core. The PDFs generated with object masks are cool features coming with Detectron source code for the sake of testing and verification purposes. But to consume Detectron in a real application, we need to get the plain outcome of the inference and interpret/use it from the caller application.

P.S. The image below the post title is an image from Flickr fed to same test script above.

Happy detection!!



Yousry Mohamed

Yousry is a lead consultant working for Cuusoo. He is very passionate about all things data including Big Data, Machine Learning and AI.