“MS-RMAC: Multiscale Regional Maximum Activation of Convolutions for Image Retrieval” improves current Maximum Activation of Convolutions (MAC) feature for image retrieval by using multi-layer and regional MAC. It is published in IEEE Signal Processing Letters in 2017.


  • Problem Statement
    • Features from a single convolutional layer are not robust enough for shape deformation, scale variation, and heavy occlusion.
  • Proposed Solution
    • Extract multiscale (MS) regional maximum activation of convolutions features from different layers of the convolutional neural networks.
    • Propose aggregating MS features into a single vector by a parameter-free hedge method for image retrieval.
  • Contribution
    • MS-RMAC feature is aggregating from MS local convolutional feature maps, which does not require PCA-whitening or specialized fine-tuning on the test dataset and can mimic the ability of SIFT descriptors and CNN category-level features.
    • Find that features from higher layers capture more semantic information of objects, and thus perform better than features from lower layers.
    • Propose parameter-free weighting schemes that boost the effect of highly active semantic responses and improve retrieval accuracy.


MS-RMAC Representation

Regions Figure 1: Sample regions extracted at three different scales (l = 1, …, 3). The top-left region of each scale (gray colored region) and its neighboring regions toward each direction (dashed borders) are highlighted. We depict the centers of all regions with a red cross.

Computing RMAC

As shown in Fig. 1, we uniformly sample regions of width at each scale . Also, the regions are sampled to allow around 40% overlap between consecutive retions. Finally, we sum all regions features into a single vector:

  • Notation
    • : the layer number of convolutional feature maps
    • : RMAC features
    • : the total region number

Computing MS-RMAC

MS-RMAC Figure 2: Flowchart of the proposed MS-RMAC feature extraction method. Multiple RMAC features from different layers are concatenated into a vector.

Instead of using the final convolutional layer RMAC feature, we propose to use features extracted from multiple convolutional layers.

  • Notation
    • : proposed MS-RMAC feature
    • : the total layer number

The image search is then performed by finding the nearest database image to the query and sorting image based on the MS-RMAC feature Euclidean distance, formally

  • Notation
    • : distance between two images and
    • : weight of different convolutional feature maps, and

Hedge Weight for MS-RMAC

The standard parameter-free hedge algorithm is proposed to tackle decision-theoretic online learning problems in a multiexpert multiround setting. And it can be used to calculate the weights .

In round , the hedge algorithm tries to calculate the weights . The loss of expert is computed as

  • Notation
    • : the retrieval accuracy only using one layer RMAC feature
    • : the retrieval accuracy using all layer RMAC features with the weights in round .

The standard parameter-free hedge algorithm generates a new weight distribution on all experts by introducing a regret measure defined by

where the weighted average loss among all experts is computed as .

By minimizing the cumulative regret in the first rounds to any expert , the weights will be generated.



The reason why the proposed MS-RMAC feature performs well is two fold:

  1. Visual representations using MS hierarchical RMAC features are more effective than single-scale CNN features.
    • With CNN features from multiple scale, the proposed feature contains both category-level semantic information and fine-grained details information, which account for appearance changes caused by illumination variation, shape deformation, heavy occlusion, and background clutters.
  2. The hedge weight method for MS-RMAC* is suitable for boosting the effect of highly active semantic responses and improves image retrieval accuracy.