“MS-RMAC: Multiscale Regional Maximum Activation of Convolutions for Image Retrieval” improves current Maximum Activation of Convolutions (MAC) feature for image retrieval by using multi-layer and regional MAC. It is published in IEEE Signal Processing Letters in 2017.
- Problem Statement
- Features from a single convolutional layer are not robust enough for shape deformation, scale variation, and heavy occlusion.
- Proposed Solution
- Extract multiscale (MS) regional maximum activation of convolutions features from different layers of the convolutional neural networks.
- Propose aggregating MS features into a single vector by a parameter-free hedge method for image retrieval.
- MS-RMAC feature is aggregating from MS local convolutional feature maps, which does not require PCA-whitening or specialized fine-tuning on the test dataset and can mimic the ability of SIFT descriptors and CNN category-level features.
- Find that features from higher layers capture more semantic information of objects, and thus perform better than features from lower layers.
- Propose parameter-free weighting schemes that boost the effect of highly active semantic responses and improve retrieval accuracy.
Figure 1: Sample regions extracted at three different scales (l = 1, …, 3). The top-left region of each scale (gray colored region) and its neighboring regions toward each direction (dashed borders) are highlighted. We depict the centers of all regions with a red cross.
As shown in Fig. 1, we uniformly sample regions of width at each scale . Also, the regions are sampled to allow around 40% overlap between consecutive retions. Finally, we sum all regions features into a single vector:
- : the layer number of convolutional feature maps
- : RMAC features
- : the total region number
Figure 2: Flowchart of the proposed MS-RMAC feature extraction method. Multiple RMAC features from different layers are concatenated into a vector.
Instead of using the final convolutional layer RMAC feature, we propose to use features extracted from multiple convolutional layers.
- : proposed MS-RMAC feature
- : the total layer number
The image search is then performed by finding the nearest database image to the query and sorting image based on the MS-RMAC feature Euclidean distance, formally
- : distance between two images and
- : weight of different convolutional feature maps, and
Hedge Weight for MS-RMAC
The standard parameter-free hedge algorithm is proposed to tackle decision-theoretic online learning problems in a multiexpert multiround setting. And it can be used to calculate the weights .
In round , the hedge algorithm tries to calculate the weights . The loss of expert is computed as
- : the retrieval accuracy only using one layer RMAC feature
- : the retrieval accuracy using all layer RMAC features with the weights in round .
The standard parameter-free hedge algorithm generates a new weight distribution on all experts by introducing a regret measure defined by
where the weighted average loss among all experts is computed as .
By minimizing the cumulative regret in the first rounds to any expert , the weights will be generated.
The reason why the proposed MS-RMAC feature performs well is two fold:
- Visual representations using MS hierarchical RMAC features are more effective than single-scale CNN features.
- With CNN features from multiple scale, the proposed feature contains both category-level semantic information and fine-grained details information, which account for appearance changes caused by illumination variation, shape deformation, heavy occlusion, and background clutters.
- The hedge weight method for MS-RMAC* is suitable for boosting the effect of highly active semantic responses and improves image retrieval accuracy.