Open Access

Fig. 1.


Download original image

Generic architecture of the multimodality network. The number of parallel blocks is contingent on the number of modalities. The depth of both the parallel and common blocks will be determined by the type of fusion being implemented (early, middle, or late fusion); however, it is important to note that the total network depth is fixed at eight (each modality will go through eight inception blocks in total). The same goes for the average pooling layers; they are performed consistently through the different architectures, before the 1st, 4th, and 6th inception blocks and the last one after the valid padding convolution layers. The baseline model without the multimodality approach represents a special case, where all the image bands are grouped into a single modality. The fixed depth allows for a standardized comparison between the different experiments.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.