Searching in academic databases and digital libraries for scholarly articles on "multimodal information retrieval" leads to a considerable amount of papers. For instance, in GoogleScholar we can find about 200 papers, and the top papers are related to image retrieval,
I sent my research proposal to a doctoral symposium, in which it got accepted for presentation and publication. Two out of three referees pointed out that multimodal is a confusing term to indicate our intention of combining text and visual features. Later, in the defense of my research proposal one out of two of the committee members also recommended to change that term. Then, I got confused about the right use of this word. I guess I have enough evidence that multimodal has been used to mean the same as I want. But the comments of other experts contradict it.
As far as I understand, multimodal may be used to indicate the interaction between a user and a system using different devices, as one of the referees indicated inside his review (multimodal interaction). On the other hand, when someone talks about multimodal data, it means that you have several sensors to measure different aspects of the same phenomenom (such as this). So, since the multimodal data perspective, images and text would be measures of the same phenomenon: a meaning or a semantic unit. However, it seems to be complicated, and non-natural to explain and understand in that way.
The discussion about multimodal data in the context of our research is still open. May be we can publish a review paper to discuss about that with many other people, in an information retrieval conference for instance. Meanwhile, I think I'll avoid the term unless we can be sure that it will be correctly understood.
[Lew2006] M. S. Lew, N. Sebe, C. Djeraba, and R. Jain, “Content-based multimedia information retrieval: State of the art and challenges,” ACM Trans. Multimedia Comput. Commun. Appl., vol. 2, no. 1, pp. 1–19, February 2006.
[Datta2008] R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image retrieval: Ideas, influences, and trends of the new age,” ACM Comput. Surv., vol. 40, no. 2, pp. 1–60, April 2008.