Speech2Action: Cross-Modal Supervision for Action Recognition

CVPR, pp. 10314-10323, 2020.

Cited by: 0|Bibtex|Views132|DOI:https://doi.org/10.1109/CVPR42600.2020.01033
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
The key assumption is that if there is a consistent trend of a verb appearing in the screenplays before or after a speech segment, and our model is able to exploit this trend to minimise a classification objective, we infer that the speech is correlated with the action verb

Abstract:

Is it possible to guess human action from dialogue alone? In this work we investigate the link between spoken words and actions in movies. We note that movie screenplays describe actions, as well as contain the speech of characters and hence can be used to learn this correlation with no additional supervision. We train a BERT-based Spee...More

Code:

Data:

0
Introduction
  • You can get a sense of human activity in a movie by listening to the dialogue alone.
  • The words Hello, thanks for calling, is a good indication that somebody is speaking on the phone
  • Could this be a valuable source of information for learning good action recognition models?.
  • Obtaining large scale human labelled video datasets to train models for visual action recognition is a notoriously challenging task
  • While large datasets, such as Kinetics [20] or Moments in Time [30] consisting of individual short clips (e.g. 10s) are available, these datasets come at formidable human cost and effort.
  • Many such datasets suffer from heavily skewed distributions with long tails – i.e. it is difficult to obtain manual labels for rare or infrequent actions [15]
Highlights
  • Often, you can get a sense of human activity in a movie by listening to the dialogue alone
  • We make the following four contributions: (i) We train a Speech2Action model from literary screenplays, and show that it is possible to predict certain actions from transcribed speech alone without the need for any manual labelling; We apply the Speech2Action model to a large unlabelled corpus of videos to obtain weak labels for video clips from the speech alone; We demonstrate that an action classifier trained with these weak labels achieves state of the art results for action classification when fine-tuned on standard benchmarks compared to other weakly supervised/domain transfer methods; and more interestingly, we evaluate the action classifier trained only on these weak labels with no fine-tuning on the mid and tail classes from the AVA dataset [15] in the zero-shot and few-shot setting, and show a large boost over fully supervised performance for some classes without using a single manually labelled example
  • We plot the precisionrecall curves using the softmax scores obtained from the Speech2Action model (Fig. 6 in the Appendix)
  • The key assumption is that if there is a consistent trend of a verb appearing in the screenplays before or after a speech segment, and our model is able to exploit this trend to minimise a classification objective, we infer that the speech is correlated with the action verb
  • We provide a new data-driven approach to obtain weak labels for action recognition, using speech alone
  • The same principle used here could be applied to mine videos for more general visual content
Methods
  • Architecture

    Pre-training Acc.

    Shuffle&Learn [29] OPN [24] ClipOrder [49] Wang et al [43] 3DRotNet [19] DPC [16] CBT [38]

    S3D-G (RGB) VGG-M-2048 R(2+1)D C3D S3D-G (RGB) 3DResNet18 S3D-G (RGB)

    UCF101† [37] 35.8

    UCF101† [37] 23.8

    UCF101† [37] 30.9

    Kinetics† [37] 33.4 Kinetics†

    DisInit (RGB) [14]

    R(2+1)D-18 [42] Kinetics∗∗.
  • Shuffle&Learn [29] OPN [24] ClipOrder [49] Wang et al [43] 3DRotNet [19] DPC [16] CBT [38].
  • S3D-G (RGB) VGG-M-2048 R(2+1)D C3D S3D-G (RGB) 3DResNet18 S3D-G (RGB).
  • UCF101† [37] 30.9.
  • Kinetics† [37] 33.4 Kinetics†.
  • DisInit (RGB) [14].
  • R(2+1)D-18 [42] Kinetics∗∗
Results
  • The authors evaluate the performance of the model on the 220 movie screenplays in the val set.
  • The authors plot the precisionrecall curves using the softmax scores obtained from the Speech2Action model (Fig. 6 in the Appendix)
  • Those verbs that achieve an average precision (AP) higher than 0.01 are inferred to be correlated with speech.
  • Because the evaluation is performed purely on the basis of the proximity of speech to verb class in the stage direction of the movie screenplay, it is not a perfect ground truth indication of whether an action will be performed in a video.
  • The authors improve over these works by 3-4% – which is impressive given that the latter
Conclusion
  • The authors provide a new data-driven approach to obtain weak labels for action recognition, using speech alone.
  • With only a thousand unaligned screenplays as a starting point, the authors obtain weak labels automatically for a number of rare action classes.
  • The authors note that besides actions, people talk about physical objects, events and scenes – descriptions of which are present in screenplays and books.
Summary
  • Introduction:

    You can get a sense of human activity in a movie by listening to the dialogue alone.
  • The words Hello, thanks for calling, is a good indication that somebody is speaking on the phone
  • Could this be a valuable source of information for learning good action recognition models?.
  • Obtaining large scale human labelled video datasets to train models for visual action recognition is a notoriously challenging task
  • While large datasets, such as Kinetics [20] or Moments in Time [30] consisting of individual short clips (e.g. 10s) are available, these datasets come at formidable human cost and effort.
  • Many such datasets suffer from heavily skewed distributions with long tails – i.e. it is difficult to obtain manual labels for rare or infrequent actions [15]
  • Methods:

    Architecture

    Pre-training Acc.

    Shuffle&Learn [29] OPN [24] ClipOrder [49] Wang et al [43] 3DRotNet [19] DPC [16] CBT [38]

    S3D-G (RGB) VGG-M-2048 R(2+1)D C3D S3D-G (RGB) 3DResNet18 S3D-G (RGB)

    UCF101† [37] 35.8

    UCF101† [37] 23.8

    UCF101† [37] 30.9

    Kinetics† [37] 33.4 Kinetics†

    DisInit (RGB) [14]

    R(2+1)D-18 [42] Kinetics∗∗.
  • Shuffle&Learn [29] OPN [24] ClipOrder [49] Wang et al [43] 3DRotNet [19] DPC [16] CBT [38].
  • S3D-G (RGB) VGG-M-2048 R(2+1)D C3D S3D-G (RGB) 3DResNet18 S3D-G (RGB).
  • UCF101† [37] 30.9.
  • Kinetics† [37] 33.4 Kinetics†.
  • DisInit (RGB) [14].
  • R(2+1)D-18 [42] Kinetics∗∗
  • Results:

    The authors evaluate the performance of the model on the 220 movie screenplays in the val set.
  • The authors plot the precisionrecall curves using the softmax scores obtained from the Speech2Action model (Fig. 6 in the Appendix)
  • Those verbs that achieve an average precision (AP) higher than 0.01 are inferred to be correlated with speech.
  • Because the evaluation is performed purely on the basis of the proximity of speech to verb class in the stage direction of the movie screenplay, it is not a perfect ground truth indication of whether an action will be performed in a video.
  • The authors improve over these works by 3-4% – which is impressive given that the latter
  • Conclusion:

    The authors provide a new data-driven approach to obtain weak labels for action recognition, using speech alone.
  • With only a thousand unaligned screenplays as a starting point, the authors obtain weak labels automatically for a number of rare action classes.
  • The authors note that besides actions, people talk about physical objects, events and scenes – descriptions of which are present in screenplays and books.
Tables
  • Table1: Statistics of the IMSDb dataset of movie screenplays. This dataset is used to learn the correlation between speech and verbs
  • Table2: Number of true positives for 100 randomly retrieved samples for 10 classes. These estimates are obtained through manual inspection of video clips that are labelled with Speech2Action. While the true positive rate for some classes is low, the other samples still contain valuable information for the classifier. For example, although there are only 18 true samples of ‘kiss’, many of the other videos have two people with their lips very close together, or even if they are not ‘eating’ strictly, many times they are holding food in their hands
  • Table3: Action classification results on HMDB51. Pre-training on videos labelled with Speech2Action leads to a 17% improvement over training from scratch and also outperforms previous self-supervised and weakly supervised works. KSB-mined
  • Table4: Per-class average precision for 14 AVA mid and tail classes. These actions occur rarely, and hence are harder to get manual supervision for. For 8 of the 14 classes, we exceed fully supervised performance without a single manually labelled training example
  • Table5: Examples of speech samples for six verb categories labelled with the keyword spotting baseline. Each block shows the action verb on the left, and the speech samples on the right. Since we do not need to use the movie screenplays for this baseline, unlike Speech2Action (results in Table. 2 of the main paper), we show examples of transcribed speech obtained directly from the unlabelled corpus. Note how the speech labelled with the verb ‘point’ is indicative of a different semantic meaning to the physical action of ‘pointing’
  • Table6: Comparison with previous pre-training strategies for action classification on UCF101. Training on videos labelled with Speech2Action leads to a 7% improvement over training from scratch and outperforms previous self-supervised works. It also performs competitively with other weakly supervised works
Download tables as Excel
Related work
  • Aligning Screenplays to Movies: A number of works have explored the use of screenplays to learn and automatically annotate character identity in TV series [6, 10, 31, 36, 40]. Learning human actions from screenplays has also been attempted [2, 9, 23, 26, 27]. Crucially, however, all these works rely on aligning these screenplays to the actual videos themselves, often using the speech (as subtitles) to provide correspondences. However, as noted by [2], obtaining supervision for actions in this manner is challenging due to the lack of explicit correspondence between scene elements in video and their textual descriptions in screenplays.

    Apart from the imprecise temporal localization inferred from subtitles correspondences, a major limitation is that this method is not scalable to all movies and TV shows, since screenplays with stage directions are simply not available at the same order of magnitude. Hence previous works have been limited to a small scale, no more than tens of movies or a season of a TV series [2, 9, 23, 26, 27]. A similar argument can be applied to works that align books to movies [41, 53]. In contrast, we propose a method that can exploit the richness of information in a modest number of screenplays, and then be applied to a virtually limitless set of edited video material with no alignment or manual annotation required.
Funding
  • Acknowledgments: Arsha is supported by a Google PhD Fellowship
Reference
  • Relja Arandjelovic and Andrew Zisserman. Look, listen and learn. In Proceedings of the IEEE International Conference on Computer Vision, pages 609–617, 2017.
    Google ScholarLocate open access versionFindings
  • Piotr Bojanowski, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid, and Josef Sivic. Finding actors and actions in movies. In Proceedings of the IEEE international conference on computer vision, pages 2280–2287, 2013.
    Google ScholarLocate open access versionFindings
  • Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 961–970, 2015.
    Google ScholarLocate open access versionFindings
  • Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the Kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017.
    Google ScholarLocate open access versionFindings
  • Luciano Del Corro, Rainer Gemulla, and Gerhard Weikum. Werdy: Recognition and disambiguation of verbs and verb phrases with syntactic and semantic pruning. 2014.
    Google ScholarFindings
  • Timothee Cour, Benjamin Sapp, Chris Jordan, and Ben Taskar. Learning from ambiguously labeled images. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 919–92IEEE, 2009.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
    Findings
  • Olivier Duchenne, Ivan Laptev, Josef Sivic, Francis Bach, and Jean Ponce. Automatic annotation of human actions in video. In 2009 IEEE 12th International Conference on Computer Vision, pages 1491–1498. IEEE, 2009.
    Google ScholarLocate open access versionFindings
  • Mark Everingham, Josef Sivic, and Andrew Zisserman. “Hello! My name is... Buffy” – automatic naming of characters in TV video. In BMVC, 2006.
    Google ScholarLocate open access versionFindings
  • Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. Slowfast networks for video recognition. In Proceedings of the IEEE International Conference on Computer Vision, pages 6202–6211, 2019.
    Google ScholarLocate open access versionFindings
  • David F Fouhey, Wei-cheng Kuo, Alexei A Efros, and Jitendra Malik. From lifestyle vlogs to everyday interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4991–5000, 2018.
    Google ScholarLocate open access versionFindings
  • Deepti Ghadiyaram, Du Tran, and Dhruv Mahajan. Largescale weakly-supervised pre-training for video action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 12046–12055, 2019.
    Google ScholarLocate open access versionFindings
  • Rohit Girdhar, Du Tran, Lorenzo Torresani, and Deva Ramanan. Distinit: Learning video representations without a single labeled video. ICCV, 2019.
    Google ScholarFindings
  • Chunhui Gu, Chen Sun, David A Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, and Jitendra Malik. AVA: A video dataset of spatio-temporally localized atomic visual actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6047–6056, 2018.
    Google ScholarLocate open access versionFindings
  • Tengda Han, Weidi Xie, and Andrew Zisserman. Video representation learning by dense predictive coding. In Proceedings of the IEEE International Conference on Computer Vision Workshops, 2019.
    Google ScholarLocate open access versionFindings
  • Haroon Idrees, Amir R Zamir, Yu-Gang Jiang, Alex Gorban, Ivan Laptev, Rahul Sukthankar, and Mubarak Shah. The THUMOS challenge on action recognition for videos in the wild. Computer Vision and Image Understanding, 155:1–23, 2017.
    Google ScholarLocate open access versionFindings
  • Oana Ignat, Laura Burdick, Jia Deng, and Rada Mihalcea. Identifying visible actions in lifestyle vlogs. arXiv preprint arXiv:1906.04236, 2019.
    Findings
  • Longlong Jing and Yingli Tian. Self-supervised spatiotemporal feature learning by video geometric transformations. arXiv preprint arXiv:1811.11387, 2018.
    Findings
  • W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, M. Suleyman, and A. Zisserman. The Kinetics human action video dataset. CoRR, abs/1705.06950, 2017.
    Findings
  • Bruno Korbar, Du Tran, and Lorenzo Torresani. Cooperative learning of audio and video models from self-supervised synchronization. In Advances in Neural Information Processing Systems, pages 7763–7774, 2018.
    Google ScholarLocate open access versionFindings
  • Hildegard Kuehne, Hueihan Jhuang, Estıbaliz Garrote, Tomaso Poggio, and Thomas Serre. Hmdb: a large video database for human motion recognition. In 2011 International Conference on Computer Vision, pages 2556–2563. IEEE, 2011.
    Google ScholarLocate open access versionFindings
  • Ivan Laptev, Marcin Marszałek, Cordelia Schmid, and Benjamin Rozenfeld. Learning realistic human actions from movies. In IEEE Conference on Computer Vision & Pattern Recognition, 2008.
    Google ScholarLocate open access versionFindings
  • Hsin-Ying Lee, Jia-Bin Huang, Maneesh Singh, and MingHsuan Yang. Unsupervised representation learning by sorting sequences. In Proceedings of the IEEE International Conference on Computer Vision, pages 667–676, 2017.
    Google ScholarLocate open access versionFindings
  • Edward Loper and Steven Bird. Nltk: the natural language toolkit. arXiv preprint cs/0205028, 2002.
    Google ScholarFindings
  • Marcin Marszałek, Ivan Laptev, and Cordelia Schmid. Actions in context. In CVPR 2009-IEEE Conference on Computer Vision & Pattern Recognition, pages 2929–2936. IEEE Computer Society, 2009.
    Google ScholarLocate open access versionFindings
  • Antoine Miech, Jean-Baptiste Alayrac, Piotr Bojanowski, Ivan Laptev, and Josef Sivic. Learning from video and text via large-scale discriminative clustering. In Proceedings of the IEEE international conference on computer vision, 2017.
    Google ScholarLocate open access versionFindings
  • Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, and Josef Sivic. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips. In Proceedings of the IEEE international conference on computer vision, 2019.
    Google ScholarLocate open access versionFindings
  • Ishan Misra, C Lawrence Zitnick, and Martial Hebert. Shuffle and learn: unsupervised learning using temporal order verification. In European Conference on Computer Vision, pages 527–544.
    Google ScholarLocate open access versionFindings
  • Mathew Monfort, Alex Andonian, Bolei Zhou, Kandan Ramakrishnan, Sarah Adel Bargal, Yan Yan, Lisa Brown, Quanfu Fan, Dan Gutfreund, Carl Vondrick, et al. Moments in time dataset: one million videos for event understanding. IEEE transactions on pattern analysis and machine intelligence, 2019.
    Google ScholarLocate open access versionFindings
  • Iftekhar Naim, Abdullah Al Mamun, Young Chol Song, Jiebo Luo, Henry Kautz, and Daniel Gildea. Aligning movies with scripts by exploiting temporal ordering constraints. In 2016 23rd International Conference on Pattern Recognition (ICPR), pages 1786–1791. IEEE, 2016.
    Google ScholarLocate open access versionFindings
  • Andrew Owens and Alexei A Efros. Audio-visual scene analysis with self-supervised multisensory features. In Proceedings of the European Conference on Computer Vision (ECCV), pages 631–648, 2018.
    Google ScholarLocate open access versionFindings
  • Andrew Owens, Jiajun Wu, Josh H McDermott, William T Freeman, and Antonio Torralba. Ambient sound provides supervision for visual learning. In European conference on computer vision, pages 801–816.
    Google ScholarLocate open access versionFindings
  • Christopher Riley. The Hollywood standard: the complete and authoritative guide to script format and style. Michael Wiese Productions, 2009.
    Google ScholarFindings
  • Gunnar A Sigurdsson, Gul Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, and Abhinav Gupta. Hollywood in homes: Crowdsourcing data collection for activity understanding. In European Conference on Computer Vision, pages 510–526.
    Google ScholarLocate open access versionFindings
  • Josef Sivic, Mark Everingham, and Andrew Zisserman. who are you?-learning person specific classifiers from video. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 1145–1152. IEEE, 2009.
    Google ScholarLocate open access versionFindings
  • Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.
    Findings
  • Chen Sun, Fabien Baradel, Kevin Murphy, and Cordelia Schmid. Contrastive bidirectional transformer for temporal representation learning. arXiv preprint arXiv:1906.05743, 2019.
    Findings
  • Yansong Tang, Dajun Ding, Yongming Rao, Yu Zheng, Danyang Zhang, Lili Zhao, Jiwen Lu, and Jie Zhou. Coin: A large-scale dataset for comprehensive instructional video analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1207–1216, 2019.
    Google ScholarLocate open access versionFindings
  • Makarand Tapaswi, Martin Bauml, and Rainer Stiefelhagen. knock! knock! who is it? probabilistic person identification in tv-series. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2658–2665. IEEE, 2012.
    Google ScholarLocate open access versionFindings
  • Makarand Tapaswi, Martin Bauml, and Rainer Stiefelhagen. Book2movie: Aligning video scenes with book chapters. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
    Google ScholarLocate open access versionFindings
  • Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 6450–6459, 2018.
    Google ScholarLocate open access versionFindings
  • Jiangliu Wang, Jianbo Jiao, Linchao Bao, Shengfeng He, Yunhui Liu, and Wei Liu. Self-supervised spatio-temporal representation learning for videos by predicting motion and appearance statistics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4006– 4015, 2019.
    Google ScholarLocate open access versionFindings
  • Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. Temporal segment networks: Towards good practices for deep action recognition. In ECCV, 2016.
    Google ScholarLocate open access versionFindings
  • Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • David R Winer and R Michael Young. Automated screenplay annotation for extracting storytelling knowledge. In Thirteenth Artificial Intelligence and Interactive Digital Entertainment Conference, 2017.
    Google ScholarLocate open access versionFindings
  • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
    Findings
  • Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, and Kevin Murphy. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In Proceedings of the European Conference on Computer Vision (ECCV), pages 305–321, 2018.
    Google ScholarLocate open access versionFindings
  • Dejing Xu, Jun Xiao, Zhou Zhao, Jian Shao, Di Xie, and Yueting Zhuang. Self-supervised spatiotemporal learning via video clip order prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10334–10343, 2019.
    Google ScholarLocate open access versionFindings
  • Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh McDermott, and Antonio Torralba. The sound of pixels. In Proceedings of the European Conference on Computer Vision (ECCV), pages 570–586, 2018.
    Google ScholarLocate open access versionFindings
  • Hang Zhao, Zhicheng Yan, Heng Wang, Lorenzo Torresani, and Antonio Torralba. SLAC: A sparsely labeled dataset for action classification and localization. arXiv preprint arXiv:1712.09374, 2017.
    Findings
  • Luowei Zhou, Chenliang Xu, and Jason J Corso. Towards automatic learning of procedures from web instructional videos. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE international conference on computer vision, pages 19–27, 2015. (2) There is no word-sense disambiguation in the way the speech segments are mined, i.e. ‘Look at where I am pointing’ vs ‘You’ve missed the point’. Word-sense disambiguation is the task of identifying which sense of a word is used in a sentence when a word has multiple meanings. This task tends to be more difficult with verbs than nouns because verbs have more senses on average than nouns and may be part of a multiword phrase [5].
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments