Language Science Press

Multiword Expressions and Neology: Corpus Analysis and NLP Approaches


Background

Following our participation in the ENEOLI COST Action (Chair: Giovanni Luca Tallarico; Vice-Chair: Rute Costa) and NeoLex(Chen, Dao, Nouvel, Delaporte,2025) we are particularly interested in the fields of phraseology & multiword expressions and neology, corpus linguistics, natural language processing (NLP), and AI. In this context, we are pleased to announce the call for papers for the special issue Multiword Expressions and Neology: Corpus Analysis and NLP Approaches in the collective volume published by Language Science Press , scheduled for publication in June 2027.


Keywords

Phraseology, multi word expressions, neology, LLMs, NLP, and AI


Arguments

Phraseology (Cowie 1998; Granger & Meunier 2008; Mitkov 2017; Mel’čuk 2023; Polguère, 2002, 2014; Mejri 2018; Chen 2021), or multiword expressions (MWEs) (Savary 2008; Constant 2012), play a central role in language structure, lexical creativity, and cultural expression (Chen 2022a, 2022b). While MWEs have traditionally been associated with stability and conventionalization, contemporary language use—particularly in digital media, journalism, and social networks—reveals a growing number of innovative phrasemes that challenge existing theoretical and computational models (Chen 2025).

In recent years, the emergence of phraseological neologisms ((Lombard, Huyghe, Gygax 2021; Serebriak 2024)—newly created or resemanticized idioms, collocations, proverbs and paroemias, locutions, support verb constructions—has become a key area of interest in linguistics, lexicography, and natural language processing (NLP). These innovative phrasemes, which often arise in media discourse, social networks, and digital communication, combine lexical fixity with semantic dynamism, making them particularly challenging to identify, describe, and model.

Following Lombard, Huyghe, and Gygax (2020), phraseological neologisms are defined as “new multiword expressions that result from the conventional use of a phrase and are characterized by non-compositionality.” For instance, the French expression être au taquet (lit. “to be at the cleat”) has been recently lexicalized with the meaning “to be going full throttle.”

These expressions often exhibit a paradoxical combination of lexical fixity and semantic dynamism, making them difficult to detect, annotate, and model automatically.

To better situate phraseological neologisms within broader processes of lexical innovation, it is first necessary to clarify what is meant by neologism in linguistic theory.

The Oxford Dictionary of English defines a neologism as “a newly coined word or expression” (Soanes & Stevenson, 2005: 1179), while Webster’s Third New International Dictionary adopts a broader perspective, referring to “a new word, usage, or expression” (Gove et al., 1993: 1516), thereby explicitly integrating the notion of usage. Neologisms reflect ongoing social and cultural change and constitute a key indicator of linguistic vitality, as illustrated by recent lexical innovations such as teleworking, cryptocurrency, or generative artificial intelligence. As Crystal (1996: 73) notes, “the invention of new words is perhaps the most obvious way in which a language can exceed its existing resources.” Importantly, neology is not limited to the creation of new lexical items, but also encompasses innovative constructional patterns, morphological schemas, and shifts in grammatical categories.

From a Natural Language Processing (NLP) perspective, the detection and analysis of neologisms—often approached through the identification of unknown or out-of-vocabulary words—has been the object of several research initiatives. In the French context, projects such as EDyLex (Sagot & Nouvel 2013a ; 2013b) and the Néoveille platform have proposed methods for the automatic detection, linguistic characterization, and monitoring of lexical innovation in large corpora, combining rule-based and statistical approaches. More recently, projects such as NeoLex have sought to extend these approaches by integrating lexicographic modeling and ontological structuring of neological data, with a particular emphasis on Chinese and Vietnamese, two languages that remain underrepresented in existing neology-oriented NLP projects (Chen, Dao et al. 2025 ; Chen & Nouvel & Dao et al. 2026). At the European level, recent initiatives such as the COST Action ENEOLI (2024–2027) aim to structure research on lexical innovation by fostering collaboration between linguistic, sociological, and computational approaches. Despite these efforts, NLP research on neology remains largely focused on single-word units, and the systematic modeling of more complex phenomena—such as phraseological neologisms—still appears to be underdeveloped. Recent work presented at the ACL Workshop on Multiword Expressions (MWE), including studies on neural identification, multilingual MWE detection, and large language model–based approaches, further highlights the growing interest in modeling MWEs in contemporary NLP (Baldwin et al. 2023; Barbu Mititelu et al. 2024; Hadj Mohamed et al. 2024; Kissane et al. 2024; De Leon 2025; Miletić et al. 2024; Savary et al. 2023).

While neology in single-word units has been extensively studied, neology in phraseology and multiword expressions (MWEs) remains comparatively underexplored. Research has long demonstrated that phraseological fixedness is not static: under the influence of social, economic, and political change, idioms and fixed expressions may undergo significant formal and semantic transformations. Studies on défigement (Mejri 2013, Chen 2022) and variation have shown that phraseological units are subject to creative manipulation, contextual adaptation, and semantic reanalysis.

For each phraseological unit, it is therefore crucial to identify not only a canonical (lemmatized) form, but also its possible variants, including conjugational patterns, agreement constraints, alternative spellings, and syntactic realizations. Particular attention must be paid to discontinuous or disjunctive realizations, in which the components of a multiword unit are separated in the sentence while preserving their idiomatic meaning. As illustrated by Polguère (2020), expressions such as poser un lapin à quelqu’un (“to stand someone up”) may appear in syntactic variants like At the good friends’ meeting, there were often no rabbits, while avoir la pêche (“to feel energetic”) may occur in elliptical realizations such as What energy this morning!. Such cases demonstrate that collocations and fixed expressions can surface in dislocated or truncated forms, a phenomenon well documented in phraseological studies.

Phraseological innovation offers a privileged perspective on language change, shedding light on the interaction between lexical dependencies, semantic restructuring, usage frequency, and socio-cultural factors. While multiword expressions (MWEs) are widely acknowledged as central units of linguistic description and processing—encompassing idiomatic expressions, collocations, fixed phrases, support verb constructions, and named entities—their neological dynamics remain comparatively underexplored, particularly from a computational perspective (Sag et al., 2002; Gross, 1982, 1996).

The present volume, Multiword Expressions and Neology: Corpus Analysis and NLP Approaches, aims to address a significant gap in current research by focusing on the intersection of phraseology, neology, MWEs, corpus linguistics, lexical semantics, computational linguistics, lexicography and natural language processing (NLP). Some approaches have proven effective for identifying MWEs and capturing their internal variability, whether through statistical association measures, syntactic patterns, or machine-learning models (Smadja, 1993; Pecina, 2010; Ramisch et al., 2010a, 2010b; Vincze et al., 2011).

Thus, we seeks to bring together contributions from computational linguistics, corpus linguistics, lexicography, psycholinguistics, and theoretical linguistics, with a strong emphasis on studies that combine theoretical insight with empirical and computational validation. It focuses on how new multiword combinations are created, disseminated, conventionalized, and cognitively processed, as well as on how they can be automatically identified, modeled, and represented in NLP systems and lexicographic resources. Unlike traditional approaches that emphasize the stability and fixedness of MWEs, contemporary corpus data reveal a growing number of phraseological innovations arising from metaphorical extension, semantic shift, calquing, ellipsis, controlled variation, and contextual reuse. These phenomena challenge rigid classifications of MWEs and call for updated theoretical frameworks capable of accounting for both degrees of semantic opacity and patterns of syntactic flexibility (Gross, 1986; Mel’čuk, 2011). Cross-linguistic and multilingual research is particularly encouraged, especially work involving typologically distinct languages such as English and French (Indo-European), Chinese (Sino-Tibetan, largely analytic), and Vietnamese (Austroasiatic, analytic and tonal). Such perspectives are essential for understanding how cultural, cognitive, and structural factors shape the formation, diffusion, and conventionalization of neological MWEs, and for developing robust models capable of capturing phraseological innovation across languages.

Topics of interest include, but are not limited to:

1. Mechanisms and Drivers of Phraseological Innovation mechanisms of phraseological innovation, including metaphor, blending, ellipsis, calquing, re-motivation, and semantic drift
phraseological neology: creation, resemanticization, and stabilization of MWEs
semantic drift and re-motivation in idioms and collocations
sociocultural and pragmatic factors driving phraseological innovation

2. Identification and Detection of Neological MWEs
identification and annotation of neological MWEs in corpora and treebanks
corpus-based methods for detecting new multiword units
computational methods for detecting new MWEs, including distributional semantics, contextual embeddings, clustering, and language models

3. Diachronic and Evolutionary Perspectives
diachronic analysis of MWE neology in large-scale corpora
distributional and embedding-based approaches to MWE evolution
computational modeling of variation, stability, and entropy in MWEs

4. Semantic and Syntactic Modeling
semantic and syntactic modeling of innovative MWEs within frameworks such as Construction Grammar (CxG), HPSG, LFG, Universal Dependencies, and Meaning–Text Theory
annotation frameworks based on Meaning–Text Theory (MTT) and lexical functions

5. Cross-linguistic and Multilingual Studies
cross-linguistic comparison and multilingual studies of emerging MWEs, especially across typologically
diverse languages (e.g. EN–FR–ZH–DE–VI)

6. Lexicographic and Digital Resources
construction of lexicographic resources for neological MWEs
digital resources and lexicons documenting MWE neology, including bilingual and multilingual databases
contributions presenting reusable resources (corpora, lexicons, annotation schemes) or reproducible tools are particularly encouraged

7. NLP Applications and Language Technologies
NLP applications for MWE extraction, semantic disambiguation, and multilingual alignment
treatment of neological MWEs in NLP applications, including machine translation, terminology extraction, and text generation
evaluation of emerging MWEs in large language models (LLMs) and generative systems


Bibliography (selection)

Baldwin, T., Croft, W., Nivre, J., Savary, A., Stymne, S., et al. (2023). Universals of linguistic idiosyncrasy in multilingual computational linguistics (Dagstuhl Seminar 23191). Dagstuhl Reports, 13(5), 22–70. https://doi.org/10.4230/DagRep.13.5.22

Barbu Mititelu, V., Giouli, V., Evang, K., Zeman, D., Osenova, P., Tiberius, C., Krek, S., Markantonatou, S., Stoyanova, I., Stanković, R., & Chiarcos, C. (2024). Multiword expressions between the corpus and the lexicon: Universality, idiosyncrasy, and the lexicon-corpus interface. In Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024 (pp. 147–153). ELRA & ICCL.

Chen, L. (2025). Modeling and structuring of a bilingual French–Chinese phraseological dictionary: Neural automatic approach for ontology and lexicography. In eLex 2025: Electronic lexicography in the 21st century – Intelligent lexicography (pp. 830–851). Lexical Computing CZ.

Chen, L., Nouvel, D., Dao, H. L., & Delaporte, A. (2026). Étude et détection automatique des néologismes en chinois et en vietnamien (2015–2025) [Conference presentation, accepted]. Lexiques – Lexicons – Lexik, ATILF-CNRS, Université de Lorraine, Nancy, France.

Chen, L., Dao, H. L., Nouvel, D., & Delaporte, A. (2025). Extraction automatique et modélisation lexicale des néologismes en chinois et en vietnamien (2015–2025) : Le projet NeoLex. NLP & TAL – Traitement automatique des langues, INALCO, 1–75.

Chiarcos, C., Ionov, M., Apostol, E.-S., Gkirtzou, K., Kabashi, B., Khan, A. F., & Truică, C.-O. (2024). Multiword expressions, collocations and the OntoLex vocabulary. In V. Giouli & V. Barbu Mititelu (eds.), Multiword expressions in lexical resources: Linguistic, lexicographic, and computational perspectives (pp. 187–227). Berlin: Language Science Press. https://langsci-press.org/catalog/book/440

Constant, M. (n.d.). Mettre les expressions multi-mots au cœur de l’analyse automatique de textes : Sur l’exploitation de ressources symboliques externes (Habilitation à diriger des recherches, Université Paris-Est).

Lombard, A., Huyghe, R., & Gygax, P. (2021). Neological intuition in French: A study of formal novelty and lexical regularity as predictors. Lingua, 254, Article 103055.

Kissane, H., Schilling, A., & Krauss, P. (2025). Probing internal representations of multi-word verbs in large language models. In Proceedings of the 21st Workshop on Multiword Expressions (MWE 2025) (pp. 7–13). Association for Computational Linguistics.

Laureano De Leon, F. A., Abbas, A., Madabushi, H. T., & Lee, M. (2025). Evaluating large language models on multiword expressions in multilingual and code-switched contexts. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing: Natural Language Processing in the Generative AI Era (pp. 644–653). INCOMA Ltd.

Mel’čuk, I. A. (2023). General phraseology: Theory and practice. John Benjamins.

Miletić, F., & Schulte im Walde, S. (2024). Semantics of multiword expressions in transformer-based models: A survey. Transactions of the Association for Computational Linguistics, 12, 593–612.

Savary, A., Liu, J., Pierredon, A., Antoine, J.-Y., & Grobol, L. (2023). We thought the eyes of coreference were shut to multiword expressions and they mostly are. Journal of Language Modelling, 11(1), 147–187.

Savary, A., Stymne, S., Barbu Mititelu, V., Schneider, N., Ramisch, C., et al. (2023). PARSEME meets Universal Dependencies: Getting on the same page in representing multiword expressions. Northern European Journal of Language Technology, 9(1). https://doi.org/10.3384/nejlt.2000-1533.2023.4453

Zilio, L., & Kabashi, B. (2024). Using neural machine translation for normalising historical documents. In 21st EURALEX International Congress: Lexicography and Semantics, Cavtat/Dubrovnik, Croatia. http://euralex.org/wp-content/themes/euralex/proceedings/Euralex%202024/EURALEX2024_Pr_p827-839_Zilio-Kabashi.pdf


Submission Guidelines

Please submit your proposal, including a title and a 500-word abstract, together with the bibliography and optionally 3–5 keywords, by June 15, 2026 on the submission platform, and select Language Science Press: MWEs and Neology (June 15, 2026) at the following link:
https://phraseo-mwe2026.sciencesconf.org/submission/submit?lang=en


Reviewing Process

The volume will follow the publisher’s early two-reviewer (E2R) system. The reviewing process will proceed as follows:

  • The first revision phase, including the selection of submitted chapters, will be supervised by the Volume Editors (VEs). Each chapter will be reviewed by two anonymous external reviewers (Reviewer 1s), recruited and coordinated by the VEs. The evaluations provided by these reviewers constitute the primary basis for revisions and are of central importance to the review process.

  • A second reviewer appointed by the Series Editor (Reviewer 2) will have access to the full set of first-round reviews. Reviewer 2’s role is to ensure that the review process has been conducted with due diligence and to provide final approval before the Reviewer 1 reports, together with any additional recommendations, are transmitted to the authors via the Volume Editors.

  • Upon acceptance by Reviewer 2, the Series Editor will inform the Volume Editors that the first-level reviews may be forwarded to the authors and that the volume is considered accepted under the E2R procedure. Authors will then incorporate the requested revisions.

  • The Volume Editors will compile the final version of the manuscript, integrating all revisions and any additional recommendations. Acceptance for proof-reading by the publisher will be based on the final submitted volume.

  • Guest Editors

    Lian CHEN 陈恋 (LLL, University of Orléans, CRLAO-CNRS-INALCO, France)
    lian.chen@univ-orleans.fr

    Besim KABASHI (Computational Linguistics, University of Tübingen, Germany)
    besim.kabashi@fau.de

    HuyLinh DAO 匋辉靈 (CRLAO-CNRS-INALCO-EHESS, France)
    huy-linh.dao@inalco.fr


    Chief Editors

    Mike Rosner (University of Malta, Malta)
    mike.rosner@um.edu.mt

    Petya Osenova( Sofia University "St. Kliment Ohridski" Sofia, Bulgaria)
    petya@bultreebank.org


    Important Dates

    1 March 2026Call for abstracts
    15 June 2026Submission of abstracts
    10 July 2026Notification of acceptance
    30 November 2026Submission of full chapters
    15 January 2027Return of Reviewer1 reports
    22 January 2027Submission to Reviewer2 (final versions of chapters + reviewer1 comments + preliminary covering materials from VEs)
    22 February 2027Reviewer2 report received/ Reviewer1 reports to authors
    15 April 2027Submission of revised chapters (final version)
    15 May 2027Final checks and bibliography compilation
    1 June 2027Final volume submitted to Language Science Press

    Contact

    For any questions regarding the Phraseo-MWE 2026 workshop, please contact us at the following address:
    lian.chen@univ-orleans.fr
    besim.kabashi@fau.de
    huylinh.dao@inalco.fr

    This book Focus

    Phraseology, multi word expressions, neology, LLMs, NLP, and AI

    Scientific Committee (in process)

    Related Links