Multiword Expressions and Neology: Corpus Analysis and NLP Approaches
Background
Following our participation in the ENEOLI COST Action (Chair: Giovanni Luca Tallarico; Vice-Chair: Rute Costa) and NeoLex(Chen, Dao, Nouvel, Delaporte,2025) we are particularly interested in the fields of phraseology & multiword expressions and neology, corpus linguistics, natural language processing (NLP), and AI. In this context, we are pleased to announce the call for papers for the special issue Multiword Expressions and Neology: Corpus Analysis and NLP Approaches in the collective volume published by Language Science Press , scheduled for publication in December 2027.
Keywords
Phraseology, multi word expressions, neology, LLMs, NLP, and AI
Arguments
Phraseology (Cowie 1998; Granger & Meunier 2008; Mitkov 2017; Mel’čuk 2023; Polguère, 2002, 2014; Mejri 2018; Chen 2021), or multiword expressions (MWEs) (Savary 2008; Constant 2012), play a central role in language structure, lexical creativity, and cultural expression (Chen 2022a, 2022b). While MWEs have traditionally been associated with stability and conventionalization, contemporary language use—particularly in digital media, journalism, and social networks—reveals a growing number of innovative phrasemes that challenge existing theoretical and computational models (Chen 2025).
In recent years, the emergence of phraseological neologisms ((Lombard, Huyghe, Gygax 2021; Serebriak 2024)—newly created or resemanticized idioms, collocations, proverbs and paroemias, locutions, support verb constructions—has become a key area of interest in linguistics, lexicography, and natural language processing (NLP). These innovative phrasemes, which often arise in media discourse, social networks, and digital communication, combine lexical fixity with semantic dynamism, making them particularly challenging to identify, describe, and model.
Following Lombard, Huyghe, and Gygax (2020), phraseological neologisms are defined as “new multiword expressions that result from the conventional use of a phrase and are characterized by non-compositionality.” For instance, the French expression être au taquet (lit. “to be at the cleat”) has been recently lexicalized with the meaning “to be going full throttle.”
These expressions often exhibit a paradoxical combination of lexical fixity and semantic dynamism, making them difficult to detect, annotate, and model automatically.
To better situate phraseological neologisms within broader processes of lexical innovation, it is first necessary to clarify what is meant by neologism in linguistic theory.
The Oxford Dictionary of English defines a neologism as “a newly coined word or expression” (Soanes & Stevenson, 2005: 1179), while Webster’s Third New International Dictionary adopts a broader perspective, referring to “a new word, usage, or expression” (Gove et al., 1993: 1516), thereby explicitly integrating the notion of usage. Neologisms reflect ongoing social and cultural change and constitute a key indicator of linguistic vitality, as illustrated by recent lexical innovations such as teleworking, cryptocurrency, or generative artificial intelligence. As Crystal (1996: 73) notes, “the invention of new words is perhaps the most obvious way in which a language can exceed its existing resources.” Importantly, neology is not limited to the creation of new lexical items, but also encompasses innovative constructional patterns, morphological schemas, and shifts in grammatical categories.
From a Natural Language Processing (NLP) perspective, the detection and analysis of neologisms—often approached through the identification of unknown or out-of-vocabulary words—has been the object of several research initiatives. In the French context, projects such as EDyLex (Sagot & Nouvel 2013a ; 2013b) and the Néoveille platform have proposed methods for the automatic detection, linguistic characterization, and monitoring of lexical innovation in large corpora, combining rule-based and statistical approaches. More recently, projects such as NeoLex have sought to extend these approaches by integrating lexicographic modeling and ontological structuring of neological data, with a particular emphasis on Chinese and Vietnamese, two languages that remain underrepresented in existing neology-oriented NLP projects (Chen, Dao et al. 2025 ; Chen & Nouvel & Dao et al. 2026). At the European level, recent initiatives such as the COST Action ENEOLI (2024–2027) aim to structure research on lexical innovation by fostering collaboration between linguistic, sociological, and computational approaches. Despite these efforts, NLP research on neology remains largely focused on single-word units, and the systematic modeling of more complex phenomena—such as phraseological neologisms—still appears to be underdeveloped.
While neology in single-word units has been extensively studied, neology in phraseology and multiword expressions (MWEs) remains comparatively underexplored. Research has long demonstrated that phraseological fixedness is not static: under the influence of social, economic, and political change, idioms and fixed expressions may undergo significant formal and semantic transformations. Studies on défigement (Mejri 2013, Chen 2022) and variation have shown that phraseological units are subject to creative manipulation, contextual adaptation, and semantic reanalysis.
For each phraseological unit, it is therefore crucial to identify not only a canonical (lemmatized) form, but also its possible variants, including conjugational patterns, agreement constraints, alternative spellings, and syntactic realizations. Particular attention must be paid to discontinuous or disjunctive realizations, in which the components of a multiword unit are separated in the sentence while preserving their idiomatic meaning. As illustrated by Polguère (2020), expressions such as poser un lapin à quelqu’un (“to stand someone up”) may appear in syntactic variants like At the good friends’ meeting, there were often no rabbits, while avoir la pêche (“to feel energetic”) may occur in elliptical realizations such as What energy this morning!. Such cases demonstrate that collocations and fixed expressions can surface in dislocated or truncated forms, a phenomenon well documented in phraseological studies.
Phraseological innovation offers a privileged perspective on language change, shedding light on the interaction between lexical dependencies, semantic restructuring, usage frequency, and socio-cultural factors. While multiword expressions (MWEs) are widely acknowledged as central units of linguistic description and processing—encompassing idiomatic expressions, collocations, fixed phrases, support verb constructions, and named entities—their neological dynamics remain comparatively underexplored, particularly from a computational perspective (Sag et al., 2002; Gross, 1982, 1996).
The present volume, Multiword Expressions and Neology: Corpus Analysis and NLP Approaches, aims to address a significant gap in current research by focusing on the intersection of phraseology, neology, MWEs, corpus linguistics, lexical semantics, computational linguistics, lexicography and natural language processing (NLP). Some approaches have proven effective for identifying MWEs and capturing their internal variability, whether through statistical association measures, syntactic patterns, or machine-learning models (Smadja, 1993; Pecina, 2010; Ramisch et al., 2010a, 2010b; Vincze et al., 2011).
Thus, we seeks to bring together contributions from computational linguistics, corpus linguistics, lexicography, psycholinguistics, and theoretical linguistics, with a strong emphasis on studies that combine theoretical insight with empirical and computational validation. It focuses on how new multiword combinations are created, disseminated, conventionalized, and cognitively processed, as well as on how they can be automatically identified, modeled, and represented in NLP systems and lexicographic resources. Unlike traditional approaches that emphasize the stability and fixedness of MWEs, contemporary corpus data reveal a growing number of phraseological innovations arising from metaphorical extension, semantic shift, calquing, ellipsis, controlled variation, and contextual reuse. These phenomena challenge rigid classifications of MWEs and call for updated theoretical frameworks capable of accounting for both degrees of semantic opacity and patterns of syntactic flexibility (Gross, 1986; Mel’čuk, 2011). Cross-linguistic and multilingual research is particularly encouraged, especially work involving typologically distinct languages such as English and French (Indo-European), Chinese (Sino-Tibetan, largely analytic), and Vietnamese (Austroasiatic, analytic and tonal). Such perspectives are essential for understanding how cultural, cognitive, and structural factors shape the formation, diffusion, and conventionalization of neological MWEs, and for developing robust models capable of capturing phraseological innovation across languages.
Topics of interest include, but are not limited to:
1. Mechanisms and Drivers of Phraseological Innovation mechanisms of phraseological innovation, including metaphor, blending, ellipsis, calquing, re-motivation, and semantic drift phraseological neology: creation, resemanticization, and stabilization of MWEs semantic drift and re-motivation in idioms and collocations sociocultural and pragmatic factors driving phraseological innovation
2. Identification and Detection of Neological MWEs identification and annotation of neological MWEs in corpora and treebanks corpus-based methods for detecting new multiword units computational methods for detecting new MWEs, including distributional semantics, contextual embeddings, clustering, and language models
3. Diachronic and Evolutionary Perspectives diachronic analysis of MWE neology in large-scale corpora distributional and embedding-based approaches to MWE evolution computational modeling of variation, stability, and entropy in MWEs
4. Semantic and Syntactic Modeling semantic and syntactic modeling of innovative MWEs within frameworks such as Construction Grammar (CxG), HPSG, LFG, Universal Dependencies, and Meaning–Text Theory annotation frameworks based on Meaning–Text Theory (MTT) and lexical functions
5. Cross-linguistic and Multilingual Studies cross-linguistic comparison and multilingual studies of emerging MWEs, especially across typologically diverse languages (e.g. EN–FR–ZH–DE–VI)
6. Lexicographic and Digital Resources construction of lexicographic resources for neological MWEs digital resources and lexicons documenting MWE neology, including bilingual and multilingual databases contributions presenting reusable resources (corpora, lexicons, annotation schemes) or reproducible tools are particularly encouraged
7. NLP Applications and Language Technologies NLP applications for MWE extraction, semantic disambiguation, and multilingual alignment treatment of neological MWEs in NLP applications, including machine translation, terminology extraction, and text generation evaluation of emerging MWEs in large language models (LLMs) and generative systems
Bibliography
Expression of Interest
Please send a title and a 500-word abstract, together with the bibliography
and optionally 3–5 keywords,
by May 30, 2026 to the mail below :
lian.chen@univ-orleans.fr
besim.kabashi@fau.de
huy-linh.dao@inalco.fr
Guest Editors
Lian CHEN 陈恋 (LLL, University of Orléans, CRLAO-CNRS-INALCO, France)
Besim KABASHI (University of Tübingen, Germany)
HuyLinh DAO 匋辉靈 (CRLAO-CNRS-INALCO-EHESS, France)
Chief Editors
Mike Rosner
mike.rosner@um.edu.mt
Petya Osenova
petya@bultreebank.org
Important Dates
| February 15, 2026 | Opening of the call for papers |
| May 30, 2026 | Submission of abstracts by authors |
| October 30, 2026 | Submission of full chapters by authors |
| April 30, 2027 | Reviews returned |
| June 30, 2027 | Submission of the revised version by authors |
| August 1, 2027 | Final versions due |
| December 2027 | Expected publication |
Contact
For any questions regarding the Phraseo-MWE 2026 workshop, please contact us at the following address:
lian.chen@univ-orleans.fr
besim.kabashi@fau.de
huylinh.dao@inalco.fr
This book Focus
Phraseology, multi word expressions, neology, LLMs, NLP, and AI