Language Science Press

Lexicography in Asia and Generative AI


Keywords

Asian Lexicography, LLMs, Computational Lexicography, NLP, and AI


Arguments

The rapid development of generative artificial intelligence (AI), particularly large language models (LLMs), is profoundly reshaping the landscape of lexicography (de Schryver, 2033; Lew, 2024; OpenAI, 2023). These technologies offer unprecedented opportunities for automating and enhancing key stages of dictionary production, including sense discovery, definition writing, example generation, multilingual alignment, and semantic modeling (Hanks, 2012; Kilgarriff et al., 2014; Chen 2025). At the same time, Asian languages present distinctive challenges and research opportunities due to their typological diversity, writing systems, segmentation issues, and the uneven availability of linguistic resources (Wang & Huang 2013; Zhan et al., 2021). This special issue aims to address the intersection of these two dynamics by examining how generative AI can be meaningfully and responsibly integrated into lexicographic practices in Asia.

Lexicography in Asia encompasses a wide range of languages and scripts, such as Chinese, Japanese, Korean, Vietnamese, Thai, and Indic languages, many of which involve complex morphology, rich phraseology, or non- alphabetic writing systems (McEnery & Hardie, 2012). These characteristics complicate corpus processing, sense modeling, and dictionary structuring, especially in low-resource contexts (Lew, 2012). Generative AI, when combined with corpus-driven methodologies and computational pipelines, provides new ways to transform large- scale linguistic data into structured, reusable lexicographic resources. From corpus acquisition and preprocessing to micro- and macrostructural design, AI-assisted workflows have the potential to significantly reduce manual effort while improving coverage and consistency (de Schryver, 2023, Lew 2024).

Beyond automation, recent advances in knowledge representation—such as knowledge graphs, Linked Open Data, and ontological models like OntoLex-Lemon—enable lexicographic data to be integrated into broader semantic ecosystems (Cimiano et al. 2020, McCrae, McCrae et al., 2017; Chen et al. 2025). This is particularly relevant for Asian lexicography, where cross-lingual alignment, dialectal variation, and script conversion (e.g., traditional or simplified Chinese, romanization systems) play a central role (Navigli & Ponzetto, 2012; McCrae et al., 2017). Generative AI can support these processes by facilitating multilingual sense alignment, suggesting semantic relations, and assisting in the construction of interoperable lexical databases (Brown et al., 2020; OpenAI, 2023). However, the adoption of generative AI also raises critical methodological and ethical questions. Issues of bias, hallucination, data licensing, reproducibility, and evaluation standards are especially pressing when dealing with culturally sensitive lexical items and underrepresented language communities (Gebru et al., 2018; Bender et al., 2021; Blodgett et al., 2020). Systematic investigation is therefore required to establish best practices for human–AI collaboration in lexicography, robust evaluation protocols, and sustainable models for long-term dictionary maintenance.

By bringing together research on computational lexicography, AI technologies, and Asian languages, this special issue seeks to provide a comprehensive overview of current advances, practical tools, and theoretical perspectives (Lew, 2012; McCrae et al., 2017). It aims to foster interdisciplinary dialogue between lexicographers, computational linguists, and AI researchers, and to promote the development of reliable, transparent, and ethically grounded lexicographic resources for the multilingual realities of Asia.


Bibliography

Blodgett, S. L., Barocas, S., Daumé III, H., & Wallach, H. (2020). Language (technology) is power: A critical survey of “bias” in NLP. ACL 2020.

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots. FAccT ’21.

Brown, T., et al. (2020). Language models are few-shot learners. NeurIPS.

Cimiano, P., Chiarcos, C., McCrae, J. P., & Gracia, J. (2020). Modelling lexical resources as linked data. Springer.

Chen, L. (2025). Modeling and structuring of a bilingual French–Chinese phraseological dictionary. eLex 2025.

Chen, L., Gasparini, N., Dao, H.-L., & Do-Hurinvillle, D.-T. (2025). Toward a trilingual ontology of phraseological units. AsiaLex 2025.

Cimiano, P., McCrae, J., & Buitelaar, P. (2020). Ontology-based lexical resources. Morgan & Claypool.

Gebru, T., et al. (2018). Datasheets for datasets. FAT*.

Hanks, P. (2012). Corpus evidence and electronic lexicography. Oxford University Press.

Kabashi, B. (2018). A Lexicon of Albanian for Natural Language Processing. Lexicographica.

Wang, S., & Huang, C.-R. (2013). Applying Chinese Word Sketch Engine to facilitate lexicography. ASIALEX.

Kilgarriff, A., et al. (2014). The Sketch Engine: ten years on. Lexicography.

Lew, R. (2012). How can we make electronic dictionaries more effective? Oxford University Press.

Lew, R. (2024). Dictionaries and lexicography in the AI era.

McCrae, J. P., et al. (2017). The OntoLex-Lemon model. Semantic Web.

McEnery, T., & Hardie, A. (2012). Corpus Linguistics: Method, Theory and Practice. Cambridge University Press.

Navigli, R., & Ponzetto, S. P. (2012). BabelNet. Artificial Intelligence.

OpenAI. (2023). GPT-4 Technical Report.

De Schryver, G.-M. (2003). Lexicographers’ dreams in the electronic-dictionary age.

De Schryver, G.-M. (2023). Generative AI and Lexicography.

Zilio, L., & Kabashi, B. (2024). Using Neural Machine Translation for Normalising Historical Documents. EURALEX 2024.


Expression of Interest

Please send a title and a 500-word abstract, together with the bibliography and optionally 3–5 keywords, by May 30, 2026 to the mail below :
besim.kabashi@fau.de
lian.chen@univ-orleans.fr


Guest Editors

Besim Kabashi (University of Tübingen, Germany)

Lian Chen 陈恋 (LLL, University of Orléans, France)


Chief Editors

Vicent Ooi
vinceooi@nus.edu.sg

Hai Xu 徐海
xuhai1101@gdufs.edu.cn

Important Dates

February 15, 2026Opening of the call for papers
March 1, 2026Invitations sent to experts
May 30, 2026Submission of abstracts by authors
February 28, 2027Submission of the first version by authors
April 30, 2027Reviews returned
June 30, 2027Submission of the revised version by authors
August 1, 2027Final versions due
December 2027Expected publication

Contact

For any questions regarding the Lexicography: International Journal of Asialex , please contact us at the following address:
besim.kabashi@fau.de
lian.chen@univ-orleans.fr

Special issue Focus

Lexicography of Asian Languages, LLMs, Computational Lexicography, NLP, and AI

Related Links