February 2026 - Corpora

Fully funded doctoral fellowship in multilingual LLM development at University of Oslo
by Stephan Oepen 11 Feb '26

11 Feb '26

To whom it may concern (with apologies for cross-posting): The Language Technology Group (LTG) at the University of Oslo has a vacancy for a fully-funded doctoral fellowship for three years. The position will have its thematic focus on LLM development for “smaller” languages and the interplay of (model-based) data selection and fine-grained evaluation. Please see the job announcement for additional details: https://www.jobbnorge.no/en/available-jobs/job/294517/phd-research-fellow-i… LTG is a happy and productive research environment of close to 25 researchers in natural language processing from different walks of life. The group is part of the Computer Science Department at the University of Oslo, where we enjoy modern facilities and good access to national and European supercomputing facilities. LTG participates in several national and European flagship research initiatives, including the Digital Europe consortium OpenEuroLLM: Foundation Models for Transparent AI in Europe. This doctoral fellowship will be offered association with OpenEuroLLM, for example by means of collaboration with or research visits to other consortium members. The application deadline for this position is March 1. Interviews will be conducted by early April, and offers of employment made shortly thereafter. The latest possible starting date for this position is October 1, 2026. Please do not hesitate to contact me for further inquiries. Best wishes, oe

1 0

Summer School: The Paradigm Shift: From Rules to Models in Natural Language Processing
by Amal Haddad 11 Feb '26

11 Feb '26

The Paradigm Shift: From Rules to Models in Natural Language Processing International Summer School Alicante, Spain, 15, 16 and 17 June 2026 https://summer-school.gplsi.es First Call for Participation Natural Language Processing (NLP) has witnessed a clear paradigm shift: the transition from rule-based approaches to data-driven language models. While rule-based approaches dominated NLP for many years, during the 1990s and early 2000s they gradually gave way to statistical and machine-learning methods. It would be fair to say that data-driven models--and, most prominently, Deep Learning (DL), including more recently Large Language Models (LLMs)--have taken the world by storm. Deep Learning models are now used almost everywhere, across nearly every discipline, and Natural Language Processing is no exception. DL has proved highly promising so far, delivering improvements for almost every NLP task and application. However, as observed on numerous occasions, the outputs of DL models are not always ideal, with some studies reporting cases in which machine-learning approaches do not necessarily outperform the 'old-fashioned' rule-based ones. The overarching theme of the summer school will be this paradigm shift, with lectures and practical sessions reflecting the latest trends at both theoretical and practical levels. More specifically, the programme will combine lectures focusing on theoretical foundations with hands-on practical sessions. Specific topics will include an Introduction to Large Language Models (LLMs), Explainable AI in LLMs, Datasets and bias in LLMs, Building foundational LLMs for low-resource languages, Machine Translation for Low-Resource Languages, LLMs and sentiment analysis, Model and hyperparameter optimisation and Eye-tracking and gaze data for NLP and language models, among others. The summer school will be ideal for both newcomers and experienced professionals in NLP, computer science, data science, cybersecurity, corpus linguistics, language technologies, and related disciplines, offering a unique opportunity to deepen expertise and engage with the rapidly evolving world of LLMs. Venue and dates The summer school will take place at the research institute of Informatics of the University of Alicante and will take place on 15, 16 and 17 June 2026. Registration Registration will open in March 2026. Related events The summer school will follow the second international conference _Natural Language Processing and Artificial Intelligence_ (NLPAICS'2026) which will take place in Alicante on 11 and 12 June 2026 (https://nlpaics2026.gplsi.es). Keynote speaker Roberto Navigli (Sapienza University of Rome) Lecturers The list of summer school lecturers includes: Tharindu Ranasinghe (Lancaster University) Salima Lamsiyah (University of Luxembourg) Cengiz Acartürk (Jagiellonian University) Hansi Hettiarachchi (Lancaster University) Juan Pablo Consuegra Ayala (University of Alicante) Robiert Sepulveda Torres (University of Alicante) Alicia Picazo Izquierdo (University of Alicante) Isuri Anuradha (Lancaster University) Damith Premasiri (Lancaster University) Ernesto Luis Estevanell (University of Alicante) Maram Alharbi (Lancaster University) Summer school Directors Tharindu Ranasinghe (University of Lancaster) Salima Lamsiyah (University of Luxembourg) Summer School Chair Ruslan Mitkov (University of Alicante) Advisory Committee Manuel Palomar Sanz (University of Alicante) Rafael Muñoz Guillena (University of Alicante) Andrés Montoyo Guijarro (University of Alicante) Organising Committee Raúl García Cerdá (University of Alicante) Alicia Picazo Izquierdo (University of Alicante) Ernesto Luis Estevanell (University of Alicante) Maram Alharbi (Lancaster University) Further information Further information including registration details will be provided in subsequent calls. Alternatively, interested parties can email summer-school(a)dlsi.ua.es for more information. -- Amal Haddad Haddad (She/her) Facultad de Traducción e Interpretación Universidad de Granada |https://www.ugr.es/personal/amal-haddad-haddad Lexicon Research Group |http://lexicon.ugr.es/haddad Co-Convenor, BAAL SIG 'Humans, Machines, Language'|https://r.jyu.fi/humala Event Coordinator, BAAL SIG 'Language, Learning and Teaching' =============== Cláusula de Confidencialidad: "Este mensaje se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es Ud. el destinatario indicado, queda notificado de que la utilización, divulgación o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, se ruega lo comunique inmediatamente por esta misma vía y proceda a su destrucción. This message is intended exclusively for its addressee and may contain information that is CONFIDENTIAL and protected by professional privilege. If you are not the intended recipient you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited by law. If this message has been received in error, please immediately notify us via e-mail and delete it" ===============

1 0

Second CfP: Identity Aware NLP at LREC 2026
by Neele Falk 11 Feb '26

11 Feb '26

Dear all, We are organizing a workshop co-located with LREC 2026 on Identity Aware NLP. The details are as follows: ===================================================================== SECOND CALL FOR PAPERS Ethical and Technical Challenges for Identity-Aware NLP Workshop at LREC 2026, Palma de Mallorca, Spain, May 11-16, 2026 https://identity-aware-ai.github.io/ ===================================================================== *Workshop Theme:* What makes each of us unique, and which ethical and technical challenges does this imply? *OVERVIEW* What makes us unique? Language (and thus the automatic processing of it) is about people and what they mean. However, current practice relies on the assumptions that the involved humans are all the same, and that if enough data (and compute power) is present, the resulting generalizations will be robust enough and represent the majority. This approach often harms marginalized communities and ignores the notion of identity in models and systems. Our interdisciplinary workshop aims to raise the question of "what makes each of us unique?" to the NLP community. *WORKSHOP GOALS* - The development of a shared and interdisciplinary understanding of identities and how identity is treated in AI - The development of new methods that push the effective, fair, and inclusive treatment of individuals in AI to the next level *TOPICS OF INTEREST* We invite submissions on the following topics: *Modeling subjective phenomena and disagreement: *Personalization and perspectivist methods that challenge one-size-fits-all approaches by leveraging disaggregated data and annotator metadata. Methods that learn from disagreements rather than forcing consensus that erases unique perspectives. *Auditing and evaluating identity representation:* Techniques to measure how well models represent diverse identities, diagnose failures in capturing marginalized perspectives, and assess whether systems treat all identities equitably. Frameworks for identity-aware performance evaluation beyond aggregate metrics. *Bias detection and fairness interventions: *Methods to identify when models fail marginalized groups due to over-generalization, and techniques to mitigate such harms while preserving model utility. *Identity representation in LLMs: *How language models encode (or erase) diverse identities, embody particular perspectives, and either reproduce or challenge stereotypes. Measuring LLMs' capacity for reasoning about identities beyond majority groups. *Socio-political applications: *Modeling polarization, opinion formation, and deliberation in ways that account for identity rather than assuming homogeneous populations. How identity-aware approaches improve accuracy for politically sensitive tasks. *Methodological foundations from social sciences:* Best practices from psychology and survey science for measuring identity constructs (values, morals, narratives). Addressing challenges of using LLMs to model diverse populations while avoiding erasure through aggregation. *Accountability and responsible development: *Ethical responsibilities when building systems that represent (or exclude) identities. Making AI development processes accountable to marginalized communities most affected by over-generalization. *Identity-aware and community informed evaluation and auditing*: Community informed bias evaluation and auditing. Human evaluation of LLMs and other AI systems in an identity-aware manner. *SUBMISSION TYPES* We welcome the following types of submissions: * Long papers: 4-8 pages of content (excluding references) * Short papers: 4-8 pages of content (excluding references) * Non-archival submissions, student project presentations, mixed-media submissions For non-archival submissions, we welcome creative formats including: - Art, poetry, music - Blog posts - Jupyter notebooks - Teaching materials - Videos - Findings papers - Late-breaking papers - Extended abstracts For creative format submissions, please submit a PDF containing: - A summary or abstract of your work - A link to your work (if hosted externally) - Any additional context or documentation *SUBMISSION GUIDELINES * * All submissions will be double-blind reviewed * Submissions should follow LREC 2026 formatting guidelines available at: https://lrec2026.info/authors-kit/ * Papers must be 4-8 pages in length (excluding references) * Papers must include ethics and limitations sections * NO appendices are allowed (initial submission), up to 10 pages camera-ready * Originality and simultaneous submissions: submissions must be original, previously unpublished work. If a paper is submitted to or under consideration at another venue at the same time, this must be declared at submission time. If accepted here, it must be withdrawn from other venues; if accepted elsewhere while under review here, please notify us promptly. * Preprints: there is no anonymity period at LREC 2026, so authors may post preprints at any time; however, the version submitted for review must still be anonymized * Language resources (optional): at submission time, authors may share related language resources with the community; repository entries are linked to the LRE Map and provide metadata for the resource * Submission site: https://softconf.com/lrec2026/IdentityAwareAI * Proceedings and presentation: accepted papers will appear in the workshop proceedings. All accepted papers will be presented as posters. For remote participants, we will also organize a lightning round of short virtual presentations to accompany the posters. *WORKSHOP FORMAT* The workshop will be a half-day event featuring: - Keynote speeches from leading experts in the field - Paper presentations (oral and lightning talks) - Participatory design activity to develop a shared interdisciplinary vocabulary, identify current gaps in datasets for studying identity, and design a vision for collecting new datasets We are committed to ensuring that our workshop is accessible to all. The workshop will be held in a hybrid format, allowing both in-person and virtual participation. *IMPORTANT DATES* All deadlines are 11:59 PM AoE (Anywhere on Earth) * Submission Deadline: February 20, 2026 * Notification of Acceptance: March 20, 2026 * Camera-Ready Deadline: March 30, 2026 * Workshop Date: May 16, 2026 *DIVERSITY & INCLUSION * We actively encourage submissions from underrepresented communities and countries. The workshop organizers will provide mentorship and thorough feedback, especially to first-time authors and reviewers. * ORGANIZERS* Pranav A (University of Hamburg) Valerio Basile (University of Turin) Neele Falk (University of Stuttgart) David Jurgens (University of Michigan) Gabriella Lapesa (GESIS, Leibniz Institute for the Social Sciences & Heinrich-Heine University of Düsseldorf) Anne Lauscher (University of Hamburg) Soda Marem Lo (University of Turin) *CONTACT* For queries, please contact: identity-aware-ai(a)googlegroups.com Join us at Identity-Aware AI 2026 to contribute to this important conversation!

1 1

Two PostDoc positions in Natural Language Processing (Open Position and for Authorship Analysis)
by Steffen Eger 11 Feb '26

11 Feb '26

The Natural Language Learning & Generation (NLLG) group https://nl2g.github.io/ at University of Technology Nuremberg (UTN) is looking for two postdocs positions (E13 100%), to be filled as soon as possible: * one open position in our fields of expertise (see below) * one position for LLM-based authorship verification/attribution in German data (speaking German is beneficial) The duration of the positions is 12 to 15 months. The tasks include: * scientific research in at least one of our focus areas, see below * Writing and publishing research results in relevant conferences and journals, as well as scientific networking and outreach through their presentation at conferences * Participation in third-party funding applications * Supervision of doctoral students and student assistants * Design and teaching of courses on a small scale Application materials include: * tabular CV * letter of motivation (restricted to 1 page) * 1-page description of your desired contribution to the group as a postdoc * links to at least 3 top-quality conference or journal publications (ACL, EMNLP, NAACL, EACL, COLING, TACL, ICLR, ICCV, NeurIPS, AAAI, CVPR, or an equivalent) and a description of your role in each publication. Application deadline: * February 20, 2026 For questions please contact steffen.eger(a)utn.de<mailto:steffen.eger@utn.de> Please apply online via https://www.utn.de/en/career/job-openings/ Focus areas: The NLLG group is among the leading NLP groups in Europe. It has a broad focus on NLP related topics, including: evaluation of text generation (e.g. machine translation, multimodal tasks, etc.), NLP and digital humanities (e.g. automatic poetry generation, literary translation, language change, argumentation, authorship analysis), NLP and social sciences (e.g. LLM-based analysis of social solidarity over time, biases, fairness) and AI4Science topics (e.g. automatic generation of scientific figures). --------------------------------------------- Prof. Dr. Steffen Eger Heisenberg Professor Natural Language Learning & Generation (NLLG) University of Technology Nuremberg (UTN) https://nl2g.github.io/ <https://nl2g.github.io/>https://www.utn.de/en/person/prof-dr-steffen-eger/<https://www.utn.de/person/prof-dr-steffen-eger/> https://www.utn.de/en/departments/department-engineering/nllg-lab/ <https://nl2g.github.io/> Ulmenstraße 52i 90443 Nürnberg

1 0

UCL Summer School in English Corpus Linguistics
by Bas Aarts 11 Feb '26

11 Feb '26

The Survey of English Usage at University College London will be running its 13th Summer School in English Corpus Linguistics online from 24-26 June 2026. This Summer School is an accessible and inspiring introductory course in English Corpus Linguistics for students of linguistics and students of the English language. The course will be taught online over three days in the morning (UK time). The course consists of theoretical and practical sessions. Over the course of the three days, participants learn about the following: -the scope of Corpus Linguistics, and how we can use corpora to study the English Language; -key issues in Corpus Linguistics methodology; -how to use corpora to analyse issues in syntax, semantics, discourse and World Englishes; -basic elements of statistics; -how to navigate large and small corpora, particularly ICE-GB and DCPSE. At the end of the course, participants will have: -acquired a basic but solid knowledge of the terminology, concepts and methodologies used in English Corpus Linguistics; -had practical experience working with two corpora and a corpus exploration tool (ICECUP); -have gained an understanding of the breadth of Corpus Linguistics and the potential application for projects; -have learned about the fundamental concepts of inferential statistics and their practical application to Corpus Linguistics. Students are expected to have a basic knowledge of concepts in linguistics, especially grammar. Places are limited. Be sure to book early to get the early bird rate. For students in full-time education the course fee includes a free copy of either the ICE-GB Corpus (https://www.ucl.ac.uk/arts-humanities/research-projects/1998/sep/ice-gb) or the DCPSE Corpus (https://www.ucl.ac.uk/arts-humanities/research-projects/2006/sep/dcpse), with the associated exploration software ICECUP. For more information about the course, provisional timetable and how to apply, see: https://www.ucl.ac.uk/english/summer-school-english-corpus-linguistics Prof. Bas Aarts Department of English Language and Literature UCL Substack: https://basaarts.substack.com/ Continuous Professional Development and INSET courses for teachers: https://bit.ly/39qnKIH X: @UCLEnglishUsage and @EngliciousUCL Note: I respect your work/life balance. If I send you an email outside of your normal working hours there is no expectation that you will read or respond to the message at that time. Here, history happens. [image.png]

1 0

CfP: Shared Task in Vocabulary Difficulty Prediction at BEA Workshop, ACL 2026.
by Skidmore, Lucy (Exams) 11 Feb '26

11 Feb '26

Dear all, We invite participation in our Shared Task on Vocabulary Difficulty Prediction for English Learners, which will be hosted at The<https://sig-edu.org/bea/2026> <https://sig-edu.org/bea/2026> 21st Workshop on Innovative Use of NLP for Building Educational Applications<https://sig-edu.org/bea/2026> (co-located with ACL 2026) both online and in person in San Diego, CA, United States. This shared task focuses on predicting the difficulty of English vocabulary for learners with different L1 backgrounds. Evaluation will use the British Council’s Knowledge-based Vocabulary Lists (KVL), which provide psychometrically calibrated difficulty scores for English learners with Spanish, German, and Mandarin L1s. The task includes a Closed Track, limited to the provided data and standard NLP resources, and an Open Track, which allows external data and use of LLMs, to explore the full potential of current AI approaches. Important Dates 26 January: Release of training data and baseline models<https://github.com/britishcouncil/bea2026st> 20 March: Test data release 27 March: System submissions from teams due 3 April: Announcement of evaluation results by the organisers 24 April: System papers due 1 May: Paper reviews returned 12 May: Final camera-ready submissions 2-3 July: BEA 2026 workshop at ACL Further details can be found at our shared task website<https://www.britishcouncil.org/data-science-and-insights/bea2026st>. Please send any questions to vocabularychallenge(a)britishcouncil.org<mailto:vocabularychallenge@britishcouncil.org> or post a new topic in our forum<https://groups.google.com/g/bea-2026-shared-task/>. We look forward to your participation! Organisers: Mariano Felice (British Council) and Lucy Skidmore (British Council). The British Council is the United Kingdom's international organisation for cultural relations and educational opportunities. A registered charity: 209131 (England and Wales) SC037733 (Scotland). This message is for the use of the intended recipient(s) only and may contain confidential information. If you have received this message in error, please notify the sender and delete it. The British Council accepts no liability for loss or damage caused by viruses and other malware and you are advised to carry out a virus and malware check on any attachments contained in this message.

1 0

[DEADLINE EXTENSION] 3rd CfP: Social Context (SoCon) and Integrating NLP and Psychology to Study Social Interactions (NLPSI)
by Marco Antonio Stranisci 11 Feb '26

11 Feb '26

------------------------------------------------------------------- Joint Call for Papers Social Context (SoCon) and Integrating NLP and Psychology to Study Social Interactions (NLPSI) Co-located with LREC 2026, Palma, Mallorca (Spain) ------------------------------------------------------------------- Workshop day: May 12, 2026 Deadline for paper submission: February 16, 2026 February 23, 2026 Website: https://socon-nlpsi.github.io Contact: socon-nlpsi-workshop-organizers.nlproc(a)uni-bamberg.de OVERVIEW ------------------------------------------------------------------- Natural Language Processing has evolved significantly, enabling the modeling of high-level aspects of human communication. Relevant topics include pragmatics, social dynamics, and the integration of social context to better understand communicative intent. The SoCon and NLPSI workshops share a focus on the social dimensions of communication, while addressing distinct challenges. The Social Context Workshop explores how context shapes language use, seeking interdisciplinary collaboration across NLP, Pragmatics, Sociolinguistics, and Sociology. It aims to develop shared terminology and promote community-centered approaches as alternatives to traditional crowdsourcing. The NLPSI Workshop focuses on psychological processes shaping human communication, including how individuals perceive, process, and produce language. It welcomes interdisciplinary work from NLP, Social Psychology, and Affective Computing, with an emphasis on large-scale studies. TOPICS ------------------------------------------------------------------- This joint Call for Papers contains two tracks, SoCon and NLPSI. Authors should choose the track that best matches their contribution. SoCon Track "Towards Responsibly Infusing NLP with Social Context, Community Meanings, and Pragmatics Through Interdisciplinary NLP Efforts." Topics include, but are not limited to: * Interdisciplinary methods for modeling context, integrating NLP with pragmatics and social sciences * Studying social communities and how to engage with communities of practice and speech communities * Ethical challenges in resource creation, including participatory design involving relevant communities * Explaining behaviors in social interactions through models of social attitudes shaped by backgrounds, contexts, and triggering events NLPSI Track "Bridging the gap between NLP and psychological insights to foster a deeper understanding of social interactions." Topics include, but are not limited to: *Psychological constructs (beliefs, motives, feelings, affect, personality) *Psychological studies, especially those focused on interaction *Communication patterns such as empathy, persuasion, and conflict resolution *The role of emotions in interpersonal communication, such as emotion contagion and interpersonal emotion regulation SUBMISSION TYPES ------------------------------------------------------------------- * Long papers (up to 8 pages) presenting original research, from preliminary to established contributions * Short papers (up to 4 pages) presenting emerging ideas or early-stage research * Extended abstracts(non-archival, up to 2 pages): a new format designed to be inclusive of researchers from fields where conference papers are not standard (e.g., Social Sciences). Extended abstracts are not included in conference proceedings. SUBMISSION GUIDELINES ------------------------------------------------------------------- Submissions will be double-blind reviewed. Papers must follow the LREC templates (LaTeX, Word, Open Office, Overleaf). Page limits apply only to the main content; limitations, ethics, acknowledgements, references, and appendices do not count. Submission via Softconf: https://softconf.com/lrec2026/SoConNLPSI/ Authors must indicate resources used or created (data, tools, technologies, evaluation kits). ELRA encourages sharing of language resources to support reuse and replicability. Authors must follow ethical AI research policies and include an ethics statement. WORKSHOP FORMAT ------------------------------------------------------------------- The workshop follows LREC’s attendance policy. It will be a full-day hybrid event with keynotes and paper presentations (oral and lightning talks). IMPORTANT DATES ------------------------------------------------------------------- Paper submission deadline: February 16, 2026 February 23, 2026 Notification of acceptance: March 23, 2026 Camera-ready deadline: March 30, 2026 Workshop day: May 12, 2026 ORGANIZERS ------------------------------------------------------------------- SoCon Marco Antonio Stranisci, University of Turin Soda Marem Lo, University of Turin Sabine Weber, Bamberg University Rossana Damiano, University of Turin Simona Frenda, Heriot-Watt University Roman Klinger, University of Bamberg Viviana Patti, University of Turin Marteen Sap, Carnegie Mellon University Seid Muhie Yimam, University of Hamburg NLPSI Aswathy Velutharambath, University of Bamberg Sofie Labat, Ghent University Neele Falk, University of Stuttgart Flor Miriam Plaza-del-Arco, Bocconi University Roman Klinger, University of Bamberg Véronique Hoste, Ghent University Bennett Kleinberg, Tilburg University Marco https://marcostranisci.github.io/ - How happy were you with the shot selection the PhD even though they came back? - Happy? - Reasonable - Happy is not a word that we think about in the game the PhD. Think of something different. Happy, I don't know how to judge 'happy'. (Pop <https://www.youtube.com/watch?v=Fl_I9s1cN3Q>)

1 0

Extended Deadline: 12th Workshop on the Representation and Processing of Sign Languages (sign-lang@LREC 2026)
by Schulder, Dr. Marc 11 Feb '26

11 Feb '26

Dear all, due to numerous requests we are extending the submission period for sign-lang@LREC 2026 to Friday, 20 February. See below for the updated Call for Papers: Event: 12th Workshop on the Representation and Processing of Sign Languages (sign-lang@LREC 2026) **NEW** Submission deadline: 20 February 2026 Workshop date: 16 May 2026 Website: https://www.sign-lang.uni-hamburg.de/lrec2026/ Submission page: https://softconf.com/lrec2026/signlang2026/ CALL FOR PAPERS Submissions are invited for a full day workshop on sign language resources and technologies, to take place on 16 May 2026 as a satellite event of LREC 2026 in Palma de Mallorca, Spain. As in the previous four years, the workshop will be a hybrid event. The extended submission deadline is Friday, 20 February 2026. During the past years, a number of large-scale sign language corpus projects have started. Some have already been completed, but many more projects are about to start. At the same time, sign language technologies are maturing and are promising to support the time-consuming basic annotation. The workshop aims at bringing together those researchers who already work with multimodal sign language corpora (and those who see the need for empirical underpinnings of their current research) with those who develop sign language technologies. It provides the platform to compare competing approaches. As sign language resource technologies build to a large extent on methodologies and tools used in the language resource community in general, but add very specific perspectives (e.g. no writing system established, use of video as data source) and works with a different modality of human language, sign language research is able to feed back to the language resource community at large. At the same time, as the raw data are in the visual domain, the field naturally bridges into Computer Vision. Thus, researchers use Machine Learning methods on both visual and linguistic data. We invite submissions of papers to be presented either on stage (20 minutes plus 10 minutes discussion), as posters (with or without demonstrations) or remotely (poster PDF plus text chat) on the following topics: 2026 SPECIAL TOPIC: LANGUAGE IN MOTION Motion is at the core of sign languages, both literally, through their existence in the visual-gestural modality, and figuratively, in how their communities drive language change. Equally, sign language research must stay in motion, adapting to new insights and technological possibilities, advancing how we create and use resources, evolving the capabilities of tools, and pushing the boundaries of what can be expected from the field, both technologically and ethically. We especially invite contributions relating to the representation and processing of sign languages that address these various facets of language in motion, but also welcome papers on other general issues relating to sign language resources and technologies. GENERAL ISSUES ON SIGN LANGUAGE CORPORA AND TOOLS • Evaluation of sign language resources • Experiences in building sign language corpora • Elicitation methodology appropriate for corpus collection • Proposals for standards for linguistic annotation or for metadata descriptions • Experiences from linguistic research using corpora • Use of (parallel) corpora and lexicons in translation studies and machine translation • Avatar technology as a tool in sign language corpora and corpus data feeding into advances in avatar technology • Language documentation and long-term accessibility for sign language data • Annotation and visualization tools • Linking corpora and lexicons and integrated presentation of corpus and dictionary contents • “Internet as a corpus” for sign languages • Sign language corpus mining • Crowd and community sourcing for corpus work • Multi-lingual sign language resources and connecting sign language resources to language resources for spoken languages • Language change and how it relates to resource creation, corpus-driven linguistic research, and language technologies We are pleased to confirm that the workshop will be a hybrid event. Similar to the 2022 and 2024 workshops, all participants will be given access to an online text chat before and during the event to allow remote participants to present their work as well as for discussion of all workshop contributions. On-stage presentations will be live streamed (including International Sign/English interpretation) with opportunity for questions from remote and on-site participants. The live poster sessions will be held on-site only, but posters will be made available online for discussion via text chat. In the tradition of LREC, oral/signed presentations, poster presentations (with or without demonstrations) and remote presentations have equal status, and authors are encouraged to suggest the presentation format best suited to communicate their ideas. Papers (4–8 pages) of all accepted submissions to this workshop will be published as workshop proceedings published on the conference website – independent of whether you have a poster, remote or oral/signed presentation. The workshop does not differentiate between long, short, or position papers. Please submit your paper through the LREC START system (https://softconf.com/lrec2026/signlang2026/) not later than 14 February 2026 (any time zone), indicating whether you prefer an oral/signed presentation, a poster presentation, a poster presentation with demo, or a remote poster. Unlike the main conference, the workshop will be reviewed single-blind, so submissions SHOULD NOT BE ANONYMOUS. In all other respects, submissions should follow the LREC 2026 style guide (https://lrec2026.info/authors-kit/). ATTENTION Please note that you are expected to submit the full paper, not an extended abstract as in previous years! IMPORTANT DATES • **NEW** deadline for submissions: 20 February 2026 (11:59PM UTC-12:00 “anywhere on Earth”) • Notification of acceptance: 16 March, 2026 • Early bird registration ends: tbd • Camera ready version of the paper (for both oral/signed presentations and posters): 27 March 2026 • Submission of slides for interpreters' preparation (oral/signed presentations only): 6 May 2026 • Submission of all slides/posters for the conference platform: 6 May 2026 • Submission of additional material, including demo videos, to be made available alongside with the posters/slides on the conference platform: 6 May 2026 • This workshop: 16 May 2026 • LREC main conference: 13–15 May 2026 • LREC workshops 11, 12 & 16 May 2026

1 0

Important Update
by Mel Mistica 11 Feb '26

11 Feb '26

[cid:template_1770789672660]<https://share.google/ONWDW22As9yNi8r66>

1 0

Third Call for Papers — Joint Workshop on Legal and Ethical Issues in Human Language Technologies (LEGAL2026) and Computational Approaches to Language Data Pseudonymization, Anonymization, De-identification, and Data Privacy (CALD-pseudo 2026)
by Maria Irena Szawerna 10 Feb '26

10 Feb '26

**Apologies for cross-posting** Third Call for Papers: Joint Workshop on Legal and Ethical Issues in Human Language Technologies (LEGAL2026) and Computational Approaches to Language Data Pseudonymization, Anonymization, De-identification, and Data Privacy (CALD-pseudo 2026) Website: https://legal2026.mobileds.de/ Submission: https://softconf.com/lrec2026/LEGAL2026/ We invite submissions to the Joint Workshop on Legal and Ethical Issues in Human Language Technologies (LEGAL2026) and Computational Approaches to Language Data Pseudonymization, Anonymization, De-identification, and Data Privacy (CALD-pseudo 2026), to be held at LREC 2026 on the 12th of May 2026. Important Dates * 20th of February 2026: paper submission deadline * 30th March 2026: camera ready deadline (strict) * 12th May 2026: workshop date Introduction Access to text and speech data is essential for research, yet personal and sensitive information often prevents open sharing. Techniques such as pseudonymization and anonymization offer potential solutions, but their effectiveness, limitations, and impact on data utility require deeper investigation. Balancing privacy protection with meaningful scientific use remains a key challenge. At the same time, legal and ethical requirements increasingly shape how language resources can be created, processed, and distributed. Regulatory frameworks, such as the GDPR, the Data Act, and the Artificial Intelligence Act, affect access, reuse, and documentation duties for both text and speech data, creating a complex environment that demands interdisciplinary insight. The workshop brings these two perspectives together by addressing both the technical and practical aspects of de-identification as well as the legal and ethical obligations governing data handling. Topics include anonymization and pseudonymization methods, compliance in practical workflows, provenance and rights tracking, and emerging approaches to legal metadata. The goal is to foster responsible, legally sound, and technically robust innovation in human language technologies. Topics of Interest We invite contributions from all disciplines involved in the creation, processing, governance, and de-identification of text and speech data. Submissions may address theoretical, empirical, methodological, legal, or technical questions, including cross-disciplinary work. We particularly encourage research on less-represented languages and on data from under-represented communities. 1. Legal Aspects of Language Data (LEGAL2026) * Regulatory frameworks and global governance * Intellectual property, data protection, and LLM governance * Ethics, fairness, trust, and transparency * Compliance in practice * Ethics, fairness, and trust * Operationalizing compliance * Emerging and grey areas * Interdisciplinary and cross-border coordination 2. Pseudonymization, Anonymization, and De-identification: Theoretical, Methodological, and Technical Aspects (CALD-pseudo 2026) * Detection and classification of personal information (PI) * Replacement and transformation of PI * Utility and bias after de-identification * Approaches to evaluation and adversarial testing * Dataset creation for de-identification research * Low-resource scenarios * Speech-specific challenges * Cross-disciplinary applications and challenges We invite submissions from fields where de-identification of data plays an important role, including but not limited to Computational Linguistics, Applied Linguistics, Corpus Linguistics, Digital Humanities, Social Sciences, Political Sciences, Medical Science etc., from the perspectives of researchers, public organizations, and industry. Submission Guidelines Authors are invited to submit original and unpublished research papers in the following categories: * Long papers (up to 8 pages) for substantial contributions * Short papers (up to 4 pages) for: * Small, focused contributions or ongoing or preliminary work * Extended abstracts for non-technical submissions only, such as conceptual, theoretical, legal, ethical, policy-oriented, or position papers. Extended abstract submissions are expected to be developed into regular papers by the camera-ready submission deadline. The full papers will be published as workshop proceedings along with the LREC main conference. They should follow the LREC stylesheet, which is available on the conference website on the Author’s kit<https://lrec2026.info/authors-kit/> page. Unlike the main conference, we allow appendices of up to 10 pages already in the review phase. However, the reviewers will not be required to look in the appendices and must be able to review the paper based on everything contained within the main body of the paper (as if there were no appendices). Submission deadline: 20th of February 2026 Submission link: https://softconf.com/lrec2026/LEGAL2026/ When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones). Keynote Talks We are delighted to announce the workshop will host keynote talks from two speakers: * Paweł Kamocki, Leibniz-Institut für Deutsche Sprache, Germany * Ivan Habernal, Ruhr University Bochum, Germany Workshop Organizers LEGAL 2026: * Ingo Siegert, Otto-von-Guericke Universität Magdeburg, Germany * Paweł Kamocki, Leibniz-Institut für Deutsche Sprache, Germany * Kossay Talmoudi, ELDA, France * Khalid Choukri, ELDA, France CALD-pseudo 2026 * Maria Irena Szawerna, University of Gothenburg, Sweden * Simon Dobnik, University of Gothenburg, Sweden * Therese Lindström Tiedemann, University of Helsinki, Finland * Pierre Lison, Norwegian Computing Center & University of Oslo, Norway * Ildikó Pilán, Norwegian Computing Center, Norway * Ricardo Muñoz Sánchez, University of Gothenburg, Sweden * Lisa Södergård, University of Helsinki, Finland * Elena Volodina, University of Gothenburg, Sweden * Xuan-Son Vu, Lund University & DeepTensor AB, Sweden Program Committee A list of program committee members is available on the workshop webpage. Contact For inquiries, please contact ingo.siegert(a)ovgu.de for questions about LEGAL2026 or mormor.karl(a)svenska.gu.se for questions about CALD-pseudo 2026. Best regards, Maria Irena Szawerna ____________________ PhD student Språkbanken Text<https://spraakbanken.gu.se/> Institutionen för svenska, flerspråkighet och språkteknologi<https://www.gu.se/svenska-spraket> UNIVERSITY OF GOTHENBURG<https://www.gu.se/> https://spraakbanken.gu.se/om/personal/maria-szawerna

1 0

2026

2025

2024

2023

2022

Corpora February 2026