February 2026 - Corpora

CFP: Convergence 2026: Human-AI Integration for Multilingual and Accessible Communication (Extended deadline)
by Constantin Orasan 18 Feb '26

18 Feb '26

Convergence 2026: Human-AI Integration for Multilingual and Accessible Communication Guildford, UK, 17 - 19 June 2026 Third call for papers https://www.surrey.ac.uk/centre-translation-studies/convergence-2026 New submission deadline: 14th March 2026 The conference Building on the success of the first Convergence conference<https://www.surrey.ac.uk/centre-translation-studies/convergence-2023> in 2023, which explored the responsible and intelligent integration of human and machine capabilities in translation and interpreting, the Centre for Translation Studies at University of Surrey, UK, is proud to announce Convergence 2026: Human-AI Integration for Multilingual and Accessible Communication. The second edition of the Convergence conference will create an opportunity to bring together innovative research on the evolving landscape of AI in the context of multilingual and accessible communication, reflecting on the complexity and effects of using AI-driven technologies in these fields. The conference will foster a multidisciplinary dialogue that will generate new theoretical perspectives and practical research, focusing on themes such as the ethical aspects of AI in translation and interpreting, AI-enabled digital accessibility and societal inclusion, and the impact of Generative AI on language mediation. We will also examine the evolving role of language professionals, the power of Large Language Models (LLMs) in supporting multilingual communication, and the crucial need for responsible use of language AI in the public sector. The conference will publish full papers in open access proceedings with assigned ISBN and DOI. The conference will be preceded by a Summer school on Artificial Intelligence for Accessible Communication between 15th and 17th June 2026. The application process for the summer is currently open at https://www.surrey.ac.uk/centre-translation-studies/convergence-2026/summer… Conference themes Theme 1: Ethical aspects of AI in translation and interpreting Theme 2: AI-enabled digital accessibility and societal inclusion Theme 3: Which creative turn? Language mediation in the era of GenAI Theme 4: The evolving role of language professionals in the era of AI Theme 5: LLMs supporting multilingual communication Theme 6: Responsible use of language AI in the public sector Full description of the themes is available on the conference website: https://www.surrey.ac.uk/centre-translation-studies/convergence-2026#themes Submissions and publications Convergence 2026 invites the following types of submissions on one of the conference themes: * Long papers - describing original completed research. Allowed paper length: maximum 8 pages + unlimited number of pages for references and appendices * Short papers - describing work in progress. Allowed paper length: maximum 4 pages + unlimited number of pages for references and appendices The conference will not consider and evaluate abstracts only. Full details about paper submission are available on the conference website at https://www.surrey.ac.uk/centre-translation-studies/convergence-2026/submis… Invited speakers * Horacio Saggion<https://www.upf.edu/web/horacio-saggion>, Chair in Computer Science and Artificial Intelligence and Head of the TALN Group and Large Scale Text Understanding Systems Lab at the Department of Information and Communication Technologies, Universitat Pompeu Fabra. * John Anthony O'Shea<https://www.linkedin.com/in/johnanthonyoshea/>, LL.B, LL.M, Founder of Jurtrans, Chairperson of FIT-Europe, member of EU's Language Industry Expert Group Programme committee The programme committee is available at https://www.surrey.ac.uk/centre-translation-studies/convergence-2026/commit… Important dates * 7th March 2026: Registration of intention to submit a paper (optional) * 14th March 2026: Submissions of full papers * 16th April 2026: Notification of acceptance * 29th May 2026: Camera ready papers for the draft proceedings * 15th - 17th June 2026: Summer school on Artificial Intelligence for Accessible Communication * 17th - 19th June 2026: The Convergence conference * 1st Sept 2026: Camera ready papers for final proceedings Venue The conference will take place in Guildford at University of Surrey. If you have any questions do not hesitate to contact us on cts_inquiries(a)surrey.ac.uk<mailto:cts_enquiries@surrey.ac.uk> Conference organisers Conference chair: Prof Sabine Braun<https://www.surrey.ac.uk/people/sabine-braun> Programme chairs: Prof Constantin Orasan<https://www.surrey.ac.uk/people/constantin-orasan> and Dr Diana Singureanu<https://www.surrey.ac.uk/people/diana-singureanu> Proceedings chairs: Dr Felix do Carmo<https://www.surrey.ac.uk/people/felix-do-carmo> and Prof Constantin Orasan<https://www.surrey.ac.uk/people/constantin-orasan> Summer school chairs: Dr Elena Davitti<https://www.surrey.ac.uk/people/elena-davitti> and Prof Sabine Braun<https://www.surrey.ac.uk/people/sabine-braun> Sponsorship chairs: Sara Palmer<https://www.surrey.ac.uk/people/sara-palmer> and Aimee Savage<https://www.surrey.ac.uk/people/aimee-savage> Local organisers: Aimee Savage<https://www.surrey.ac.uk/people/aimee-savage> and Dr Yuan Zou<https://www.surrey.ac.uk/people/yuan-zou> --- Prof Constantin Orăsan Professor of Language and Translation Technologies Centre for Translation Studies<https://www.surrey.ac.uk/centre-translation-studies>, University of Surrey, UK Personal page: https://www.surrey.ac.uk/people/constantin-orasan

1 0

1st Call for Papers: SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages
by Atul K. Ojha 18 Feb '26

18 Feb '26

Apologies for cross-posting. --------------------------------------------------------------------------- *SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL* *Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages* *https://sites.google.com/view/sigul2026/home-page <https://sites.google.com/view/sigul2026/home-page>* ------------------------------------ We are pleased to announce the upcoming SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL on Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages <https://sites.google.com/view/sigul2026/home-page>, co-located with *LREC 2026 *in Palma, Mallorca, Spain. This workshop brings together researchers working on less-resourced, endangered, minority, low-density, and underrepresented languages to share novel techniques, resources, strategies, and evaluation methods. We emphasize the entire pipeline: data creation, modeling, adaptation/transfer, system development, evaluation, deployment, and ethical/community engagement. We invite contributions on, but not limited to, the following topics: - Data collection, annotation, and curation for under-resourced languages (crowdsourcing, participatory methods, gamification, unsupervised or weakly supervised methods) - Learning with limited supervision (zero- or few-shot, PEFT, RAG with linguistic resources) - Multilingual alignment, representation learning, and language embeddings, including rare languages - Speech, multimodal, and cross-modal technologies for under-resourced languages (speech recognition, synthesis, speech-to-text, speech translation, multimodal resources) - Basic text processing (normalization, orthography, transliteration, tokenization/segmentation, morphological and syntactic processing) in and for low-resource settings. - Low-resource machine translation (pivoting, alignment, synthetic data) - Evaluation frameworks, benchmarks, and metrics designed or adapted for underrepresented languages - Adaptation, domain adaptation, and robustness to domain shift in low-resource contexts - Responsible approaches, ethical issues, community engagement, data sovereignty, and language revitalization - Deployment, tools, and practical systems for underserved languages (e.g., mobile apps, dictionary or translation apps, linguistic tools) - Case studies of success and negative results (with lessons learned) - Interoperability, standardization, and metadata practices for datasets in low-resource scenarios Special Themes Language modeling for intra-language variation, dialects, accents, and regional variants of less-resourced languages Many less-resourced languages display rich internal diversity, including dialects, accents, and regional or social varieties. This special theme focuses on developing language models and speech technologies that capture and respect intra-language variation rather than reduce it to a single “standard.” We welcome work on dialect identification and adaptation, accent-robust speech systems, normalization vs. diversity-preserving modeling, and cross-dialect transfer in low-data scenarios. Approaches combining linguistic insights, community participation, and ethical awareness are especially encouraged. The aim is to build technologies that reflect and sustain the true linguistic richness of under-resourced languages. Ultra-Low-Resource Language Adaptation This special theme focuses on methods that enable effective language and speech technology development under extreme data scarcity. We invite research on transfer learning, cross-lingual adaptation, multilingual pretraining, and self-supervised or few-shot approaches tailored to ultra-low-resource settings. Work on evaluation, data augmentation (including synthetic data), and leveraging typological or linguistic knowledge is also welcome. The goal is to advance techniques that extend modern language technologies to the most underrepresented languages, ensuring inclusivity in the digital age. Community-Led Project Showcase To help ground research in community needs, we invite brief (5–10 min) presentations by language community members, NGOs, or practitioners describing real-world challenges or resource needs. Position papers or research posters are appropriate formats for this category. Important Dates Paper Submission Deadline: February 20 (Friday), 2026 Notification of Acceptance: March 22 (Sunday), 2026 Submission of Camera-Ready: March 30 (Monday), 2026 Workshop Date: 11-12 May 2026 All deadlines are anywhere-on-earth (AoE). Call for Papers We welcome original research papers and ongoing work relevant to the topics of the workshop. Each submission can be one of the following categories: - research papers; - position papers for reflective considerations of methodological, best practice, and institutional issues (e.g., ethics, data ownership, speakers’ community involvement, de-colonizing approaches); - posters, for work-in-progress projects in the early stage of development or description of new resources; - demo papers and early-career/student papers (to be submitted as extended abstracts and presented as posters). The research and position papers should range from four (4) to eight (8) pages, while demo papers are limited to four (4) pages. References don't count towards page limits. Accepted papers will appear in the workshop proceedings, which include both oral and poster papers in the same format. Determination of the presentation format (oral vs. poster) is based solely on an assessment of the optimal method of communication (more or less interactive), given the paper content. Submissions must be anonymous and follow LREC formatting guidelines <https://lrec2026.info/authors-kit/>. For inquiries, send an email to claudia.soria(a)cnr.it. Identify, Describe and Share your LRs! When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones). Thanks, Atul

1 2

CFP : [LaTeLL 2026] International Conference ‘LAnguage TEchnologies for Low-resource Languages’
by chafik.salmane＠um6p.ma 18 Feb '26

18 Feb '26

International Conference ‘LAnguage TEchnologies for Low-resource Languages’ (LaTeLL ’2026) Fes, Morocco 30 September, 1 and 2 October 2026 www.latell.org/2026/ Fourth Call for Papers - The conference Natural Language Processing (NLP) has witnessed remarkable progress in recent years, largely driven by the emergence of deep learning architectures and, more recently, large language models (LLMs). Nevertheless, these advances have disproportionately benefited high-resource languages that possess abundant data for model training. By contrast, low-resource languages which account for at least 85% of the world’s linguistic diversity and are often spoken by smaller or marginalised communities, have not yet reaped the full benefits of contemporary NLP technologies. This imbalance can be attributed to several interrelated factors, including the scarcity of high-quality training data, limited computational and financial resources, and insufficient community engagement in data collection and model development. Developing NLP applications for low-resource languages poses major challenges, particularly the need for large, well-annotated datasets, standardised tools, and robust linguistic resources. Although several workshops have previously addressed NLP for low-resource languages, LaTeLL represents the first international conference dedicated specifically to the automatic processing of such languages. The event aims to provide a forum for researchers to present and discuss their latest work in NLP in general, and in the development and evaluation of language models for low-resource languages in particular. - Conference topics We invite submissions on a broad range of themes concerning linguistic and computational studies focusing on low-resource languages, including but not limited to the following topics: Language resources for low-resource languages ● Dataset creation and annotation ● Evaluation methodologies and benchmarks for low-resource settings ● Lexical resources, corpora, and linguistic databases ● Crowdsourcing and community-driven data collection ● Tools and frameworks for low-resource language processing Core language technologies for low-resource languages ● Language modelling and pre-training for low-resource languages ● Speech recognition, text-to-speech, and spoken language understanding ● Phonology, morphology, word segmentation, and tokenisation ● Syntax: tagging, chunking, and parsing ● Semantics: lexical and sentence-level representation NLP Applications for low-resource languages ● Information extraction and named entity recognition ● Question answering systems ● Dialogue and interactive systems ● Summarisation ● Machine translation ● Sentiment analysis, stylistic analysis, and argument mining ● Content moderation ● Information retrieval and text mining Multimodality and Grounding for low-resource languages ● Vision and language for low-resource contexts ● Speech and text multimodal systems ● Low-resource sign language processing Ethics, Equity, and Social Impact for low-resource languages ● Bias and fairness in low-resource language technologies ● Sociolinguistic considerations in technology development ● Cultural appropriateness and sensitivity Human-Centred Approaches in low-resource languages ● Usability and accessibility of low-resource language technologies ● Educational applications and language learning ● Community needs assessment and technology adoption ● User experience research in low-resource contexts Multilinguality and Cross-Lingual Methods for low-resource languages ● Multilingual language models and their adaptation ● Code-switching and code-mixing ● Cross-lingual transfer learning in low-resource languages. - Special Theme Track 1 — Building Applications Based on Large Language Models for Low-Resource Languages LaTeLL’2026 will feature a Special Theme Track dedicated to the development of applications based on Large Language Models (LLMs) for low-resource languages. This track aims to explore innovative methodologies, architectures, and tools that leverage the power of LLMs to enhance linguistic processing, accessibility, and inclusivity for underrepresented languages. Contributions are encouraged on topics such as model adaptation and fine-tuning, multilingual and cross-lingual transfer, ethical and fairness considerations, and the creation of datasets and benchmarks that facilitate the integration of LLM-based solutions in low-resource settings. - Special Theme Track 2 — Modern Standard Arabic (MSA) and Arabic Dialects This special track addresses the unique challenges and opportunities in processing Modern Standard Arabic (MSA) and the rich landscape of Arabic dialects. The diglossic nature of Arabic, where the formal MSA coexists with numerous, widely used spoken dialects, presents a significant hurdle for NLP. While MSA is relatively well-resourced, Arabic dialects are quintessential examples of low-resource languages, often lacking standardised orthographies, annotated corpora, and dedicated processing tools. This track invites submissions on novel research and resources aimed at bridging this gap and advancing the state of the art in Arabic language technology. Topics of interest include, but are not limited to: ● Dialect identification and classification ● Creation of corpora and lexical resources for Arabic dialects ● Machine translation between MSA and dialects, and across different dialects ● Speech recognition and synthesis for dialectal Arabic ● Computational modelling of morphology, syntax, and semantics for dialects ● NLP applications (e.g., sentiment analysis, NER) for dialectal user-generated content ● Code-switching between Arabic dialects, MSA, and other languages - Submissions and Publication LaTeLL’2026 welcomes high-quality submissions in English, which may take one of the following two forms: ● Regular (long) papers: Up to eight (8) pages in length, presenting substantial, original, completed, and unpublished research. ● Short (poster) papers: Up to four (4) pages in length, suitable for concise or focused contributions, ongoing research, negative results, system demonstrations, and similar work. Short papers will be presented during a dedicated poster session. The conference will not consider submissions consisting of abstracts only. All accepted papers (both long and short) will be published as electronic proceedings (with ISBN) and made available on the conference website at the time of the event. The organisers intend to submit the proceedings for inclusion in the ACL Anthology. To prepare your submission, please make sure to use the LaTeLL’2026 style files available here: LaTeX: https://drive.google.com/file/d/1RceWyUqjFLEbv_oNto-x2Quop7qT4-wf/view?usp=… Word: https://docs.google.com/document/d/1m6VeC9jtMpe-Ku2QREgrPlE2-NTDvJvZ/edit?u… Overleaf: https://www.overleaf.com/read/ttzzfcnjrgvw#e82bef Papers should be submitted through Softconf/START using the following link: https://softconf.com/p/latell2026 Authors of papers receiving exceptionally positive reviews will be invited to prepare extended and substantially revised versions for submission to a leading journal in the field of Natural Language Processing (NLP). The conference will also feature a Student Workshop, and awards will be presented to the authors of outstanding papers. Important dates ● Submissions due: 1 May 2026 ● Reviewing process: 20 May – 20 June 2026 ● Notification of acceptance: 25 June 2026 ● Camera-ready due: 10 July 2026 ● Conference camera-ready proceedings ready 10 July 2026 ● Conference: 30 September, 1 October and 2 October 2026 Keynote speaker Nizar Habash (New York University Abu Dhabi) Organisation Conference Chair Ruslan Mitkov (Lancaster University and University of Alicante) Programme Committee Chairs Saad Ezzini (King Fahd University of Petroleum & Minerals) Salima Lamsiyah (University of Luxembourg) Tharindu Ranasinghe (Lancaster University) Organising Committee Maram Alharbi (Lancaster University) Salmane Chafik (Mohammed VI Polytechnic University) Ernesto Estevanell (University of Alicante) Milica Ikonić Nešić (University of Belgrade) Further information and contact details The follow-up calls will provide more details on the conference venue and registration. The conference website is www.latell.org/2026/ and will be updated on a regular basis. For further information, please email 2026(a)latell.org Registration will open in April 2026.

1 0

Call For Participation: MedGenVidQA at BioNLP-ACL 2026 — Test Set Released & Submission Open
by deepak.gupta＠nih.gov 18 Feb '26

18 Feb '26

Dear Colleagues and Friends, We are pleased to inform you that the CodaBench registration and submission portal for the MedGenVidQA shared task is now open. Participants can access the test dataset and submit their system runs for evaluation through the portal. Task A: Multimodal Retrieval (MMR) https://www.codabench.org/competitions/13989/ Task B: Multimodal Answer Generation (MAG) https://www.codabench.org/competitions/14014/ Task C: Visual Answer Localization (VAL) https://www.codabench.org/competitions/14015/ Submission Deadline: March 31, 2026 More details can be found on the shared task webpage: https://medgenvidqa.github.io/ We look forward to your participation. Please join our Google Group (https://groups.google.com/g/medgenvidqa2026) for important updates. If you have any questions, please contact us via the Google Group or email. Best regards, MedGenVidQA 2026 Organizers

1 0

CFP: [IRAI-ECIR 2026] Late Breaking Paper for First Workshop on Information Retrieval for Accountability and Integrity
by Yohei Seki 18 Feb '26

18 Feb '26

Hello, We are excited to invite you to submit late-breaking paper (up to two pages) to IRAI 2026 - the First Workshop on Information Retrieval for Accountability and Integrity, a half-day pilot workshop dedicated to exploring how IR and NLP can help evaluate forward-looking statements, verify commitments, and restore trust across public and private domains. - What IRAI aims to do Information systems shape public discourse, decisions, and trust-yet we lack systematic ways to evaluate the accuracy of forward-looking statements (e.g., campaign promises, corporate forecasts). Media coverage is selective, standards are uneven, and the signal is buried in noise. The result: accountability gaps and eroded confidence. IRAI brings IR and NLP communities together to assess the fulfilment and reliability of claims and commitments. It complements ECIR’s mission by tackling a pressing, real-world challenge with societal impact. - Aligned with the IR and NLP community IRAI 2026 will be part of the European Conference on Information Retrieval (ECIR) held in Delft on April, 2nd 2026 as it highlights concrete applications for social good. - Important Information (for Late-breaking paper) * When: Apr 2, 2026 * Where: Delft, Netherlands * Submission Deadline: Feb 25, 2026 * Notification Due: Mar 3, 2026 * Final Version Due: Mar 10, 2026 - More info and Registration: https://nlpfin.github.io/sites/ECIR2026.html -- IRAI organizers

1 0

Final CFP: 1st Workshop on Creating Interoperable Corpora of Historical Newspapers (PressMint) at LREC 2026
by Petya Osenova 17 Feb '26

17 Feb '26

1st Workshop on Creating Interoperable Corpora of Historical Newspapers (PressMint) Final Call for Papers Date: May 16, 2026, a half-day workshop Location: Palma de Mallorca, Spain Website: https://www.clarin.eu/PressMint-LREC2026 Submission Deadline: 1 March 2025 Submission link: https://softconf.com/lrec2026/PressMint/ Advertisement/Tagline Unlock the pan-European history! Join the PressMint workshop to build & analyze multilingual, interoperable historical newspaper corpora! Workshop description Historical newspapers are of interest to historians and historical linguists, as well as to social and political scientists, ethnologists, anthropologists, media and communication scholars, and researchers in cultural studies. All of these are fields where contemporary digital resources, tools and methods (e.g. “distant reading”) are still underutilised. On the other hand, corpora of historical newspapers already exist for a number of languages and countries to a large extent, as they are out of copyright. Also, the images, and often OCR, are available through the national libraries. Also, in recent years these data started to be of big interest to the researchers since they preserve the historical, cultural, political, societal past. However, these corpora are not interoperable, which precludes methods for their comparison, as well as any translingual and transnational research, an especially important consideration, as statehood and nationhood are highly dynamic in Europe in the period to be covered by the project corpora. An initial joint attempt towards the creation of a corpus of historical newspapers from the beginning of 20. century on, is the CLARIN flagship project PressMint<https://www.clarin.eu/pressmint>. The project features data from 20 partners at the moment, aiming to develop a standard for interoperable resources of newspapers in diachronic timespans. The final goal is to provide structured and high quality multilingual data in a common format, with the same type of linguistic annotation that covers (at least partially) the same time period. Objective The PressMint workshop aims to gather experts interested in creating, processing and analyzing interoperable corpora of historical data in general, but especially with a focus on newspapers. Another very important objective is to consider also the perspective of the communities who use historical data - their purposes, requirements, feedback. We encourage the interested colleagues to present their work on both types of levels – national and pan-European; monolingual and multilingual as well as task-specific and multidisciplinary. We view this workshop as a venue to exchange research ideas and start collaboration on this topic. The workshop will feature one invited speaker: Maud Ehrmann, EPFL, CH We invite unpublished original work focusing on (but not exclusive to) on the following topics: * compilation, annotation, visualisation and utilisation of historical newspaper corpora of the period relevant to PressMint (ideally around the start of the 20th century but not constrained by this period) * harmonisation of the existing multilingual historical newspaper corpora that contain either synchronic or diachronic data, or both * linking or comparing historical newspaper corpora with other datasets, including sources of structured knowledge, such as formal ontologies and LOD datasets * enrichment of historical newspaper corpora (with e.g. sentiment annotation, etc.) * machine translation of historical newspaper corpora * employment of LLMs as stand alone tools or as parts of NLP architectures for historical data processing, maintenance and knowledge deployment. * various scenarios of usage of historical data Submission & Publication We accept submission of long papers (from 6 to 8 pages), short papers (4 pages) and demo papers (4 pages) to be presented as a long or short oral presentation or poster presentations at the workshop. To support double-blind reviewing, all submissions must be fully anonymized and should be formatted according to the stylesheet available on the LREC 2026 website<https://lrec2026.info/authors-kit/>. The papers of the workshop will be published in online proceedings. At the time of submission, authors are also offered the opportunity to share related language resources with the community. All repository entries are linked to the LRE Map [https://lremap.elra.info/], which provides metadata for the resources. Please note that the LREC style guide should be followed. The formatting guidelines can be found here: https://lrec2026.info/authors-kit/. Important Dates * Paper submission deadline: 1 March 2026 * Notification of acceptance: 15 March 2026 * Camera-ready papers: 30 March 2026 * Workshop date: 16 May 2026 Organizing Committee * Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences, PL * Tanja Wissik, Austrian Academy of Sciences, AT * Petya Osenova, Sofia University ”St. Kl. Ohridski” & IICT-BAS, BG The workshop is supported by the CLARIN research infrastructure and the PressMint Project. To contact the organisers, please email maciej.ogrodniczuk(a)gmail.com<mailto:maciej.ogrodniczuk@gmail.com>

1 0

CFP: AmericasNLP 2026 - Sixth Workshop on NLP for Indigenous Languages of the Americas
by Luis Chiruzzo - Inco 17 Feb '26

17 Feb '26

Sixth Workshop on NLP for Indigenous Languages of the AmericasAmericasNLP 2026 will be co-located with ACL 2026 <https://2026.aclweb.org/> in San Diego, California, USA!Call for PapersThe goal of AmericasNLP is to encourage and increase the visibility of work on the Indigenous languages of the Americas. It aims to encourage research on NLP, computational linguistics, corpus linguistics and speech for Indigenous languages, to connect researchers and professionals from underrepresented communities and native speakers of endangered languages with the ACL community, and, more generally, to promote machine learning approaches suitable for low-resource languages. We invite the submission of: - Long papers (8 pages) and short papers (4 pages) on substantial, original, and unpublished research - Non-archival extended abstracts (2 pages), technical reports (8 pages), and work which has been presented at other venues (in the format of the original publication). Submissions do not need to describe work on native languages directly, as long as it is clear why those can benefit from the described approaches. Areas of interest include but are not limited to: - Creation of datasets for NLP applications - Incorporation of external knowledge into neural systems - Linguistic typology and the use of typological features for NLP - Transfer learning, meta-learning, and active learning - Weakly supervised, semi-supervised, and unsupervised learning - Machine translation of low-resource languages - Applications of, and innovation with LLMs for indigenous languages of the Americas - Morphology and phonology of low-resource languages - NLP applications for Indigenous languages of the Americas - Ethical considerations for research on languages spoken by Indigenous communities - Language activism, revitalization, and sovereignty, in the context of NLP models and research Submissions will be accepted until April 15th, 2026 via softconf: submission portal <https://softconf.com/acl2026/americasnlp> *Note:* Limitation section and ethics statement are not mandatory, but strongly encouraged. If they are part of your submission, they do *not* count towards the page limit.Shared TaskTo motivate the NLP community to increase research efforts on Indigenous and endangered languages, AmericasNLP 2026 will feature a new shared task about image captioning of culturally relevant images. The results of the shared task will be presented during the in-person workshop in San Diego. More information can be found here <https://turing.iimas.unam.mx/americasnlp/2026_st.html>.Important Dates - Submission Deadline: April 15th *(After the ACL acceptance notification)* - Notification of Acceptance: May 10th - Camera-Ready Papers Due: May 22nd - Workshop: July 3 or 4 All deadlines are 11:59pm anywhere on Earth (AoE).Organizing Committee - *Manuel Mager*, Johannes Gutenberg University of Mainz, jmagerho(a)uni-mainz.de - *Arturo Oncevay*, Independent, arturo.oncevay(a)gmail.com - *Abteen Ebrahimi*, University of Colorado Boulder, abteen.ebrahimi(a)colorado.edu - *Minh Duc Bui*, Johannes Gutenberg University of Mainz, minhducbui(a)uni-mainz.de - *Shruti Rijhwani*, Google DeepMind, shrutirijhwani(a)google.com - *Luis Chiruzzo*, Universidad de la República, Uruguay, luischir(a)fing.edu.uy - *Robert Pugh*, University of Indiana, pughrob(a)iu.edu - *Rolando Coto-Solano*, Dartmouth College, rolando.a.coto.solano(a)dartmouth.edu - *John E. Ortega*, Northeastern University, j.ortega(a)northeastern.edu - *Katharina von der Wense*, University of Colorado Boulder and Johannes Gutenberg University of Mainz, katharina.kann(a)colorado.edu ContactContact: americas.nlp.workshop(a)gmail.com Website: https://turing.iimas.unam.mx/americasnlp/

1 0

Deadline extension: Holocaust Testimonies as Language Resources (HTRes-2026)
by Martin Wynne 17 Feb '26

17 Feb '26

Please note that the deadline for submissions has now been extended to Friday 27 February 2026. CALL FOR PAPERS The Second Workshop on Holocaust Testimonies as Language Resources (HTRes-2026), pre-conference workshop W53 at LREC2026 Date: 11 May 2026 (afternoon) Location: Palma de Mallorca, Spain Workshop web page: https://www.clarin.eu/HTRes2026 Submission Deadline: 27 February 2026 Submission link: https://softconf.com/lrec2026/HTRes2026/ Holocaust testimonies serve as a bridge between survivors and history’s darkest chapters, providing a connection to the profound experiences of the past. Testimonies stand as the primary source of information that describes the Holocaust, offering first-hand accounts and personal narratives of those who experienced it. The majority of testimonies are captured in an oral format, as survivors vividly explain and share their personal experiences and observations from that time period. Transforming Holocaust testimonies into a machine-processable digital format can be a difficult task owing to the unstructured nature of the text. The creation of accessible, comprehensive, and well-annotated Holocaust testimony collections is of paramount importance to our society. These collections empower researchers and historians to validate the accuracy of socially and historically significant information, enabling them to share critical insights and trends derived from these data. The primary objective of this workshop is to explore how various theories, techniques, and tools from corpus linguistics, natural language processing, and digital humanities can contribute to the examination, analysis, dissemination, and preservation of Holocaust testimonies and other Holocaust-related documents. The workshop is supported by CLARIN and EHRI. Please find full details of the call for papers at the workshop web page at https://www.clarin.eu/HTRes2026. The main conference website is at https://lrec2026.info/ . IMPORTANT DATES Final date for paper submission: extended to 27 February 2026 Notification of Acceptance: 11 March 2026 Camera-ready version submission: 30 March 2026 Workshop date: 11 May 2026 To contact the organisers, please email holocausttlr(a)gmail.com From Martin Wynne on behalf of the organizing committee. -- Senior Researcher in Corpus Linguistics Faculty of Linguistics, Philology and Phonetics, University of Oxford National Co-ordinator, CLARIN-UK martin.wynne(a)ling-phil.ox.ac.uk https://orcid.org/0000-0002-4155-0530 -- Senior Researcher in Corpus Linguistics Faculty of Linguistics, Philology and Phonetics, University of Oxford National Co-ordinator, CLARIN-UK martin.wynne(a)ling-phil.ox.ac.uk https://orcid.org/0000-0002-4155-0530

1 0

Final CfP LANLP: Bridging Ibero and Latin American NLP communities Co-located Networking Symposium @ LREC 2026 [Extended deadline]
by GAMALLO OTERO PABLO 17 Feb '26

17 Feb '26

Final Call for Papers LANLP: Bridging Ibero and Latin American NLP communities 16 May 2026, Palma de Mallorca, Spain http:<http://lanlp>https://sites.google.com/view/lanlp2026/home Co-located Networking Symposium @ LREC 2026 https://lrec2026.info/ Description and Goals We organise a Networking Symposium on Latin American NLP (LANLP), focusing on natural language processing for the diverse languages of the Iberian Peninsula and Latin America. This region includes major world languages (e.g. Spanish (~558M speakers), Portuguese (~267M) as well as regional and indigenous languages. For example, Latin America alone hosts tens of millions of speakers of Quechua (~10M), Guaraní (>6M), Nahuatl (~2M), Aymara (~2M), among many others. Such languages are highly under‐resourced: over 88% of the world’s languages remain largely unsupported by language technologies. This networking event addresses that gap by promoting collaboration on ethically and culturally sensitive resource creation, evaluation, and novel methods for low-resource multilingual NLP in Iberian and Latin American languages and varieties. Our goal is to bring together communities (SEPLN<http://www.sepln.org/>, CLARIAH-ES<https://www.clariah.es/>, PROPOR<https://propor2024.citius.gal/>, AmericasNLP<https://turing.iimas.unam.mx/americasnlp/index.html>, and SomosNLP<https://somosnlp.org/>) to share cutting-edge research, language resources, and best practices. LANLP focuses on community-driven resource development and evaluation for Iberian languages, and diverse Latin American languages (including indigenous and minority languages). We aim to bridge regional communities: for instance, past forums like OpenCor note that “Latin American and Iberian communities... did not have an established event” to share initiatives, corpora and tools. LANLP fills this gap, fostering new contacts between Iberian and Latin American NLP research groups. The goals are to (1) highlight challenges in processing these languages, (2) share novel datasets and models, and (3) catalyze future collaborations and shared tasks. We emphasize both academic rigor and community inclusivity, encouraging contributions from established researchers and grassroots language advocates alike. Topics of Interest We invite submissions on topics including (but not limited to): * Language resource creation: Corpora, lexicons, and annotations for Iberian and Latin American languages (text, speech, multimodal). * LLMs opportunities and challenges: Small Language Models, synthetic data, mitigating biases, linguistic inequalities, data scarcity, language domination. * Multilingual transfer & modeling: Cross-lingual and multilingual representations, transfer learning, and embedding methods that bridge Spanish, Portuguese, varieties and minority languages. * Machine translation & generation: MT, summarization, and language generation for Spanish, Portuguese, and low-resource languages (e.g., Quechua, Aymara, Nahuatl). * Speech and audio processing: ASR, TTS, and spoken language resources for under-resourced languages and regional dialects (e.g. indigenous languages, Brazilian Portuguese, Latin American Spanish). * Dialectal and code-switching NLP: Identification and handling of dialectal variation and code-switching (e.g. Spanish–Portuguese code-mixing, Spanish–indigenous language contact). * Morphology and syntax: Analysis and tagging for morphologically rich or under-documented languages (e.g. Basque, Mapudungun, Bribri) using universal dependencies or other frameworks. * Domain-specific NLP: Social media, sentiment, hate-speech detection, and other tasks in Iberian and Latin American language contexts (e.g. Latin American social media analysis). * Digital humanities & cultural heritage: NLP for historical texts, literature, and cultural content in Spanish, Portuguese, and regional languages. * Community-driven methods: Crowdsourcing, citizen science, and participatory approaches for data collection and annotation in these languages. * Evaluation and benchmarks: Development of evaluation metrics and benchmarks tailored to low-resource Iberian/Latin languages. * Ethical and social issues: Fairness, bias, and indigenous language rights in NLP; collaboration with native speaker communities; data governance and sustainability of resources. Important dates * February 18, 27, 2026: Paper submission deadline *extended* * March 20, 2026 Notification of acceptance * March 30, 2026: Camera-ready deadline * May 16, 2026: Networking Symposium Date Submission Instructions We invite non anonymous submissions in English, Spanish or Portuguese on the topics of interest between 4 and 8 pages of content. The page limit of 8 pages does not include acknowledgements, references, potential Ethics Statements and discussion on Limitations in line with the policy of the main LREC conference. All submissions must follow the LREC stylesheet (https://lrec2026.info/authors-kit/). Any submissions which are over-length, poorly formatted or make excessive use of appendices to circumvent page limits are liable to desk-rejection. At the time of submission, authors are offered the opportunity to share related language resources with the community. All repository entries are linked to the LRE Map (https://lremap.elra.info/), which provides metadata for the resource. Organizing Committee * Luis Chiruzzo Inco (AmericasNLP, luischir(a)fing.edu.uy<mailto:luischir@fing.edu.uy>) * Pablo Gamallo (PROPOR, CiTIUS, pablo.gamallo(a)usc.gal<mailto:pablo.gamallo@usc.gal>) * María Grandury (SomosNLP, EPFL, mariagrandury(a)gmail.com<mailto:mariagrandury@gmail.com>) * Rafael Muñoz Guillena (SEPLN, CENID, UA, rafael(a)dlsi.ua.es<mailto:rafael@dlsi.ua.es>) * German Rigau Claramunt (CLARIAH-ES. HiTZ Center, EHU, german.rigau(a)ehu.eus<mailto:german.rigau@ehu.eus>)

1 0

OTELC Research Competition
by colinfinnerty＠yahoo.co.uk 17 Feb '26

17 Feb '26

Hello, The first annual Oxford Test of English Learner Corpora (OTELC) Research Competition, hosted by Oxford University Press, is now open. This competition offers master’s students in linguistics, corpus linguistics, or language assessment the opportunity to design a research project using authentic English‑language test‑taker responses from the Oxford Test of English Learner Corpora. Selected entrants will receive full access to the OTELC for the duration of the competition. The winning submission will be awarded a 13‑inch iPad Air and the opportunity to have their work published on the Oxford English Assessment Research webpage. Eligibility requirements Applicants must: - Be enrolled in a master’s programme - Be taking a course in linguistics, corpus linguistics, or language assessment - Have at least one semester remaining in their programme How to apply Applicants should submit a research proposal using the official application form. Proposals must clearly outline research aims, research questions, and how the OTELC will be used to address them. Application deadline: Sunday, 31 May 2026 Further information: https://elt.oup.com/feature/global/learner-corpora/ Best wishes, Colin Finnerty Head of Assessment Research Oxford English Assessment Research Oxford University Press

1 0

2026

2025

2024

2023

2022

Corpora February 2026