- Corpora - ELRA lists

CorpusLab
by Michael Barlow 04 Mar '26

04 Mar '26

Colleagues, I have updated the corpuslab.com website with a simple concordance/collocation interface. You can search IMRDC sections of research articles in Arts, Economics, Engineering, Psychology and Social Science. It is not heavy-duty text analysis; you get 100 hits. You can sort by clicking the left or right context and you can highlight hedges, boosters, connectives, and reporting verbs. It is meant to be a gentle introduction to text analysis, an aid to student writers. There are also details of quite advanced disciplinary writing books written by me and Claude. As noted in those, many disciplines don’t follow the IMRDC structure, but use alternative labels or thematic headings. I am assuming, in the concordance search described above, the language features found in the Method section will still be present in articles that use an alternative heading, such as Experiment 1. I should perhaps add some explicit statement about the typical macrostructure found in these disciplines. Let me know if you have any questions, find any problems etc. Michael --- Michael Barlow. www.michaelbarlow.com<http://www.michaelbarlow.com/> Assoc. Prof. Applied Linguistics, University of Auckland Recent publications M. Barlow. 2026. Writing the Social Science Research Article.<https://www.amazon.com/dp/0940753367/ref=sr_1_1_so_ABIS_BOOK?crid=25EQBROGY…> Athelstan. M. Barlow. 2026. Writing the Humanities Research Article.<https://www.amazon.com/s?k=writing+the+humanities+research+article&i=stripb…> Athelstan. M.Barlow. 2026. Writing the Business Research Article<https://www.amazon.com/dp/0940753405/ref=sr_1_2?crid=2XO1YCSNL8ZT&dib=eyJ2I…>.<https://www.amazon.com/dp/0940753405/ref=sr_1_2?crid=2XO1YCSNL8ZT&dib=eyJ2I…> Athelstan M.Barlow. 2026. Writing the Economics Research Article<https://www.amazon.com/dp/0940753383/ref=sr_1_1?crid=2XO1YCSNL8ZT&dib=eyJ2I…>. Athelstan M. Barlow 2023. Ten Lectures on Corpora and Cognitive Linguistics<https://brill.com/display/title/61682>. Brill Le, Pham & Barlow 2023. The Academic Discourse of Mechanical Engineering<https://benjamins.com/catalog/scl.107>. Benjamins

1 0

Call for papers First Workshop on Evaluating LLMs for Specialized Domains (Eval4SD)
by carolin.holtermann＠uni-hamburg.de 04 Mar '26

04 Mar '26

Dear colleagues, We invite submissions to the First Workshop on Evaluating LLMs for Specialized Domains (Eval4SD), to be held co-located with KONVENS 2026 in Hamburg, Germany (September 14th - 17th). The workshop focuses on the evaluation of large language models in specialized domains such as—but not limited to—law, medicine, science, finance, digital humanities, social sciences, education, and politics. In this space, we have identified three core areas detailed below: LLM Benchmarking, Domain Research Replication, and Evaluation Methodology. Work that fits within the general theme but not any of the focus areas is also welcome! - **LLM Benchmarking:** We invite contributions that evaluate multiple models, datasets, inference methods, or prompting techniques on existing data or introduce novel, specialized benchmarking datasets. Papers in this direction may seek to answer questions like: ‘Which model should I use for my social science project?’ ‘Are open-weight models inferior for specialized tasks?’, or ‘Given a limited budget, what is my best choice of LLM for my digital humanities question?’ We especially encourage submissions that evaluate performance in low- and medium-resource languages. - **Domain Research Replication:** Does information automatically extracted using a different model or a slightly altered approach still support the same domain conclusions? We invite submissions that attempt to replicate existing domain research using a tweaked LLM setup. For us, testing open-weight models is especially important in light of replicability. We are excited to see how robust domain research is to adaptations of the automation setups, from prompting to model weights and training data. - **Metrics and Evaluation Methodology:** We invite submissions on methodology for assessing LLM outputs in complex tasks. This includes work on LLM judge setups or novel rule-based metrics for specialized tasks. We allow submissions in two categories: - **Long Papers (up to 8 pages + references):** Complete research contributions with novel findings, experimental results, and thorough analysis. Suitable for mature work on LLM evaluation methodology or new benchmark proposals. - **Short & Position Papers (up to 4 pages + references):** Preliminary results, position papers, system descriptions, and focused contributions. Great for provocative arguments or narrowly scoped empirical studies. Submissions follow the ACL template; reviews are double-blind and are conducted via OpenReview (https://openreview.net/group?id=GSCL.org/KONVENS/2026/Workshop/Eval4SD). Additionally, we welcome non-archival submissions to present recently published work or seek feedback on work-in-progress without violating dual-submission policies. Accepted papers will be presented at the workshop, but will not be included in the official proceedings. Important dates: - Submission deadline: July 03, 2026 (23:59 CEST) - Notification of acceptance: July 31, 2026 - Camera-ready deadline: August 15, 2026 - Workshop date: co-located with KONVENS (14th - 17th), exact day TBA Website: https://eval4sd.github.io/ Contact: eval4sd-organizers(a)googlegroups.com

1 0

[CFP] First International Workshop on Extraction from Triplet Text-Table-Knowledge Graph and associated Challenge @ ESWC 2026 - deadline extended
by Raphaël Troncy 04 Mar '26

04 Mar '26

[Apologies for cross-postings] Call for Papers First International Workshop on Extraction from Triplet Text-Table-Knowledge Graph and associated Challenge https://ecladatta.github.io/triplet2026/ in conjunction with the 23rd European Semantic Web Conference (ESWC 2026) https://2026.eswc-conferences.org/, Dubrovnik, Croatia Important dates: - **Submission deadline (extended)**: 13 March, 2026 (11:59pm, AoE) - **Notifications**: 31 March, 2026 - **Challenge registration deadline**: 15 March, 2026 - **Challenge results submission**: 10 April, 2026 - **Camera-ready deadline**: 15 April, 2026 (11:59pm, AoE) - **Workshop**: Sunday 10 May OR Monday 11 May 2026 Motivation: Understanding information spread across text and table is essential for tasks such as question answering and fact checking. Existing benchmarks primarily deal with semantic table interpretation or reasoning over tables for question answering, leaving a gap in evaluating models that integrate tabular and textual information, perform joint information extraction across modalities, or can automatically detect inconsistencies between modalities. This workshop aims to provide a forum for exchanging ideas between the NLP community working on open information extraction and the vibrant Semantic Web community working on the core challenge of matching tabular data to Knowledge Graphs, on populating knowledge graphs using texts and on reasoning across text, tabular data and knowledge graphs. The workshop also targets researchers focusing on the intersection of learning over structured data and information retrieval, for example, in retrieval augmented generation (RAG) and question answering (QA) systems. Hence, the goal of the workshop is to connect researchers and trigger collaboration opportunities by bringing together views from the Semantic Web, NLP, database, and IR disciplines. Scope: The topics of interest include but are not limited to: - Semantic Table Interpretation - Automated Tabular Data Understanding - Using Large Language Models (LLMs) for Information Extraction - Generative Models and LLMs for Structured Data - Knowledge Graph Construction and Completion with Tabular Data and Texts - Analysis of Tabular Data on the Web (Web Tables) - Benchmarking and Evaluation Frameworks for Joint Text-Table Data Analysis - Applications (e.g. data search, fact-checking, Question-Answering, KG alignment) Submission Guidelines: We invite two types of submissions: 1. Full research papers (12-15 pages) including references and appendices 2. Challenge papers (6-8 pages) including references and appendices All submissions should be formatted in the CEUR layout format, https://www.overleaf.com/latex/templates/template-for-submissions-to-ceur-w… This workshop is double-blind and non-archival. Submissions are managed through EasyChair at https://easychair.org/conferences/?conf=triplet2026. All accepted papers will be presented as posters or as oral talks. **TRIPLET Challenge:** In recent years, the research community has shown increasing interest in the joint understanding of text and tabular data, often, for performing tasks such as question answering or fact checking where evidences can be found in texts and tables. Hence, various benchmarks have been developed for jointly querying tabular data and textual documents in domains such as finance, scientific publications, and open domain. While benchmarks for triple extraction from text for Knowledge Graph construction and semantic annotation of tabular data exist in the community, there remains a gap in benchmarks and tasks that specifically address the joint extraction of triples from text and tables by leveraging complementary clues across these different modalities. The TRIPLET 2026 challenge is proposing three sub-tasks and benchmarks for understanding the complementarity between tables, texts, and knowledge graphs, and in particular to propose a joint knowledge extraction and reconciliation process. #Sub-Task 1: Assessing the Relatedness Between Tables and Textual Passages The goal of this task is to assess the relatedness between tables and textual passages (within documents and across documents). For this purpose, we have constructed LATTE (Linking Across Table and Text for Relatedness Evaluation), a human annotated dataset comprising table–text pairs with relatedness labels. LATTE consists of 7,674 unique tables and 41,880 unique textual paragraphs originating from 3,826 distinct Wikipedia pages. Each text paragraph is drawn from the same or contextually linked pages as the corresponding table, rather than being artificially generated. LATTE provides a challenging benchmark for cross-modal reasoning by requiring classification of related and unrelated table–text pairs. Unlike prior resources centered on table-to-text generation or text retrieval, LATTE emphasizes fine-grained semantic relatedness between structured and unstructured data. The Figure below provides an example, using a web-annotation tool we developed, of how we identify the relatedness between the sentence containing the entity AirPort Extreme 802.11n (highlighted in Orange) and the data table providing information about output power and frequency for this entity. Participants are provided with tables and textual passages that would need to be ranked. The evaluation will use metrics such as P@k, R@k and F1@k. Go to https://www.codabench.org/competitions/12776/ and enroll to participate in this Task. #Sub-Task 2: Joint Relation Extraction Between Texts and Tables The goal of this task is to automatically extract knowledge jointly from tables and related texts. For this purpose, we created ReTaT, a dataset that can be used to train and evaluate systems for extracting such relations. This dataset is composed of (table, surrounding text) pairs extracted from Wikipedia pages and has been manually annotated with relation triples. ReTaT is organized in three subsets with distinct characteristics: domain (business, telecommunication and female celebrities), size (from 50 to 255 pairs), language (English vs French), type of relations (data vs object properties), close vs open list of relation, size of the surrounding text (paragraph vs full page). We then assessed its quality and suitability for the joint table-text relation extraction task using Large Language Models (LLMs). Given a Wikipedia page containing texts and tables and a list of predicates defined in Wikidata, a participant system should extract triples composed of mentions located partly in the text and partly in the table and disambiguated with entities and predicates identified in the Wikidata reference knowledge graph. For example, in the Figure below, an annotation triple <Q13567390, P2109, 24.57> is associated with mentions highlighted in orange (subject), blue (predicate) and green (object) to annotate the document available at https://en.wikipedia.org/wiki/AirPort_Extreme. Similar to the Text2KGBench evaluation (https://link.springer.com/chapter/10.1007/978-3-031-47243-5_14), and because the set of triples are not exhaustive for a given sentence, to avoid false negatives, we follow a locally closed approach by only considering the relations that are part of the ground truth. The evaluation then uses metrics such as P, R and F1. Go to https://www.codabench.org/competitions/12936/ and enroll to participate in this Task. # Sub-Task 3: Detecting Inconsistencies Between Texts, Tables and Knowledge Graphs The goal of this task is to check the consistency of knowledge extracted from tables and texts with existing triples in the Wikidata knowledge graph. Different kind of inconsistencies will be considered in this task. Participants to this task will be able to report on their findings in their system paper. See the Figure at https://ecladatta.github.io/images/triplet_annotation_tool.png # Data & Evaluation: For the first 2 sub-tasks, we have released a training dataset with ground-truth annotations, enabling participant teams to develop machine learning-based systems, and in particular for training purposes and for hyperparameter optimizations and internal validations. A separate blind test dataset will remain private and be used for ranking the submissions. Participants should register on Codabench and then enroll for each sub-task separately (Task 1: https://www.codabench.org/competitions/12776/ and Task 2: https://www.codabench.org/competitions/12936/). Each team are allowed a limited number of daily submissions, and the highest achieved accuracy will be reported as the team's final result. We encourage participants to develop open-source solutions, to utilise and fine-tune pre-trained language models and to experiment with LLMs of various size in zero-shot or few-shot settings. # Challenge Important Dates: - Release of training set: 13 February 2026 - Deadline for registering to the challenge: 15 March 2026 - Release of test set: 24 March 2026 - Submission of results: 10 April 2026 - System Results & Notification of Acceptance: 17 April 2026 - Submission of System Papers: 28 April 2026 - Presentations @ TRIPLET Workshop: May 2026 Workshop Organizers - Raphael Troncy (EURECOM, France) - Yoan Chabot (Orange, France) - Véronique Moriceau (IRIT, France) - Nathalie Aussenac-Gilles (IRIT, France) - Mouna Kamel(IRIT, France) Contact: For discussions, please use our Google Group, https://groups.google.com/g/triplet-challenge The workshop is supported by the ECLADATTA project funded by the French National Funding Agency ANR under the grant ANR-22-CE23-0020. -- Raphaël Troncy EURECOM, Campus SophiaTech Data Science Department 450 route des Chappes, 06410 Biot, France. e-mail: raphael.troncy(a)eurecom.fr & raphael.troncy(a)gmail.com Tel: +33 (0)4 - 9300 8242 Fax: +33 (0)4 - 9000 8200 Web: http://www.eurecom.fr/~troncy/

1 0

Call for papers - Workshop on New Trends in Automatized Language Assessment (TALA)
by Romane Werner 04 Mar '26

04 Mar '26

Call for papers - Workshop on New Trends in Automatized Language Assessment (TALA) The Workshop on New Trends in Automatized Language Assessment (TALA) will take place on 7 April 2026 in Louvain-la-Neuve, Belgium, and online (hybrid event). This meeting aims to provide an overview of various recent approaches of automatized language assessment and to offer researchers, academics and (PhD) students an excellent opportunity to share and discuss recent trends and cutting-edge methods on language assessment-related research. In particular, the workshop will focus on proficiency assessment by mainly targeting automatic readability assessment (ARA) and automated essay scoring (AES). Automatic readability assessment constitutes an interdisciplinary field of research concerned with the linguistic, cognitive, and typographic factors that influence the ease with which a text can be read and understood by different audiences. It is gaining increasing importance across a wide range of domains, including education, institutional communication, digital accessibility, and automated assessment of language proficiency. It has been an active field within natural language processing since the beginning of the 21st century. Automated essay scoring aims to analyze written productions in order to generate an evaluation of writers’ competence in a specific field. For language-oriented AES, it is the written linguistic skills that are targeted. This task is particularly critical in language assessment contexts, but it can also support learning processes and the generation of formative feedback. The workshop will include an invited speaker talk and some presentations based on abstract selection. Invited speaker: Rodrigo Wilkens (University of Exeter) is a specialist in computational readability modeling and automated essay scoring. His research focuses on multilingual proficiency assessment, linguistic feature modeling, and the use of large language models for educational applications. He has contributed to the development and evaluation of ARA and AES systems, with particular emphasis on non-English languages. His recent work explores the representational capacity of transformer models for proficiency prediction, interpretability in automated assessment, and readability-guided text generation. We welcome abstracts addressing literature review, research results, ongoing research or negative results on the topics related to the main theme, with particular interest in the following subfields: · AI and LLM-based approaches to automated language assessment, especially AES and ARA · Computational and linguistic modeling of readability and writing proficiency · Evaluation methodologies, validation frameworks, and interpretability in automated assessment · Multilingual and non-English language assessment · Corpus creation, annotation schemes, and new benchmark tasks · Fairness, bias, and ethical considerations in automated assessment · Theoretical perspectives linking linguistic features and proficiency modeling · Critical reviews and meta-analyses in ARA/AES research Submission format: Abstracts may be submitted in French or English and should be between 300 and 500 words. Authors are encouraged to include a short list of relevant references, which will not count toward the word limit. Abstracts should be anonymized for review. Author names and affiliations must be provided separately in the submission form. Submissions must be made via the online form available at: https://forms.gle/w1L6JNx8YAEgtswB7. Please note that no proceedings will be later organized, as this workshop aims to foster scientific exchanges above all. Important dates: Abstract deadline: 20 March Acceptance notification: 1st of April Workshop: 7 April Organizing Committee: Prof. Thomas François, Prof. Rodrigo Wilkens, Dr. Eleonora Guzzi, Lingyun Gao, Amandine Pay, Elodie Vanzeveren, Romane Werner.

1 0

GUM Corpus V12 - new documents and annotations
by Amir.Zeldes＠georgetown.edu 04 Mar '26

04 Mar '26

(Apologies for cross-postings) � *** The GUM Corpus - Release 12.0.0 *** *** Georgetown University Multilayer corpus *** � The Corpling Lab <https://gucorpling.org/corpling/> at Georgetown University is happy to announce the first release of series 12 of the Georgetown University Multilayer corpus (GUM V12.0.0): � https://gucorpling.org/gum/ � New in this version: � * New documents – the corpus now contains 291,056 tokens * Completely reworked GUMBridge annotation scheme for bridging anaphora (work led by Lauren Levine): * Manual re-annotation effort of the entire corpus * Much more densely and consistently annotated using new guidelines * 11 subtypes of bridging anaphora * Multiple concurrent bridging subtypes are now possible � GUM is an open source corpus of richly annotated English texts from 24 genres: � * Main genres: (available in train/dev/test) * academic writing * biographies * courtroom transcripts * essays * fiction * how-to guides * interviews * letters * news * online forum discussions * podcasts * political speeches * spontaneous face to face conversations * textbooks * travel guides * vlogs � * Out-of-domain test genres: (test2, aka GENTLE partition): * dictionary entries * live esports commentary * legal documents * medical notes * poetry * mathematical proofs * course syllabuses * threat letters � The corpus is created by students as part of the Computational Linguistics curriculum at Georgetown University and is available under Creative Commons licenses. � This is the first version of GUM series 12, containing 301 documents annotated for: � * Multiple POS tags (100% manual gold PTB, extended PTB, converted CLAWS5 and UPOS) and UD morphological features * Manually corrected lemmatization and morphological segmentation * Sentence segmentation and rough speech act (manual) * Document structure using TEI tags (paragraphs, headings, figures, captions etc., all manual) * Constituent and dependency syntax (manually corrected Universal Dependencies, and PTB parses from gold tags with function labels and enhanced dependencies) * Construction Grammar annotations following UCxn * Information status (given-active/inactive, accessible-inferable/common ground/aggregate, and new) * Entity type, graded salience (0-5) and coreference annotation (including non-named entities, singletons, appositions, cataphora and discourse deixis), as well as Centering Theory annotations * Bridging anaphora classified into 11 subtypes (multiple concurrent types are possible) * Entity linking (Wikification) of all named entities with Wikipedia articles and Wikidata, including their non-named and pronominal mentions * Discourse parses in enhanced Rhetorical Structure Theory (eRST) and discourse dependencies, including multiple concurrent and non-projective relations * Discourse signal annotations classified into 9 major and 46 minor types indicating how the presence of a relation is marked (based on the Signaling Corpus scheme) * Shallow discourse relations following the PDTB v3 scheme * Five abstractive summaries for each document following strict, comparable guidelines across genres � Note on Reddit data: token text is not contained in the release but can be downloaded with an included script. � For more information and to search or download the corpus online, see the corpus website <https://gucorpling.org/gum/> . � Best wishes, The GUM team � PS – if you like GUM, also check out our automatically annotated AMALGUM <https://github.com/gucorpling/amalgum/> corpus! �

1 0

Call for papers 13th Workshop on Argument Mining and Reasoning
by Musi, Elena 04 Mar '26

04 Mar '26

Call for Papers: ArgMining 2026 – Workshop on Argument Mining The Workshop on Argument Mining (ArgMining) provides a regular forum for presenting and discussing cutting-edge research in argument mining (a.k.a. argumentation mining) for academic and industry researchers. Continuing a series of twelve successful previous workshops, the 2026 edition welcomes submissions of long papers, short papers, extended abstracts, and PhD proposals. Workshop Theme The 2026 edition of ArgMining places a special focus on understanding and evaluating arguments in both human and machine reasoning. With this theme, we broaden the workshop’s scope to include reasoning—a long-standing area of AI research that has recently gained renewed interest within the ACL community, driven by the latest generation of large language models (LLMs). Reasoning is tightly connected to argumentation, as it represents, analyzes, and evaluates the process of reaching conclusions based on available information. Viewing argumentation as a paradigm for capturing reasoning enables the evaluation of machines (particularly LLMs) based on their ability to address argument mining tasks. Topics of Interest Topics include, but are not limited to: * Automatic extraction of textual patterns describing argument components in human and machine argumentation * Cross-lingual, cross-cultural, and multi-perspective argument mining and reasoning * Argument mining and generation from multimodal and/or multilingual data * Explainability in argument mining through reasoning * Modeling, assessing, and critically reflecting on the argumentation capabilities of LLMs * Novel benchmarks in argument mining addressing recent developments in LLM reasoning * Guidelines for assessing and documenting reasoning processes reflected in benchmarks * Annotation guidelines, linguistic analysis, and argumentation corpora * Real-world applications (e.g., social sciences, education, law, scientific writing; misinformation detection) * Integration of commonsense and domain knowledge into argumentation models * Combining information retrieval with argument mining (e.g., argumentative search engines) * Ethical aspects and societal impact of argument mining and LLM reasoning Submissions from all application areas are welcome. Submission Types The workshop accepts the following submission types: * Long Papers (archival) * Short Papers (archival) * Extended Abstracts (non-archival) * PhD Proposals (non-archival) Accepted contributions will be presented as oral or poster presentations. Archival Submissions * Long papers: * Substantial, original, completed, and unpublished work * Up to 8 pages (excluding references) * Unlimited references * Up to 2 appendix pages * 1 additional page in the final version for reviewer comments * Short papers: * Original, unpublished work with a focused contribution * Not shortened versions of long papers * Up to 4 pages (excluding references) * Unlimited references * Up to 1 appendix page * 1 additional page in the final version for reviewer comments Non-Archival Submissions * Extended abstracts: * Up to 2 pages including references * 1 additional appendix page for tables/figures * Selection based on workshop fit and the special theme * Priority given to abstracts with doctoral students as first authors unable to present at *CL conferences due to visa restrictions * PhD proposals: * Up to 4 pages * Description of PhD project, research challenges, contributions, and future directions * Presented in a dedicated poster session for feedback and discussion Multiple Submissions Policy ArgMining 2026 will not consider papers simultaneously under review elsewhere. Submissions overlapping significantly (>25%) with active ARR submissions will not be accepted. ARR-reviewed papers are allowed if reviews and meta-reviews are available by the ARR commitment deadline. Submission Format * Two-column ACL 2026 format * LaTeX or Microsoft Word templates * PDF submissions only * Submissions via OpenReview Important Dates * Direct paper submission deadline (archival): March 5, 2026 * ARR commitment deadline (archival): March 24, 2026 * Direct paper submission deadline (non-archival): April 7, 2026 * Notification of acceptance: April 28, 2026 * Camera-ready deadline: May 12, 2026 * Workshop dates: July 2–3, 2026 Review Policy Long and short papers will follow ACL double-blind review policies. Submissions must be anonymized, including self-references and links. Papers violating anonymity requirements will be rejected without review. Demo descriptions are exempt from anonymization. Best Paper Award ArgMining 2026 will present a Best Paper Award to recognize significant contributions to argument mining research. All accepted papers are eligible. Contact and Information Website: https://argminingorg.github.io/2026/ Email: argmining.org [at] gmail.com Workshop Organizers Mohamed Elaraby (University of Pittsburgh) Annette Hautli-Janisz (University of Passau) John Lawrence (University of Dundee) Elena Musi (University of Liverpool) Julia Romberg (GESIS Leibniz Institute for the Social Sciences) Federico Ruggeri (University of Bologna)

1 2

CFP: The third edition of the International Conference ‘New Trends in Translation and Interpreting Technology’ (NeTTIT’2026)
by Amal Haddad 03 Mar '26

03 Mar '26

International Conference 'New Trends in Translation and Interpreting Technology' (NeTTIT'2026) Dubrovnik, Croatia, 24-27 June 2026 Fifth Call for Papers # The conference The third edition of the International Conference 'New Trends in Translation and Interpreting Technology' (NeTTIT'2026) will take place in Dubrovnik, Croatia from 24 to 27 June 2026. The objective of the conference is (i) to bridge the gap between academia and industry in the field of translation and interpreting by bringing together academics in linguistics, translation and interpreting studies, machine translation and natural language processing, developers, practitioners, language service providers and vendors who work on or are interested in different aspects of technology for translation and interpreting, and (ii) to be a distinctive event for discussing the latest developments and practices. NeTTIT'2026 invites all professionals who would like to learn about the new trends, present the latest work or/and share their experience in the field, and who would like to establish business and research contacts, collaborations and new ventures. The conference will include plenary presentations (research and user presentations, keynote speeches), poster sessions and panel discussions. All submitted papers will be peer-reviewed by experts, and the accepted papers will be published as open-access conference e- proceedings which will be available at the time of the conference. # Conference topics Contributions are invited on any topic related to latest technology and practices in translation, subtitling, localisation, interpreting, machine translation and Large Language Models used in translation and interpreting. NeTTIT'2026 will feature a Special Theme Track "Future of Translation and Interpreting Technologies in the Era of LLMs and Generative AI". The conference topics include but are not limited to (see also the special conference theme below): ## CAT tools - Translation Memory (TM) systems - NLP and MT for translation memory systems - Terminology extraction tools - Localisation tools ## Machine Translation - Latest developments in Neural Machine Translation - MT for under-resourced languages - MT with low computing resources - Multimodal MT - Integration of MT in TM systems - Resources for MT ## Technologies for MT deployment - MT evaluation techniques, metrics and evaluation results - Human evaluations of MT output - Evaluating MT in a real-world setting - Quality estimation for MT - Domain adaptation ## Translation Studies - Corpus-based studies applied to translation - Corpora and resources for translation - Translationese - Cognitive effort and eye-tracking experiments in translation ## Interpreting studies - Corpus-based studies applied to interpreting - Corpora and resources for interpreting - Interpretese - Resources for interpreting and interpreting technology applications - Cognitive effort and eye-tracking experiments in interpreting ## Interpreting technology - Machine interpreting - Computer-aided interpreting - NLP for dialogue interpreting - Development of NLP based applications for communication in public service settings (healthcare, education, law, emergency services) ## Emerging Areas in Translation and Interpreting - MT and translation tools for literary texts and creative texts - MT for social media and real-time conversations - Sign language recognition and translation ## Subtitling - NLP and MT for subtitling - Latest technology for subtitling ## User needs - Analysis of translators' and interpreters' needs in terms of translation and interpreting technology - User requirements for interpreting and translation tools - Incorporating human knowledge into translation and interpreting technology - What existing translators' (including subtitlers') and interpreters' tools do not offer - User requirements for electronic resources for translators and interpreters - Translation and interpreting workflows in larger organisations and the tools for translation and interpreting employed ## The business of translation and interpreting - Translation workflow and management - Technology adoption by translators and industry - Setting up translation /interpreting / language provider company ## Teaching translation and interpreting - Teaching Machine Translation - Teaching translation technology - Teaching interpreting technology - Latest AI developments in the syllabi of translation and interpreting curricula ## Ethical issues in translation and technology - Bias and fairness in MT - Privacy and security in cloud MT systems - Transparency and explainability of MT systems - Environmental impact on MT systems # Special Theme Track - Future of Translation and Interpreting Technologies in the Era of LLMs and Generative AI We are excited to share that NeTTIT'2026 will have a special theme with the goal of stimulating discussion around Large Language Models, Generative AI and the Future of Translation and Interpreting Technologies. While the new generation of Large Language Models such as CHATGPT, Gemini, Claude, DeepSeek and LLAMA showcase remarkable advancements in language generation and understanding, we find ourselves in uncharted territory when it comes to their performance on various Translation and Interpreting Technology tasks with regards to fairness, interpretability, ethics and transparency. The theme track invites studies on how LLMs perform on Translation and Interpreting Technology tasks and applications, and what this means for the future of the field. The possible topics of discussion include (but are not limited to) the following: - Changes in (and the impact on) the translators and interpreters' professions in the new AI era especially as a result of the latest developments in LLMs and Generative AI - Generative AI and translation - Generative AI and interpreting - Augmenting machine translation systems with generative AI - Domain and terminology adaptation with Large Language Models - Literary translation with Large Language Models - Translation for low-resourced and minority languages with LLMs - Improving Machine Translation Quality with Contextual Prompts in Large Language Models - Prompt engineering for translation - Generative AI for professional translation - Generative AI for professional interpreting # Invited speakers Yves Champollion, Wordfast LLC Marko Grobelnik, Josef Stefan Institute # Submissions and publication NeTTIT'2026 invites the following types of submissions in English: ## Academic papers - Regular long papers: These can be up to eight (8) pages long, presenting substantial, original, completed, and unpublished work. - Short papers: These can be up to four (4) pages long and are suitable for describing small, focused contributions, work-in-progress, negative results, system demonstrations, etc. ## User papers - for industry and practitioners. References to related work are optional. Allowed paper length: between 2 and 4 pages. Papers should be submitted through Softconf/START using the following link: https://softconf.com/p/nettit2026/user/ For submitting the papers, we invite the authors to comply with the ACL format using the templates available on the conference website. The conference will not consider and evaluate abstracts only. Further details on the submission procedure are available on the conference website: https://nettt-conference.com/2026/submissions-and-publication/ The accepted papers will be published in the conference e-proceedings with assigned ISBN and DOI and made available online on the conference website at the time of the conference. The conference organisers will seek the inclusion of the conference proceedings in the ACL anthology. # Important dates - Submissions due: 23 March 2026 - Reviewing process: 25 March-25 April 2026 - Notification of acceptance: 28 April 2026 - Camera-ready due: 25 May 2026 - Conference camera-ready proceedings ready 15 June 2026 - Conference: 24-27 June 2026 # Pre-conference Tutorials The pre-conference tutorials will include: Post-editing and AI-augmented translation - Marie Escribe (LanguageWire and Polytechnic University of Valencia) Machine Translation Quality Evaluation - Tharindu Ranasinghe (Lancaster University) Automatic Speech Recognition as a supporting tool for interpreters - Constantin Orasan (University of Surrey) # Conference Chairs - Gloria Corpas Pastor (University of Malaga) - Ruslan Mitkov (Lancaster University and University of Alicante) - Marko Tadic (University of Zagreb) # Programme Committee Chairs - Constantin Orasan (University of Surrey) - Tharindu Ranasinghe (Lancaster University) # Publication Chairs - Marie Escribe (LanguageWire and Polytechnic University of Valencia) - Alicia Picazo Izquierdo (University of Alicante) # Organising Committee and Programme Committee coordination * Marie Escribe (LanguageWire and Polytechnic University of Valencia) * Alicia Picazo Izquierdo (University of Alicante) * Xiaojing Zhao (Hong Kong Polytechnic University) * # Publicity and Sponsorship Chair - Vilelmini Sosoni (Ionian University) # Programme committee For a list of the programme committee members visit: https://nettt-conference.com/2026/programme-committee/ # Venue The conference will take place at the Centre for Advanced Academic Studies (CAAS) of the University of Zagreb (http://www.caas.unizg.hr/) in Dubrovnik. # Sponsor Juremy.com # Sponsorship opportunities Companies working in the fields of translation technology, interpreting technology and/or related fields, are welcome to familiarise themselves with the sponsorship opportunities that the conference offers. Please visit https://nettt-conference.com/2026/sponsors/ for more details. # Further information and contact details The conference website https://nettt-conference.com/ is updated on a regular basis. For further information, please email nettit2026(a)nettt-conference.com. You can also follow us on social media for updates and announcements. LinkedIn - https://www.linkedin.com/company/nettit2026/ Twitter/X - https://x.com/NeTTIT2026 -- Amal Haddad Haddad (She/her) Facultad de Traducción e Interpretación Universidad de Granada |https://www.ugr.es/personal/amal-haddad-haddad Lexicon Research Group |http://lexicon.ugr.es/haddad Co-Convenor, BAAL SIG 'Humans, Machines, Language'|https://r.jyu.fi/humala Event Coordinator, BAAL SIG 'Language, Learning and Teaching' =============== Cláusula de Confidencialidad: "Este mensaje se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es Ud. el destinatario indicado, queda notificado de que la utilización, divulgación o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, se ruega lo comunique inmediatamente por esta misma vía y proceda a su destrucción. This message is intended exclusively for its addressee and may contain information that is CONFIDENTIAL and protected by professional privilege. If you are not the intended recipient you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited by law. If this message has been received in error, please immediately notify us via e-mail and delete it" ===============

1 0

New Book: Understanding Conversational AI: Language, Ethics, and Society
by Thierry Poibeau 03 Mar '26

03 Mar '26

Dear colleagues, I would like to draw your attention to my recent book, Understanding Conversational AI: Philosophy, Ethics, and Social Impact of Large Language Models (Ubiquity Press, 270 pages). The book is fully open access and can be downloaded either as a complete volume or chapter by chapter. A paperback edition is also available on demand. It offers an interdisciplinary examination of large language models, focusing on how they reshape our understanding of language, cognition, and social practices. Engaging with philosophy of language, linguistics, cognitive science, and AI ethics, it analyzes how these systems generate meaning, simulate reasoning, and increasingly participate in activities such as translation, writing, and evaluation. It also addresses broader epistemic and political issues, including bias, misinformation, automation, and the transformation of professional and educational contexts. The book is available here: https://www.ubiquitypress.com/books/m/10.5334/bde Please feel free to read and share it. Best regards, Thierry Poibeau

1 0

37th IEEE International Symposium on Software Reliability Engineering (ISSRE 2026): Second Call for Papers (Industry Track)
by Announce 03 Mar '26

03 Mar '26

*** Second Call for Papers (Industry Track) *** 37th IEEE International Symposium on Software Reliability Engineering (ISSRE 2026) October 20-23, 2026, 5* St. Raphael Resort and Marina Limassol, Cyprus https://cyprusconferences.org/issre2026/ The ISSRE Industry Track gathers industry representatives as well as researchers from, within or in collaboration with industry to discuss software reliability, quality assurance as well as experiences and lessons learned. This year we will bring experiences from self- made tools, usage of AI, generative AI and machine learning in relation to software reliability. Industry track papers are expected to be of interest to software development professionals, as well as to anyone researching or working in the area of software reliability, software quality, and process improvement groups, with concrete relevance to industrial problems and practical applications. All presenters of accepted papers will be required to attend the conference in person. Participating in the conference would give a chance to meet and discuss with a wide selection of researchers and other industry experts in the area. Topics of Interest Topics of interest include development, analysis methods and models throughout the software development lifecycle, from an industrial and practitioner-oriented perspective. Ask yourself this: Is the work grounded in real-world systems, operational experience, or industrial practice, and does it address reliability or dependability concerns? If it is, you have found the right conference track. For a more detailed list check out the detailed topics list for the research track on this site. • Use cases, practical experiences, lessons learned, improvement programs in reliability or dependability. • Foundations of reliability and dependability, including process, technology, methods, metrics and lessons learned. • Design for reliability or dependability, failure and incident case studies, including experiences in security, testing, verification, and related practices in the field. • Reliability in AI-driven and autonomic systems or AI techniques used for Reliability Engineering. • Software reliability in any system domain. • Trustworthiness, security, and Responsible Software Engineering. • Human-centric focus on reliability and dependability. • Adoption of reliability standards, measurements and similar experiences. We look for papers with good evaluation, honest data, new insights and practical experiences that can be used to help others. We also encourage submissions reporting negative results, unexpected outcomes, and lessons learned from real-world practice. Submission Guidelines and Instructions We invite three kinds of submissions to the Industry Track: • Enlightening Talk or Tool Demo: 1-2 page abstract (OR a Power Point presentation OR a video for a tool demo). • Short paper: 4-pages (including references). • Full paper: 6-pages (including references). All the submissions will be reviewed by members of the Industry Track Program Committee. Accepted papers (with an abstract) will be included in the ISSRE Supplemental Proceedings and submitted for publication to IEEE Xplore. Papers are submitted via Easy Chair https://easychair.org/conferences?conf=issre2026 . Submissions must adhere to the IEEE Computer Society Format Guidelines (for more Information, please refer to the relevant part on the conference website: https://cyprusconferences.org/issre2026/industry-track/). Note that: • A paper must include the title, the name and affiliation of each author, an abstract of up to 150 words, and up to 4 keywords. Thus, submissions are not anonymous. • Reviewers will use the abstract during the bidding process for peer-review. Thus, the abstract should state the paper goals clearly, along with the means used to achieve them. • The first page is not a separate page, but is a part of the paper (i.e., it has technical material in it). Thus, this page counts toward the total page budget for the paper. • Symbols and labels used in the graphs should be readable as printed, without requiring on-screen magnification. • Limit the file size to less than 15 MB (for Video’s – provide a live link). Papers that exceed the page limits specified, on topics not in the scope of ISSRE, or that do not follow the formatting guidelines will be rejected without review. Authors of accepted papers will have the chance to present their work at ISSRE 2026. Submission implies the willingness of at least one of the authors to register for the conference and to give the talk, if the paper is accepted. Best Paper Awards The Industry Program Chair will select three candidates among top-ranked papers presenting and motivating novel and disruptive ideas that address problems relevant for industry. Selection will be based on the reviewers’ feedback, novelty and potential impact of the results. The final selection of the best paper will be done by the audience attending the presentation of the candidate papers. Eligible papers must be (1) full papers accepted to the industry track, and (2) co-authored by at least one author whose primary affiliation is in Industry. Important Dates (AoE) • Abstract Submission Deadline: June 28, 2026 & July 3, 2026 • Paper Submission Deadline: July 5, 2026 & July 12, 2026 • Notification to Authors: August 12, 2026 • Camera Ready Papers: August 19, 2026 • Enlightening Talks or Tool Demos (without abstract; not to appear in the proceedings): August 15, 2026 • Author Registration Deadline (Industry Track): August 19, 2026 Organisation General Chairs • Leonardo Mariani, University of Milano - Bicocca, Italy • George A. Papadopoulos, University of Cyprus, Cyprus Program Coordinator • Roberto Natella, GSSI, Italy Research Program Committee Chairs • Domenico Cotroneo, UNC Charlotte, USA • Jie M. Zhang, King's College London, UK Industry Program Chairs • Jinyang Liu, Bytedance, USA • Sigrid Eldh, Ericsson AB, Sweden Workshop Chairs • Georgia Kapitsaki, University of Cyprus, Cyprus • August Shi, The University of Texas at Austin, USA Doctoral Symposium Chairs • Stefan Winter, LMU Munich, Germany • Lili Wei, McGill University, Canada Fast Abstract Chairs • Luigi Lavazza, University of Insubria, Italy • Yintong Huo, SMU, Singapore JIC2 Chair • Helene Waeselynck, LAAS-CNRS, France Publicity Chairs • Allison K. Sulivan, The University of Texas at Arlington, USA • Jose D'Abruzzo Pereira, University of Coimbra, Portugal Publication Chairs • Sherlock Licorish, Otago Business School, New Zealand • Maria Teresa Rossi, GSSI, Italy Artifact Evaluation Chairs • Naghmeh Ivaki, University of Coimbra, Portugal • Fumio Machida, University of Tsukuba, Japan Diversity and Inclusion Chair • Eleni Constantinou, University of Cyprus, Cyprus Financial Chair • Costas Pattichis, University of Cyprus, Cyprus Web Chairs • Michalis Ioannides, Easy Conferences LTD • Elena Masserini, University of Milano - Bicocca, Italy Registration Chair • Easy Conferences LTD

1 0

Call for Participation - HIPE-OCRepair 2026 - ICDAR Competition on LLM-Assisted OCR Post-Correction
by Maud Ehrmann 03 Mar '26

03 Mar '26

(apologies for cross-postings) ==== HIPE-OCRepair 2026 - Historical OCR Post-Correction Shared Task Website: https://hipe-eval.github.io/HIPE-OCRepair-2026/ Task: LLM-Assisted OCR Post-Correction for Multilingual Historical Documents Venue: ICDAR 2026<https://icdar2026.org/> (31 Aug - 4th Sep 2026) ==== Data: https://github.com/hipe-eval/HIPE-OCRepair-2026-data How-to: Participation Guidelines<https://github.com/hipe-eval/HIPE-OCRepair-2026-data/blob/main/README-Parti…> Scorer: https://github.com/hipe-eval/HIPE-OCRepair-scorer/ ==== We invite participation in HIPE-OCRepair 2026, the ICDAR 2026 Competition on LLM-Assisted OCR Post-Correction for Historical Documents. Large-scale digitized historical collections still contain substantial OCR errors. Re-processing millions of pages with improved engines is rarely feasible, making post-correction the most viable strategy for addressing the OCR debt accumulated in digital heritage collections. Recent progress in large language models opens promising new directions, but their effectiveness varies across languages and error types, and they may introduce hallucinations. To what extent can large language models address the OCR debt accumulated in large-scale digitized historical collections? HIPE-OCRepair 2026 addresses this question through HIPE-OCRepair-Bench, a unified multilingual benchmark comprising curated datasets, a standardised evaluation protocol, baseline systems, and an open leaderboard. Task Participants correct noisy OCR transcripts of historical documents without access to the original images. For each text chunk (typically a paragraph or article), the dataset provides: * one OCR hypothesis * document metadata (language, date, publication title) * OCR quality indicators (CER, WER, lexicon-based quality score) Systems must produce improved corrected text. Both generative (LLM-based) and discriminative or hybrid approaches are welcome. Data The benchmark consists of parallel OCR and ground truth data drawn from multiple curated historical collections, covering English, French, and German materials from the 17th to the 20th century, including newspapers and printed works. It consolidates existing resources alongside newly curated materials. Important dates * 10 Dec 2025: Sample data release * 02 Mar 2026: Training and development data release; scorer * 23 Mar 2026: Hugging Face leader board release * 06-08 Apr 2026: Evaluation phase (test release and submission) * 10 Apr 2026: Results publication * 31 Aug-4 Sep 2026: Presentation at ICDAR 2026 HIPE-OCRepair addresses a central challenge for the document analysis, NLP, and digital humanities communities: improving the usability of large historical text collections at scale. It offers a reproducible evaluation framework, openly available data and tools, and a leaderboard for benchmarking beyond the competition itself. We look forward to your participation! Best regards, HIPE-OCRepair 2026 Organizers https://hipe-eval.github.io/HIPE-OCRepair-2026/

1 0

2026

2025

2024

2023

2022

Corpora