- Corpora - ELRA lists

PhD position in NLP & Information Extraction at HHU Düsseldorf
by Stefan Dietze 18 Mar '26

18 Mar '26

Starting from May 2026, the Data & Knowledge Engineering group at the Computer Science department at Heinrich-Heine-University Düsseldorf (https://www.cs.hhu.de/lehrstuehle-und-arbeitsgruppen/data-knowledge-enginee…), affiliated with the Heine Center for Artificial Intelligence and Data Science (HeiCAD) (https://www.heicad.hhu.de/) is looking for a *PhD student– Information Extraction & Natural Language Processing* (Salary group 13 TV-L, working time 100%, initially limited to 36 months with the possibility of further extension) In the context of the research project "WIEGE", we will investigate the spread of claims and political narratives across different social media platforms in an interdisciplinary consortium involving researchers from Communication Science, Computational Social Science and Computer Science. Our research will be concerned with novel Natural Language Processing (NLP) methods for the detection, linking and classification of claims and narratives in online discourse data. Challenges arise from the heterogeneous nature of different data sources, therefore the development of generalizable approaches will be a key focus. The PhD student will work in close collaboration with our project partners from Communication Science, benefitting from their tailored expert annotations and, vice-versa, aiding their annotation efforts by providing semi-automatic labeling approaches. Another task will entail the modeling and publishing of generated data according to Semantic Web principles. Your tasks will be: ******************* * Research in fields such as NLP, Machine Learning, Language Modeling and Representation learning, specifically with the aim to extract structured information from online discourse data * Develop NLP methods for (i) the detection and classification of claims and narratives on social media, and (ii) linking related claims and narratives within and across data sources; further (iii) assist semi-automated data annotations and (iv) model and publish data according to Semantic Web standards * Writing, publishing and presenting project results * Collaboration with team members and project partners in an interdisciplinary consortium Your profile: ************** * University degree (diploma/MSc) in Computer Science, Computational Linguistics or related fields * Research interests in NLP, machine learning, data mining, large language models, Semantic Web * Hands-on experience with Python, including knowledge of ML-Frameworks such as TensorFlow and PyTorch * Ability to communicate fluently in English (mandatory), good knowledge of the German language (desirable) What we offer: *************** * Flexible working hours and home office arrangements * A fast growing and international working environment with a lot of creative scientific freedom * Access to unique research data, (social) web archives and behavioral data * Support of collaborations with international research labs and experts The PhD research will be supervised by Prof. Dr. Stefan Dietze (Professor for Data & Knowledge Engineering at HHU and Scientific Director of KTS at GESIS - Leibniz Institute for the Social Sciences, Cologne), mentored by Dr. Katarina Boland (Postdoctoral Researcher at Data & Knowledge Engineering and HeiCAD). For further information please contact Stefan Dietze (stefan.dietze(a)hhu.de) and/or Katarina Boland (katarina.boland(a)hhu.de). Interested? ************* Please apply by sending your complete application documents as a single PDF file to katarina.boland(a)hhu.de by 01 April 2026. -- Prof. Dr. Stefan Dietze Scientific Director Knowledge Technologies for the Social Sciences GESIS - Leibniz Institute for the Social Sciences Web: https://www.gesis.org/en/kts Chair of Data & Knowledge Engineering Heinrich-Heine-University Düsseldorf Web: https://www.cs.hhu.de/en/research-groups/data-knowledge-engineering Phone: +49 (0)221-47694-421 Web: http://stefandietze.net

1 0

2nd Call for SemEval Task Proposals 2027
by Ekaterina Kochmar 18 Mar '26

18 Mar '26

Introduction We invite proposals for tasks to be run as part of SemEval-2027. SemEval (the International Workshop on Semantic Evaluation) is an ongoing series of evaluations of computational semantics systems, organized under the umbrella of SIGLEX, the Special Interest Group on the Lexicon of the Association for Computational Linguistics. SemEval tasks investigate the nature of meaning in natural languages, exploring how to characterize and compute meaning. This is achieved in practical terms, using shared datasets and standardized evaluation metrics to quantify the strengths and weaknesses and possible solutions. SemEval tasks encompass a broad range of semantic topics from the lexical level to the discourse level, including word sense identification, semantic parsing, coreference resolution, and sentiment analysis, among others. For SemEval-2027, we welcome tasks that can test an automatic system for semantic analysis of text (e.g., intrinsic semantic evaluation, or an application-oriented evaluation). We especially encourage tasks for languages other than English, cross-lingual tasks, and tasks that develop novel applications of computational semantics. See the websites of previous editions of SemEval to get an idea about the range of tasks explored, e.g., SemEval-2020 (http://alt.qcri.org/semeval2020/) and SemEval-2021/2026 (https://semeval.github.io<https://semeval.github.io/>). We strongly encourage proposals based on pilot studies that have already generated initial data, evaluation measures, and baselines. In this way, we can avoid unforeseen challenges down the road that may delay the task. We suggest providing a reasonable baseline (e.g., providing a Transformer / LLM baseline for a classification task) apart from the majority vote / random guess. In case you are not sure whether a task is suitable for SemEval, please feel free to get in touch with the SemEval organizers at <semevalorganizers(a)gmail.com<mailto:semevalorganizers@gmail.com>> to discuss your idea. The submission webpage is: https://softconf.com/acl2026/semevaltasks2027/ Task Selection Task proposals will be reviewed by experts, and reviews will serve as the basis for acceptance decisions. Everything else being equal, more innovative new tasks will be given preference over task reruns. Task proposals will be evaluated on: Novelty: Is the task on a compelling new problem that has not been explored much in the community? Is the task a rerun, but covering substantially new ground (new subtasks, new types of data, new languages, etc. - one addition is not sufficient)? Interest: Is the proposed task likely to attract a sufficient number of participants? Data: Are the plans for collecting data convincing? Will the resulting data be of high quality? Will annotations have meaningfully high inter-annotator agreements? Have all appropriate licenses for use and re-use of the data after the evaluation been secured? Have all international privacy concerns been addressed? Will the data annotation be ready on time? Evaluation: Is the methodology for evaluation sound? Is the necessary infrastructure available, or can it be built in time for the shared task? Will research inspired by this task be able to evaluate in the same manner and on the same data after the initial task? Is the task significantly challenging (e.g., room for improvement over the baselines)? Impact: What is the expected impact of the data in this task on future research beyond the SemEval Workshop? Ethical – The data must be compliant with privacy policies. e.g. avoid personally identifiable information (PII). Tasks aimed at identifying specific people will not be accepted. Avoid medical decision making (compliance with HIPAA, do not try to replace medical professionals, especially if it has anything to do with mental health). These are representative and not exhaustive. Roles: Lead Organizer - main point of contact, expected to ensure deliverables are met on time and participate in contributing to task duties (see below). Co-Organizers - provide significant contributions to ensuring the task runs smoothly. Some examples include maintaining communication with task participants, preparing data, creating and running evaluation scripts, leading paper reviewing, and acceptance. Advisory Organizers - more of a supervisor role, may not contribute to detailed tasks, but will provide guidance and support. New Tasks vs. Task Reruns We welcome both new tasks and task reruns. For a new task, the proposal should address whether the task would be able to attract participants. Preference will be given to novel tasks that have not received much attention yet. For reruns of previous shared tasks (whether or not the previous task was part of SemEval), the proposal should address the need for another iteration of the task. Valid reasons include: a new form of evaluation (e.g., a new evaluation metric, a new application-oriented scenario), new genres or domains (e.g., social media, domain-specific corpora), or a significant expansion in scale. We further discourage carrying over a previous task and just adding new subtasks, as this can lead to the accumulation of too many subtasks. Evaluating on a different dataset with the same task formulation, or evaluating on the same dataset with a different evaluation metric, typically should not be considered a separate subtask. Task Organization We welcome people who have never organized a SemEval task before, as well as those who have. Apart from providing a dataset, task organizers are expected to: - Verify the data annotations have sufficient inter-annotator agreement. - Verify licenses for the data allow its use in the competition and afterwards. In particular, text that is publicly available online is not necessarily in the public domain; unless a license has been provided, the author retains all rights associated with their work, including copying, sharing and publishing. For more information, see: https://creativecommons.org/faq/#what-is-copyright-and-why-does-it-matter - Resolve any potential security, privacy, or ethical concerns about the data. - Commit to make the data available also after the task in a long-term repository under an appropriate license, preferably using Zenodo: https://zenodo.org/communities/semeval/ - Provide task participants with format checkers and standard scorers. - Provide task participants with baseline systems to use as a starting point (in order to lower the obstacles to participation). A baseline system typically contains code that reads the data, creates a baseline response (e.g., random guessing, majority class prediction), and outputs the evaluation results. Whenever possible, baseline systems should be written in widely used programming languages and/or should be implemented as a component for standard NLP pipelines. - Create a mailing list and website for the task and post all relevant information there. - Create a CodaLab or other similar competition for the task and upload the evaluation script. - Manage submissions on CodaLab or a similar competition site. - Write a task description paper to be included in SemEval proceedings, and present it at the workshop. - Manage participants’ submissions of system description papers, manage participants’ peer review of each other’s papers, and possibly shepherd papers that need additional help in improving the writing. - Review other task description papers. Desk Rejects - To ensure tasks have sufficient support, we require a minimum of two organizers at the time of proposal submission. A task proposal with only one organizer will be desk-rejected. Running a SemEval task is a significant time commitment; therefore, we highly recommend that a task have at least three-four organizers. - A person can be a lead organizer on only one task. The second mandatory organizer on the task must be committed to the task as a key co-organizer. Any other organizers (beyond the lead and co-organizer) can participate in other tasks. - All data should have a research-friendly license. The licensing must be provided in the proposal. - Task organizers must commit to keeping the data available after the task, either by keeping the task alive, by uploading it to Zenodo or some other public data storage location that will be permanent, and sharing the link with the organizers. === Important dates === - Task proposals due 13 April 2026 (Anywhere on Earth) - Task selection notification 25 May 2026 === Preliminary timetable === - Sample data ready 15 July 2026 - Training data ready 1 September 2026 - Evaluation data ready 1 December 2026 (internal deadline; not for public release) - Evaluation start 10 January 2027 - Evaluation end by 31 January 2027 (latest date; task organizers may choose an earlier date) - Paper submission due February 2027 - Notification to authors March 2027 - Camera ready due April 2027 - SemEval workshop Summer 2027 (co-located with a major NLP conference) Tasks that fail to keep up with crucial deadlines (such as the dates for having the task and CodaLab website up and dates for uploading sample, training, and evaluation data) may be cancelled at the discretion of SemEval organizers. While consideration will be given to extenuating circumstances, our goal is to provide sufficient time for the participants to develop strong and well-thought-out systems. Cancelled tasks will be encouraged to submit proposals for the subsequent year’s SemEval. To reduce the risk of tasks failing to meet the deadlines, we are unlikely to accept multiple tasks with overlap in the task organizers. Submission Details The task proposal should be a self-contained document of no longer than 3 pages (plus additional pages for references). All submissions must be in PDF format, following the ACL template: https://github.com/acl-org/acl-style-files Each proposal should contain the following: - Overview - Summary of the task - Why this task is needed and which communities would be interested in participating - Expected impact of the task - Data & Resources - How the training/testing data will be produced. Please discuss whether existing corpora will be reused. - Details of copyright and license, so that the data can be used by the research community both during the SemEval evaluation and afterwards - How much data will be produced - How data quality will be ensured and evaluated - An example of what the data would look like - Resources required to produce the data and prepare the task for participants (annotation cost, annotation time, computation time, etc.) - Assessment of any concerns with respect to ethics, privacy, or security (e.g., personally identifiable information of private individuals; potential for systems to cause harm) - Pilot Task (strongly recommended) - Details of the pilot task - What lessons were learned, and how these will impact the task design - Evaluation - The evaluation methodology to be used, including clear evaluation criteria - For Task Reruns - Justification for why a new iteration of the task is needed (see criteria above) - What will differ from the previous iteration - Expected impact of the rerun compared with the previous iteration - Task organizers - Names, affiliations, email addresses - (optional) brief description of relevant experience or expertise - (if applicable) years and task numbers of any SemEval tasks you have run in the past Proposals will be reviewed by an independent group of area experts who may not have familiarity with recent SemEval tasks, and therefore, all proposals should be written in a self-explanatory manner and contain sufficient examples. The submission webpage is: https://softconf.com/acl2026/semevaltasks2027/ === Chairs === Debanjan Ghosh, Analog Devices, USA Kai North, Cambium Assessment, USA Ekaterina Kochmar, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), UAE Mamoru Komachi, Hitotsubashi University, Japan Marcos Zampieri, George Mason University, USA Contact: semevalorganizers(a)gmail.com<mailto:semevalorganizers@gmail.com>

1 0

CfP Digital Humanities and Artificial Intelligence in African Studies workshop, Stellenbosch, South Africa
by Menno Van Zaanen 18 Mar '26

18 Mar '26

═══════════════════════════════════════════════════════════════════════ ═ CALL FOR PAPERS ═══════════════════════════════════════════════════════════════════════ ═ A DFG Programme Point Sud Workshop Digital Humanities and Artificial Intelligence in African Studies: Towards Sustainable and Equitable Practices 21–24 September 2026 · STIAS, Stellenbosch, South Africa ──────────────────────────────────────────────────────────── ABOUT THE WORKSHOP The integration of digital humanities (DH) and artificial intelligence (AI) is transforming the production of knowledge in African Studies, offering new opportunities for innovative analysis, dynamic visualisation and cross-cultural research. Yet this shift raises urgent questions regarding equitable access, the representation of African languages, and the suitability of methodologies. Current large language models underrepresent African languages, digital scholarly infrastructures remain optimised for English, and digitisation pipelines that produce AI-ready data are themselves shaped by political choices about what to digitise, how to describe it, and who controls access. While recent initiatives on digital sovereignty in Africa have centred on policy and regulation, this workshop shifts attention to methodological practice. It asks how DH methods and AI transform research in African Studies, and how we can design, evaluate, and sustain these methods under African conditions. By bringing together scholars, independent researchers and practitioners from Africa, Europe, and beyond, the event will foster North–South and South–South dialogue at the intersection of African epistemologies and digital methods, moving from description to design. ──────────────────────────────────────────────────────────── CONVENORS - Frédérick Madore, University of Bayreuth - Vincent Hiribarren, King's College London - Emmanuel Ngue Um, University of Yaoundé 1 - Menno van Zaanen, South African Centre for Digital Language Resources (SADiLaR) ──────────────────────────────────────────────────────────── THEMATIC AXES The programme is structured around the following thematic axes: 1. Transforming Research Methods through AI and Digital Tools in African Studies This axis asks a fundamental question: how are AI and DH methods changing the study of African cultures, languages, and histories? Participants will present concrete uses of AI to analyse multilingual texts, employ computer vision to study visual culture and historical artefacts, and develop digital mapping to trace cultural movements and connections. We will evaluate what works for different kinds of African cultural materials, identify adaptations required for local contexts, and specify where computational approaches can complement—rather than replace—interpretive scholarship. The goal is clear: practical guidance for integrating these methods while preserving the interpretive richness that defines the humanities. 2. Building Sustainable Research Infrastructures from African Perspectives Moving beyond policy discourse, this axis asks what it takes to build and sustain digital research capacity within African institutions and communities. We will examine practical obstacles—limited connectivity, unstable funding, and scarce training data for local languages—and showcase South–South collaboration models that have navigated these constraints. Participants will share strategies for developing tools that utilise available resources rather than assuming high-end infrastructure. Key questions include how to keep research outputs accessible to the communities being studied, how to train the next generation of African DH scholars, and how to secure sustainable funding that does not depend solely on institutions in the Global North. The focus is on concrete, scalable approaches to durable capacity. 3. Centring African Knowledge Systems in Digital Research Design This axis poses a methodological challenge: how can digital research tools respect and incorporate African ways of knowing? Rather than retrofitting existing techniques to African materials, we explore how African epistemologies can shape the tools themselves. Case studies will show community knowledge informing database structures, oral traditions testing text-centred analytical frameworks, and local classification systems improving standard metadata schemas. We will consider protocols for culturally sensitive materials, interface design that does not privilege European languages, and criteria to ensure that AI systems trained on African data primarily serve African research needs. Here, decolonisation moves from critique to construction. ──────────────────────────────────────────────────────────── WORKSHOP FORMAT & LANGUAGE POLICY The workshop will run in a hybrid format to maximise participation and impact. In-person sessions at STIAS will be paired with remote access via Zoom for those unable to travel. Participants will pre-circulate draft papers in English or French one month in advance, each with a bilingual abstract to support preparation. To address language barriers, the workshop will operate bilingually in English and French. Presenters may speak in either language; where possible, a bilingual chair will moderate discussion and provide brief consecutive interpretation where needed. Recent advances in AI speech recognition and machine translation now enable near-real-time captioning; we will deploy these tools in the room and on Zoom. All presenters will supply slides with bilingual titles and key terms, and a one-page terminology handout in both languages. Together, these measures encourage meaningful participation in Africa’s Anglophone and Francophone communities, which are often divided by institutional and linguistic boundaries, and provide immediate, practical benefits for multilingual colleagues. ──────────────────────────────────────────────────────────── SUBMISSION GUIDELINES We invite proposals for individual papers (20-minute presentations). Submissions may be in English or French. Proposals of up to 500 words should be emailed to the convenors by 30 April 2026. Each submission must include: (i) a title; (ii) an abstract outlining the context, central question, and methodological approach; and (iii) a 100-word biographical note indicating the applicant’s discipline and institutional affiliation. Please send your proposals to the following addresses: - Frédérick Madore: frederick.madore(a)uni-bayreuth.de - Vincent Hiribarren: vincent.hiribarren(a)kcl.ac.uk - Emmanuel Ngue Um: ngueum(a)gmail.com - Menno van Zaanen: menno.vanzaanen(a)nwu.ac.za ──────────────────────────────────────────────────────────── PUBLICATION Our goal is to publish selected papers from the workshop as a special issue in the Journal of the Digital Humanities Association of Southern Africa (JDHASA), subject to agreement with the journal’s editorial board. All submitted full papers will undergo peer review. Authors whose papers are selected for the special issue will be expected to revise their manuscripts in line with reviewer feedback before final publication. ──────────────────────────────────────────────────────────── SELECTION CRITERIA & INCLUSIVITY Selection will prioritise gender equity, support for early-career scholars based in sub-Saharan Africa, and balance across disciplines and regions. In addition to scholars, we will include practitioner-developers by directly engaging the teams behind DH tools. Their participation will help us to assess user needs and the feasibility of embedding African ways of knowing in tool design. DH remains gender-imbalanced; accordingly, the open call will explicitly encourage applications from women and weight gender equity in review. We will intentionally include Africa-based, diasporic, and returning scholars. Recognising uneven DH capacity, particularly in several Francophone regions, we will aim for a majority of Africa-based participants and amplify Francophone voices through targeted outreach and reserved places for early-career researchers. The workshop will uphold equal opportunity regardless of gender, religion, or other sociocultural differences. ──────────────────────────────────────────────────────────── KEY DATES - Submission Deadline: 30 April 2026 - Notification of Acceptance: 15 May 2026 - Deadline for Full Papers: 15 August 2026 - Workshop Dates: 21–24 September 2026 ═══════════════════════════════════════════════════════════════════════ ═ https://fmadore.github.io/stias-dh-ai-workshop-2026 ═══════════════════════════════════════════════════════════════════════ ═ -- Prof Menno van Zaanen menno.vanzaanen(a)nwu.ac.za Professor in Digital Humanities South African Centre for Digital Language Resources https://www.sadilar.org ________________________________ NWU PRIVACY STATEMENT: http://www.nwu.ac.za/it/gov-man/disclaimer.html DISCLAIMER: This e-mail message and attachments thereto are intended solely for the recipient(s) and may contain confidential and privileged information. Any unauthorised review, use, disclosure, or distribution is prohibited. If you have received the e-mail by mistake, please contact the sender or reply e-mail and delete the e-mail and its attachments (where appropriate) from your system. ________________________________

1 0

LREC 2026 - Early-bird deadline extension
by info＠elda.org 17 Mar '26

17 Mar '26

[Apologies for multiple postings] The Early-bird deadline has been extended to March 31, 2026 (23:59 AoE). Late and onsite fees will apply starting April 1, 2026. All participants are encouraged to take advantage of the extended deadline and register. We also encourage on-site participants to book their accommodation in Palma as soon as possible. Two partner hotel options are currently available for LREC participants, plus a selection of hotels suggested by the Local Chairs. 1. Meliá Hotels: https://events.melia.com/en/events/palma-port/LREC-2026 2. Aubamar Palma Resort: https://www.aubamar.com/en/?promocode=LREC2026 <https://www.aubamar.com/en/?promocode=LREC2026&utm_source=lrec2026&utm_medi…> 3. Selection of hotels by the Local Chairs: https://shorturl.at/kkr73 LREC 2026 Management Chairs

1 0

ComputEL-9: 2nd Call-for-Papers
by Antti Arppe 17 Mar '26

17 Mar '26

ComputEL-9: Ninth Workshop on the Use of Computational Methods in the Study of Endangered Languages Second CALL FOR PAPERS Submission deadline: March 20, 2026 Submission link: https://softconf.com/acl2026/ComputEL2026 ComputEL-9 will be co-located with ACL 2026 in San Diego, California. It will be a one-day workshop, held in Friday July 4, 2026. This time, we are co-ordinating our activities with Americas-NLP, held on the previous day. We encourage submissions that explore the interface and intersection of computational linguistics, documentary linguistics, and community-based efforts in language revitalization and reclamation. This includes submissions that: (i) demonstrate new methods or technologies for tasks or applications focused on low-resource settings, and in particular, endangered languages, (ii) examine the use of specific methods in the analysis of data from low-resource languages, or demonstrate new methods for analysis of such data, oriented toward the goals of language reclamation and revitalization, (iii) propose new models for the collection, management, and mobilization of language data in community settings, with attention to e.g. issues of data sovereignty and community protocols, (iv) explore concrete steps for a more fruitful interaction among computer scientists, documentary linguists, and language communities. IMPORTANT DATES 20 March 2026 Deadline for submission of papers or extended abstracts 1 May 2026 Notification of Acceptance 4 July 2026 Workshop PRESENTATIONS Presentation of accepted papers will be in both oral session and a poster session. The decision on whether a presentation for a paper will be oral and/or poster will be made by the Organizing Committee on the advice of the Program Committee, taking into account the subject matter and how the content might be best conveyed. Oral and poster presentations will not be distinguished in the Proceedings. SUBMISSIONS We offer two submissions lengths: short (up to 4 pages) or long (up to 8 pages) paper. The length of submission does not influence the likelihood of acceptance. Both paper types must include a section on ethical consideration and a section on limitations; these sections are not considered part of the page limit. All submissions must be anonymous and will be peer-reviewed by the scientific Program Committee. Papers must follow the style and formatting guidelines provided in by ACL Style Files (download template files for LaTeX: https://github.com/acl-org/acl-style-files). Submissions that exceed the length requirements, or are missing a limitations section, will be desk rejected. Papers can be submitted to one of the workshop’s tracks: (a) language community perspective and (b) academic perspective. Submissions must be uploaded to SoftConf: https://softconf.com/acl2026/ComputEL2026 by March 20, 2026 11:59PM (UTC-12, “anywhere on earth”). A. Short Papers: Short paper submissions must describe original and unpublished work. They are max. 4 pages excluding references. They must include a section on ethical consideration and limitations; these sections are not considered part of the page limit. Please note that a short paper is not a shortened long paper. Instead, short papers should have a small, focused contribution or describe work in progress (“working paper”). Short papers might not necessarily be intended for publication. Some common kinds of short papers are negative results, opinion pieces, interesting application nuggets, or descriptions of ongoing collaborative teamwork. B. Long Paper: Long papers must describe substantial, original, completed and unpublished work. Wherever appropriate, concrete evaluation and analysis should be included. Long papers are max. 8 pages excluding references and appendices. They must include a section on ethical consideration and limitations; these sections are not considered part of the page limit. PROCEEDINGS The Organizing Committee will select papers that have been accepted for presentation for online publication via the open-access ACL Anthology. Not all accepted papers for presentation are guaranteed inclusion in the Anthology. Final versions of long and short papers that are accepted for publication will be allotted one additional page (altogether 5 and 9 pages) excluding references. Papers accepted for inclusion in the Anthology should be revised and improved versions of the work that was submitted for, and which underwent, review. Any revisions should concern responses to reviewer comments or the addition of relevant details and clarifications, but not entirely new, unreviewed content. FUNDING SUPPORT Limited funding will be available for some accepted authors. A link to apply for funding will be sent to submitters after the submission deadline. Decisions on funding will be sent with notification of acceptance. Priority will be given to individuals without institutional support, for instance members of endangered language communities, other unsponsored or under-sponsored presenters (e.g. student/faculty of Linguistics Departments), and student presenters. ADDITIONAL AND CONTACT INFORMATION Please see the ComputEL-9 website for further information: https://computel-workshop.org/computel-9/ Organizing Committee Email: computel.workshop(a)gmail.com -- ====================================================================== Antti Arppe - Ph.D (General Linguistics), M.Sc. (Engineering) Professor of Quantitative Linguistics Director, Alberta Language Technology Lab (ALTLab) Project Director, 21st Century Tools for Indigenous Languages (21C) Department of Linguistics, University of Alberta Algonquian Studies Association - Secretary-Treasurer E-mail: arppe(a)ualberta.ca - antti.arppe(a)iki.fi WWW: www.ualberta.ca/~arppe - altlab.ualberta.ca - 21c.tools Mānahtu ina rēdûti ihza ummânūti ihannaq - dulum ugulak úmun ingul ----------------------------------------------------------------------

1 1

Postdoc Position in NLP at University of Potsdam, from Sept 2026,
by David Schlangen 16 Mar '26

16 Mar '26

I am looking for a postdoctoral researcher to join my group. Some keywords: language learning in interaction; learning to interact; simulating situated language use; building agents; evaluating LLMs / LLM-agents in interaction; pragmatics of human/AI interaction. Application deadline: April 7th 2026, for start in September 2026. For more information about the position and on how to apply, see: https://clp.ling.uni-potsdam.de/positions/ . --- David Schlangen Chair "Foundations of Computational Linguistics" Department of Linguistics, University of Potsdam Karl-Liebknecht-Strasse 24-25 14476 Potsdam, Germany Campus Golm, Building 14, Room 2.18 Tel. +49 331 977 2692 Tel. Secretary +49 331 977 2016 http://clp.ling.uni-potsdam.de

1 0

March 2026 Newsletter - LDC
by Penn LDC 16 Mar '26

16 Mar '26

In this newsletter: LDC data and commercial technology development New publications Ancient Chinese WordNet<https://catalog.ldc.upenn.edu/LDC2026L03> CALLHOME Spanish Second Edition<https://catalog.ldc.upenn.edu/LDC2026S04> CALLHOME Spanish Lexicon Second Edition<https://catalog.ldc.upenn.edu/LDC2026L02> ________________________________ LDC data and commercial technology development For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. Visit the Licensing<https://www.ldc.upenn.edu/data-management/using/licensing> page for further information. ________________________________ New publications: Ancient Chinese WordNet<https://catalog.ldc.upenn.edu/LDC2026L03> was developed by Nanjing Normal University<https://www.njnu.edu.cn/> and contains lexical and semantic information for Ancient Chinese vocabulary from the Pre-Qin period (before 221 BCE). The WordNet comprises 38,781 word forms and 55,100 senses, each manually linked to a corresponding synset in Princeton WordNet 1.6<https://wordnet.princeton.edu/> and covering 22 noun categories, 15 verb categories, and additional adjective and adverb categories. The Ancient Chinese WordNet project began in 2012 with the goal of creating a structured lexical database to support linguistic research and natural language processing applications involving historical Chinese language materials. 2026 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. * CALLHOME Spanish Second Edition<https://catalog.ldc.upenn.edu/LDC2026S04> was developed by LDC and contains 38 hours of speech from 120 unscripted telephone conversations between native Spanish speakers. This publication is a re-release of the original CALLHOME Spanish collection, combining CALLHOME Spanish Speech (LDC96S35)<https://catalog.ldc.upenn.edu/LDC96S35> and CALLHOME Spanish Transcripts (LDC96T17)<https://catalog.ldc.upenn.edu/LDC96T17>, with additional transcription and updated directory structure, file formats, and documentation. This corpus contains the 120 calls from CALLHOME Spanish Speech which represented training and development data and a subset of evaluation data. Participants spoke on topics of their choice in a single telephone call lasting up to 30 minutes. Calls were manually audited for language, recording quality, channel characteristics, dialect, and region. For this second edition, all audio was converted from SPHERE files to FLAC format, and the original training/development/test partitioning was removed. This release also features revised transcripts conforming to updated LDC transcription guidelines that addressed normalization of annotation formats, standardization of speaker-produced and background noises, application of foreign-language marking, whitespace cleanup, and corrections and consistency fixes. The CALLHOME series consists of telephone conversations and transcripts developed by LDC and Rutgers, The State University of New Jersey, in support of research in speaker identification, language identification, and related technologies. Languages in the series include American English, Egyptian Arabic, German, Japanese, Mandarin Chinese, and Spanish. 2026 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. * CALLHOME Spanish Lexicon Second Edition<https://catalog.ldc.upenn.edu/LDC2026L02> was developed by LDC and contains 45,547 Spanish words with morphological, phonological, stress, and frequency information. This second edition updates file formats, directory structure, and documentation. The first edition is available as CALLHOME Spanish Lexicon (LDC96L16)<https://catalog.ldc.upenn.edu/LDC96L16>. The words in the lexicon were derived from 80 transcripts representing unscripted telephone conversations between native Spanish speakers contained in CALLHOME Spanish Second Edition LDC2026S04 and from various Spanish news texts. The lexicon contains nine tab-separated information fields: (1) headword: orthographic form; (2) morph: morphological analysis of the headword; (3) pron: pronunciation of the headword; (4) stress: primary stress information of the word; (5) callh freq: frequency of the headword in CALLHOME transcripts; (6) madrid freq: frequency of the headword in Madrid Radio transcripts; (7) ap freq: frequency of the headword in Associated Press newswire; (8) reut freq: frequency of the headword in Reuters newswire; and (9) norte freq: frequency of the headword in El Norte newswire. This release also includes a pronunciation dictionary derived from the lexicon in CMUdict<https://stdlib.io/docs/api/latest/@stdlib/datasets/cmudict> format and the grapheme-to-phoneme (G2P) tools used to automatically generate pronunciations for the original lexicon. 2026 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data for a fee. To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance. Membership Coordinator Linguistic Data Consortium<ldc.upenn.edu> University of Pennsylvania T: +1-215-573-1275 E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu> M: 3600 Market St. Suite 810 Philadelphia, PA 19104

1 0

CFP: 61st Linguistics Colloquium, Sept. 9 to 12, Pavia, Italy
by Reinhard Rapp 16 Mar '26

16 Mar '26

Call for Papers 61st Linguistics Colloquium LingColl 2026 Università di Pavia, Italy, September 9 to 12, 2026 https://lingcoll26.unipv.it Scope: all fields of Linguistics Conference languages: English and German Deadline for abstract submission: May 3, 2026 Special theme: Rethinking Language Comparison: Contrastive Linguistics between Corpora and AI The 61st Linguistics Colloquium (www.lingcoll.de) will take place at the University of Pavia, Italy, from September 9 to 12, 2026. Founded in Hamburg in 1966, the Linguistics Colloquium (has since been hosted in almost 20 countries. It provides a platform for the study of language and languages in all areas of linguistics and warmly welcomes researchers from diverse theoretical backgrounds. The colloquium is distinguished by its cooperative and open culture of discussion: innovative ideas meet critical reflection, and the exchange of research results is actively promoted. Its aim is to create an inspiring space where new approaches, methods, and perspectives can be jointly discussed and developed. In addition, contrastive linguistics will be a focal point at this year’s colloquium. Since its beginnings, contrastive linguistics has undergone significant development, expanding both its methodological and conceptual scope. Today, language comparison is no longer limited to language pairs but can involve multiple languages. It integrates geographical and sociolinguistic dimensions, extends its focus to semantic, pragmatic, textual, and discourse-linguistic levels, and also takes into account historical stages and diachronic comparisons within a single language. Moreover, contrastive linguistics has increasingly established itself as a theoretically reflective discipline: analysing a language in the light of another allows for the identification of linguistic phenomena that might otherwise remain unnoticed or inadequately explained. Recent advances have been particularly driven by the use of large corpora and digital methods. AI-supported analytical methods are expected to provide further developments in the near future. The planned conference will focus on current theoretical, methodological, and applied approaches in contrastive linguistics, with a particular emphasis on German in comparison with other languages. Its aim is to bring together research that empirically investigates systematic differences and similarities across languages and highlights their relevance for applied contexts. Thematic Focus (including, but not limited to): Contrastive Analyses in the Areas of: - Phonetics and phonology - Morphology and syntax - Semantics and lexicon - Phraseology and pragmatics - Text and discourse Corpus-Based, Corpus-Driven, and AI-Supported Approaches: - Contrastive corpus linguistics - Comparative corpus annotation - Corpus-based analyses of phraseological patterns, collocations, and constructions - Quantitative and qualitative methods - Use of AI, NLP, and LLMs in contrastive Research Methodological and Theoretical Issues: - Comparability of data and corpora - Modelling linguistic differences at the word, phrase, and discourse levels - Interfaces between linguistics, corpus linguistics, computational linguistics, and AI Applied Perspectives, Including: - German as a foreign and second language (DaF/DaZ) - Specialized and professional language - Phraseodidactics and discourse-oriented language teaching - Lexicography, phraseography, and terminology work - Translation studies, interpreting, and contrastive discourse analysis- - Language teaching and language comparison in the Classroom We welcome contributions that are theoretically informed as well as empirically oriented, including work that bridges basic research and application. Submissions presenting innovative methods or new resources are particularly encouraged. In addition, in keeping with the tradition of the Linguistics Colloquium, presentations from all other areas of linguistics may be proposed. Submission Abstracts (approx. 300 words) can be submitted until May 3, 2026. lingcoll2026(a)gmail.com Notification of acceptance will be sent by May 15, 2026. Conference Languages The conference languages are German and English. Registration Registration deadline: 30 June 2026 lingcoll2026(a)gmail.com Registration fee Participants with a regular income: €200.00 Participants without a regular income (PhD candidates, scholarship holders): €100.00

1 0

Second CALL: Multimodal sexism identification with sensor data - EXIST 2026
by JORGE AMANDO CARRILLO DE ALBORNOZ CUADRADO 16 Mar '26

16 Mar '26

Please consider contributing and/or forwarding to appropriate colleagues and groups. ****We apologize for the multiple copies of this e-mail**** ---------------------------------------------------------------------------------------------------- Call for Participation ---------------------------------------------------------------------------------------------------- Second Call for Participation: EXIST 2026: Multimodal sexism identification with sensor data Website: http://nlp.uned.es/exist2026/ EXIST is a series of scientific events and shared tasks on sexism identification in social networks. EXIST aims to foster the automatic detection of sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours (EXIST 2021, EXIST 2022, EXIST 2023, EXIST 2024, EXIST 2025). The sixth edition of the EXIST shared task will be held as a Lab in CLEF 2026, on September 21-24, 2026, at Friedrich-Schiller-Universität Jena, Germany . In EXIST 2026, we take a significant step forward by integrating the principles of Human-Centered AI (HCAI) into the development of automatic tools for detecting sexism online. Recognizing that no single interpretation can fully capture the diversity of human perception, we go beyond traditional annotation paradigms by combining Learning With Disagreement (LeWiDi) with sensor-based data (EEG, heart rate, and eye-tracking signals) collected from subjects exposed to potentially sexist content, with the aim of capturing unconscious responses to sexism. This dual approach represents a breakthrough in dataset creation for sensitive and value-laden tasks: for the first time, datasets will include not only divergent judgments from annotators, but also the embodied traces of how this content affect. This richer, multidimensional annotation process will enable the development of more inclusive, equitable, and socially aware AI systems for detecting sexism in complex multimedia formats like memes and short videos, where ambiguity and affect play a critical role. Similar to the approaches in the 2023, 2024 and 2025 edition, this edition will also embrace the Learning With Disagreement (LeWiDi) paradigm for both the development of the dataset and the evaluation of the systems. The LeWiDi paradigm doesn’t rely on a single “correct” label for each example. Instead, the model is trained to handle and learn from conflicting or diverse annotations. This enables the system to consider various annotators’ perspectives, biases, or interpretations, resulting in a fairer learning process. Building upon the EXIST 2025 dataset, this edition focuses exclusively on multimedia formats, comprising six experimental subtasks applied to images (memes) and videos (TikToks). Participants are challenged to address three main objectives: sexism identification (x.1), source intention detection (x.2), and sexism categorization (x.3) (numbering of subtask is consistent with EXIST 2025). Participants will be asked to classify memes and videos (in English and Spanish) according to the following tasks: TASK 2: Sexism detection in Memes: TASK 2.1 - Sexism Identification in Memes: this is a binary classification subtask consisting on determining wheter a meme describes a sexist situation or criticizes a sexist behaviour, and classifying it into two categories: YES and NO. Task 2.2: Source Intention in Memes: this subtask aims to categorize the meme according to the intention of the author. Due to the characteristics of the memes systems should only classify memes into the DIRECT or JUDGEMENTAL categories. Task 2.3: Sexism Categorization in Memes: once a message has been classified as sexist, the third subtask aims to categorize the message in different types of sexism (according to a categorization proposed by experts and that takes into account the different facets of women that are undermined). In particular, each sexist tweet must be categorized in one or more of the following categories: (i) IDEOLOGICAL AND INEQUALITY, (ii) STEREOTYPING AND DOMINANCE, (iii) OBJECTIFICATION, (iv) SEXUAL VIOLENCE and (v) MISOGYNY AND NON-SEXUAL VIOLENCE. TASK 3: Sexism detection in Videos: SUBTASK 3.1 - Sexism Identification in Videos: this is a binary classification task as in Subtasks 2.1. SUBTASK 3.2: Source Intention in Videos: this subtask replicates subtask 2.2 for memes, but it takes as source videos. SUBTASK 3.3: This subtask aims to classify sexist videos according to the categorization provided for Subtask 2.3: (i) IDEOLOGICAL AND INEQUALITY, (ii) STEREOTYPING AND DOMINANCE, (iii) OBJECTIFICATION, (iv) SEXUAL VIOLENCE and (v) MISOGYNY AND NON-SEXUAL VIOLENCE. Although we recommend to participate in all subtasks and in both languages, participants are allowed to participate just in one of them (e.g. subtask 2.1) and in one language (e.g. English). During the training phase, the task organizers will provide the participants with the manually-annotated EXIST 2026 dataset. For the evaluation of the systems, the unlabeled test data will be released. We encourage participation from both academic institutions and industrial organizations. We invite participants to register for the lab at CLEF 2026 Labs Registration site (https://clef-labs-registration.dipintra.it/). You will receive information about how to join the Discord Group for the EXIST 2026 shared task. Important Dates: * 17 November 2025: Registration opens. ¡¡¡¡DONE!!! * 26 February 2026: Training set available. ¡¡¡¡DONE!!! * 9 April 2026: Test set available. * 23 April 2026: Registration closes. * 7 May 2026: Runs submission due to organizers. * 28 May 2026: Results notification to participants. * 4 June 2026: Submission of Working Notes by participants. * 30 June 2026: Notification of acceptance (peer reviews). * 6 July 2026: Camera-ready participant papers due to organizers. * 21-24 September 2026: EXIST 2026 at CLEF Conference. ** Note: All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth") ** Organizers: Laura Plaza, Universidad Nacional de Educación a Distancia (UNED) Jorge Carrillo-de-Albornoz, Universidad Nacional de Educación a Distancia (UNED) Iván Arcos, Universitat Politècnica de València (UPV) Maria Aloy Mayo, Universitat Politècnica de València (UPV) Paolo Rosso, Universitat Politècnica de València (UPV) Damiano Spina, Royal Melbourne Institute of Technology (RMIT) Contact: Contact the organizers by writing to: jcalbornoz(a)lsi.uned.es Website: http://nlp.uned.es/exist2026/ AVISO LEGAL. Este mensaje puede contener información reservada y confidencial. Si usted no es el destinatario no está autorizado a copiar, reproducir o distribuir este mensaje ni su contenido. Si ha recibido este mensaje por error, le rogamos que lo notifique al remitente. Le informamos de que sus datos personales, que puedan constar en este mensaje, serán tratados en calidad de responsable de tratamiento por la UNIVERSIDAD NACIONAL DE EDUCACIÓN A DISTANCIA (UNED) c/ Bravo Murillo, 38, 28015-MADRID-, con la finalidad de mantener el contacto con usted. La base jurídica que legitima este tratamiento, será su consentimiento, el interés legítimo o la necesidad para gestionar una relación contractual o similar. En cualquier momento podrá ejercer sus derechos de acceso, rectificación, supresión, oposición, limitación al tratamiento o portabilidad de los datos, ante la UNED, Oficina de Protección de datos<https://www.uned.es/dpj>, o a través de la Sede electrónica<https://uned.sede.gob.es/> de la Universidad. Para más información visite nuestra Política de Privacidad<https://descargas.uned.es/publico/pdf/Politica_privacidad_UNED.pdf>.

1 0

Final CFP: the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026) @ ACL 2026
by Ali Hurriyetoglu 16 Mar '26

16 Mar '26

The 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026) (formerly CASE) @ ACL 2026 Also, this year, the EEUCA workshop (previously CASE) continues the tradition of the eight previous editions of our workshop on challenges and applications of event extraction. Website: https://bit.ly/EEUCA2026 Submission page: https://openreview.net/group?id=aclweb.org/ACL/2026/Workshop/EEUCA Paper submission deadline: March 29, 2026 (Updated!) Pre-reviewed ARR commitment deadline: April 15, 2026 Notification of acceptance: April 28, 2026 Camera-ready paper due: May 12, 2026 Pre-recorded video due (hard deadline): June 4, 2026 Shared tasks and shared task papers: Start of the Competition: Dec 10, 2025 Eval Phase Start: Dec 10, 2025 Test Phase Start: Jan 15, 2026 Test Phase End: March 15, 2026 Paper Submission Deadline: March 28, 2026 Notification of acceptance: April 28, 2026 Camera-ready paper due: May 12, 2026 We invite work on all aspects of automated coding and analysis of events from mono- or multi-lingual text sources. This includes (but is not limited to) the following topics 1) Extracting events and their arguments in and beyond a sentence or document, event coreference resolution. 2) New datasets, training data collection and annotation for event information. 3) Event-event relations, e.g., subevents, main events, spatiotemporal relations, causal relations. 4) Event dataset evaluation in light of reliability and validity metrics. 5) Defining, populating, and facilitating event schemas and ontologies. 6) Automated tools and pipelines for event collection related tasks. 7) Lexical, syntactic, semantic, discursive, and pragmatic aspects of event manifestation. 8) Methodologies for development, evaluation, and analysis of event datasets. 9) Applications of event databases, e.g. early warning, conflict prediction, and policymaking. 10) Estimating what is missing in event datasets using internal and external information. 11) Detection of new event types, e.g. creative protests, cyber activism, COVID-19 related, terrorism, food safety, food security, climate change, extreme weather events, disasters. 12) Release of new event datasets, 13) Bias and fairness of the sources and event datasets. 14) Ethics, misinformation, privacy, and fairness concerns pertaining to event datasets. 15) Copyright issues on event dataset creation, dissemination, and sharing. 16) Cross-lingual, multilingual, and multimodal aspects in event analysis. 17) Exploiting LLMs in Event Extraction. 18) Generative AI and event reports: detecting AI-generated news, exploiting generative AI for creating event corpora, etc. Shared Task 1: Multimodal Identification of Vaccine Critical Content on Social Media This shared task focuses on detecting vaccine-critical stance in multimodal social media memes. Using the VaxMeme dataset of over 10,000 annotated memes, participants will develop models that jointly leverage visual and textual signals to classify a meme’s stance as pro-vaccine, vaccine-critical, or neutral. The task encourages research on cross-modal understanding, sarcasm, implicit messaging, and misinformation dynamics in public health discourse. External data and transfer learning are permitted, and submissions will be evaluated using macro-F1. All system description papers will be published in the ACL Anthology. Learn More: https://github.com/therealthapa/eeuca-vaccine Shared Task 2: Understanding Toxic Behavioral Intent in Gaming Chat Logs for Healthy Online Interaction This shared task tackles intent-level toxicity detection in online gaming communities using the GameTox dataset of 53,000 annotated chat utterances from World of Tanks. Participants will develop models that classify a player’s message into six fine-grained intent categories, including hate, threats, insults, extremism, and non-toxic communication. The challenge highlights contextual nuance, gaming slang, implicit aggression, and varied severity levels of toxicity. External datasets are allowed, and submissions are evaluated using macro-F1. All system description papers will be published in the ACL Anthology. Learn More: https://github.com/therealthapa/eeuca-toxicity Keep an eye on the workshop page that is being updated: https://bit.ly/EEUCA2026 and contact us for any inquiries (submission, collaboration, contribution, or just saying Hi! ). EEUCA Organization Committee

1 0