Starting from May 2026, the Data & Knowledge Engineering group at the
Computer Science department at Heinrich-Heine-University Düsseldorf
(https://www.cs.hhu.de/lehrstuehle-und-arbeitsgruppen/data-knowledge-enginee…),
affiliated with the Heine Center for Artificial Intelligence and Data
Science (HeiCAD) (https://www.heicad.hhu.de/) is looking for a
*PhD student– Information Extraction & Natural Language Processing*
(Salary group 13 TV-L, working time 100%, initially limited to 36 months
with the possibility of further extension)
In the context of the research project "WIEGE", we will investigate the
spread of claims and political narratives across different social media
platforms in an interdisciplinary consortium involving researchers from
Communication Science, Computational Social Science and Computer
Science. Our research will be concerned with novel Natural Language
Processing (NLP) methods for the detection, linking and classification
of claims and narratives in online discourse data. Challenges arise from
the heterogeneous nature of different data sources, therefore the
development of generalizable approaches will be a key focus. The PhD
student will work in close collaboration with our project partners from
Communication Science, benefitting from their tailored expert
annotations and, vice-versa, aiding their annotation efforts by
providing semi-automatic labeling approaches. Another task will entail
the modeling and publishing of generated data according to Semantic Web
principles.
Your tasks will be:
*******************
* Research in fields such as NLP, Machine Learning, Language Modeling
and Representation learning, specifically with the aim to extract
structured information from online discourse data
* Develop NLP methods for (i) the detection and classification of claims
and narratives on social media, and (ii) linking related claims and
narratives within and across data sources; further (iii) assist
semi-automated data annotations and (iv) model and publish data
according to Semantic Web standards
* Writing, publishing and presenting project results
* Collaboration with team members and project partners in an
interdisciplinary consortium
Your profile:
**************
* University degree (diploma/MSc) in Computer Science, Computational
Linguistics or related fields
* Research interests in NLP, machine learning, data mining, large
language models, Semantic Web
* Hands-on experience with Python, including knowledge of ML-Frameworks
such as TensorFlow and PyTorch
* Ability to communicate fluently in English (mandatory), good knowledge
of the German language (desirable)
What we offer:
***************
* Flexible working hours and home office arrangements
* A fast growing and international working environment with a lot of
creative scientific freedom
* Access to unique research data, (social) web archives and behavioral data
* Support of collaborations with international research labs and experts
The PhD research will be supervised by Prof. Dr. Stefan Dietze
(Professor for Data & Knowledge Engineering at HHU and Scientific
Director of KTS at GESIS - Leibniz Institute for the Social Sciences,
Cologne), mentored by Dr. Katarina Boland (Postdoctoral Researcher at
Data & Knowledge Engineering and HeiCAD).
For further information please contact Stefan Dietze
(stefan.dietze(a)hhu.de) and/or Katarina Boland (katarina.boland(a)hhu.de).
Interested?
*************
Please apply by sending your complete application documents as a single
PDF file to katarina.boland(a)hhu.de by 01 April 2026.
--
Prof. Dr. Stefan Dietze
Scientific Director Knowledge Technologies for the Social Sciences
GESIS - Leibniz Institute for the Social Sciences
Web: https://www.gesis.org/en/kts
Chair of Data & Knowledge Engineering
Heinrich-Heine-University Düsseldorf
Web: https://www.cs.hhu.de/en/research-groups/data-knowledge-engineering
Phone: +49 (0)221-47694-421
Web: http://stefandietze.net
Introduction
We invite proposals for tasks to be run as part of SemEval-2027. SemEval (the International Workshop on Semantic Evaluation) is an ongoing series of evaluations of computational semantics systems, organized under the umbrella of SIGLEX, the Special Interest Group on the Lexicon of the Association for Computational Linguistics.
SemEval tasks investigate the nature of meaning in natural languages, exploring how to characterize and compute meaning. This is achieved in practical terms, using shared datasets and standardized evaluation metrics to quantify the strengths and weaknesses and possible solutions. SemEval tasks encompass a broad range of semantic topics from the lexical level to the discourse level, including word sense identification, semantic parsing, coreference resolution, and sentiment analysis, among others.
For SemEval-2027, we welcome tasks that can test an automatic system for semantic analysis of text (e.g., intrinsic semantic evaluation, or an application-oriented evaluation). We especially encourage tasks for languages other than English, cross-lingual tasks, and tasks that develop novel applications of computational semantics. See the websites of previous editions of SemEval to get an idea about the range of tasks explored, e.g., SemEval-2020 (http://alt.qcri.org/semeval2020/) and SemEval-2021/2026 (https://semeval.github.io<https://semeval.github.io/>).
We strongly encourage proposals based on pilot studies that have already generated initial data, evaluation measures, and baselines. In this way, we can avoid unforeseen challenges down the road that may delay the task. We suggest providing a reasonable baseline (e.g., providing a Transformer / LLM baseline for a classification task) apart from the majority vote / random guess.
In case you are not sure whether a task is suitable for SemEval, please feel free to get in touch with the SemEval organizers at <semevalorganizers(a)gmail.com<mailto:semevalorganizers@gmail.com>> to discuss your idea.
The submission webpage is: https://softconf.com/acl2026/semevaltasks2027/
Task Selection
Task proposals will be reviewed by experts, and reviews will serve as the basis for acceptance decisions. Everything else being equal, more innovative new tasks will be given preference over task reruns. Task proposals will be evaluated on:
Novelty: Is the task on a compelling new problem that has not been explored much in the community? Is the task a rerun, but covering substantially new ground (new subtasks, new types of data, new languages, etc. - one addition is not sufficient)?
Interest: Is the proposed task likely to attract a sufficient number of participants?
Data: Are the plans for collecting data convincing? Will the resulting data be of high quality? Will annotations have meaningfully high inter-annotator agreements? Have all appropriate licenses for use and re-use of the data after the evaluation been secured? Have all international privacy concerns been addressed? Will the data annotation be ready on time?
Evaluation: Is the methodology for evaluation sound? Is the necessary infrastructure available, or can it be built in time for the shared task? Will research inspired by this task be able to evaluate in the same manner and on the same data after the initial task? Is the task significantly challenging (e.g., room for improvement over the baselines)?
Impact: What is the expected impact of the data in this task on future research beyond the SemEval Workshop?
Ethical – The data must be compliant with privacy policies. e.g. avoid personally identifiable information (PII). Tasks aimed at identifying specific people will not be accepted. Avoid medical decision making (compliance with HIPAA, do not try to replace medical professionals, especially if it has anything to do with mental health). These are representative and not exhaustive.
Roles:
Lead Organizer - main point of contact, expected to ensure deliverables are met on time and participate in contributing to task duties (see below).
Co-Organizers - provide significant contributions to ensuring the task runs smoothly. Some examples include maintaining communication with task participants, preparing data, creating and running evaluation scripts, leading paper reviewing, and acceptance.
Advisory Organizers - more of a supervisor role, may not contribute to detailed tasks, but will provide guidance and support.
New Tasks vs. Task Reruns
We welcome both new tasks and task reruns. For a new task, the proposal should address whether the task would be able to attract participants. Preference will be given to novel tasks that have not received much attention yet.
For reruns of previous shared tasks (whether or not the previous task was part of SemEval), the proposal should address the need for another iteration of the task. Valid reasons include: a new form of evaluation (e.g., a new evaluation metric, a new application-oriented scenario), new genres or domains (e.g., social media, domain-specific corpora), or a significant expansion in scale. We further discourage carrying over a previous task and just adding new subtasks, as this can lead to the accumulation of too many subtasks. Evaluating on a different dataset with the same task formulation, or evaluating on the same dataset with a different evaluation metric, typically should not be considered a separate subtask.
Task Organization
We welcome people who have never organized a SemEval task before, as well as those who have. Apart from providing a dataset, task organizers are expected to:
- Verify the data annotations have sufficient inter-annotator agreement.
- Verify licenses for the data allow its use in the competition and afterwards. In particular, text that is publicly available online is not necessarily in the public domain; unless a license has been provided, the author retains all rights associated with their work, including copying, sharing and publishing. For more information, see: https://creativecommons.org/faq/#what-is-copyright-and-why-does-it-matter
- Resolve any potential security, privacy, or ethical concerns about the data.
- Commit to make the data available also after the task in a long-term repository under an appropriate license, preferably using Zenodo: https://zenodo.org/communities/semeval/
- Provide task participants with format checkers and standard scorers.
- Provide task participants with baseline systems to use as a starting point (in order to lower the obstacles to participation). A baseline system typically contains code that reads the data, creates a baseline response (e.g., random guessing, majority class prediction), and outputs the evaluation results. Whenever possible, baseline systems should be written in widely used programming languages and/or should be implemented as a component for standard NLP pipelines.
- Create a mailing list and website for the task and post all relevant information there.
- Create a CodaLab or other similar competition for the task and upload the evaluation script.
- Manage submissions on CodaLab or a similar competition site.
- Write a task description paper to be included in SemEval proceedings, and present it at the workshop.
- Manage participants’ submissions of system description papers, manage participants’ peer review of each other’s papers, and possibly shepherd papers that need additional help in improving the writing.
- Review other task description papers.
Desk Rejects
- To ensure tasks have sufficient support, we require a minimum of two organizers at the time of proposal submission. A task proposal with only one organizer will be desk-rejected. Running a SemEval task is a significant time commitment; therefore, we highly recommend that a task have at least three-four organizers.
- A person can be a lead organizer on only one task. The second mandatory organizer on the task must be committed to the task as a key co-organizer. Any other organizers (beyond the lead and co-organizer) can participate in other tasks.
- All data should have a research-friendly license. The licensing must be provided in the proposal.
- Task organizers must commit to keeping the data available after the task, either by keeping the task alive, by uploading it to Zenodo or some other public data storage location that will be permanent, and sharing the link with the organizers.
=== Important dates ===
- Task proposals due 13 April 2026 (Anywhere on Earth)
- Task selection notification 25 May 2026
=== Preliminary timetable ===
- Sample data ready 15 July 2026
- Training data ready 1 September 2026
- Evaluation data ready 1 December 2026 (internal deadline; not for public release)
- Evaluation start 10 January 2027
- Evaluation end by 31 January 2027 (latest date; task organizers may choose an earlier date)
- Paper submission due February 2027
- Notification to authors March 2027
- Camera ready due April 2027
- SemEval workshop Summer 2027 (co-located with a major NLP conference)
Tasks that fail to keep up with crucial deadlines (such as the dates for having the task and CodaLab website up and dates for uploading sample, training, and evaluation data) may be cancelled at the discretion of SemEval organizers. While consideration will be given to extenuating circumstances, our goal is to provide sufficient time for the participants to develop strong and well-thought-out systems. Cancelled tasks will be encouraged to submit proposals for the subsequent year’s SemEval. To reduce the risk of tasks failing to meet the deadlines, we are unlikely to accept multiple tasks with overlap in the task organizers.
Submission Details
The task proposal should be a self-contained document of no longer than 3 pages (plus additional pages for references). All submissions must be in PDF format, following the ACL template: https://github.com/acl-org/acl-style-files
Each proposal should contain the following:
- Overview
- Summary of the task
- Why this task is needed and which communities would be interested in participating
- Expected impact of the task
- Data & Resources
- How the training/testing data will be produced. Please discuss whether existing corpora will be reused.
- Details of copyright and license, so that the data can be used by the research community both during the SemEval evaluation and afterwards
- How much data will be produced
- How data quality will be ensured and evaluated
- An example of what the data would look like
- Resources required to produce the data and prepare the task for participants (annotation cost, annotation time, computation time, etc.)
- Assessment of any concerns with respect to ethics, privacy, or security (e.g., personally identifiable information of private individuals; potential for systems to cause harm)
- Pilot Task (strongly recommended)
- Details of the pilot task
- What lessons were learned, and how these will impact the task design
- Evaluation
- The evaluation methodology to be used, including clear evaluation criteria
- For Task Reruns
- Justification for why a new iteration of the task is needed (see criteria above)
- What will differ from the previous iteration
- Expected impact of the rerun compared with the previous iteration
- Task organizers
- Names, affiliations, email addresses
- (optional) brief description of relevant experience or expertise
- (if applicable) years and task numbers of any SemEval tasks you have run in the past
Proposals will be reviewed by an independent group of area experts who may not have familiarity with recent SemEval tasks, and therefore, all proposals should be written in a self-explanatory manner and contain sufficient examples.
The submission webpage is: https://softconf.com/acl2026/semevaltasks2027/
=== Chairs ===
Debanjan Ghosh, Analog Devices, USA
Kai North, Cambium Assessment, USA
Ekaterina Kochmar, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), UAE
Mamoru Komachi, Hitotsubashi University, Japan
Marcos Zampieri, George Mason University, USA
Contact: semevalorganizers(a)gmail.com<mailto:semevalorganizers@gmail.com>
═══════════════════════════════════════════════════════════════════════
═
CALL FOR PAPERS
═══════════════════════════════════════════════════════════════════════
═
A DFG Programme Point Sud Workshop
Digital Humanities and Artificial Intelligence in African Studies:
Towards Sustainable and Equitable Practices
21–24 September 2026 · STIAS, Stellenbosch, South Africa
────────────────────────────────────────────────────────────
ABOUT THE WORKSHOP
The integration of digital humanities (DH) and artificial intelligence
(AI) is transforming the production of knowledge in African Studies,
offering new opportunities for innovative analysis, dynamic
visualisation and cross-cultural research. Yet this shift raises urgent
questions regarding equitable access, the representation of African
languages, and the suitability of methodologies. Current large language
models underrepresent African languages, digital scholarly
infrastructures remain optimised for English, and digitisation
pipelines
that produce AI-ready data are themselves shaped by political choices
about what to digitise, how to describe it, and who controls access.
While recent initiatives on digital sovereignty in Africa have centred
on policy and regulation, this workshop shifts attention to
methodological practice. It asks how DH methods and AI transform
research in African Studies, and how we can design, evaluate, and
sustain these methods under African conditions. By bringing together
scholars, independent researchers and practitioners from Africa,
Europe,
and beyond, the event will foster North–South and South–South dialogue
at the intersection of African epistemologies and digital methods,
moving from description to design.
────────────────────────────────────────────────────────────
CONVENORS
- Frédérick Madore, University of Bayreuth
- Vincent Hiribarren, King's College London
- Emmanuel Ngue Um, University of Yaoundé 1
- Menno van Zaanen, South African Centre for Digital Language
Resources (SADiLaR)
────────────────────────────────────────────────────────────
THEMATIC AXES
The programme is structured around the following thematic axes:
1. Transforming Research Methods through AI and Digital Tools in
African Studies
This axis asks a fundamental question: how are AI and DH methods
changing the study of African cultures, languages, and histories?
Participants will present concrete uses of AI to analyse multilingual
texts, employ computer vision to study visual culture and historical
artefacts, and develop digital mapping to trace cultural movements and
connections. We will evaluate what works for different kinds of African
cultural materials, identify adaptations required for local contexts,
and specify where computational approaches can complement—rather than
replace—interpretive scholarship. The goal is clear: practical guidance
for integrating these methods while preserving the interpretive
richness
that defines the humanities.
2. Building Sustainable Research Infrastructures from African
Perspectives
Moving beyond policy discourse, this axis asks what it takes to build
and sustain digital research capacity within African institutions and
communities. We will examine practical obstacles—limited connectivity,
unstable funding, and scarce training data for local languages—and
showcase South–South collaboration models that have navigated these
constraints. Participants will share strategies for developing tools
that utilise available resources rather than assuming high-end
infrastructure. Key questions include how to keep research outputs
accessible to the communities being studied, how to train the next
generation of African DH scholars, and how to secure sustainable
funding
that does not depend solely on institutions in the Global North. The
focus is on concrete, scalable approaches to durable capacity.
3. Centring African Knowledge Systems in Digital Research Design
This axis poses a methodological challenge: how can digital research
tools respect and incorporate African ways of knowing? Rather than
retrofitting existing techniques to African materials, we explore how
African epistemologies can shape the tools themselves. Case studies
will
show community knowledge informing database structures, oral traditions
testing text-centred analytical frameworks, and local classification
systems improving standard metadata schemas. We will consider protocols
for culturally sensitive materials, interface design that does not
privilege European languages, and criteria to ensure that AI systems
trained on African data primarily serve African research needs. Here,
decolonisation moves from critique to construction.
────────────────────────────────────────────────────────────
WORKSHOP FORMAT & LANGUAGE POLICY
The workshop will run in a hybrid format to maximise participation and
impact. In-person sessions at STIAS will be paired with remote access
via Zoom for those unable to travel. Participants will pre-circulate
draft papers in English or French one month in advance, each with a
bilingual abstract to support preparation. To address language
barriers,
the workshop will operate bilingually in English and French. Presenters
may speak in either language; where possible, a bilingual chair will
moderate discussion and provide brief consecutive interpretation where
needed. Recent advances in AI speech recognition and machine
translation
now enable near-real-time captioning; we will deploy these tools in the
room and on Zoom. All presenters will supply slides with bilingual
titles and key terms, and a one-page terminology handout in both
languages. Together, these measures encourage meaningful participation
in Africa’s Anglophone and Francophone communities, which are often
divided by institutional and linguistic boundaries, and provide
immediate, practical benefits for multilingual colleagues.
────────────────────────────────────────────────────────────
SUBMISSION GUIDELINES
We invite proposals for individual papers (20-minute presentations).
Submissions may be in English or French. Proposals of up to 500 words
should be emailed to the convenors by 30 April 2026. Each submission
must include: (i) a title; (ii) an abstract outlining the context,
central question, and methodological approach; and (iii) a 100-word
biographical note indicating the applicant’s discipline and
institutional affiliation.
Please send your proposals to the following addresses:
- Frédérick Madore: frederick.madore(a)uni-bayreuth.de
- Vincent Hiribarren: vincent.hiribarren(a)kcl.ac.uk
- Emmanuel Ngue Um: ngueum(a)gmail.com
- Menno van Zaanen: menno.vanzaanen(a)nwu.ac.za
────────────────────────────────────────────────────────────
PUBLICATION
Our goal is to publish selected papers from the workshop as a special
issue in the Journal of the Digital Humanities Association of Southern
Africa (JDHASA), subject to agreement with the journal’s editorial
board. All submitted full papers will undergo peer review. Authors
whose
papers are selected for the special issue will be expected to revise
their manuscripts in line with reviewer feedback before final
publication.
────────────────────────────────────────────────────────────
SELECTION CRITERIA & INCLUSIVITY
Selection will prioritise gender equity, support for early-career
scholars based in sub-Saharan Africa, and balance across disciplines
and
regions. In addition to scholars, we will include
practitioner-developers by directly engaging the teams behind DH tools.
Their participation will help us to assess user needs and the
feasibility of embedding African ways of knowing in tool design. DH
remains gender-imbalanced; accordingly, the open call will explicitly
encourage applications from women and weight gender equity in review.
We
will intentionally include Africa-based, diasporic, and returning
scholars. Recognising uneven DH capacity, particularly in several
Francophone regions, we will aim for a majority of Africa-based
participants and amplify Francophone voices through targeted outreach
and reserved places for early-career researchers. The workshop will
uphold equal opportunity regardless of gender, religion, or other
sociocultural differences.
────────────────────────────────────────────────────────────
KEY DATES
- Submission Deadline: 30 April 2026
- Notification of Acceptance: 15 May 2026
- Deadline for Full Papers: 15 August 2026
- Workshop Dates: 21–24 September 2026
═══════════════════════════════════════════════════════════════════════
═
https://fmadore.github.io/stias-dh-ai-workshop-2026
═══════════════════════════════════════════════════════════════════════
═
--
Prof Menno van Zaanen menno.vanzaanen(a)nwu.ac.za
Professor in Digital Humanities
South African Centre for Digital Language Resources
https://www.sadilar.org
________________________________
NWU PRIVACY STATEMENT:
http://www.nwu.ac.za/it/gov-man/disclaimer.html
DISCLAIMER: This e-mail message and attachments thereto are intended solely for the recipient(s) and may contain confidential and privileged information. Any unauthorised review, use, disclosure, or distribution is prohibited. If you have received the e-mail by mistake, please contact the sender or reply e-mail and delete the e-mail and its attachments (where appropriate) from your system.
________________________________
ComputEL-9: Ninth Workshop on the Use of Computational Methods in the
Study of Endangered Languages
Second CALL FOR PAPERS
Submission deadline: March 20, 2026
Submission link: https://softconf.com/acl2026/ComputEL2026
ComputEL-9 will be co-located with ACL 2026 in San Diego, California. It
will be a one-day workshop, held in Friday July 4, 2026. This time, we
are co-ordinating our activities with Americas-NLP, held on the previous
day.
We encourage submissions that explore the interface and intersection of
computational linguistics, documentary linguistics, and community-based
efforts in language revitalization and reclamation. This includes
submissions that:
(i) demonstrate new methods or technologies for tasks or applications
focused on low-resource settings, and in particular, endangered languages,
(ii) examine the use of specific methods in the analysis of data from
low-resource languages, or demonstrate new methods for analysis of such
data, oriented toward the goals of language reclamation and revitalization,
(iii) propose new models for the collection, management, and
mobilization of language data in community settings, with attention to
e.g. issues of data sovereignty and community protocols,
(iv) explore concrete steps for a more fruitful interaction among
computer scientists, documentary linguists, and language communities.
IMPORTANT DATES
20 March 2026 Deadline for submission of papers or extended abstracts
1 May 2026 Notification of Acceptance
4 July 2026 Workshop
PRESENTATIONS
Presentation of accepted papers will be in both oral session and a
poster session. The decision on whether a presentation for a paper will
be oral and/or poster will be made by the Organizing Committee on the
advice of the Program Committee, taking into account the subject matter
and how the content might be best conveyed. Oral and poster
presentations will not be distinguished in the Proceedings.
SUBMISSIONS
We offer two submissions lengths: short (up to 4 pages) or long (up to 8
pages) paper. The length of submission does not influence the likelihood
of acceptance. Both paper types must include a section on ethical
consideration and a section on limitations; these sections are not
considered part of the page limit.
All submissions must be anonymous and will be peer-reviewed by the
scientific Program Committee. Papers must follow the style and
formatting guidelines provided in by ACL Style Files (download template
files for LaTeX: https://github.com/acl-org/acl-style-files).
Submissions that exceed the length requirements, or are missing a
limitations section, will be desk rejected.
Papers can be submitted to one of the workshop’s tracks: (a) language
community perspective and (b) academic perspective.
Submissions must be uploaded to SoftConf:
https://softconf.com/acl2026/ComputEL2026 by March 20, 2026 11:59PM
(UTC-12, “anywhere on earth”).
A. Short Papers:
Short paper submissions must describe original and unpublished work.
They are max. 4 pages excluding references. They must include a section
on ethical consideration and limitations; these sections are not
considered part of the page limit. Please note that a short paper is not
a shortened long paper. Instead, short papers should have a small,
focused contribution or describe work in progress (“working paper”).
Short papers might not necessarily be intended for publication. Some
common kinds of short papers are negative results, opinion pieces,
interesting application nuggets, or descriptions of ongoing
collaborative teamwork.
B. Long Paper:
Long papers must describe substantial, original, completed and
unpublished work. Wherever appropriate, concrete evaluation and analysis
should be included. Long papers are max. 8 pages excluding references
and appendices. They must include a section on ethical consideration and
limitations; these sections are not considered part of the page limit.
PROCEEDINGS
The Organizing Committee will select papers that have been accepted for
presentation for online publication via the open-access ACL Anthology.
Not all accepted papers for presentation are guaranteed inclusion in the
Anthology. Final versions of long and short papers that are accepted for
publication will be allotted one additional page (altogether 5 and 9
pages) excluding references. Papers accepted for inclusion in the
Anthology should be revised and improved versions of the work that was
submitted for, and which underwent, review. Any revisions should concern
responses to reviewer comments or the addition of relevant details and
clarifications, but not entirely new, unreviewed content.
FUNDING SUPPORT
Limited funding will be available for some accepted authors. A link to
apply for funding will be sent to submitters after the submission
deadline. Decisions on funding will be sent with notification of
acceptance. Priority will be given to individuals without institutional
support, for instance members of endangered language communities, other
unsponsored or under-sponsored presenters (e.g. student/faculty of
Linguistics Departments), and student presenters.
ADDITIONAL AND CONTACT INFORMATION
Please see the ComputEL-9 website for further information:
https://computel-workshop.org/computel-9/
Organizing Committee Email: computel.workshop(a)gmail.com
--
======================================================================
Antti Arppe - Ph.D (General Linguistics), M.Sc. (Engineering)
Professor of Quantitative Linguistics
Director, Alberta Language Technology Lab (ALTLab)
Project Director, 21st Century Tools for Indigenous Languages (21C)
Department of Linguistics, University of Alberta
Algonquian Studies Association - Secretary-Treasurer
E-mail: arppe(a)ualberta.ca - antti.arppe(a)iki.fi
WWW: www.ualberta.ca/~arppe - altlab.ualberta.ca - 21c.tools
Mānahtu ina rēdûti ihza ummânūti ihannaq - dulum ugulak úmun ingul
----------------------------------------------------------------------
I am looking for a postdoctoral researcher to join my group.
Some keywords: language learning in interaction; learning to interact; simulating situated language use; building agents; evaluating LLMs / LLM-agents in interaction; pragmatics of human/AI interaction.
Application deadline: April 7th 2026, for start in September 2026. For more information about the position and on how to apply, see: https://clp.ling.uni-potsdam.de/positions/ .
---
David Schlangen
Chair "Foundations of Computational Linguistics"
Department of Linguistics, University of Potsdam
Karl-Liebknecht-Strasse 24-25
14476 Potsdam, Germany
Campus Golm, Building 14, Room 2.18
Tel. +49 331 977 2692
Tel. Secretary +49 331 977 2016
http://clp.ling.uni-potsdam.de
In this newsletter:
LDC data and commercial technology development
New publications
Ancient Chinese WordNet<https://catalog.ldc.upenn.edu/LDC2026L03>
CALLHOME Spanish Second Edition<https://catalog.ldc.upenn.edu/LDC2026S04>
CALLHOME Spanish Lexicon Second Edition<https://catalog.ldc.upenn.edu/LDC2026L02>
________________________________
LDC data and commercial technology development
For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. Visit the Licensing<https://www.ldc.upenn.edu/data-management/using/licensing> page for further information.
________________________________
New publications:
Ancient Chinese WordNet<https://catalog.ldc.upenn.edu/LDC2026L03> was developed by Nanjing Normal University<https://www.njnu.edu.cn/> and contains lexical and semantic information for Ancient Chinese vocabulary from the Pre-Qin period (before 221 BCE). The WordNet comprises 38,781 word forms and 55,100 senses, each manually linked to a corresponding synset in Princeton WordNet 1.6<https://wordnet.princeton.edu/> and covering 22 noun categories, 15 verb categories, and additional adjective and adverb categories. The Ancient Chinese WordNet project began in 2012 with the goal of creating a structured lexical database to support linguistic research and natural language processing applications involving historical Chinese language materials.
2026 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
CALLHOME Spanish Second Edition<https://catalog.ldc.upenn.edu/LDC2026S04> was developed by LDC and contains 38 hours of speech from 120 unscripted telephone conversations between native Spanish speakers. This publication is a re-release of the original CALLHOME Spanish collection, combining CALLHOME Spanish Speech (LDC96S35)<https://catalog.ldc.upenn.edu/LDC96S35> and CALLHOME Spanish Transcripts (LDC96T17)<https://catalog.ldc.upenn.edu/LDC96T17>, with additional transcription and updated directory structure, file formats, and documentation.
This corpus contains the 120 calls from CALLHOME Spanish Speech which represented training and development data and a subset of evaluation data. Participants spoke on topics of their choice in a single telephone call lasting up to 30 minutes. Calls were manually audited for language, recording quality, channel characteristics, dialect, and region. For this second edition, all audio was converted from SPHERE files to FLAC format, and the original training/development/test partitioning was removed.
This release also features revised transcripts conforming to updated LDC transcription guidelines that addressed normalization of annotation formats, standardization of speaker-produced and background noises, application of foreign-language marking, whitespace cleanup, and corrections and consistency fixes.
The CALLHOME series consists of telephone conversations and transcripts developed by LDC and Rutgers, The State University of New Jersey, in support of research in speaker identification, language identification, and related technologies. Languages in the series include American English, Egyptian Arabic, German, Japanese, Mandarin Chinese, and Spanish.
2026 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
CALLHOME Spanish Lexicon Second Edition<https://catalog.ldc.upenn.edu/LDC2026L02> was developed by LDC and contains 45,547 Spanish words with morphological, phonological, stress, and frequency information. This second edition updates file formats, directory structure, and documentation. The first edition is available as CALLHOME Spanish Lexicon (LDC96L16)<https://catalog.ldc.upenn.edu/LDC96L16>. The words in the lexicon were derived from 80 transcripts representing unscripted telephone conversations between native Spanish speakers contained in CALLHOME Spanish Second Edition LDC2026S04 and from various Spanish news texts.
The lexicon contains nine tab-separated information fields: (1) headword: orthographic form; (2) morph: morphological analysis of the headword; (3) pron: pronunciation of the headword; (4) stress: primary stress information of the word; (5) callh freq: frequency of the headword in CALLHOME transcripts; (6) madrid freq: frequency of the headword in Madrid Radio transcripts; (7) ap freq: frequency of the headword in Associated Press newswire; (8) reut freq: frequency of the headword in Reuters newswire; and (9) norte freq: frequency of the headword in El Norte newswire.
This release also includes a pronunciation dictionary derived from the lexicon in CMUdict<https://stdlib.io/docs/api/latest/@stdlib/datasets/cmudict> format and the grapheme-to-phoneme (G2P) tools used to automatically generate pronunciations for the original lexicon.
2026 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
Call for Papers
61st Linguistics Colloquium
LingColl 2026
Università di Pavia, Italy, September 9 to 12, 2026
https://lingcoll26.unipv.it
Scope: all fields of Linguistics
Conference languages: English and German
Deadline for abstract submission: May 3, 2026
Special theme:
Rethinking Language Comparison: Contrastive Linguistics between Corpora
and AI
The 61st Linguistics Colloquium (www.lingcoll.de) will take place at
the University of Pavia, Italy, from September 9 to 12, 2026.
Founded in Hamburg in 1966, the Linguistics Colloquium (has since been
hosted in almost 20 countries. It provides a platform for the study of
language and languages in all areas of linguistics and warmly welcomes
researchers from diverse theoretical backgrounds. The colloquium is
distinguished by its cooperative and open culture of discussion:
innovative ideas meet critical reflection, and the exchange of research
results is actively promoted. Its aim is to create an inspiring space
where new approaches, methods, and perspectives can be jointly discussed
and developed.
In addition, contrastive linguistics will be a focal point at this
year’s colloquium. Since its beginnings, contrastive linguistics has
undergone significant development, expanding both its methodological and
conceptual scope. Today, language comparison is no longer limited to
language pairs but can involve multiple languages. It integrates
geographical and sociolinguistic dimensions, extends its focus to
semantic, pragmatic, textual, and discourse-linguistic levels, and also
takes into account historical stages and diachronic comparisons within a
single language.
Moreover, contrastive linguistics has increasingly established itself as
a theoretically reflective discipline: analysing a language in the light
of another allows for the identification of linguistic phenomena that
might otherwise remain unnoticed or inadequately explained. Recent
advances have been particularly driven by the use of large corpora and
digital methods. AI-supported analytical methods are expected to provide
further developments in the near future.
The planned conference will focus on current theoretical,
methodological, and applied approaches in contrastive linguistics, with
a particular emphasis on German in comparison with other languages. Its
aim is to bring together research that empirically investigates
systematic differences and similarities across languages and highlights
their relevance for applied contexts.
Thematic Focus (including, but not limited to):
Contrastive Analyses in the Areas of:
- Phonetics and phonology
- Morphology and syntax
- Semantics and lexicon
- Phraseology and pragmatics
- Text and discourse
Corpus-Based, Corpus-Driven, and AI-Supported Approaches:
- Contrastive corpus linguistics
- Comparative corpus annotation
- Corpus-based analyses of phraseological patterns, collocations, and
constructions
- Quantitative and qualitative methods
- Use of AI, NLP, and LLMs in contrastive Research
Methodological and Theoretical Issues:
- Comparability of data and corpora
- Modelling linguistic differences at the word, phrase, and discourse levels
- Interfaces between linguistics, corpus linguistics, computational
linguistics, and AI
Applied Perspectives, Including:
- German as a foreign and second language (DaF/DaZ)
- Specialized and professional language
- Phraseodidactics and discourse-oriented language teaching
- Lexicography, phraseography, and terminology work
- Translation studies, interpreting, and contrastive discourse analysis-
- Language teaching and language comparison in the Classroom
We welcome contributions that are theoretically informed as well as
empirically oriented, including work that bridges basic research and
application. Submissions presenting innovative methods or new resources
are particularly encouraged.
In addition, in keeping with the tradition of the Linguistics
Colloquium, presentations from all other areas of linguistics may be
proposed.
Submission
Abstracts (approx. 300 words) can be submitted until May 3, 2026.
lingcoll2026(a)gmail.com
Notification of acceptance will be sent by May 15, 2026.
Conference Languages
The conference languages are German and English.
Registration
Registration deadline: 30 June 2026 lingcoll2026(a)gmail.com
Registration fee
Participants with a regular income: €200.00
Participants without a regular income (PhD candidates, scholarship
holders): €100.00
Please consider contributing and/or forwarding to appropriate colleagues and groups.
****We apologize for the multiple copies of this e-mail****
----------------------------------------------------------------------------------------------------
Call for Participation
----------------------------------------------------------------------------------------------------
Second Call for Participation:
EXIST 2026: Multimodal sexism identification with sensor data
Website: http://nlp.uned.es/exist2026/
EXIST is a series of scientific events and shared tasks on sexism identification in social networks. EXIST aims to foster the automatic detection of sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours (EXIST 2021, EXIST 2022, EXIST 2023, EXIST 2024, EXIST 2025). The sixth edition of the EXIST shared task will be held as a Lab in CLEF 2026, on September 21-24, 2026, at Friedrich-Schiller-Universität Jena, Germany .
In EXIST 2026, we take a significant step forward by integrating the principles of Human-Centered AI (HCAI) into the development of automatic tools for detecting sexism online. Recognizing that no single interpretation can fully capture the diversity of human perception, we go beyond traditional annotation paradigms by combining Learning With Disagreement (LeWiDi) with sensor-based data (EEG, heart rate, and eye-tracking signals) collected from subjects exposed to potentially sexist content, with the aim of capturing unconscious responses to sexism. This dual approach represents a breakthrough in dataset creation for sensitive and value-laden tasks: for the first time, datasets will include not only divergent judgments from annotators, but also the embodied traces of how this content affect. This richer, multidimensional annotation process will enable the development of more inclusive, equitable, and socially aware AI systems for detecting sexism in complex multimedia formats like memes and short videos, where ambiguity and affect play a critical role.
Similar to the approaches in the 2023, 2024 and 2025 edition, this edition will also embrace the Learning With Disagreement (LeWiDi) paradigm for both the development of the dataset and the evaluation of the systems. The LeWiDi paradigm doesn’t rely on a single “correct” label for each example. Instead, the model is trained to handle and learn from conflicting or diverse annotations. This enables the system to consider various annotators’ perspectives, biases, or interpretations, resulting in a fairer learning process.
Building upon the EXIST 2025 dataset, this edition focuses exclusively on multimedia formats, comprising six experimental subtasks applied to images (memes) and videos (TikToks). Participants are challenged to address three main objectives: sexism identification (x.1), source intention detection (x.2), and sexism categorization (x.3) (numbering of subtask is consistent with EXIST 2025). Participants will be asked to classify memes and videos (in English and Spanish) according to the following tasks:
TASK 2: Sexism detection in Memes:
TASK 2.1 - Sexism Identification in Memes: this is a binary classification subtask consisting on determining wheter a meme describes a sexist situation or criticizes a sexist behaviour, and classifying it into two categories: YES and NO.
Task 2.2: Source Intention in Memes: this subtask aims to categorize the meme according to the intention of the author. Due to the characteristics of the memes systems should only classify memes into the DIRECT or JUDGEMENTAL categories.
Task 2.3: Sexism Categorization in Memes: once a message has been classified as sexist, the third subtask aims to categorize the message in different types of sexism (according to a categorization proposed by experts and that takes into account the different facets of women that are undermined). In particular, each sexist tweet must be categorized in one or more of the following categories: (i) IDEOLOGICAL AND INEQUALITY, (ii) STEREOTYPING AND DOMINANCE, (iii) OBJECTIFICATION, (iv) SEXUAL VIOLENCE and (v) MISOGYNY AND NON-SEXUAL VIOLENCE.
TASK 3: Sexism detection in Videos:
SUBTASK 3.1 - Sexism Identification in Videos: this is a binary classification task as in Subtasks 2.1.
SUBTASK 3.2: Source Intention in Videos: this subtask replicates subtask 2.2 for memes, but it takes as source videos.
SUBTASK 3.3: This subtask aims to classify sexist videos according to the categorization provided for Subtask 2.3: (i) IDEOLOGICAL AND INEQUALITY, (ii) STEREOTYPING AND DOMINANCE, (iii) OBJECTIFICATION, (iv) SEXUAL VIOLENCE and (v) MISOGYNY AND NON-SEXUAL VIOLENCE.
Although we recommend to participate in all subtasks and in both languages, participants are allowed to participate just in one of them (e.g. subtask 2.1) and in one language (e.g. English).
During the training phase, the task organizers will provide the participants with the manually-annotated EXIST 2026 dataset. For the evaluation of the systems, the unlabeled test data will be released.
We encourage participation from both academic institutions and industrial organizations. We invite participants to register for the lab at CLEF 2026 Labs Registration site (https://clef-labs-registration.dipintra.it/). You will receive information about how to join the Discord Group for the EXIST 2026 shared task.
Important Dates:
* 17 November 2025: Registration opens. ¡¡¡¡DONE!!!
* 26 February 2026: Training set available. ¡¡¡¡DONE!!!
* 9 April 2026: Test set available.
* 23 April 2026: Registration closes.
* 7 May 2026: Runs submission due to organizers.
* 28 May 2026: Results notification to participants.
* 4 June 2026: Submission of Working Notes by participants.
* 30 June 2026: Notification of acceptance (peer reviews).
* 6 July 2026: Camera-ready participant papers due to organizers.
* 21-24 September 2026: EXIST 2026 at CLEF Conference.
** Note: All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth") **
Organizers:
Laura Plaza, Universidad Nacional de Educación a Distancia (UNED)
Jorge Carrillo-de-Albornoz, Universidad Nacional de Educación a Distancia (UNED)
Iván Arcos, Universitat Politècnica de València (UPV)
Maria Aloy Mayo, Universitat Politècnica de València (UPV)
Paolo Rosso, Universitat Politècnica de València (UPV)
Damiano Spina, Royal Melbourne Institute of Technology (RMIT)
Contact:
Contact the organizers by writing to: jcalbornoz(a)lsi.uned.es
Website: http://nlp.uned.es/exist2026/
AVISO LEGAL. Este mensaje puede contener información reservada y confidencial. Si usted no es el destinatario no está autorizado a copiar, reproducir o distribuir este mensaje ni su contenido. Si ha recibido este mensaje por error, le rogamos que lo notifique al remitente.
Le informamos de que sus datos personales, que puedan constar en este mensaje, serán tratados en calidad de responsable de tratamiento por la UNIVERSIDAD NACIONAL DE EDUCACIÓN A DISTANCIA (UNED) c/ Bravo Murillo, 38, 28015-MADRID-, con la finalidad de mantener el contacto con usted. La base jurídica que legitima este tratamiento, será su consentimiento, el interés legítimo o la necesidad para gestionar una relación contractual o similar. En cualquier momento podrá ejercer sus derechos de acceso, rectificación, supresión, oposición, limitación al tratamiento o portabilidad de los datos, ante la UNED, Oficina de Protección de datos<https://www.uned.es/dpj>, o a través de la Sede electrónica<https://uned.sede.gob.es/> de la Universidad.
Para más información visite nuestra Política de Privacidad<https://descargas.uned.es/publico/pdf/Politica_privacidad_UNED.pdf>.
The 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026) (formerly CASE) @ ACL 2026
Also, this year, the EEUCA workshop (previously CASE) continues the tradition of the eight previous editions of our workshop on challenges and applications of event extraction.
Website: https://bit.ly/EEUCA2026
Submission page: https://openreview.net/group?id=aclweb.org/ACL/2026/Workshop/EEUCA
Paper submission deadline: March 29, 2026 (Updated!)
Pre-reviewed ARR commitment deadline: April 15, 2026
Notification of acceptance: April 28, 2026
Camera-ready paper due: May 12, 2026
Pre-recorded video due (hard deadline): June 4, 2026
Shared tasks and shared task papers:
Start of the Competition: Dec 10, 2025
Eval Phase Start: Dec 10, 2025
Test Phase Start: Jan 15, 2026
Test Phase End: March 15, 2026
Paper Submission Deadline: March 28, 2026
Notification of acceptance: April 28, 2026
Camera-ready paper due: May 12, 2026
We invite work on all aspects of automated coding and analysis of events from mono- or multi-lingual text sources. This includes (but is not limited to) the following topics
1) Extracting events and their arguments in and beyond a sentence or document, event coreference resolution.
2) New datasets, training data collection and annotation for event information.
3) Event-event relations, e.g., subevents, main events, spatiotemporal relations, causal relations.
4) Event dataset evaluation in light of reliability and validity metrics.
5) Defining, populating, and facilitating event schemas and ontologies.
6) Automated tools and pipelines for event collection related tasks.
7) Lexical, syntactic, semantic, discursive, and pragmatic aspects of event manifestation.
8) Methodologies for development, evaluation, and analysis of event datasets.
9) Applications of event databases, e.g. early warning, conflict prediction, and policymaking.
10) Estimating what is missing in event datasets using internal and external information.
11) Detection of new event types, e.g. creative protests, cyber activism, COVID-19 related, terrorism, food safety, food security, climate change, extreme weather events, disasters.
12) Release of new event datasets,
13) Bias and fairness of the sources and event datasets.
14) Ethics, misinformation, privacy, and fairness concerns pertaining to event datasets.
15) Copyright issues on event dataset creation, dissemination, and sharing.
16) Cross-lingual, multilingual, and multimodal aspects in event analysis.
17) Exploiting LLMs in Event Extraction.
18) Generative AI and event reports: detecting AI-generated news, exploiting generative AI for creating event corpora, etc.
Shared Task 1: Multimodal Identification of Vaccine Critical Content on Social Media
This shared task focuses on detecting vaccine-critical stance in multimodal social media memes. Using the VaxMeme dataset of over 10,000 annotated memes, participants will develop models that jointly leverage visual and textual signals to classify a meme’s stance as pro-vaccine, vaccine-critical, or neutral. The task encourages research on cross-modal understanding, sarcasm, implicit messaging, and misinformation dynamics in public health discourse. External data and transfer learning are permitted, and submissions will be evaluated using macro-F1. All system description papers will be published in the ACL Anthology.
Learn More: https://github.com/therealthapa/eeuca-vaccine
Shared Task 2: Understanding Toxic Behavioral Intent in Gaming Chat Logs for Healthy Online Interaction
This shared task tackles intent-level toxicity detection in online gaming communities using the GameTox dataset of 53,000 annotated chat utterances from World of Tanks. Participants will develop models that classify a player’s message into six fine-grained intent categories, including hate, threats, insults, extremism, and non-toxic communication. The challenge highlights contextual nuance, gaming slang, implicit aggression, and varied severity levels of toxicity. External datasets are allowed, and submissions are evaluated using macro-F1. All system description papers will be published in the ACL Anthology.
Learn More: https://github.com/therealthapa/eeuca-toxicity
Keep an eye on the workshop page that is being updated: https://bit.ly/EEUCA2026 and contact us for any inquiries (submission, collaboration, contribution, or just saying Hi! ).
EEUCA Organization Committee