15th meeting of Forum for Information Retrieval
Evaluation HASOC-2023
We are excited to announce the 5th edition of
HASOC, consisting of four interesting shared tasks. We invite
you to participate.
Task 1 focus on identifying hate speech,
offensive language, and profanity in different languages
using natural language processing techniques.
- Task 1A deals with
identifying hate and offensive content in Sinhala, a
low-resource Indo-Aryan language spoken in Sri Lanka.
The task involves classifying tweets into Hate and
Offensive (HOF) or Non-Hate and Offensive (NOT). The
dataset for this task is based on the Sinhala Offensive
Language Detection dataset.
- Task 1B focuses on
identifying hate and offensive content in Gujarati,
another low-resource Indo-Aryan language spoken by
approximately 50 million people in India. Similarly,
participants need to classify tweets into HOF or NOT
categories. The training set for this task consists of
around 200 tweets.
For more details, please visit
task 1 page.
Task 2, Identification of Conversational
Hate-Speech in Code-Mixed Languages (ICHCL), addresses
the challenge of identifying hate speech and offensive
content in code-mixed conversations on social media.
Code-mixed text includes multiple languages within a
single conversation. The task is divided into two
subtasks.
- In Task 2a, participants need
to perform binary classification on conversational
tweets with tree-structured data. They must determine
whether a tweet, comment, or reply contains hate speech,
offensive language, or profanity (HOF) or is non-hate
and offensive (NOT). The classification should consider
both the individual content and support for hate
expressed in the parent tweet.
- Task 2b involves the
classification of conversational tweets with
tree-structured data into specific forms of hate.
Participants must identify if the tweet, comment, or
reply contains standalone hate (SHOF), contextual hate
(CHOF) that supports hate expressed in the parent, or if
it is non-hate (NONE).
For more details, please visit
Task 2 webpage.
Task 3 aims to detect hateful spans within
a sentence already considered hateful. A hate span is a
set of continuous tokens that, in tandem, communicate
the explicit hatefulness in a sentence.
- For instance, in the
statement, "Women ... Can't live with them... Can't
shoot them," the portion highlighted in bold will be
considered a hateful span. This shared task aims to
extract all such spans from a hateful text.
- The input texts are all in
English. The detection of hateful spans is achieved by
mapping this into a sequence labeling problem. For every
token of the sequences, we have manually annotated the
start and end of a hateful span. This is achieved by the
BIO notation tagging, where B' represents the beginning
of the hate span,' I' forms the continuation of a hate
span, and' O' represents the non-hate tag. The task is
then to learn the correct sequence of the BIO tags for a
given sentence. For example, in the above sentence, the
tag sequence for the preprocessed sentence will be of
the form "women can't live with them can't shoot them" →
"O O O O O B I I"; "I" notation cannot exist on its own
and will always be preceded by either an "I" or "B".
Consequently, a “B” notation can be immediately followed
by an “O” in case the span is just a single word.
For more details, please visit
Task
3 webpage.
Task 4 aims to detect hate speech in
Bengali, Bodo, and Assamese languages. It is a binary
classification task. Each dataset (for the three
languages) consists of a list of sentences with their
corresponding class (hate or offensive (HOF) or not hate
(NOT)). Data is primarily collected from Twitter,
Facebook, and Youtube comments.
The Macro F1 score will be the yardstick of the task.
Team rank will be determined based on the Macro F1 score of
the first part.
For more details, please visit
Task 4 webpage.
Registration for all four tasks is open on our
registration page.
We believe that your expertise and contribution will be
invaluable in advancing the state-of-the-art hate speech
classification. We encourage you to participate in this
exciting shared task and contribute to the research
community.
Regards,
HASOC organizing team