AI for Italian Misogyny Detection

Fine-tuned BERT model for accurate detection of misogynistic content in Italian text, trained and evaluated on the AMI dataset.

AI for Italian Misogyny Detection

Tech Stack:

NLPHugging Face TransformersBERTItalian Languagepython

Project Overview

This project involved fine-tuning the dbmdz/bert-base-italian-xxl-uncased model for the specific task of detecting misogynistic content in the Italian language. The model is a binary text classifier, identifying whether a given text is misogynistic (label 1) or not (label 0).

Model Details

  • Fine-tuned from the pre-trained Italian BERT model: dbmdz/bert-base-italian-xxl-uncased
  • Specifically designed for misogyny detection in Italian text.
  • Trained and evaluated on the AMI (Automatic Misogyny Identification) dataset.

Training and Evaluation

The model was trained using the AMI dataset, which contains labeled Italian texts. Key training hyperparameters included a learning rate of 2e-5, a batch size of 32, and 5 epochs. The AdamW optimizer was used with weight decay. Evaluation was performed on a balanced test set from the AMI dataset, with metrics including Accuracy (0.9412), F1-score (0.9420), Precision (0.9291), and Recall (0.9553).

Key Technologies

  • Natural Language Processing (NLP)
  • Hugging Face Transformers library for model fine-tuning and deployment
  • BERT (Bidirectional Encoder Representations from Transformers) architecture
  • Italian language-specific pre-trained model (dbmdz/bert-base-italian-xxl-uncased)

Potential Uses

  • Moderation of online content to identify and filter misogynistic text.
  • Social media analysis to study the prevalence and patterns of misogyny in Italian online discourse.
  • Sociolinguistic research on the linguistic features of misogynistic language in Italian.

Limitations and Considerations

The model's performance is dependent on the data it was trained on (AMI dataset) and may exhibit biases present in that data. It might struggle with more subtle or implicit forms of misogyny. Use in conjunction with human oversight is recommended for critical applications. The model is specifically designed for Italian text and may not perform reliably on other languages.