logo

The French Court Decision Structure dataset — FCD12K

Download the last version of the dataset WITH titles here! Download the last version of the dataset WITHOUT titles here!

Task details

The challenge we want to address here is automatically identifying what sections of textual documents refer to. Specifically, given a document that consists of pre-cut paragraphs, we want to classify these paragraphs into 6 predefined classes (see the section below for a description of each class). Typical use cases enabled after this classification are: navigating more easily through documents, selecting only specific sections for summarization purposes, or weighting sections differently as we index these documents in a search engine.

Dataset overview

This dataset is intended to researchers working on natural language processing in French, especially on text classification relying on sequential information.

The dataset contains French court decisions from the Appeal Court and the High Court. A French court decision is generally structured as follows:

Dataset specifications

This dataset containts 12k decisions from:

The decisions have been anonymized by the original sources.

Methodology

The dataset has been built automatically by using decisions that contained explicit section titles. We then removed those titles in order to simulate real cases where sections are to be automatically detected given their contents.

In the example below where the label associated with each section is visible, we deleted the second line that explicits the new section.

I-MOTIFS Les arguments du juge ...
B-DISPOSITIF Par ces motifs ,
I-DISPOSITIF Condamne M. X aux dépens d' appel .

Note that for paragraphs related to the pleas in law, they are sometimes mixed with the facts. We have thus decided to gather Facts and Plead in Laws into the same tag.

Format

The dataset uses a JSON format. Each court decision corresponds to a row in the JSON file.

Each row contains:

Example:

I-ENTETE La cour d' appel de Paris
I-COMPOSITION Madame Dupont , Présidente
I-COMPOSITION Monsieur Durand , Greffier
I-PARTIES EDF
I-FAITS ...

License

This dataset is under the LUDO v1.0 license, available at this address.
Personal data contained in this dataset are handled according to Doctrine's privacy policy.