The IUROPA CJEU Text Corpus — created by Michal Ovádek, Joshua Fjelstul, Daniel Naurin, and Johan Lindholm — contains the cleaned, full text of all judgments, orders, and Advocate-General (AG) opinions published by the Court of Justice and General Court since 1952. The texts are collected from InfoCuria and EUR-Lex and have been processed to remove formatting artifacts and to standardize character encoding.
The corpus is split into Parquet files by court, document type, and language, making it easy to work with just the slice you need. Each file includes document metadata and the full text.