French Generative Models


ALMAnaCH, Inria


PAGnol is a collection of large French language models, geared towards free-form text generation. With 1.5 billion parameters, PAGnol-XL is the largest model available for French. PAGnol is based on the GPT architecture, and uses scaling laws predictions for efficient training. PAGnol is the first language model trained by LightOn, in cooperation with the ALMAnaCH team of Inria – increasingly large and powerful models will follow.

This is a pre-release: paper and additional models will be available soon 🙂

Write with PAGnol in our interactive demo!

PAGnol was built by Julien Launay, E.L. Tommasone, Baptiste Pannier, François Boniface, Amélie Chatelain, Iacopo Poli, and Djamé Seddah. It is named after Marcel Pagnol (with PAG standing for pré-apprentissage génératif), and was trained on the IDRIS Jean Zay supercomputer thanks to a GENCI allocation.

Model card

We release PAGnol models trained on CCNet. We will also provide small and medium models trained on OSCAR for research purpose.

PAGnol is cutting-edge AI technology: you are responsible for using it thoughtfully and responsibly, in a way that benefits society.

Intended uses

PAGnol is geared towards free-form text generation. It is best used for creative writing, without strong constraints on its outputs. It may be fine-tuned to specific forms and styles of text generation. However, in line with previous work1, we found it is not competitive when finetuned on specific tasks (classification, question answering, etc.)

PAGnol trained on OSCAR is downloadable only for research purposes.

We encourage further research on PAGnol zero-shot abilities as well as bias and fairness issues. If you are interested in these topics, you can get in touch with us.

Limitation and bias

PAGnol is not grounded, and cannot distinguish between facts and fiction. To enhance output quality, we trained PAGnol on CCNet, a dataset filtered to match Wikipedia-like writing. However, PAGnol may still generate offensive or suggestive content.

Use judgement and discretion before deploying PAGnol. In particular, we recommend performing a study of use-case specific biases and limitations.

  1. Raffel, Colin, et al. “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer." Journal of Machine Learning Research 21 (2020): 1-67. ↩︎

Use PAGnol

Using PAGnol with lairgpt

Head over to our GitHub repository to access our PyTorch inference code. Using PAGnol is as simple as running the following code:

from lairgpt.models import PAGnol

pagnol = PAGnol.small()
pagnol("Salut PAGnol, comment ça va ?")

> "Très bien! Les jours d’été sont là ! Bientôt les premiers festivals..."

This is all gibberish to me!

No worries, interact directly with PAGnol in our interactive demo!

Downloading PAGnol models

PAGnol is made available under the MIT licence: by downloading the models available below, you agree with the terms of the licence agreement. Under no circumstances will LightOn and/or Inria be held responsible or liable in any way for any claims, damages, losses, expenses, costs or liabilities whatsoever (including, without limitation, any direct or indirect damages for loss of profits, business interruption or loss of information) resulting or arising directly or indirectly from your use of or inability to use PAGnol.

For everyone: PAGnol-CCNet

We make these models available for research first. Use judgement and discretion before deploying PAGnol. In particular, we recommend performing a study of use-case specific limitations.
PAGnol Parameters Download Training data
tokenizer tokenizer_ccnet.json.tar.gz CCNet
S 124M CCNet
M 355M coming soon CCNet
L 773M CCNet
XL 1.5B coming soon CCNet
We plan to make PAGnol available through 🤗 HuggingFace soon!

For researchers: PAGnol-OSCAR

Models trained on OSCAR are for research purpose only: they are significantly more likely to produce explicit and offensive content!
PAGnol Parameters Download Training data
S-OSC 124M coming soon OSCAR
M-OSC 355M coming soon OSCAR

API and specific versions of PAGnol

If you are interested in accessing PAGnol through a dedicated API, or in tailoring PAGnol to your use case (different domain, training data, or language), get in touch with us.