This crate implements a parser for llguidance grammars.
The main entry point is the Constraint struct. You will need a token parser, built with TokenParser::from_llguidance_json. This in turn requires a JSON-encoded grammar, see TopLevelGrammar struct.
If you're dealing with a compilation (non-chat) model, call constraint.process_prompt()
first.
Once you have a constraint, do the following in a loop:
constraint.compute_mask()
to get sampling mask for the next tokenconstraint.temperature
constraint.commit_token()
ff_tokens
, more than one token can be returned)If either compute_mask()
or commit_token()
return a stop result, you need to terminate the sequence.
If you're accepting arbitrary grammars, you likely should stream the parser results to the user. The easiest way to do this is to set constraint.log_json_progress
and then forward results of constraint.flush_logs()
after commit_token()
and right before terminating the sequence.
The compute_mask()
function can take more than a millisecond for larger tokenizers and/or grammars, so you should arrange for it be executed in background, while the logits are computed on the GPU or other CPU cores. The commit_token()
function is very fast and can be called in the main loop.
See sample parser for an example of how to use this crate.