Shink HTMLToken by 1.5 kB

HTMLToken has an inline vector for buffering characters during tokenization. We
originally picked the inline capacity of this buffer somewhat arbitrarily. This
CL tunes the number based on a somewhat non-scientific browse of a number of
popular web sites.

I instrumented content_shell to log the length of each complete DataVector. I
then browsed around a variety of web sites to collect data. The 99% percentile
of DataVector lengths was just shy of 250 characters. I rounded that up to 256
because powers of two are pretty. That means we'll malloc an external buffer
less than 1% of the time, which seems fine.


