Emoji frequency data supplied to the ESC must follow the following format, in a plain text file.
<hex> ; <frequency>
\x{hex1 … hexN}
. You can omit the FE0F values from the hex (see below); the hex will be normalized when processing the data. (This hex format is chosen to be compact yet durable in spreadsheets.)#
is a comment; you can put the plain text emoji and/or name on each line in a comment if that is easier to read/manage, but comments are completely optional and ignored.# Data from ABC, 2019-03-17 \x{1F602} ; 57686831 # 😂 face with tears of joy \x{31 FE0F 20E3} ; 139909 # 1️⃣ keycap: 1 \x{1F3CC FE0F} ; 53769 # 🏌️ person golfing
Some emoji normally need variation selectors (FE0F) in their representation, such as \x{1F3CC FE0F}
(🏌️person golfing). However, vendors can override this behavior, and show (for example) \x{1F3CC}
as an emoji. For such cases, the vendor can supply separate frequency information for the forms with and without the FE0F.
Copy the data into a TSV (tab-separated values) file in the folder DATA/frequency/emoji/vendorRaw.tsv, where vendor is one of gboard, facebook, twitter, etc.
Open EmojiFrequency.java and follow the instructions there, then run.
You'll get files named: Generated/emoji/frequency/...
Copy those into the spreadsheet in the appropriate tab, RawVendorSnapshot
Example:
Hex Count Rank Emoji
\x{2B1A} 1565637 1 ⬚
\x{1F602} 1200855 2 😂
When you do this for the first time after a new release, you'll get 2 kinds of failures. Here are the fixes:
Known problem: the “canonical value for couples” are single characters, but the code needs fixing to take care of \x{1F9D1 200D 2764 FE0F 200D 1F48B 200D 1F9D1}
and \x{1F9D1 200D 2764 FE0F 200D 1F9D1}
; plus the ‘unmarked gender’
TBD: fixing the PIE chart data