This directory contains models for unsupervised training of word embeddings using the model described in:
(Mikolov, et. al.) Efficient Estimation of Word Representations in Vector Space, ICLR 2013.
Detailed instructions on how to get started and use them are available in the tutorials. Brief instructions are below.
To download the example text and evaluation data:
wget http://mattmahoney.net/dc/text8.zip -O text8.zip unzip text8.zip wget http://download.tensorflow.org/data/questions-words.txt
Assuming you are using the pip package install and have cloned the git repository, navigate into this directory and run using:
cd tensorflow/models/embedding python word2vec_optimized.py \ --train_data=text8 \ --eval_data=questions-words.txt \ --save_path=/tmp/
To run the code from sources using bazel:
bazel run -c opt tensorflow/models/embedding/word2vec_optimized -- \ --train_data=text8 \ --eval_data=questions-words.txt \ --save_path=/tmp/
Here is a short overview of what is in this directory.
File | What's in it? |
---|---|
word2vec.py | A version of word2vec implemented using Tensorflow ops and minibatching. |
word2vec_test.py | Integration test for word2vec. |
word2vec_optimized.py | A version of word2vec implemented using C ops that does no minibatching. |
word2vec_optimized_test.py | Integration test for word2vec_optimized. |
word2vec_kernels.cc | Kernels for the custom input and training ops. |
word2vec_ops.cc | The declarations of the custom ops. |