tree: 482378d835238fb363fe961fc8ce4baa99c54d24 [path history] [tgz]
  1. __init__.py
  2. BUILD
  3. README.md
  4. word2vec.py
  5. word2vec_kernels.cc
  6. word2vec_ops.cc
  7. word2vec_optimized.py
  8. word2vec_optimized_test.py
  9. word2vec_test.py
tensorflow/models/embedding/README.md

This directory contains models for unsupervised training of word embeddings using the model described in:

(Mikolov, et. al.) Efficient Estimation of Word Representations in Vector Space, ICLR 2013.

Detailed instructions on how to get started and use them are available in the tutorials. Brief instructions are below.

To download the example text and evaluation data:

wget http://mattmahoney.net/dc/text8.zip -O text8.zip
unzip text8.zip
wget http://download.tensorflow.org/data/questions-words.txt

Assuming you are using the pip package install and have cloned the git repository, navigate into this directory and run using:

cd tensorflow/models/embedding
python word2vec_optimized.py \
  --train_data=text8 \
  --eval_data=questions-words.txt \
  --save_path=/tmp/

To run the code from sources using bazel:

bazel run -c opt tensorflow/models/embedding/word2vec_optimized -- \
  --train_data=text8 \
  --eval_data=questions-words.txt \
  --save_path=/tmp/

Here is a short overview of what is in this directory.

FileWhat's in it?
word2vec.pyA version of word2vec implemented using Tensorflow ops and minibatching.
word2vec_test.pyIntegration test for word2vec.
word2vec_optimized.pyA version of word2vec implemented using C ops that does no minibatching.
word2vec_optimized_test.pyIntegration test for word2vec_optimized.
word2vec_kernels.ccKernels for the custom input and training ops.
word2vec_ops.ccThe declarations of the custom ops.