tensorflow/models/embedding - external/github.com/tensorflow/tensorflow

tree: 482378d835238fb363fe961fc8ce4baa99c54d24 [path history] [tgz]

tensorflow/models/embedding/README.md

This directory contains models for unsupervised training of word embeddings using the model described in:

Detailed instructions on how to get started and use them are available in the tutorials. Brief instructions are below.

To download the example text and evaluation data:

wget http://mattmahoney.net/dc/text8.zip -O text8.zip
unzip text8.zip
wget http://download.tensorflow.org/data/questions-words.txt

Assuming you are using the pip package install and have cloned the git repository, navigate into this directory and run using:

cd tensorflow/models/embedding
python word2vec_optimized.py \
  --train_data=text8 \
  --eval_data=questions-words.txt \
  --save_path=/tmp/

To run the code from sources using bazel:

bazel run -c opt tensorflow/models/embedding/word2vec_optimized -- \
  --train_data=text8 \
  --eval_data=questions-words.txt \
  --save_path=/tmp/

Here is a short overview of what is in this directory.

File	What's in it?
`word2vec.py`	A version of word2vec implemented using Tensorflow ops and minibatching.
`word2vec_test.py`	Integration test for word2vec.
`word2vec_optimized.py`	A version of word2vec implemented using C ops that does no minibatching.
`word2vec_optimized_test.py`	Integration test for word2vec_optimized.
`word2vec_kernels.cc`	Kernels for the custom input and training ops.
`word2vec_ops.cc`	The declarations of the custom ops.