hub / github.com/tensorlayer/TensorLayer / load_imdb_dataset

Function load_imdb_dataset

tensorlayer/files/utils.py:844–944 · view source on GitHub ↗

Load IMDB dataset. Parameters ---------- path : str The path that the data is downloaded to, defaults is ``data/imdb/``. nb_words : int Number of words to get. skip_top : int Top most frequent words to ignore (they will appear as oov_char value in the seq

(
    path='data', nb_words=None, skip_top=0, maxlen=None, test_split=0.2, seed=113, start_char=1, oov_char=2,
    index_from=3
)

Source from the content-addressed store, hash-verified

842
843
844	def load_imdb_dataset(
845	path='data', nb_words=None, skip_top=0, maxlen=None, test_split=0.2, seed=113, start_char=1, oov_char=2,
846	index_from=3
847	):
848	"""Load IMDB dataset.
849
850	Parameters
851	----------
852	path : str
853	The path that the data is downloaded to, defaults is ``data/imdb/``.
854	nb_words : int
855	Number of words to get.
856	skip_top : int
857	Top most frequent words to ignore (they will appear as oov_char value in the sequence data).
858	maxlen : int
859	Maximum sequence length. Any longer sequence will be truncated.
860	seed : int
861	Seed for reproducible data shuffling.
862	start_char : int
863	The start of a sequence will be marked with this character. Set to 1 because 0 is usually the padding character.
864	oov_char : int
865	Words that were cut out because of the num_words or skip_top limit will be replaced with this character.
866	index_from : int
867	Index actual words with this index and higher.
868
869	Examples
870	--------
871	>>> X_train, y_train, X_test, y_test = tl.files.load_imdb_dataset(
872	... nb_words=20000, test_split=0.2)
873	>>> print('X_train.shape', X_train.shape)
874	(20000,) [[1, 62, 74, ... 1033, 507, 27],[1, 60, 33, ... 13, 1053, 7]..]
875	>>> print('y_train.shape', y_train.shape)
876	(20000,) [1 0 0 ..., 1 0 1]
877
878	References
879	-----------
880	- `Modified from keras. <https://github.com/fchollet/keras/blob/master/keras/datasets/imdb.py>`__
881
882	"""
883	path = os.path.join(path, 'imdb')
884
885	filename = "imdb.pkl"
886	url = 'https://s3.amazonaws.com/text-datasets/'
887	maybe_download_and_extract(filename, path, url)
888
889	if filename.endswith(".gz"):
890	f = gzip.open(os.path.join(path, filename), 'rb')
891	else:
892	f = open(os.path.join(path, filename), 'rb')
893
894	X, labels = cPickle.load(f)
895	f.close()
896
897	np.random.seed(seed)
898	np.random.shuffle(X)
899	np.random.seed(seed)
900	np.random.shuffle(labels)
901

Callers

nothing calls this directly

Calls 3

maybe_download_and_extractFunction · 0.85

closeMethod · 0.80

loadMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…