hub / github.com/tensorlayer/TensorLayer / read_analogies_file

Function read_analogies_file

tensorlayer/nlp.py:546–611 · view source on GitHub ↗

Reads through an analogy question file, return its id format. Parameters ---------- eval_file : str The file name. word2id : dictionary a dictionary that maps word to ID. Returns -------- numpy.array A ``[n_examples, 4]`` numpy array containing t

(eval_file='questions-words.txt', word2id=None)

Source from the content-addressed store, hash-verified

544
545
546	def read_analogies_file(eval_file='questions-words.txt', word2id=None):
547	"""Reads through an analogy question file, return its id format.
548
549	Parameters
550	----------
551	eval_file : str
552	The file name.
553	word2id : dictionary
554	a dictionary that maps word to ID.
555
556	Returns
557	--------
558	numpy.array
559	A ``[n_examples, 4]`` numpy array containing the analogy question's word IDs.
560
561	Examples
562	---------
563	The file should be in this format
564
565	>>> : capital-common-countries
566	>>> Athens Greece Baghdad Iraq
567	>>> Athens Greece Bangkok Thailand
568	>>> Athens Greece Beijing China
569	>>> Athens Greece Berlin Germany
570	>>> Athens Greece Bern Switzerland
571	>>> Athens Greece Cairo Egypt
572	>>> Athens Greece Canberra Australia
573	>>> Athens Greece Hanoi Vietnam
574	>>> Athens Greece Havana Cuba
575
576	Get the tokenized analogy question data
577
578	>>> words = tl.files.load_matt_mahoney_text8_dataset()
579	>>> data, count, dictionary, reverse_dictionary = tl.nlp.build_words_dataset(words, vocabulary_size, True)
580	>>> analogy_questions = tl.nlp.read_analogies_file(eval_file='questions-words.txt', word2id=dictionary)
581	>>> print(analogy_questions)
582	[[ 3068 1248 7161 1581]
583	[ 3068 1248 28683 5642]
584	[ 3068 1248 3878 486]
585	...,
586	[ 1216 4309 19982 25506]
587	[ 1216 4309 3194 8650]
588	[ 1216 4309 140 312]]
589
590	"""
591	if word2id is None:
592	word2id = {}
593
594	questions = []
595	questions_skipped = 0
596
597	with open(eval_file, "rb") as analogy_f:
598	for line in analogy_f:
599	if line.startswith(b":"): # Skip comments.
600	continue
601	words = line.strip().lower().split(b" ") # lowercase
602	ids = [word2id.get(w.strip().decode()) for w in words]
603	if None in ids or len(ids) != 4:

Callers

nothing calls this directly

Calls 1

getMethod · 0.80

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…