MCPcopy
hub / github.com/tensorlayer/TensorLayer / read_analogies_file

Function read_analogies_file

tensorlayer/nlp.py:546–611  ·  view source on GitHub ↗

Reads through an analogy question file, return its id format. Parameters ---------- eval_file : str The file name. word2id : dictionary a dictionary that maps word to ID. Returns -------- numpy.array A ``[n_examples, 4]`` numpy array containing t

(eval_file='questions-words.txt', word2id=None)

Source from the content-addressed store, hash-verified

544
545
546def read_analogies_file(eval_file='questions-words.txt', word2id=None):
547 """Reads through an analogy question file, return its id format.
548
549 Parameters
550 ----------
551 eval_file : str
552 The file name.
553 word2id : dictionary
554 a dictionary that maps word to ID.
555
556 Returns
557 --------
558 numpy.array
559 A ``[n_examples, 4]`` numpy array containing the analogy question's word IDs.
560
561 Examples
562 ---------
563 The file should be in this format
564
565 >>> : capital-common-countries
566 >>> Athens Greece Baghdad Iraq
567 >>> Athens Greece Bangkok Thailand
568 >>> Athens Greece Beijing China
569 >>> Athens Greece Berlin Germany
570 >>> Athens Greece Bern Switzerland
571 >>> Athens Greece Cairo Egypt
572 >>> Athens Greece Canberra Australia
573 >>> Athens Greece Hanoi Vietnam
574 >>> Athens Greece Havana Cuba
575
576 Get the tokenized analogy question data
577
578 >>> words = tl.files.load_matt_mahoney_text8_dataset()
579 >>> data, count, dictionary, reverse_dictionary = tl.nlp.build_words_dataset(words, vocabulary_size, True)
580 >>> analogy_questions = tl.nlp.read_analogies_file(eval_file='questions-words.txt', word2id=dictionary)
581 >>> print(analogy_questions)
582 [[ 3068 1248 7161 1581]
583 [ 3068 1248 28683 5642]
584 [ 3068 1248 3878 486]
585 ...,
586 [ 1216 4309 19982 25506]
587 [ 1216 4309 3194 8650]
588 [ 1216 4309 140 312]]
589
590 """
591 if word2id is None:
592 word2id = {}
593
594 questions = []
595 questions_skipped = 0
596
597 with open(eval_file, "rb") as analogy_f:
598 for line in analogy_f:
599 if line.startswith(b":"): # Skip comments.
600 continue
601 words = line.strip().lower().split(b" ") # lowercase
602 ids = [word2id.get(w.strip().decode()) for w in words]
603 if None in ids or len(ids) != 4:

Callers

nothing calls this directly

Calls 1

getMethod · 0.80

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…