MCPcopy Index your code
hub / github.com/lisa-lab/DeepLearningTutorials / load_data

Function load_data

code/logistic_sgd.py:175–253  ·  view source on GitHub ↗

Loads the dataset :type dataset: string :param dataset: the path to the dataset (here MNIST)

(dataset)

Source from the content-addressed store, hash-verified

173
174
175def load_data(dataset):
176 ''' Loads the dataset
177
178 :type dataset: string
179 :param dataset: the path to the dataset (here MNIST)
180 '''
181
182 #############
183 # LOAD DATA #
184 #############
185
186 # Download the MNIST dataset if it is not present
187 data_dir, data_file = os.path.split(dataset)
188 if data_dir == "" and not os.path.isfile(dataset):
189 # Check if dataset is in the data directory.
190 new_path = os.path.join(
191 os.path.split(__file__)[0],
192 "..",
193 "data",
194 dataset
195 )
196 if os.path.isfile(new_path) or data_file == 'mnist.pkl.gz':
197 dataset = new_path
198
199 if (not os.path.isfile(dataset)) and data_file == 'mnist.pkl.gz':
200 from six.moves import urllib
201 origin = (
202 'http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz'
203 )
204 print('Downloading data from %s' % origin)
205 urllib.request.urlretrieve(origin, dataset)
206
207 print('... loading data')
208
209 # Load the dataset
210 with gzip.open(dataset, 'rb') as f:
211 try:
212 train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
213 except:
214 train_set, valid_set, test_set = pickle.load(f)
215 # train_set, valid_set, test_set format: tuple(input, target)
216 # input is a numpy.ndarray of 2 dimensions (a matrix)
217 # where each row corresponds to an example. target is a
218 # numpy.ndarray of 1 dimension (vector) that has the same length as
219 # the number of rows in the input. It should give the target
220 # to the example with the same index in the input.
221
222 def shared_dataset(data_xy, borrow=True):
223 """ Function that loads the dataset into shared variables
224
225 The reason we store our dataset in shared variables is to allow
226 Theano to copy it into the GPU memory (when code is run on GPU).
227 Since copying data into the GPU is slow, copying a minibatch everytime
228 is needed (the default behaviour if the data is not in a shared
229 variable) would lead to a large decrease in performance.
230 """
231 data_x, data_y = data_xy
232 shared_x = theano.shared(numpy.asarray(data_x,

Callers 10

test_rbmFunction · 0.90
test_dAFunction · 0.90
test_DBNFunction · 0.90
test_SdAFunction · 0.90
test_cAFunction · 0.90
evaluate_lenet5Function · 0.90
test_mlpFunction · 0.90
cg_optimization_mnistFunction · 0.90
sgd_optimization_mnistFunction · 0.70
predictFunction · 0.70

Calls 2

shared_datasetFunction · 0.85
loadMethod · 0.80

Tested by 6

test_rbmFunction · 0.72
test_dAFunction · 0.72
test_DBNFunction · 0.72
test_SdAFunction · 0.72
test_cAFunction · 0.72
test_mlpFunction · 0.72