MCPcopy Index your code
hub / github.com/robertmartin8/MachineLearningStocks / backtest

Function backtest

backtesting.py:10–82  ·  view source on GitHub ↗

A simple backtest, which splits the dataset into a train set and test set, then fits a Random Forest classifier to the train set. We print the precision and accuracy of the classifier on the test set, then run a backtest comparing this strategy's performance to passive investment in

()

Source from the content-addressed store, hash-verified

8
9
10def backtest():
11 """
12 A simple backtest, which splits the dataset into a train set and test set,
13 then fits a Random Forest classifier to the train set. We print the precision and accuracy
14 of the classifier on the test set, then run a backtest comparing this strategy's performance
15 to passive investment in the S&P500.
16 Please note that there is a methodological flaw in this backtest which will give deceptively
17 good results, so the results here should not encourage you to live trade.
18 """
19 # Build the dataset, and drop any rows with missing values
20 data_df = pd.read_csv("keystats.csv", index_col="Date")
21 data_df.dropna(axis=0, how="any", inplace=True)
22
23 features = data_df.columns[6:]
24 X = data_df[features].values
25
26 # The labels are generated by applying the status_calc to the dataframe.
27 # '1' if a stock beats the S&P500 by more than x%, else '0'. Here x is the
28 # outperformance parameter, which is set to 10 by default but can be redefined.
29 y = list(
30 status_calc(
31 data_df["stock_p_change"], data_df["SP500_p_change"], outperformance=10
32 )
33 )
34
35 # z is required for us to track returns
36 z = np.array(data_df[["stock_p_change", "SP500_p_change"]])
37
38 # Generate the train set and test set by randomly splitting the dataset
39 X_train, X_test, y_train, y_test, z_train, z_test = train_test_split(
40 X, y, z, test_size=0.2
41 )
42
43 # Instantiate a RandomForestClassifier with 100 trees, then fit it to the training data
44 clf = RandomForestClassifier(n_estimators=100, random_state=0)
45 clf.fit(X_train, y_train)
46
47 # Generate the predictions, then print test set accuracy and precision
48 y_pred = clf.predict(X_test)
49 print("Classifier performance\n", "=" * 20)
50 print(f"Accuracy score: {clf.score(X_test, y_test): .2f}")
51 print(f"Precision score: {precision_score(y_test, y_pred): .2f}")
52
53 # Because y_pred is an array of 1s and 0s, the number of positive predictions
54 # is equal to the sum of the array
55 num_positive_predictions = sum(y_pred)
56 if num_positive_predictions < 0:
57 print("No stocks predicted!")
58
59 # Recall that z_test stores the change in stock price in column 0, and the
60 # change in S&P500 price in column 1.
61 # Whenever a stock is predicted to outperform (y_pred = 1), we 'buy' that stock
62 # and simultaneously `buy` the index for comparison.
63 stock_returns = 1 + z_test[y_pred, 0] / 100
64 market_returns = 1 + z_test[y_pred, 1] / 100
65
66 # Calculate the average growth for each stock we predicted 'buy'
67 # and the corresponding index growth

Callers 1

backtesting.pyFile · 0.85

Calls 1

status_calcFunction · 0.90

Tested by

no test coverage detected