|
linearly separable |
non-linearly separable | problems | differences | |
| perceptron | ||||
| gradient descent delta rule | ||||
| stochastic gradient descent |
The rule for the perceptron (page 88) and the delta rule (page 93) appear similar. What is the fundamental difference?
Why will the gradient descent algorithm converge?
Why is there only one global minimum?
FAQs from demo.doc
| test2.neu | xor training |
| test3.neu | xor manual training |
| test4.neu | litho training and testing |
| test5.neu | litho with argument input |
| test6.neu | demo loops, arrays |
| test7.neu | operators |
| test8.neu | compound statement |
| test9.neu | if statement |
| test10.neu | while, print, newline |
| test13.neu | array, keyboard input |
| test14.neu | copy array |
| test15.neu | dump program state variables |
| test16.neu | create a network |
| test17.neu | manually fully connect network |
| test18.neu | create training and test sets |
| test20.neu | add data after training |
| test21.neu | if statement |
| test22.neu | find high, low |
| test23.neu | normalize data |
| test24.neu | manual vs. automatic find high/low |
| test25.neu | litho run, see outputs |
| test27.neu | automatically remove low weights during training |
| test28.neu | iterate each weight |