A friend of mine challenged me. After telling him how I was able to implement some decent neural network solutions with the help of LLMs, he asked: Could the LLM write a neural network example in Commodore 64 BASIC?
You betcha.
Well, it took a few attempts — there were some syntax issues and some oversimplifications so eventually I had the idea of asking the LLM to just write the example on Python first and then use that as a reference implementation for the C64 version. That went well. Here’s the result:
As this screen shot shows, the program was able to learn the behavior of an XOR gate, the simplest problem that requires a hidden layer of perceptrons, and as such, a precursor to modern “deep learning” solutions.
I was able to run this test on Krisztián Tóth’s (no relation) excellent C64 emulator, which has the distinguishing feature of reliable copy-paste, making it possible to enter long BASIC programs without having to retype them or somehow transfer them to a VIC-1541 floppy image first.
In any case, this is the program that resulted from my little collaboration with the Claude 3.7-sonnet language model:
10 REM NEURAL NETWORK FOR XOR PROBLEM 20 REM BASED ON WORKING PYTHON IMPLEMENTATION 100 REM INITIALIZE VARIABLES 110 DIM X(3,1) : REM INPUT PATTERNS 120 DIM Y(3) : REM EXPECTED OUTPUTS 130 DIM W1(1,1) : REM WEIGHTS: INPUT TO HIDDEN 140 DIM B1(1) : REM BIAS: HIDDEN LAYER 150 DIM W2(1) : REM WEIGHTS: HIDDEN TO OUTPUT 160 DIM H(1) : REM HIDDEN LAYER OUTPUTS 170 DIM D1(1,1) : REM PREVIOUS DELTA FOR W1 180 DIM B2 : REM BIAS: OUTPUT LAYER 190 DIM D2(1) : REM PREVIOUS DELTA FOR W2 200 DIM DB1(1) : REM PREVIOUS DELTA FOR B1 210 DB2 = 0 : REM PREVIOUS DELTA FOR B2 220 LR = 0.5 : REM LEARNING RATE 230 M = 0.9 : REM MOMENTUM 300 REM SETUP TRAINING DATA (XOR PROBLEM) 310 X(0,0)=0 : X(0,1)=0 : Y(0)=0 320 X(1,0)=0 : X(1,1)=1 : Y(1)=1 330 X(2,0)=1 : X(2,1)=0 : Y(2)=1 340 X(3,0)=1 : X(3,1)=1 : Y(3)=0 400 REM INITIALIZE WEIGHTS RANDOMLY 410 FOR I=0 TO 1 420 FOR J=0 TO 1 430 W1(I,J) = RND(1)-0.5 440 NEXT J 450 B1(I) = RND(1)-0.5 460 W2(I) = RND(1)-0.5 470 NEXT I 480 B2 = RND(1)-0.5 510 REM INITIALIZE MOMENTUM TERMS TO ZERO 520 FOR I=0 TO 1 530 FOR J=0 TO 1 540 D1(I,J) = 0 550 NEXT J 560 D2(I) = 0 570 DB1(I) = 0 580 NEXT I 590 DB2 = 0 600 REM TRAINING LOOP 610 PRINT "TRAINING NEURAL NETWORK..." 620 PRINT "EP","ER" 630 FOR E = 1 TO 5000 640 ER = 0 650 FOR P = 0 TO 3 660 GOSUB 1000 : REM FORWARD PASS 670 GOSUB 2000 : REM BACKWARD PASS 680 ER = ER + ABS(O-Y(P)) 690 NEXT P 700 IF (E/10) = INT(E/10) THEN PRINT E,ER 710 IF ER < 0.1 THEN E = 5000 720 NEXT E 800 REM TEST NETWORK 810 PRINT "TESTING NETWORK:" 820 FOR P = 0 TO 3 830 GOSUB 1000 : REM FORWARD PASS 840 PRINT X(P,0);X(P,1);"->"; INT(O+0.5);" (";O;")" 850 NEXT P 860 END 1000 REM FORWARD PASS SUBROUTINE 1010 REM CALCULATE HIDDEN LAYER 1020 FOR I = 0 TO 1 1030 S = 0 1040 FOR J = 0 TO 1 1050 S = S + X(P,J) * W1(J,I) 1060 NEXT J 1070 S = S + B1(I) 1080 H(I) = 1/(1+EXP(-S)) 1090 NEXT I 1100 REM CALCULATE OUTPUT 1110 S = 0 1120 FOR I = 0 TO 1 1130 S = S + H(I) * W2(I) 1140 NEXT I 1150 S = S + B2 1160 O = 1/(1+EXP(-S)) 1170 RETURN 2000 REM BACKWARD PASS SUBROUTINE 2010 REM OUTPUT LAYER ERROR 2020 DO = (Y(P)-O) * O * (1-O) 2030 REM UPDATE OUTPUT WEIGHTS WITH MOMENTUM 2040 FOR I = 0 TO 1 2050 DW = LR * DO * H(I) 2060 W2(I) = W2(I) + DW + M * D2(I) 2070 D2(I) = DW 2080 NEXT I 2090 DW = LR * DO 2100 B2 = B2 + DW + M * DB2 2110 DB2 = DW 2120 REM HIDDEN LAYER ERROR AND WEIGHT UPDATE 2130 FOR I = 0 TO 1 2140 DH = H(I) * (1-H(I)) * DO * W2(I) 2150 FOR J = 0 TO 1 2160 DW = LR * DH * X(P,J) 2170 W1(J,I) = W1(J,I) + DW + M * D1(J,I) 2180 D1(J,I) = DW 2190 NEXT J 2200 DW = LR * DH 2210 B1(I) = B1(I) + DW + M * DB1(I) 2220 DB1(I) = DW 2230 NEXT I 2240 RETURN
The one proverbial fly in the ointment is that it took about two hours for the network to be trained. The Python implementation? It runs to completion in about a second.
That’s very interesting as it touches a few things about which I have curiosity too.
I wonder why basic implementation is so slow. Of course I know computers of the era were not fast, and interpreted basic was not either. Still there seem to be 5000 iterations, each of them touching just a handful of neurons… more the second for the iteration! Perhaps I need to try it myself (additionally to try this curious emulator).
Next the task for neural network looks bit too simple – not in its simplicity itself, but in amount of various combinations of inputs and outputs. 5000 iterations look bit too much to train so few combinations.
I remember conceiving similar experiment, but in my case NN was to train predicting the value of sine function at X given three values at X-3, X-2 and X-1. As I’m quite naive in the math of NNs I trained it with simple random search. Here inputs may get any values in range -1 … +1, but as far as I remember training was working fine. Later I made a small problem based on this (link removed as unnecessary for my comment “appears to be spam”).
But returning to simplicity – I guess “neural network mimicking XOR” is some sample problem easily found over internet. Could we pick some other function for which, hopefully, LLM is less likely to find a ready example?
I recently asked “deepseek” to compile assembly code (8051 syntax) to hex file. To my surprise it immediately produced output, even richly commenting about the process. Of course my surprise subsided when I found the output being quite buggy, as if glued from pieces remotely looking like suitable. Lines have incorrect check sum or even lack it, from some point code completely diverts from the source. I asked it to fix checksums – it said “yes, sure, ok” and produced the same output. I asked why it is the same, and got answer like “sure, you was correct that it should be different, but I am correct to produce it the same”. Thus I grew more curious about cornering LLM’s with task variations for which they seemingly are not able to correctly construct the output.