Machine learning in 8-bit land

May 132025

A friend of mine challenged me. After telling him how I was able to implement some decent neural network solutions with the help of LLMs, he asked: Could the LLM write a neural network example in Commodore 64 BASIC?

You betcha.

Well, it took a few attempts — there were some syntax issues and some oversimplifications so eventually I had the idea of asking the LLM to just write the example on Python first and then use that as a reference implementation for the C64 version. That went well. Here’s the result:

As this screen shot shows, the program was able to learn the behavior of an XOR gate, the simplest problem that requires a hidden layer of perceptrons, and as such, a precursor to modern “deep learning” solutions.

I was able to run this test on Krisztián Tóth’s (no relation) excellent C64 emulator, which has the distinguishing feature of reliable copy-paste, making it possible to enter long BASIC programs without having to retype them or somehow transfer them to a VIC-1541 floppy image first.

In any case, this is the program that resulted from my little collaboration with the Claude 3.7-sonnet language model:

10 REM NEURAL NETWORK FOR XOR PROBLEM
20 REM BASED ON WORKING PYTHON IMPLEMENTATION

100 REM INITIALIZE VARIABLES
110 DIM X(3,1) : REM INPUT PATTERNS
120 DIM Y(3) : REM EXPECTED OUTPUTS
130 DIM W1(1,1) : REM WEIGHTS: INPUT TO HIDDEN
140 DIM B1(1) : REM BIAS: HIDDEN LAYER
150 DIM W2(1) : REM WEIGHTS: HIDDEN TO OUTPUT
160 DIM H(1) : REM HIDDEN LAYER OUTPUTS
170 DIM D1(1,1) : REM PREVIOUS DELTA FOR W1
180 DIM B2 : REM BIAS: OUTPUT LAYER
190 DIM D2(1) : REM PREVIOUS DELTA FOR W2
200 DIM DB1(1) : REM PREVIOUS DELTA FOR B1
210 DB2 = 0 : REM PREVIOUS DELTA FOR B2
220 LR = 0.5 : REM LEARNING RATE
230 M = 0.9 : REM MOMENTUM

300 REM SETUP TRAINING DATA (XOR PROBLEM)
310 X(0,0)=0 : X(0,1)=0 : Y(0)=0
320 X(1,0)=0 : X(1,1)=1 : Y(1)=1
330 X(2,0)=1 : X(2,1)=0 : Y(2)=1
340 X(3,0)=1 : X(3,1)=1 : Y(3)=0

400 REM INITIALIZE WEIGHTS RANDOMLY
410 FOR I=0 TO 1
420 FOR J=0 TO 1
430 W1(I,J) = RND(1)-0.5
440 NEXT J
450 B1(I) = RND(1)-0.5
460 W2(I) = RND(1)-0.5
470 NEXT I
480 B2 = RND(1)-0.5


510 REM INITIALIZE MOMENTUM TERMS TO ZERO
520 FOR I=0 TO 1
530 FOR J=0 TO 1
540 D1(I,J) = 0
550 NEXT J
560 D2(I) = 0
570 DB1(I) = 0
580 NEXT I
590 DB2 = 0

600 REM TRAINING LOOP
610 PRINT "TRAINING NEURAL NETWORK..."
620 PRINT "EP","ER"
630 FOR E = 1 TO 5000
640 ER = 0
650 FOR P = 0 TO 3
660 GOSUB 1000 : REM FORWARD PASS
670 GOSUB 2000 : REM BACKWARD PASS
680 ER = ER + ABS(O-Y(P))
690 NEXT P
700 IF (E/10) = INT(E/10) THEN PRINT E,ER
710 IF ER < 0.1 THEN E = 5000
720 NEXT E

800 REM TEST NETWORK
810 PRINT "TESTING NETWORK:"
820 FOR P = 0 TO 3
830 GOSUB 1000 : REM FORWARD PASS
840 PRINT X(P,0);X(P,1);"->"; INT(O+0.5);" (";O;")"
850 NEXT P
860 END

1000 REM FORWARD PASS SUBROUTINE
1010 REM CALCULATE HIDDEN LAYER
1020 FOR I = 0 TO 1
1030 S = 0
1040 FOR J = 0 TO 1
1050 S = S + X(P,J) * W1(J,I)
1060 NEXT J
1070 S = S + B1(I)
1080 H(I) = 1/(1+EXP(-S))
1090 NEXT I
1100 REM CALCULATE OUTPUT
1110 S = 0
1120 FOR I = 0 TO 1
1130 S = S + H(I) * W2(I)
1140 NEXT I
1150 S = S + B2
1160 O = 1/(1+EXP(-S))
1170 RETURN

2000 REM BACKWARD PASS SUBROUTINE
2010 REM OUTPUT LAYER ERROR
2020 DO = (Y(P)-O) * O * (1-O)
2030 REM UPDATE OUTPUT WEIGHTS WITH MOMENTUM
2040 FOR I = 0 TO 1
2050 DW = LR * DO * H(I)
2060 W2(I) = W2(I) + DW + M * D2(I)
2070 D2(I) = DW
2080 NEXT I
2090 DW = LR * DO
2100 B2 = B2 + DW + M * DB2
2110 DB2 = DW
2120 REM HIDDEN LAYER ERROR AND WEIGHT UPDATE
2130 FOR I = 0 TO 1
2140 DH = H(I) * (1-H(I)) * DO * W2(I)
2150 FOR J = 0 TO 1
2160 DW = LR * DH * X(P,J)
2170 W1(J,I) = W1(J,I) + DW + M * D1(J,I)
2180 D1(J,I) = DW
2190 NEXT J
2200 DW = LR * DH
2210 B1(I) = B1(I) + DW + M * DB1(I)
2220 DB1(I) = DW
2230 NEXT I
2240 RETURN

The one proverbial fly in the ointment is that it took about two hours for the network to be trained. The Python implementation? It runs to completion in about a second.

2 Responses to “Machine learning in 8-bit land”

Rodion says:

May 13, 2025 at 5:25 pm

That’s very interesting as it touches a few things about which I have curiosity too.

I wonder why basic implementation is so slow. Of course I know computers of the era were not fast, and interpreted basic was not either. Still there seem to be 5000 iterations, each of them touching just a handful of neurons… more the second for the iteration! Perhaps I need to try it myself (additionally to try this curious emulator).

Next the task for neural network looks bit too simple – not in its simplicity itself, but in amount of various combinations of inputs and outputs. 5000 iterations look bit too much to train so few combinations.

I remember conceiving similar experiment, but in my case NN was to train predicting the value of sine function at X given three values at X-3, X-2 and X-1. As I’m quite naive in the math of NNs I trained it with simple random search. Here inputs may get any values in range -1 … +1, but as far as I remember training was working fine. Later I made a small problem based on this (link removed as unnecessary for my comment “appears to be spam”).

But returning to simplicity – I guess “neural network mimicking XOR” is some sample problem easily found over internet. Could we pick some other function for which, hopefully, LLM is less likely to find a ready example?

I recently asked “deepseek” to compile assembly code (8051 syntax) to hex file. To my surprise it immediately produced output, even richly commenting about the process. Of course my surprise subsided when I found the output being quite buggy, as if glued from pieces remotely looking like suitable. Lines have incorrect check sum or even lack it, from some point code completely diverts from the source. I asked it to fix checksums – it said “yes, sure, ok” and produced the same output. I asked why it is the same, and got answer like “sure, you was correct that it should be different, but I am correct to produce it the same”. Thus I grew more curious about cornering LLM’s with task variations for which they seemingly are not able to correctly construct the output.
vttoth says:

May 13, 2025 at 10:42 pm

You might be underestimating just how slow 8-bit processors are. The 6510 in the C64 had a clock of 1 MHz and its fastest instructions took two clock cycles, some 3 or 4 as I recall. So at most maybe 300,000 or 400,000 machine instructions a second. And it had no floating point, so floating point operations, even a simple multiplication, took hundreds of clock cycles; division, a lot more. So at most, a few hundred floating point instructions per second. And then there’s the BASIC interpreter: BASIC instructions were tokenized, but otherwise the program was read character-by-character, interpreted and executed. So do not expect in the end more than a few dozen BASIC instructions executed per second… which is consistent with the fact that in this example, each iteration took a couple of seconds, so 3,000+ iterations took a couple of hours. A hand-optimized machine language implementation could have sped that up by at least an order of magnitude; replacing floating point weights and biases with 8-bit integers would have sped things up by maybe even as much as two orders of magnitude. So a hand-optimized, machine-language version could have executed in seconds. But not the BASIC version.

The significance of XOR has to do with a 1969 paper by Minsky and Papert who proved that a single-layer network of perceptrons cannot model the XOR operation. Many took it as an indication that neural networks are inherently limited, and this paper is often mentioned as one of the reasons why machine learning was held back by years if not decades. With a hidden layer, the neural network can learn XOR just fine, but this was missed, not so much by Minsky and Papert but by those who read and interpreted their paper.

Language models are NOT reasoning engines or machine language compilers. They’ll happily oblige when you ask them but they’ll likely produce garbage. Value them for their breadth of knowledge and ability to understand concepts in context, do not expect them to be meticulous algorithmic engines: That, they are not.

You must be logged in to post a comment.

Spinor Info

Machine learning in 8-bit land

2 Responses to “Machine learning in 8-bit land”

Leave a Reply