GitHub - segault/hacky: Hack Assembler (EoCS)

Hacky

just a lil' Hack assembly assembler from the Elements of Computer Systems book

it's not the exact way they wanted it implemented, but they also intended for it to be done in an OOP language like Java

work in progress (:

SYNOPSIS

Since i'm writing this, I may as well explain a little what you'd be looking at if you followed the link. In 'src/' reside all the modules of the assembler. This is a Hack (made up assembly language) to binary assembler (compiler but simpler). the core 3 modules are the parser (parser.c), encoder (code.c), and the symbol table (table.c). But first, the assembler module is essentially the main loop, it pulls the strings and defines the two stages of the assembler, more on that later. The parser takes a structure, part of our current state as the assembler, and it decides what the values are in that structure. the encoder is responsible for giving the writer module (the thing that writes to the output file) the binary values of the parsed structure. the table module keeps track of symbols like labels (to jump to in the program) and constants, and the addresses of those symbol (whether program address for labels or RAM for constants).

With the core modules out of the way, the info module just implements a tiny function which allows me to enable or disable debug printing, i think it's cool and a quality of life improvement. The args module just does the command line argument parsing, and defines the global args structure. the util(ities) module just defines some small string parsing and printing methods. Notably that's where I clean comments out of the code. 'def.h' defines definitions that most of the files need, like the enums or labels that determine how the assembler and encoder treat parsed values.

essentially what i've done is break lines like

@10
D=M+1 // set D to i+1 (i is at address 10)
@LABEL
0;JMP

into structures like

{ACMD, 10}
{CCMD, 'D', "M+1", null}
{LCMD, <addr of 'LABEL'>}
{CCMD, 0, null, JMP}

note: ACMD or a-commands load values into the A(ddress) register in the CPU. the CCMD or c-commands are computations (they do math and determine when to jump), and LCMDs are just label declarations. def.h defines another enum, VARR, and this is done because the update_state() function in parser.c can't access the symbol table, so we declare a variable reference/declaration (this is a misnomer, it's a constant; should fix that). Because of this, update_state just marks it as a constant and the assembler module adds it to the symbol table.

these are then pretty easy to write to a file, because the enums there make for easy control-flow, and the encoder can translate those strings into binary. It can also write them as hex which is implemented because the ROM in Logisim only takes Hex.

the update_state() function in parser.c doesn't handle everything, and the table is updated in the assembler/main module, and that's annoying imo; bad practice: too much being handled in one function.

the two stages of the assembler are as follows: stage one, user_symbols fills the symbol table with label declarations (places to jump to later in the program). the second stage does all the actual parsing and stuff, and if it finds a label that it doesn't find in the symbol table it assumes it's a new constant being declared, and adds the value to the table (while also replacing the constant's name with it's declared value).

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hacky

SYNOPSIS

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hacky

SYNOPSIS

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages