且听疯吟

【build a computer】 project 11

2022-11-20

week 11

目前已经开始了build a computer系列的最后几章了,后面的章节感觉越来越难,中间无数次感觉头大,不想再写这么恶心的程序了,还好坚持了下来,最终完整的通过了测试。这个project是该系列课程里面最难的一个toy project了,总共写了差不多快2000行代码,这个还是用python写的,当然代码写的一团狗屎,一个函数都差不多几百行了,全部是cound,代码水平还是稀烂,需要好好的学习一下设计模式。刚开始还好,写着写着就变味了。需要认真的设计程序结构,则需要好好花心思将整个代码重新设计一下,将所有的结构全部重新理顺一遍,整个编译器的设计还是非常有意思的,通过这个简易的编译器,可以对stack machine的设计有了初步的了解,也对面向对象语言的原理有了基本的了解。
完成该project的资源非常少,最终在github上找到了一个资源jack-compiler,这个国外的小哥代码写得非常漂亮,值得仔细学习一下,还剩下最后一个project就终于可以完成全部的build a modern computer系列课程了,课程真的是太有趣了,最有意思的是该课程讲出了计算机模型的本质,非常的推荐。

probelm

Data Translation

变量处理。在整个jack程序中,变量类型一种分为四种:

  • static:静态变量,实际存方在static segment中。
  • argument: 函数调用过程中的参数,每次调用函数时,需要指明传递的参数的个数,实际存放在stack中。
  • field: 类的成员变量,实际存放在heap中。
  • local:函数里面的过程变量,实际存在stack中。
    在做实际的解析时,需要根据不同的类型,从而将相关的数据存放在不同的区域。

symbol table

符号表每一行主要存放如下:

  • varname:变量的名称,实际变量的名称比如 var int a = 0, 符号a即代表变量的名称。
  • vartype:变量的类型,实际变量的类型比如 var int a = 0, 符号int即代表变量的类型。
  • varkind:变量的种类,即static,argument,field,local四种的一种。
  • varidx:变量的索引,实际可以作为变量存放的偏移位置。

jack compiler中,编译器中的符号表分类两类:

  • class符号表: 主要存放类的成员变量的映射符号表,该符号表生命周期在编译过程中一直存在。
  • local符号表: 主要存放函数调用过程中的映射符号表,该符号表仅在函数编译过程中存在。每一个函数会重新生成新的符号表。

Handling Variables

所有的变量处理全部用符号表来处理,所有的变量定义和函数的形参全部需要传入到符号表中,在特定的调用时,查找符号表,即可快速的定位到变量的存储位置,从而实现变量的内存读写。

Handling Arrays

我觉整个jack语言最有趣的就是处理数组和对象了。在这里它就用到了c语言指针的模型,因为jack语言本身有两个寄存器专门做为间接寻址的寄存器,所以我们每次只需要将数组的地址写入到间接寻址的寄存器即可,原理看似复杂实际非常的简单,处理数组时我们专用pointer 1寄存器。

  • pop操作: 直接将栈顶的数据写入到特定的地址。
  • push操作:直接将地址中存放的变量读取到堆栈中即可。

Handling Objects

处理成员变量跟数据也是同样的原因,我们实际调用对象时,也只是调用对象的首地址而已,将首地址写入到寄存器中,然后根据变量的索引求出变量与首地址的偏移,然后通过间接寻址,即可得到类的成员变量。我觉得这个思想非常有趣,可以利用这个思想实现C语言的面向对象的编程。处理类时,在类的构造函数运行时,则需要为该类在heap上单独开辟空间存放类的成员变量。

Commands Translation

当然在本次的函数调用过程中并没有考虑到符号优先级的问题,只是简单的用DFS来处理表达式。比如如下处理:
1
我们采用如下简单的dfs算法来处理表达式:
1
如果理解上述核心的算法,处理就非常简单。这部分没啥好谈的。

Translating Flow Control

jack语言目前只支持ifwhile的过程操作,处理也非常简单,我们只需要用if-gotogoto来处理所有的过程即可。

  • 对于if的处理:
  • 对于while的处理:

总结

主要的处理如上所示,其余还有部分比较麻烦的细节处理:

  • 对于class的成员方法的调用: 在实际调用过程中则需要将class的首地址也传入进去,则此时函数的形参应比实际要多一个,第一个参数应将类的首地址传入进去。
  • 对于函数调用:在函数调用时,应将函数的参数按照顺序依次压入到栈中,当然对于绝大部分编译器来说,则此时函数参数都是倒序依次压入栈中的,每次求出参数的偏移则直接非常的简单偏移即可。
  • 对于所有的函数类型都需要返回一个值,void类型也需要返回一个任意值即可。
  • 对于乘法和出发运算,则需要调用系统自带的函数mutilydiv,对于字符串常量则需要调用系统函数string.new,同时将字符一个一个的压入到类中,同时返回字符串类string的首地址,因此对于字符串常量实际是存放在heap中的。

project

本周的project还是非常有难度的project,本周的project花了差不多两个星期才完成,基本上都是利用中午休息和晚上的时间完成了这个project,还是非常的有意思。代码也同步放在github上,虽然写的非常稀烂,掩面哭一下。
poj11.

代码

import fileinput
import sys, getopt
import os
from enum import Enum, unique
import sys
from JackTokenizer import JackTokenizer,TOKEN_TYPE,KEYWORD_TYPE,tokentype,tokendict
from SymbolTable import SymbolTable
from VmWriter import VmWriter,SEG_TYPE,OP_TYPE,opdict
from CompileEngine import CompileEngine

class CompileCodeWriter:
def __init__(self,infile):
# read all source code string to the buffer
self.parser = JackTokenizer(infile)
self.engile = CompileEngine(infile)
self.out = VmWriter(infile)
# class global symbol table
self.classTable = SymbolTable()
# sub routine symbol table
self.subTable = SymbolTable()
# vm writer
self.vmout = VmWriter(infile)
# class name
self.classname = ""
self.subname = ""
self.subkind = ""
self.subtype = ""
self.iflabelcnt = 0
self.whilelabelcnt = 0
self.writeClassCode()
self.out.close()

def ifLabel(self):
res = []
res.append("IF_END" + str(self.iflabelcnt))
res.append("IF_" + str(self.iflabelcnt))
self.iflabelcnt += 1
return res

def whileLabel(self):
res = []
res.append("WHILE_EXP" + str(self.whilelabelcnt))
res.append("WHILE_END" + str(self.whilelabelcnt))
self.whilelabelcnt += 1
return res

def functionLabel(self):
return self.classname + "." + self.subname

def writeClassCode(self):
self.classTable.startSubroutine()
# parse class
self.parser.advance()
# parse class name
self.classname = self.parser.currToken()
self.parser.advance()
#parse symbol '{'
self.parser.advance()
#parse class val des
while self.parser.tokenType() == TOKEN_TYPE.TOKEN_KEYWORD and \
(self.parser.keyWord() == "static" or self.parser.keyWord() == "field"):
self.writeClassVarDecCode()

#parse class method
while self.parser.tokenType() == TOKEN_TYPE.TOKEN_KEYWORD and \
(self.parser.keyWord() == "method" or \
self.parser.keyWord() == "constructor" or \
self.parser.keyWord() == "function"):
self.subkind = self.parser.keyWord()
self.writeSubroutineCode()
#parse symbol '{'
self.parser.advance()

return True

def writeClassVarDecCode(self):
varname = ""
vartype = ""
varkind = ""

# parse key word
varkind = self.parser.keyWord()
self.parser.advance()
# parse val type
vartype = self.parser.currToken()
self.parser.advance()
# parse val name
varname = self.parser.currToken()
self.parser.advance()
#add one var define to the symbol table
self.classTable.define(varname,vartype,varkind)
# parse the left val name
while not (self.parser.tokenType() == TOKEN_TYPE.TOKEN_SYMBOL and \
self.parser.symbol() == ";"):
# parse symbol ','
self.parser.advance()
# parse val name
varname = self.parser.currToken()
self.classTable.define(varname,vartype,varkind)
self.parser.advance()
# parse the end symbol ';'
self.parser.advance()

return True

def writeSubroutineCode(self):
fieldcnt = 0
self.subTable.startSubroutine()

# parse key word
self.subkind = self.parser.currToken()
self.parser.advance()
if self.subkind == "method":
self.subTable.define("this",self.classname,"argument")
# parse type
self.subtype = self.parser.currToken()
self.parser.advance()
# parse subroutineName
self.subname = self.parser.currToken()
self.parser.advance()
# parse '('
self.parser.advance()
# parse param list
self.writeParameterListCode()
# parse ')'
self.parser.advance()
# parse body
self.writeSubroutineBodyCode()

return True

def writeSubroutineBodyCode(self):
nlocals = 0
# parse '{'
self.parser.advance()
# parse var
while self.parser.tokenType() == TOKEN_TYPE.TOKEN_KEYWORD and \
self.parser.keyWord() == "var":
nlocals += self.writeVarDecCode()

# function define
funcname = self.classname + "." + self.subname
self.out.writeFunction(funcname,nlocals)
if self.subkind == "constructor":
fieldcnt = self.classTable.varCount("field")
self.out.writePush("constant",fieldcnt)
self.out.writeCall("Memory.alloc",1)
self.out.writePop("pointer",0)
elif self.subkind == "method":
self.out.writePush("argument",0)
self.out.writePop("pointer",0)

# parse statements
self.writeStatementsCode()
# parse '}'
self.parser.advance()
return True

def writeParameterListCode(self):
varname = ""
varkind = "argument"
vartype = ""
paramcnt = 0

# parse rest param
while not (self.parser.tokenType() == TOKEN_TYPE.TOKEN_SYMBOL and\
self.parser.symbol() == ")"):
# parse element type
vartype = self.parser.currToken()
self.parser.advance()
# parse element varName
varname = self.parser.currToken()
paramcnt += 1
self.parser.advance()
# add argument variable into the symbol table
self.subTable.define(varname,vartype,varkind)
# parse ','
if self.parser.tokenType() == TOKEN_TYPE.TOKEN_SYMBOL:
if self.parser.symbol() == ",":
self.parser.advance()
elif self.parser.symbol() == ")":
break

return paramcnt

def writeVarDecCode(self):
nlocals = 1
varname = ""
varkind = "var"
vartype = ""

# parse key word
varkind = self.parser.keyWord()
self.parser.advance()
# parse var type
vartype = self.parser.currToken()
self.parser.advance()
# parse var name
varname = self.parser.currToken()
self.parser.advance()
self.subTable.define(varname,vartype,varkind)

# parse the rest var name
while not (self.parser.tokenType() == TOKEN_TYPE.TOKEN_SYMBOL and \
self.parser.symbol() == ";"):
self.parser.advance()
# parse var name
nlocals += 1
varname = self.parser.currToken()
self.subTable.define(varname,vartype,varkind)
self.parser.advance()

# parse the end symbol
self.parser.advance()
return nlocals

def writeStatementsCode(self):
while self.parser.tokenType() == TOKEN_TYPE.TOKEN_KEYWORD and \
(self.parser.keyWord() == "do" or \
self.parser.keyWord() == "if" or \
self.parser.keyWord() == "while" or \
self.parser.keyWord() == "let" or \
self.parser.keyWord() == "return"):
if self.parser.keyWord() == "do":
self.writeDoCode()
elif self.parser.keyWord() == "if":
self.writeIfCode()
elif self.parser.keyWord() == "while":
self.writeWhileCode()
elif self.parser.keyWord() == "let":
self.writeLetCode()
elif self.parser.keyWord() == "return":
self.writeReturnCode()
else:
print("valid statement define!\n")
exit(1)

return True

def writeDoCode(self):
funcname = []
argsCnt = 0
varname = ""
vartype = ""
varkind = ""
varidx = ""

# parse do
self.parser.advance()
# parse 'call name'
while not (self.parser.tokenType() == TOKEN_TYPE.TOKEN_SYMBOL and \
self.parser.symbol() == "("):
# write code
if self.parser.tokenType() != TOKEN_TYPE.TOKEN_SYMBOL:
funcname.append(self.parser.currToken())
self.parser.advance()

# parse '('
self.parser.advance()

# sub call
callname = ""
varname = funcname[0]
if len(funcname) == 1:
self.out.writePush("pointer",0)
callname = self.classname + "." + funcname[0]
argsCnt += 1
elif len(funcname) == 2:
if self.subTable.indexOf(varname) >= 0:
vartype = self.subTable.typeOf(varname)
varkind = self.subTable.kindOf(varname)
varidx = self.subTable.indexOf(varname)
callname = vartype + "." + funcname[1]
argsCnt += 1
if varkind == "field":
self.out.writePush("this",varidx)
elif varkind == "var":
self.out.writePush("local",varidx)
elif varkind == "static":
self.out.writePush("static",varidx)
elif varkind == "argument":
self.out.writePush("argument",varidx)
elif self.classTable.indexOf(varname) >= 0:
vartype = self.classTable.typeOf(varname)
varkind = self.classTable.kindOf(varname)
varidx = self.classTable.indexOf(varname)
callname = vartype + "." + funcname[1]
argsCnt += 1
if varkind == "field":
self.out.writePush("this",varidx)
elif varkind == "var":
self.out.writePush("local",varidx)
elif varkind == "static":
self.out.writePush("static",varidx)
elif varkind == "argument":
self.out.writePush("argument",varidx)
else:
callname = funcname[0] + "." + funcname[1]
# parse expression list
argsCnt += self.writeExpressionListCode()
# parse ')'
self.parser.advance()
# parse ';'
self.parser.advance()
# write call name
self.out.writeCall(callname,argsCnt)
self.out.writePop("temp",0)

return True

def writeLetCode(self):
varname = ""
vartype = ""
varkind = ""
varidx = ""
isArray = False
# parse let
self.parser.advance()

# parse varname
varname = self.parser.currToken()
# search the key
if self.subTable.indexOf(varname) >= 0:
vartype = self.subTable.typeOf(varname)
varkind = self.subTable.kindOf(varname)
varidx = self.subTable.indexOf(varname)
elif self.classTable.indexOf(varname) >= 0:
vartype = self.classTable.typeOf(varname)
varkind = self.classTable.kindOf(varname)
varidx = self.classTable.indexOf(varname)
else:
print("invalid var statement!")
return False

# skip varname
self.parser.advance()

# parse `[expression]`
if self.parser.tokenType() == TOKEN_TYPE.TOKEN_SYMBOL and \
self.parser.symbol() == '[':

# we push the head arrdress of the array
isArray = True
# parse '['
self.parser.advance()
# parse expression
self.writeExpressionCode()
# we parse the array
if varkind == "field":
self.out.writePush("this",varidx)
elif varkind == "var":
self.out.writePush("local",varidx)
elif varkind == "static":
self.out.writePush("static",varidx)
elif varkind == "argument":
self.out.writePush("argument",varidx)
self.out.writeArithmetic("+")
# parse ']'
self.parser.advance()

# parse '='
self.parser.advance()
# parse expression
self.writeExpressionCode()
# pop value from the stack
if isArray:
self.out.writePop("temp",0)
self.out.writePop("pointer",1)
self.out.writePush("temp",0)
self.out.writePop("that",0)
else:
if varkind == "field":
self.out.writePop("this",varidx)
elif varkind == "var":
self.out.writePop("local",varidx)
elif varkind == "static":
self.out.writePop("static",varidx)
elif varkind == "argument":
self.out.writePop("argument",varidx)
# parse ';'
self.parser.advance()
return True

def writeWhileCode(self):
label = self.whileLabel()

# parse return
self.parser.advance()
# parse '('
self.parser.advance()
# parse expression
self.out.writeLabel(label[0])
self.writeExpressionCode()
self.out.writeSigArithmetic("~")
self.out.writeIf(label[1])
# parse ')'
self.parser.advance()
# parse '{'
self.parser.advance()
# parse statements
self.writeStatementsCode()
self.out.writeGoto(label[0])
# parse '}'
self.out.writeLabel(label[1])
self.parser.advance()

return True

def writeReturnCode(self):
# parse return
self.parser.advance()
# parse expression list
if not (self.parser.tokenType() == TOKEN_TYPE.TOKEN_SYMBOL and \
self.parser.symbol() == ';'):
self.writeExpressionCode()
# parse ';'
self.parser.advance()
if self.subtype == "void":
self.out.writePush("constant",0)
self.out.writeReturn()

return True

def writeIfCode(self):
label = self.ifLabel()

# parse if
self.parser.advance()
# parse '('
self.parser.advance()
# parse expression
self.writeExpressionCode()
self.out.writeSigArithmetic("~")
# parse ')'
self.parser.advance()
# parse '{'
self.parser.advance()
# parse statements
self.out.writeIf(label[1])
self.writeStatementsCode()
self.out.writeGoto(label[0])
# parse '}'
self.parser.advance()

# parse else
self.out.writeLabel(label[1])
if self.parser.tokenType() == TOKEN_TYPE.TOKEN_KEYWORD and \
self.parser.keyWord() == "else":

# parse 'else'
self.parser.advance()
# parse '{'
self.parser.advance()
# parse statements
self.writeStatementsCode()
# parse '}'
self.parser.advance()
self.out.writeLabel(label[0])

return True

def writeExpressionCode(self):
# parse term
self.writeTermCode()

# parse op
while self.parser.tokenType() == TOKEN_TYPE.TOKEN_SYMBOL and \
(self.parser.symbol() == "+" or self.parser.symbol() == "-" or \
self.parser.symbol() == "*" or self.parser.symbol() == "/" or \
self.parser.symbol() == "&" or self.parser.symbol() == "|" or \
self.parser.symbol() == ">" or self.parser.symbol() == "<" or \
self.parser.symbol() == "="):
# parse op
op = self.parser.symbol()
self.parser.advance()
# parse term
self.writeTermCode()
# code write op
if op == "*":
self.out.writeCall("Math.multiply",2)
elif op == '/':
self.out.writeCall("Math.divide",2)
else:
self.out.writeArithmetic(op)
return True

def writeTermCode(self):
varname = ""
vartype = ""
varkind = ""
varidx = 0
paramCnt = 0

if self.parser.tokenType() == TOKEN_TYPE.TOKEN_INT_CONST:
# push constant x
self.out.writePush("constant",self.parser.intVal())
#print(self.parser.intVal())
self.parser.advance()
elif self.parser.tokenType() == TOKEN_TYPE.TOKEN_STRING_CONST:
# push string x
val = self.parser.stringVal()
self.out.writePush("constant",len(val))
self.out.writeCall("String.new",1)
for c in val:
self.out.writePush("constant",ord(c))
self.out.writeCall("String.appendChar",2)
self.parser.advance()
elif self.parser.tokenType() == TOKEN_TYPE.TOKEN_KEYWORD:
# parse keword const
if self.parser.keyWord() == "true" or self.parser.keyWord() == "false" or \
self.parser.keyWord() == "null" or self.parser.keyWord() == "this":
if self.parser.keyWord() == "true":
self.out.writePush("constant",0)
self.out.writeSigArithmetic("~")
elif self.parser.keyWord() == "false":
self.out.writePush("constant",0)
elif self.parser.keyWord() == "null":
self.out.writePush("constant",0)
elif self.parser.keyWord() == "this":
self.out.writePush("pointer",0)
self.parser.advance()
elif self.parser.tokenType() == TOKEN_TYPE.TOKEN_SYMBOL:
if self.parser.symbol() == "(":
# parse '('
self.parser.advance()
# parse expression
self.writeExpressionCode()
# parse ')'
self.parser.advance()
elif self.parser.tokenType() == TOKEN_TYPE.TOKEN_SYMBOL and \
(self.parser.symbol() == "-" or self.parser.symbol() == "~"):
# parse unaryOp
op = self.parser.symbol()
self.parser.advance()
# parse term
self.writeTermCode()
self.out.writeSigArithmetic(op)
elif self.parser.tokenType() == TOKEN_TYPE.TOKEN_IDENTIFIER:
# skip varname
termname = self.parser.currToken()
self.parser.advance()

# we push the head arrdress of the array
# parse expression
if self.parser.tokenType() == TOKEN_TYPE.TOKEN_SYMBOL and self.parser.symbol() == "[":
# parse subroutineName or varName
if self.subTable.indexOf(termname) >= 0:
vartype = self.subTable.typeOf(termname)
varkind = self.subTable.kindOf(termname)
varidx = self.subTable.indexOf(termname)
elif self.classTable.indexOf(termname) >= 0:
vartype = self.classTable.typeOf(termname)
varkind = self.classTable.kindOf(termname)
varidx = self.classTable.indexOf(termname)
else:
print("invalid var statement!")
return False

# parse '['
self.parser.advance()
# parse expression
self.writeExpressionCode()
# parse var type
if varkind == "field":
self.out.writePush("this",varidx)
elif varkind == "var":
self.out.writePush("local",varidx)
elif varkind == "static":
self.out.writePush("static",varidx)
elif varkind == "argument":
self.out.writePush("argument",varidx)
# add
self.out.writeArithmetic("+")
self.out.writePop("pointer",1)
self.out.writePush("that",0)

# parse ']'
self.parser.advance()
# parse subcall
elif self.parser.tokenType() == TOKEN_TYPE.TOKEN_SYMBOL and self.parser.symbol() == ".":
# parse '.'
self.parser.advance()

# parse subroutineName
termfunc = self.parser.currToken()
self.parser.advance()

# parse call name
if self.subTable.indexOf(termname) >= 0:
vartype = self.subTable.typeOf(termname)
varkind = self.subTable.kindOf(termname)
varidx = self.subTable.indexOf(termname)
paramCnt += 1
if varkind == "field":
self.out.writePush("this",varidx)
elif varkind == "var":
self.out.writePush("local",varidx)
elif varkind == "static":
self.out.writePush("static",varidx)
elif varkind == "argument":
self.out.writePush("argument",varidx)
elif self.classTable.indexOf(termname) >= 0:
vartype = self.classTable.typeOf(termname)
varkind = self.classTable.kindOf(termname)
varidx = self.classTable.indexOf(termname)
paramCnt += 1
if varkind == "field":
self.out.writePush("this",varidx)
elif varkind == "var":
self.out.writePush("local",varidx)
elif varkind == "static":
self.out.writePush("static",varidx)
elif varkind == "argument":
self.out.writePush("argument",varidx)
else:
vartype = termname

# parse '('
self.parser.advance()
# parse expressList
paramCnt += self.writeExpressionListCode()
# parse ')'
self.parser.advance()
# write var code
self.out.writeCall(vartype + "." + termfunc,paramCnt)
# parse call
elif self.parser.tokenType() == TOKEN_TYPE.TOKEN_SYMBOL and self.parser.symbol() == "(":
# parse '('
self.parser.advance()
# parse expressList
self.out.writePush("pointer",0)
paramCnt = self.writeExpressionListCode() + 1
# parse ')'
self.parser.advance()
# write var code
self.out.writeCall(self.classname + "." + termfunc,paramCnt)
# print(self.classname + "." + termfunc + str(paramCnt))
else:
# parse subroutineName or varName
if self.subTable.indexOf(termname) >= 0:
vartype = self.subTable.typeOf(termname)
varkind = self.subTable.kindOf(termname)
varidx = self.subTable.indexOf(termname)
elif self.classTable.indexOf(termname) >= 0:
vartype = self.classTable.typeOf(termname)
varkind = self.classTable.kindOf(termname)
varidx = self.classTable.indexOf(termname)
else:
print("invalid var statement!")
return False

if varkind == "field":
self.out.writePush("this",varidx)
elif varkind == "var":
self.out.writePush("local",varidx)
elif varkind == "static":
self.out.writePush("static",varidx)
elif varkind == "argument":
self.out.writePush("argument",varidx)

return True

def writeExpressionListCode(self):
argsCnt = 0

if self.parser.symbol() == ')' and \
self.parser.tokenType() == TOKEN_TYPE.TOKEN_SYMBOL:
return argsCnt

# parse expression
argsCnt += 1
self.writeExpressionCode()

# parse `, expression`
while self.parser.symbol() == ',' and \
self.parser.tokenType() == TOKEN_TYPE.TOKEN_SYMBOL:
# parse ','
self.parser.advance()
# parse expression
argsCnt += 1
self.writeExpressionCode()

return argsCnt

def main(input):
if os.path.exists(input):
if os.path.isdir(input):
files = os.listdir(input)
for f in files:
filename = input+f
if filename.find(".jack") >= 0:
CompileCodeWriter(input+f)
else:
CompileCodeWriter(input)
else:
print("invalid path")

if __name__ == "__main__":
main(sys.argv[1])

欢迎关注和打赏,感谢支持!

扫描二维码,分享此文章