caasih.net

Build Your Own Haskell Compiler #2

為了讓大家可以簡單地寫 desugar function , C 使用 Template Haskell ,以便編譯時自動產生程式碼,輸出給 Haskell.Src.Exts 用的 desugar function template ,於是做 if 的 desugar 時,只要寫:

-- 注意:這是最早的版本,沒有使用帶更多資訊的 `Haskell.Src.Exts.Annotated`
deIfExp (If exp1 exp2 exp3) = deIfExp $
	Case exp1
  	[ Alt (SrcLoc "" 0 0) (PApp (UnQual (Ident "False")) []) (UnGuardedRhs exp3) Nothing
    , Alt (SrcLoc "" 0 0) (PApp (UnQual (Ident "True")) []) (UnGuardedRhs exp2) Nothing
    ]

省掉手寫處理其他 Syntax 的三百多行。

但身為一個連 Monoid, Functor, Applicative 都還搞不清楚的笨蛋,怎麼可能一下子就懂 TH (Template Haskell)在搞什麼呢?

所以來拆解 7e37b79 吧!看看 TH 到底是怎麼幫助我們的 XD

文末附上的是只處理 Haskell.Src.Exts.Syntax.Module 的精簡版,沒有 7e37b79 那麼複雜,以下一點一點說明:

deriveDesugarTemplate

程式的進入點是 deriveDesugarTemplate :: String -> Q [Dec] ,吃的 String 是之後用來生 desugar template 的 function 名稱,吐出來的 Q [Dec]$() 裡面會變成程式,在 Main.hs 中這樣寫:

-- 前面有些 code

$(deriveDesugarTemplate "xx")

-- 後面有些 code

之後就有個叫 xx 的 function 可以用。寫 xx "DeIf" "deIf" 就會生出想要的三百多行程式。

deriveDesugarTemplate :: String -> Q [Dec]
deriveDesugarTemplate funName = do
  Just moduleName <- lookupTypeName "Module"

  moduleInfo <- reify moduleName
  runIO $ putStrLn $ show moduleInfo

  let code = genDataTransformer "deIf" moduleInfo
  runIO $ putStrLn $ code

  fmap pure $ funD (mkName funName) [clause (map (varP . mkName) ["modName", "funPrefix"]) (normalB (varE (mkName "undefined"))) []]

一開始的:

Just moduleName <- lookupTypeName "Module"

先從看得見的地方(在此例中是 Main.hs 和 DeriveDesugarTemplate import 的 modules),找出叫做 "Module"Name ,再靠:

moduleInfo <- reify moduleName

把 Module 的形狀拆出來。

runIO 則可以幫我們在 Q 中印出 moduleInfo ,看看接下來該怎麼寫。

Language.Haskell.Src.Exts.Syntax.Module

Module

Module SrcLoc ModuleName [ModulePragma] (Maybe WarningText) (Maybe [ExportSpec]) [ImportDecl] [Decl]

在 TH 中長這個樣子:

TyConI
  ( DataD
    [] -- Cxt
    Language.Haskell.Exts.Syntax.Module -- Name
    [] -- [TyVarBndr]
    [ NormalC -- [Con]
      Language.Haskell.Exts.Syntax.Module
      [ (NotStrict, ConT Language.Haskell.Exts.SrcLoc.SrcLoc) -- SrcLoc
      , (NotStrict, ConT Language.Haskell.Exts.Syntax.ModuleName) -- ModuleName
      , (NotStrict, AppT ListT (ConT Language.Haskell.Exts.Syntax.ModulePragma)) -- [ModulePragma]
      , (NotStrict, AppT (ConT GHC.Base.Maybe) (ConT Language.Haskell.Exts.Syntax.WarningText)) -- Maybe WarningText
      , (NotStrict, AppT (ConT GHC.Base.Maybe) (AppT ListT (ConT Language.Haskell.Exts.Syntax.ExportSpec))) -- Maybe [ExportSpec]
      , (NotStrict, AppT ListT (ConT Language.Haskell.Exts.Syntax.ImportDecl)) -- [ImportDecl]
      , (NotStrict, AppT ListT (ConT Language.Haskell.Exts.Syntax.Decl)) -- [Decl]
      ]
    ]
    [] -- [Name]
  )

我們關心的,主要是 [Con] 中那些 Types , genDataTransformer 會幫忙產生處理 Module 的 function 。

genDataTransformer

genDataTransformer 吃新函數的 prefix 和 Module 在 TH 中的形狀。可以拆成兩部分看,先看後半用 LambdaCase 寫的部分:

\case
  TyConI (DataD [] (Name (OccName tyNameStr) _) [] cons _) ->
    let
      typeSig = funPrefix ++ tyNameStr ++ " :: " ++ tyNameStr ++ " -> " ++ tyNameStr ++ "\n"
    in
      concat $ typeSig : map (conToDef tyNameStr) cons
  others -> error $ "genDataTransformer " ++ show others ++ " not implemented"

在 Module 的例子中, typeSig 會變成 "deIfModule :: Module -> Module\n" ,而下面的 map (conToDef tyNameStr) cons 生出來的就是 function body 。

conToDef tyNameStr (NormalC (Name (OccName conNameStr) _) slots) =
  let
    varNames = varNamesForNormalSlots slots
  in
    funPrefix ++ tyNameStr ++
      " (" ++ intercalate " " (conNameStr : varNames) ++ ") = " ++
      intercalate " " (conNameStr : zipWith (\varNameStr (_, ty) -> exprFromType varNameStr funPrefix ty) varNames slots) ++
      "\n"

其中:

funPrefix ++ tyNameStr ++
  " (" ++ intercalate " " (conNameStr : varNames) ++ ")"

會生出:

deIfModule (Module srcLoc moduleName modulePragma warningText exportSpec importDecl decl)

也就是說, varNamesForNormalSlots 會把 slots 收集起來,變成可以 pattern match 的樣子。

intercalate 在這裡做的事情只是在 name 之間插入空白,再接成 String 。


intercalate " " (conNameStr : zipWith (\varNameStr (_, ty) -> exprFromType varNameStr funPrefix ty) varNames slots)

會生出:

Module (deIfSrcLoc srcLoc) (deIfModuleName moduleName) (fmap (deIfModulePragma) modulePragma) (fmap (deIfWarningText) warningText) (fmap (fmap (deIfExportSpec)) exportSpec) (fmap (deIfImportDecl) importDecl) (fmap (deIfDecl) decl)

exprFromType 生出了遞迴遍歷各個 slot 需要寫的 code 。

varNamesForNormalSlots

finalNames = go M.empty rawNames where
  go nameCount (nameStr : others) =
    if S.member nameStr dupNames then
      let
        nameCount' = M.insertWith (+) nameStr 1 nameCount
        serial = nameCount' M.! nameStr
      in
        (nameStr ++ show serial) : go nameCount' others
    else
      nameStr : go nameCount others
  go _ _ = []

這段做的事情是,找出重複的名字,並在名字後面加上數字。這麼一來, Exp Exp Exp 就不會變成 exp exp exp ,而是可以區分彼此的 exp1 exp2 exp3 。先看現在的 nameStr 在不在放著重複名字的 dupNames 中,如果在的話,就用加上數字的名字(nameStr ++ show serial),不然用普通的名字就好(nameStr)。


dupNames :: S.Set String
dupNames = go S.empty S.empty rawNames where
  go res seen (nameStr : others) =
    if S.member nameStr seen then
      go (S.insert nameStr res) seen others
    else
      go res (S.insert nameStr seen) others
  go res _ _ = res

dupNames 也很好玩,一開始有兩個空的 Set ,前面的用來放重複的名字,後面的用來放看過的名字。從 rawNames 中把名字一個一個拿出來看,如果這個名字已經看過了(S.member nameStr seen),就塞到 res 中;如果沒看過,就塞到 seen 中。


rawNames 又是怎麼來的?靠 varNameFromType 從 slots 裡面拆出來的:

rawNames = map (\(_, ty) -> case varNameFromType ty of Name (OccName str) _ -> lowerHead str) slots

拆出來後把字首小寫(lowerHead)。

varNameFromType

varNameFromType :: Type -> Name
varNameFromType (ConT name) = name
varNameFromType (all @ (AppT f x)) = case maybeVarNameFromType all of
  Just o -> o
  Nothing -> varNameFromType x
varNameFromType others = error $ "varNameFromType " ++ show others ++ " not implemented"

拆 Name 出來時,有意思的是那個 maybeVarNameFromType 。會濾掉包裝或基本的 Type :

trivialTypes :: S.Set String
trivialTypes = S.fromList ["String", "Maybe", "Int", "Rational", "Char", "Integer", "Bool"]

做法是:

maybeVarNameFromType :: Type -> Maybe Name
maybevarNameFromType (ConT (name @ (Name (OccName nameStr) _)))
  | S.member nameStr trivialTypes = Nothing
  | otherwise = Just name
maybeVarNameFromType (AppT f x) = maybeVarNameFromType f <|> maybeVarNameFromType x
maybeVarNameFromType _ = Nothing

這樣就不用特地把 Maybe WarnText 寫成 mWarnText ,只要寫成 warnText 就好。該如何正確地處理它們,則交給 exprFromType

exprFromType 和 transExprFromType

exprFromType :: String -> String -> Type -> String
exprFromType name funPrefix ty = "(" ++ transExprFromType funPrefix ty ++ " " ++ name ++ ")"

exprFromType 套上了最外面的 () 與目標物(name)。重點在:

transExprFromType :: String -> Type -> String
transExprFromType funPrefix = genExpr where
  genExpr = \case
    ConT (Name (OccName name) _)
      | S.member name trivialTypes -> "id"
      | otherwise -> funPrefix ++ name
    AppT ListT x -> "fmap (" ++ genExpr x ++ ")"
    AppT (ConT (Name (OccName "Maybe") _)) x -> "fmap (" ++ genExpr x ++ ")"
    AppT (AppT (TupleT 2) a) b -> "(" ++ genExpr a ++ ") *** (" ++ genExpr b ++ ")"
    others -> error $ "exprFromType " ++ show others ++ " not implemented"

這邊才真的看每個 Type 的形狀,決定該不該加 fmap ,該加幾層。

那個 ***Control.Arrow 裡的東西,又提醒我,得把看到一半的文章看完,這樣才有機會搞懂 FRP 。

簡化過的 DerivieDesugarTemplate.hs 全文

module DeriveTemplate where

import Data.Functor
import Data.List
import Data.Char
import Data.Monoid
import Control.Applicative
import Control.Monad
import qualified Data.Set as S
import qualified Data.Map as M
import Language.Haskell.TH
import Language.Haskell.TH.Syntax

trivialTypes :: S.Set String
trivialTypes = S.fromList ["String", "Maybe", "Int", "Rational", "Char", "Integer", "Bool"]

deriveDesugarTemplate :: String -> Q [Dec]
deriveDesugarTemplate funName = do
  Just moduleName <- lookupTypeName "Module"

  moduleInfo <- reify moduleName
  runIO $ putStrLn $ show moduleInfo

  let code = genDataTransformer "deIf" moduleInfo
  runIO $ putStrLn $ code

  fmap pure $ funD (mkName funName) [clause (map (varP . mkName) ["modName", "funPrefix"]) (normalB (varE (mkName "undefined"))) []]

maybeVarNameFromType :: Type -> Maybe Name
maybevarNameFromType (ConT (name @ (Name (OccName nameStr) _)))
  | S.member nameStr trivialTypes = Nothing
  | otherwise = Just name
maybeVarNameFromType (AppT f x) = maybeVarNameFromType f <|> maybeVarNameFromType x
maybeVarNameFromType _ = Nothing

varNameFromType :: Type -> Name
varNameFromType (ConT name) = name
varNameFromType (all @ (AppT f x)) = case maybeVarNameFromType all of
  Just o -> o
  Nothing -> varNameFromType x
varNameFromType others = error $ "varNameFromType " ++ show others ++ " not implemented"

varNamesForNormalSlots :: [StrictType] -> [String]
varNamesForNormalSlots slots =
  let
    dupNames :: S.Set String
    dupNames = go S.empty S.empty rawNames where
      go res seen (nameStr : others) =
        if S.member nameStr seen then
          go (S.insert nameStr res) seen others
        else
          go res (S.insert nameStr seen) others
      go res _ _ = res

    rawNames = map (\(_, ty) -> case varNameFromType ty of Name (OccName str) _ -> lowerHead str) slots

    finalNames = go M.empty rawNames where
      go nameCount (nameStr : others) =
        if S.member nameStr dupNames then
          let
            nameCount' = M.insertWith (+) nameStr 1 nameCount
            serial = nameCount' M.! nameStr
          in
            (nameStr ++ show serial) : go nameCount' others
        else
          nameStr : go nameCount others
      go _ _ = []
  in
    finalNames

transExprFromType :: String -> Type -> String
transExprFromType funPrefix = genExpr where
  genExpr = \case
    ConT (Name (OccName name) _)
      | S.member name trivialTypes -> "id"
      | otherwise -> funPrefix ++ name
    AppT ListT x -> "fmap (" ++ genExpr x ++ ")"
    AppT (ConT (Name (OccName "Maybe") _)) x -> "fmap (" ++ genExpr x ++ ")"
    AppT (AppT (TupleT 2) a) b -> "(" ++ genExpr a ++ ") *** (" ++ genExpr b ++ ")"
    others -> error $ "exprFromType " ++ show others ++ " not implemented"

exprFromType :: String -> String -> Type -> String
exprFromType name funPrefix ty = "(" ++ transExprFromType funPrefix ty ++ " " ++ name ++ ")"

lowerHead :: String -> String
lowerHead (c:cs) = toLower c : cs

genDataTransformer :: String -> Info -> String
genDataTransformer funPrefix =
  let
    conToDef tyNameStr (NormalC (Name (OccName conNameStr) _) slots) =
      let
        varNames = varNamesForNormalSlots slots
      in
        funPrefix ++ tyNameStr ++
          " (" ++ intercalate " " (conNameStr : varNames) ++ ") = " ++
          intercalate " " (conNameStr : zipWith (\varNameStr (_, ty) -> exprFromType varNameStr funPrefix ty) varNames slots) ++
          "\n"
  in
    \case
      TyConI (DataD [] (Name (OccName tyNameStr) _) [] cons _) ->
        let
          typeSig = funPrefix ++ tyNameStr ++ " :: " ++ tyNameStr ++ " -> " ++ tyNameStr ++ "\n"
        in
          concat $ typeSig : map (conToDef tyNameStr) cons
      others -> error $ "genDataTransformer " ++ show others ++ " not implemented"

讀到這裡,認為有機會靠 TH 寫出簡潔的 pretty print ,但還是不確定從何下手才好。

順其自然吧。

Isaac Huang 發佈於