GNAT-Bootstrap-Compiler

Language and platform choice for a GNAT bootstrap compiler
Login

Language and platform choice for a GNAT bootstrap compiler

(1) By root on 2022-12-10 22:17:32 [link] [source]

Dear all,

I think it is time for us to choose a primary programming language and platform/implementation with which to build the GNAT bootstrapping compiler.

We are following the live-bootstrap approach to creating a fully bootstrappable system. I would recommend everybody to take a look at their current bootstrap path to get an idea on how that is done and how the problem has been approached. However, here is a quick summary:

  • No binaries are allowed. The only exception is at the very beginning, were a Linux kernel is needed and the first payload is a handwritten binary system to drive the entire chain.
  • No pregenerated files are allowed. Only human written and checked code is allowed to be defined as bootstrappable. This means, that GNAT is not bootstrappable, as we always need a previous GNAT to be able to compile another version. This is true all the way back to the first version.
  • The programs have to be reproducible. This means that they always produce the exact same output given the same input, all the way down to the last bit.
  • If you take a look at the path, you will see that we start with a Scheme/C compiler called Mes + MesCC. Then we move to the Tiny C Compiler TCC and we can compile the Musl libc.
  • From that point on we go on a race to compile GCC 4.7. The last GCC version capable of being compiled with just a C compiler.
  • The final program is GNU Guile which serves as the base for the GNU Guix OS.

So... Where does the GNAT bootstrap compiler fit in this crazy world? Our current wish is to have a working bootstrap compiler befor GCC 4.7 is compiled, so that we could add Ada support to it. If that is not possible we would be targeting GCC 4.0, which is also compiled before GCC 4.7.

How do we want to achieve this? The current options in the bootstrapping path would be:

  • To write a compiler in C, which could be compiled with TCC.
  • To write a compiler in Perl, which could be interpreted by Perl.
  • To write a compiler in Scheme using MesCC, which is a very basic Scheme implementation.
  • To write a compiler using an intermediate step. This could be:
    • Write the compiler using GNU Guile. This would give us a ton of control and libraries, but it would be all the way after the main bootstrap process and after GCC 4.7 is compiled.
    • Write the compiler in a small Scheme implementation early in the bootstrapping path. More on this later.
    • Write the compiler in a small language early in the bootstrapping path. I have already carried out an early analysis on the matter. The choices would be Lua or Jim's TCL. However, this seems unlikely.
    • To use another bootstrapped system. This would be most likely after the entire bootstrapping path has already taken place. An example of this would be Camlboot.

With some preliminary analysis and discussions over at #ada, #bootstrappable and the systematic analysis of Scheme implementations, it seems that the two most likely solutions are:

However I would like to know your opinion on the topic. I will make a second post with my own point of view.

(2) By root on 2022-12-10 22:39:00 in reply to 1 [link] [source]

Welp, here is my personal opinion.

I have to start by saying I am biased towards TR7, but I will explain why later.

My take on the matter is that we should choose a Scheme interpreter as LISP/Scheme is a language that is designed to create DSLs and compilers. There is a large body of literature and the community is very knowledgable and robust.

This would leave us with four different options:

  • Mes + MesCC. This options would be ideal in the sense that we could create the bootstrap extremely early in the chain. However, this would take place in a very limited environment and in a runned down Scheme implementation. I do not think the benefits outweight the costs.
  • Chibi Scheme. Chibi is a complete R7RS small and red Scheme implementation. Another great thing about it is that it can compile Scheme code to C code and generate binaries. This is a great thing, as we would be able to easily ship final binaries and more easily use the compiler. Chibi also has a nice community and good documentation. However, I see two main issues with it:
    • If we start using its libraries, our life would be much easier, but then the compiler would be fully tied to it.
    • The C source code is bigger than TR7's and is less documented. Also, compiling Chibi is not trivial and requires GNU Make. This is not a bad thing however.
  • TR7 Scheme. It is a new, small Scheme interpreter written in a single C file (with a header). It is R7RS small Scheme compliant. I like it for a few reasons. The code is quite readable and it keeps improving quite a bit. It is smaller than Chibi, by about 10kSLOC and it seems to have better source code documentation. Performance seems to be slightly better than Chibi's. However, it also has a few drawbacks:
    • New and fresh out of the oven. It is not so well-known, it does not have a community backing it (single developer mostly) and there may be a few hidden bugs.
    • It is just an interpreter, so it cannot produce binaries.
    • It lacks libraries. This is bad if we wanted to use them, however, if we want to keep the code portable, this is not so terrible.
  • GNU Guile. It is a large and very performant R5RS, R6RS and R7RS Scheme implementation. It has a very large community backing it and it is the target of the bootstrapping project. However, there are a few potential issues for us:
    • It is the final objective in the bootstrapping path, so we would not be able to target directly GCC 4.7 or GCC 4.0.
    • It is very large, about 300kSLOC (compared to 80kSLOC of Chibi and 17kSLOC of TR7).
    • It is its own Scheme dialect, so if we use its tools and idioms, the code may not be portable to other implementations.

My final conclusion would be that we should use TR7 and try to keep the code usable with Chibi and Otus Lisp. In that case, we could jump ship if we feel the need. This could also help us during develpment. We could also use third-party libraries that work on most Scheme for our needs.

So, how do you see this?

(3) By Andrius Štikonas (stikonas) on 2022-12-10 22:51:06 in reply to 2 [source]

Guile is not the final objective in the bootstrapping path.

At the moment it is close towards the end but that's mostly because live-bootstrap maintainers didn't have enough time to add more stuff.

Guile is an intermediate goal as it helps a bit when building GCC (it allows rebuilding some stuff with autogen which depends on Guile. Without Guile we can't use top level configure script if we want to avoid pregenerated files)

We plan to build newer versions of GCC and we can always rebuild GCC 4.7.4 again if that is useful.

(10) By root on 2022-12-29 21:33:01 in reply to 3 [link] [source]

Thanks for the feedback Stikonas. I was not aware that autogen used Guile! I am thankful for your work. Hopefully we will be able to add to it in the mid/far future :)

(4) By theruran on 2022-12-27 02:29:32 in reply to 2 [link] [source]

It is worth noting that Ada-Ed is already in Guix. It is an interpreter of Ada 83, and while I have not reviewed the source in detail, given that it was designed as an educational tool, I would assume that the C source code is relatively easy to understand and (maybe) modify. It's possible this interpreter was extended to originally bootstrap GNAT but that we really don't know other than there was some unreleased or proprietary compiler/interpreter used. GNAT is mostly written in Ada 95 as far as we know, with exception of a few Unicode characters in some source files that is technically Ada 2005. Our reasoning for not extending Ada-Ed to bootstrap GNAT seems to come down to a bias against C: 1) we don't want to write C, 2) C can be difficult to audit depending on the style and software architecture (even though there should be more auditors available overall), and 3) writing more C is contributing to the problems that Ada addresses. (@root can let me know if that is accurate.) So, I wouldn't rule it out completely unless we discover something unworkable about it. Perhaps someone else is more interested in taking this route, and having another way to bootstrap GNAT would be beneficial.

So, between Chibi, TR7, and Guile, I think we have good reason to drop Guile from the list. It is a huge codebase that seems to be under heavy development. It'd be nice to gather more concrete evidence on this point, but consider the recent changes from Guile 2 to 3. I have also heard talk of replacing more of its core, including possibly its garbage collector. So while I appreciate it is used widely in the community, from a bootstrapping point-of-view it is not ideal because it is hard to audit, even if it is bit-for-bit reproducible.

Between Chibi and TR7 it's hard for me to decide. I really like that TR7 is one source file, less than half the C source size of Chibi, and that the developer is so responsive. That last point makes me feel optimistic about any troubles we may run into with it. That it's not widely known or used may or may not be a disadvantage: one thing I can say is that it is probably not in any distro package repositories while Chibi is; however, it easy to build, so building it manually or adding a port would be straightforward. Not depending on Make puts it earlier in the bootstrap path and that is another advantage. I was able to build it and run it without issue using my Gentoo Hardened toolchain and with musl-gcc using -O3 -march=native and -flto. TR7 seems to fully support R7RS macros (and records), so I am not sure why you have it labeled "in the works" under macro support. I just tested some code from the R7RS-small spec, and I can see it has tests defined for these things.

Now I don't see what advantages Chibi has over TR7 other than its wider usage among the Scheme community and built-in packages. Since TR7 is R7RS-conforming, then it should be able to use other SRFIs not already included. Well, for what it's worth, I was able to import (chibi match) by first reading the .sld and .scm files explicitly (I could not get TR7_LIB_PATH work ???) and run a few tests from the documentation, although the record matching seems to be broken. I am not having immediate success with (chibi parse) which depends on SRFI-14 but this should work as well. Irvise - maybe you will have better luck. These two libraries jumped out at me from the Snow library list as immediately applicable.

As I mentioned, I'll be taking a decision support course next semester, and I may be able to make this into a project. Probably don't wait on that to get started, though. Anyway, this kind of decision is a common problem in software projects.

(6) By theruran on 2022-12-27 02:55:18 in reply to 4 [link] [source]

I forgot to add that speed is not much of an issue here, since gnat-bootstrap will only be used to bootstrap GNAT, and not used to compile arbitrary Ada packages. Memory usage is relevant though, since even though older/slower computers may be able to wait a long time on the CPU, they are limited by RAM available.

(8) By root on 2022-12-29 20:50:17 in reply to 6 [link] [source]

I agree. Speed is always good, but memory usage takes a much higher priority in our case. This is something we will need to take a look at.

(9) By root on 2022-12-29 21:30:52 in reply to 4 [link] [source]

I agree with your assessment of Ada-Ed. I would add a concern and that would be the timeframe for which it was built. I do not know how much effort would be required to make it work with modern tools and environments...

I also agree with your Guile comments.

Regarding Chibi and TR7. Since having a minimal codebase is a strong principle, I do prefer TR7 because of that. The developer's attitude is also very positive in my opinion. I am not saying that Chibi's community is bad however! TR7 is fairly new and that is why it is not packaged anywhere except Arch Linux (AUR). However, I can say that the Makefile is GNU-Make and BSD-Make compatible so that it can be easily packaged even in *BSDs. And since it is really only two/three C files, compiling it manually is extremely easy! Plus, being packaged is irrelevant for us, as we would be using Make or Kaem for its build.

It is true, however, that it lacks quite a few modules. However, as you already saw, they can easily be imported if they are portable. Though not all modules are portable as they may require functionallity that is not present in base R7RS-small or in a low-level implementation of some tool (see green threads). However, I am more than open to the possibility of including external Scheme code that would make our lives easier.

Even if I am biased towards TR7, Chibi does have some big advantages. The first one is that we can use it to have a second testing playground. That in itself is useful during development. It is not necessary however, in the final deployment/build. Another big advantage is its ability to compile Scheme code to C and have self-contained binaries. This is a big advantage, Chibi can consume C code. For us that may be an important feature, as GNAT requires C ain order to compile Ada, as some Ada values need to be taken from the POSIX/System variables that are normally only found in C code. So a GNAT compiler shall be also able to do the same. However, we would already have a C compiler that we could leverage to do that job...

I think we should also test the memory requirements of both implementations in the future and potentially reconsider our decision.

Regarding TR7 macros, the info I wrote may be outdated, as TR7 has advanced quite a lot in the past few months.

I would also like to try to import some external libraries into TR7 and see how it fairs.

And now Otus Lisp is slowly becoming bootstrappable, see the recent development so we may want to consider it as an alternative or as a secondary/tertiary implementation.

Best, Fer

(5) By theruran on 2022-12-27 02:32:34 in reply to 1 [link] [source]

For $FINDANAME what's wrong with gnat-bootstrap? I prefer descriptive names when possible.

(7) By root on 2022-12-29 20:49:30 in reply to 5 [link] [source]

Changed :)

I have used hyphens instead of underscores since this will be a Scheme project, not an Ada one >:D

(11) By root on 2024-02-19 19:42:27 in reply to 1 [link] [source]

I am leaving this just as a reference that may come in handy x86 compiler written in Scheme