On Sun, 5 Feb 2023 at 04:56, Klemens Morgenstern via Boost
The formal review of the mustache starts today, Sunday the 5th, and ends next week on Tuesday the 14th.
Hi all, This is my review of Boost.Mustache. First of all, thanks Peter for submitting the library, and Klemens for volunteering as review manager.
- Does this library bring real benefit to C++ developers for real world use-case?
This is a utility library, and as such I do believe it has a place in Boost. Although Mustache is not super-popular, it comes up in certain contexts. I think having a robust Mustache engine, compatible with Describe and JSON, brings benefits to users.
- Do you have an application for this library?
Not right now. I'm thinking of writing a PoC pentesting software which may require templating HTTP requests with a list of payloads. If that happens to be the case, I'd use this library for that.
- Does the API match with current best practices?
It has a very small API surface, which I am really glad to see. It uses JSON as input format for data, which has both advantages (it's simple) and disadvantages (it doesn't allow to implement lambdas, it's not that extensible, and in its current form, it requires a lot of copying). I value simplicity a lot, so I'm alright with using JSON. However, I think the last point should be solved somehow. Template data is purely read-only, and efficiency should be a goal in a library like this. For these reasons, an API that allows the user to pass input data with minimal copies should exist. The partials argument doesn't seem to have the right type, as it must be a mapping from string to string, and the API declares it as a JSON object. If I'm not mistaken, by using JSON you're forcing the usage of UTF-8 for the input document. I think this is perfectly fine, but it should be documented. It may be worth exploring the option of providing a renderer::reset method that allows to recycle the same renderer, improving memory re-use. Is there any use case for passing a different output_ref to renderer::render_some and renderer::finish? If there isn't, output_ref could be passed to renderer's constructor.
- Did you try to use it? What problems or surprises did you encounter?
I've built the library, the tests and the examples with b2, Linux, gcc11 and with cmake, Linux, clang15 and libc++ with no problems. I had to add the example manually to the CMakeLists.txt, as cmake doesn't currently build the examples (the author is already aware of that). I've tried to break the library in every possible way I've imagined (especially with buffer overruns and XSS payloads) and I have failed, which is good. As described in previous emails, I found a couple of possible escaping-related problems that can constitute vulnerabilities. These are described in https://github.com/pdimov/mustache/issues/6 and should be easy to fix.
- What is your evaluation of the implementation?
I've navigated through most of it using a debugger, and I think it's fine. It seemed to me that it performed a lot of copies, so I performed some measurements. I measured the number and size of allocations performed through the storage_ptr's and I found it higher than expected. Input data being converted into json::value is one source for such allocations, and should be solved by the view-based API. The renderer class also performs a lot of copying while rendering. These allocations, however, seem to be avoidable without changing the interface. I've raised https://github.com/pdimov/mustache/issues/7 to address it.
- Is the documentation helpful and clear?
I've found reading the source code more helpful. This is what I would add: * A discussion section, similar to what URL does. I would include these sections: * Introductory usage example. Without Boost.Describe (as plain JSON is easier to understand for a newcomer). I would include some explanations on the Mustache features being used. I don't think a full section on Mustache is necessary, but saying "this is a tag", "this is a partial" would be useful. * Using the library with user-defined types - with Describe now. * Using renderer to render with render_some. * Using string types as output. * A section on escaping (the magic word "security" may help here). All other libraries have a bunch of open issues by users not understanding that escaping happens by default. Also, I would describe what "HTML escape" means in this context (the spec doesn't say it and different libs do different things). * A proper reference section, with links and human-readable explanations. Taking output_ref as an example, I would like to see: * A URL for output_ref, so you can reference it from other places in docs. * A human-readable explanation (e.g. "a polymorphic reference to either a string or a I/O stream that can be used to output a rendered Mustache template"). * Explanations of what St and Os mean (ideally, pointing to its own section). * When you reference other functions/classes (e.g. renderer in the render() description), please do provide links to them. Otherwise, the user ends up having to CTRL+F it, which is extremely frustrating (real story from mp11). * Doxygen comments (or similar). Pressing F12 and seeing comments on the function you're using helps. * Comparison to other libraries. Specially to the one listed by the official Mustache page, even if it seems to be unmaintained. * The documentation should state the exact version of the Mustache API that it's implementing. It would be good to briefly describe what "mandatory features" are in the spec, what optional features aren't implemented but are in the roadmap, and which ones won't happen. The spec is very difficult to read as a user, so don't expect people will read it.
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
I've dedicated around 10h to this review. I've read the docs, the Mustache discussion and spec, reviewed the Node library, built and run the examples, tried to break them, navigated through the impl via debugger and done the allocation study.
- Are you knowledgeable about the problem domain?
I briefly met Mustache when working with a swagger-codegen project (https://github.com/swagger-api/swagger-codegen). Given an OpenAPI spec, swagger-codegen generates stub code for you in different languages. I haven't used it from C++ or in production.
Please explicitly state that you either *accept* or *reject* the inclusion of this library into boost.
My recommendation is to **CONDITIONALLY ACCEPT** this library into Boost, with the following conditions: 1. The partials argument must be typed according to the data structure it requires. 2. A view-based API should be added or, at least, should be addable without breaking changes. 3. The found vulnerabilities must be fixed. 4. Proper documentation must be written. Some other minor points (these are not acceptance conditions): * The link to mustache.github.io in the docs is HTTP instead of HTTPS. * Calling a type in the examples "reference" may not be the best idea. I stared at it thinking "why is this not dangling" until I figured out what "reference" meant in that context. * Is the config header supposed to be included by users? If not, move to detail/ * Is output_ref::write supposed to be called by users? If not, make it private. * Not being able to load the CML without the superproject contributes to the feeling of living in a monolith. * Your copyright is out of date ;) Regards, Ruben.