Quoted strings in pass arguments

Posting here to get community feedback, but this question is spun out from this comment on #4511 and the now closed #4429. Currently, arguments to passes can be wrapped in quotation marks, with the quoted string being parsed as a single argument, including the quotation marks. For frontends (read_*) and backends (write_*), using the extra_args() method will automatically strip the quotation marks, allowing one to for example open files with spaces in the path (read ”file name.v”), similar to how it would be parsed from the command line (yosys “file name.v”). For non-filenames, the onus is placed on the pass developer to handle the quotation marks themself.

I want to add quote stripping for the log command, since it is useful in the documentation to be able to call e.g. log "yosys> flatten;;", and have it log the unquoted text yosys> flatten;;. My question is, would this be useful more generally? My current thoughts are:

  1. modify token generation to always strip quotation marks and convert escaped quotation marks to regular quotation marks,
  2. as-above but only for passes with Pass::strip_quotes_flag or similar set,
  3. a helper method string unquote(string) somewhere to do the above on demand on a per-argument basis, or
  4. re-open #4429 to add quote parsing to just the log command.

There are a couple passes which rely on the quotation marks to distinguish between strings and non-strings (setattr, setparam, etc), so 1 seems like a non-option. I’m leaning towards 2 or 3 but don’t really have a preference either way. In the absence of any other feedback I’ll probably go for 3, just because modifying register.h triggers a lot of files to rebuild so I try to avoid it when I can.

I would strongly prefer option 1, and would even go beyond just automatically stripping quotes with the current quoting behavior. In particular I’d want to reduce pass specific behavior in argument parsing and instead provide a uniform way that allows you to pass any possible list of strings as arguments to any pass (i.e. any std::vector<std::string>, so including empty strings or strings containing quotes and backslashes). That list of strings should also be the only information the pass gets about the arguments, i.e. any use of quoting or escaping should be invisible to the pass itself.

If the pass interprets the argument in a way that can be a string or something else (e.g. a numeric value or glob pattern), the pass should provide an internal way to disambiguate it (e.g. by using different flags or by using and documenting how to escape strings that would otherwise be interpreted). I do think we should also strive to keep this uniform and share code for handling the list of arguments across passes where it makes sense, but if we clean up how the command line gets turned into a list of arguments, we can address that separately.


From a developers perspective, this reduces complexity because the problem of how to pass a list of strings and the problem of how to interpret that list are entirely decoupled, and the first problem is solved in the same way across all passes. This would also end up simplifying the logic and/or improving the robustness for all code that generates Yosys scripts programmatically, which we have a lot of in the FV tools that build on top of Yosys.

This is also how we deal with this when using TCL scripts. There, the arguments that TCL provides as a const char *argv[] get turned into a std::vector<std::string> which the pass receives as is. I think this is another argument in favor of fully handling basic quoting and escaping for arguments in Yosys scripts outside of a pass, allowing the pass-internal command line handling to make sense for both TCL and Yosys scripts. (It is also possible for a TCL script to provide a full command line as a single string using yosys "pass_name -some arguments", which gets parsed like a Yosys script, but that seems mostly relevant for interpreting user provided commands in a Yosys script compatible way.)


From a users perspective, this is also how it works in Unix shells and in TCL. I expect the vast majority of Yosys users to be quite familiar with either of them and a significant portion even with both. I think that makes a strong UX argument that Yosys should behave similarly. This doesn’t mean we have to use the same rules for quoting and escaping, shell and TCL don’t agree here either, instead we should consider what avoids unnecessary quoting and escaping for commonly used arguments while still handling all corner cases and is easy to remember once you learned about it.

We should also avoid breaking the majority of existing scripts without good reason, but I don’t think concerns about that should get in the way of uniform and complete quoting and escaping of pass arguments. I also expect the things that would break to be relatively easy and straightforward to fix.


Finally, a related anecdote: Windows leaves the command line parsing to the individual program, but the strong influence of Unix on everything of course meant that there was the need to provide APIs that represent the command line arguments as a list of strings. After all even C expects an int main(int argc, char *argv[]).

Windows ended up with at least two APIs providing incompatible implementations of that, but no API to turn a list of strings back into a command line. There’s a ton of code that starts with the assumption that things work as they do in unix or that there is at least something conceptionally equivalent. Eventually such code runs into bugs related to this, often with no good ways to fix things. E.g. Rust’s #91991, #29494, #82227, #44650 and that is with Rust being serious about Windows support, so it’s not for a lack of trying or paying attention to it.

I think any change we make to this that isn’t a complete and well defined way to quote and escape arbitrary string arguments, and that isn’t parsed externally to passes, will lead to just more and more special cases and diverging behavior over time, and we would most likely end up in a similar mess.

1 Like

I’ve started a draft PR for this. I haven’t yet started on attrmap.cc or setattr.cc (which I think are the only places that still rely on quotation marks), but you think making it a breaking change is the best option rather than trying to make it backwards compatible?

Yeah I think it’s worth it. If testing shows that the fallout of that breaking change is too large, we could special case those two passes for backwards compatibility but still add a new forwards compatible way to disambiguate things and print a deprecation warning when they are called in ways that would change behavior when we remove the backwards compatibility workaround.

This ended up requiring a bunch of logger regex be adjusted, since backslashes no longer get passed verbatim. From my commit message:

Replaces double quotes on problematic regex strings (mostly ones that have escape sequences that are easier to preserve in single quotes). Necessitates also changing single quotes to ., i.e match any.
For some (mostly ones that only have a single escaped character, or were using \. to match a literal fullstop) keep the double quotes and fix the regex instead.

I’m not sure how likely that is to affect end-users, but we could potentially add an extra message to the current “Error in regex expression” advising of the change if we think that’s likely to help.

The changes to chparam and setattr are indirectly tested by their usage in the test infrastructure itself (i.e. tests were failing because of this change and needed to be updated). bugpoint and scratchpad already have tests that were previously relying on quoted args. While setparam is the only one that lacks both direct tests and indirect tests. May be worth adding some tests there just to verify.

The current implementation of attrmap relies on the quotes being present, but because the arguments are -map <old_name>=<old_value> <new_name>=<new_value> (e.g. -map a=”a” b=2) the changes to quoting don’t affect it. While looking into it I also noticed that while it provides -imap which says Like -map, but use case-insensitive match for <old_value> when it is a string value, except as far as I can tell the imap value is unused and both -map and -imap are case-insensitive. There are also no tests for attrmap, and the only usage in the Yosys codebase is two cases of attrmap -remove init (which could have just as easily been setattr -unset init)