Fixing `IdString` refcounting

I brought this up during the Dev JF to get more input besides what we’ve been already discussing in the other thread. In particular I was curious whether we’d be able to go with the GC based approach right away, skipping the introduction of a separate IdRef, assuming we first make NEW_ID cheap enough that churning through those within a pass doesn’t become an issue, or that we provide a way to trigger a GC from within passes where that becomes a problem.

The important points are

  • Everyone would be on board with the general IdRef approach. (We didn’t discuss just using more const IdString & but that has been uncontroversial and merged before, here and there.)
  • Everyone would be on board with the GC based approach if we expect only very few things to break and have good and easy to apply recommendations for how to fix the fallout of that.
  • No one was aware of any third-party plugins storing IdStrings outside of the design between passes. For existing flows (including flows for third-party plugins) that we know about, where identifiers do survive outside of the design across passes, that’s done using TCL scripts where those identifiers are stored as actual strings, not ID strings. There might be exceptions but they are likely rare enough that we could impose required changes to separately mark those IDs as in-use.
  • There was some concern that the GC approach without either cheap NEW_ID or ways to trigger GC during safe points within passes would lead to space leaks for longer running passes. (I think we want a cheap NEW_ID in any case, so that shouldn’t prevent us from going with this.)

Afterwards it also occurred to me that we didn’t think of the python bindings, which are currently being rewritten and which we haven’t supported as part of TabbyCAD so far. Since python scripts are free to do whatever, they can also keep IdStrings around outside of passes. In the context of the rewrite, we had already discussed that the python bindings would need to perform some explicit in-use-marking to fix current bugs or rough edges around object lifetimes, so I’m pretty sure extending that requirement to python referenced IdStrings is not an issue.

With that, my suggestion would be to directly aim for the GC approach where we replace IdString with IdRef, getting rid of reference counts entirely. We would still call it IdString, but I’ll keep using IdRef below for clarity while discussing the alternatives. To make sure that the python bindings and the rare exceptional plugin we might have missed can be easily updated to keep working, we could still have a PersistentIdString that causes a reference count to be maintained in a separate global dict<IdRef, int> persistent_ref_counts (or a shared_mutex protected version or some variant of that) where the explicit GC would consider that as an additional GC root in addition to the list of designs. Since this would only be used for the python bindings and for storing IdStrings across pass invocations, it’s not an issue if that comes with significant overhead compared to the current IdString reference counting.

If you’d prefer to keep reference counting for now, and just want to add the minimum necessary to have IdRef as an alternative for use within passes that are updated to use multi-threading, I’m also not opposed to that, but I don’t think we should go ahead and change the vast majority of IdStrings to IdRefs if there is a good chance that we can make them equivalent when we move to the primarily GC based approach in the future.