SSA: Add a shared signature for SSA and a module to implement it.#20646
SSA: Add a shared signature for SSA and a module to implement it.#20646aschackmull merged 3 commits intogithub:mainfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a shared signature for SSA (Static Single Assignment) and a module to implement it, as a step towards a unified SSA API. The change enables code sharing across languages by providing standardized signatures and implementations that eliminate the need for individual languages to cache SSA predicates and add wrapper classes.
Key changes:
- Added a comprehensive
SsaSigsignature defining the complete SSA class hierarchy and predicates - Introduced a
MakeSsamodule that implements theSsaSigsignature using existing SSA infrastructure - Updated the
InputSigto use generic type signatures instead of specificBasicBlockSig
caaaf36 to
b196714
Compare
hvitved
left a comment
There was a problem hiding this comment.
Overall LGTM, some comments.
| * For SSA definitions occurring at the beginning of a basic block, such as | ||
| * phi definitions, this will get the first control flow node of the basic block. |
There was a problem hiding this comment.
I'm not sure that is the case for existing languages. Would it be problematic for this predicate to not return a value for phi nodes (as well as SsaImplicitEntryDefinitions)?
There was a problem hiding this comment.
I know that this is a change for several languages, but Java depends on this in a couple of places, and I actually also made the change independently for C# over in #20558, since I needed it for nullness.
And in a vacuum I think it does make sense to always return a value - not returning a value for phis and entries present a bit of a gotcha, I think.
shared/ssa/codeql/ssa/Ssa.qll
Outdated
| * variable would be visible. | ||
| */ | ||
| class SsaPhiDefinition extends SsaDefinition { | ||
| /** Holds if `inp` is an input to the phi definition along the edge originating in `bb`. */ |
| * point in the flow graph where otherwise two or more definitions for the | ||
| * variable would be visible. | ||
| */ | ||
| class SsaPhiDefinition extends SsaDefinition { |
There was a problem hiding this comment.
I suggest copying the example from
codeql/ruby/ql/lib/codeql/ruby/dataflow/SSA.qll
Lines 370 to 381 in 7b32cd4
shared/ssa/codeql/ssa/Ssa.qll
Outdated
| /** Holds if `inp` is an input to the phi definition along the edge originating in `bb`. */ | ||
| predicate hasInputFromBlock(SsaDefinition inp, BasicBlock bb); | ||
|
|
||
| /** Gets an input of this phi definition. */ |
There was a problem hiding this comment.
Copy example from
?
shared/ssa/codeql/ssa/Ssa.qll
Outdated
| */ | ||
| Expr getValue(); | ||
|
|
||
| /** Holds if this write is an initialization of a parameter. */ |
There was a problem hiding this comment.
a parameter -> parameter p.
shared/ssa/codeql/ssa/Ssa.qll
Outdated
| * pseudo-read of the captured variable at the point of capture. | ||
| */ | ||
| predicate variableCapture( | ||
| SourceVariable captured, SourceVariable closureVar, BasicBlock bb, int i |
There was a problem hiding this comment.
I know that in C# and Java, we tag local variables with a callable to account for captures, but I forgot why that is needed? In both Ruby and Rust, we simply have SourceVariable = LocalVariable, and that works perfectly fine, but would that work with the signature above (i.e., captured = closureVar)?
There was a problem hiding this comment.
I seem to recall that it was necessary - at least at some point, but I also forgot exactly why. Looking at the shared SSA lib, I don't think it's necessary there. It might be necessary (or perhaps just convenient) in the language-specific portions for C# and Java.
This signature is indeed an example where it would break if SourceVariables were not tied to a callable - we could fix it by adding a column with e.g. the first BasicBlock of the closure to implicitly identify the callable. Do you think we should do that here? Otherwise we'll indeed require tagging if this is to be implemented with anything other than none().
There was a problem hiding this comment.
Yeah, I was also thinking if we should replace closureVar with the callable representing the closure or, as you suggest, its entry basic block.
| * phi nodes, this will get the first control flow node of the basic block. | ||
| */ | ||
| ControlFlowNode getControlFlowNode() { | ||
| exists(BasicBlock bb, int i | this.definesAt(_, bb, i) | result = bb.getNode(0.maximum(i))) |
There was a problem hiding this comment.
Again, I'm not convinced that we want 0.maximum(i).
shared/ssa/codeql/ssa/Ssa.qll
Outdated
| */ | ||
| class SsaWriteDefinition extends SsaDefinition instanceof WriteDefinition { } | ||
|
|
||
| private predicate explicitWrite(WriteDefinition def, VariableWrite write) { |
shared/ssa/codeql/ssa/Ssa.qll
Outdated
| Expr getValue() { result = this.getDefinition().getValue() } | ||
| } | ||
|
|
||
| private predicate parameterInit(WriteDefinition def, Parameter p) { |
shared/ssa/codeql/ssa/Ssa.qll
Outdated
| SsaImplicitEntryDefinition() { this.definesAt(_, any(EntryBasicBlock bb), -1) } | ||
|
|
||
| /** Holds if this is a closure definition that captures the value of `capturedvar`. */ | ||
| predicate captures(SsaDefinition capturedvar) { captures(this, capturedvar) } |
There was a problem hiding this comment.
I worry that this predicate may give the wrong impression that this definition will always read the same value as capturedvar (I suspect it you included it, because that is actually the case in Java?).
There was a problem hiding this comment.
I was a bit back and forth on this. Firstly, I think it's useful to be able to get to the values of the outer scope in ad-hoc def-use chains like getAnUltimateDefinition, and Java certainly makes good use of it, even though it's technically a bit wrong for Kotlin (Kotlin allows post-capture modifications IIRC).
Another option would be to extend it to get all the possible inputs from the outer scope in case the variable is modified post-capture - that way the Java case would stay the same and other languages would see potentially multiple origins for a capture init - a bit like a phi node.
x = 0
f = () => { .. x .. } // currently sees x=0, could be extended to also see x=1
x = 1
What do you think? OTOH maybe that's silly, if we don't also include the following
x = 0
f () => { .. x .. } // can see x=0, but what about x=2 from g?
g () => { .. x=2; .. }
There was a problem hiding this comment.
C# used to have rather complex logic for doing this kind of thing, but I got rid of it when switching over to the shared capture library. I think it is hard to make a good analysis for this kind of thing, without ultimately resolving to saying that all other SSA definitions could potentially be a source, which would make it useless.
So, if you still want it for Java, I think I would prefer to have that a Java-specific thing.
There was a problem hiding this comment.
Alright, then I think I'll just delete it from the shared class.
Unused so far, but this is intended as a step towards a unified SSA api, and a way to share more code by removing the need for individual languages to cache the SSA predicates and add wrapper classes.