| /* |
| * Copyright 2022 WebAssembly Community Group participants |
| * |
| * Licensed under the Apache License, Version 2.0 (the "License"); |
| * you may not use this file except in compliance with the License. |
| * You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| */ |
| |
| // |
| // Monomorphization of code based on callsite context: When we see a call, see |
| // if the information at the callsite can help us optimize. For example, if a |
| // parameter is constant, then using that constant in the called function may |
| // unlock a lot of improvements. We may benefit from monomorphizing in the |
| // following cases: |
| // |
| // * If a call provides a more refined type than the function declares for a |
| // parameter. |
| // * If a call provides a constant as a parameter. |
| // * If a call provides a GC allocation as a parameter. TODO |
| // * If a call is dropped. TODO also other stuff on the outside, e.g. eqz? |
| // |
| // We realize the benefit by creating a monomorphized (specialized/refined) |
| // version of the function, and call that instead. For example, if we have |
| // |
| // function foo(x) { return x + 22; } |
| // foo(7); |
| // |
| // then monomorphization leads to this: |
| // |
| // function foo(x) { return x + 22; } // original unmodified function |
| // foo_b(); // now calls foo_b |
| // function foo_b() { return 7 + 22; } // monomorphized, constant 7 applied |
| // |
| // This is related to inlining both conceptually and practically. Conceptually, |
| // one of inlining's big advantages is that we then optimize the called code |
| // together with the code around the call, and monomorphization does something |
| // similar. And, this pass does so by "reverse-inlining" content from the |
| // caller to the monomorphized function: the constant 7 in the example above has |
| // been "pulled in" from the caller into the callee. Larger amounts of code can |
| // be moved in that manner, both values sent to the function, and the code that |
| // receives it (see the mention of dropped calls, before). |
| // |
| // As this monormophization uses callsite context (the parameters, where the |
| // result flows to), we call it "Contextual Monomorphization." The full name is |
| // "Empirical Contextural Monomorphization" because we decide where to optimize |
| // based on a "try it and see" (empirical) approach, that measures the benefit. |
| // That is, we generate the monomorphized function as explained, then optimize |
| // that function, which contains the original code + code from the callsite |
| // context that we pulled in. If the optimizer manages to improve that combined |
| // code in a useful way then we apply the optimization, and if not then we undo. |
| // |
| // The empirical approach significantly reduces the need for heuristics. For |
| // example, rather than have a heuristic for "see if a constant parameter flows |
| // into a conditional branch," we simply run the optimizer and let it optimize |
| // that case. All other cases handled by the optimizer work as well, without |
| // needing to specify them as heuristics, so this gets smarter as the optimizer |
| // does. |
| // |
| // Aside from the main version of this pass there is also a variant useful for |
| // testing that always monomorphizes non-trivial callsites, without checking if |
| // the optimizer can help or not (that makes writing testcases simpler). |
| // |
| // This pass takes an argument: |
| // |
| // --pass-arg=monomorphize-min-benefit@N |
| // |
| // The minimum amount of benefit we require in order to decide to optimize, |
| // as a percentage of the original cost. If this is 0 then we always |
| // optimize, if the cost improves by even 0.0001%. If this is 50 then we |
| // optimize only when the optimized monomorphized function has half the |
| // cost of the original, and so forth, that is higher values are more |
| // careful (and 100 will only optimize when the cost goes to nothing at |
| // all). |
| // |
| // TODO: We may also want more arguments here: |
| // * Max function size on which to operate (to not even try to optimize huge |
| // functions, which could be slow). |
| // * Max optimized size (if the max optimized size is less than the size we |
| // inline, then all optimized cases would end up inlined; that would also |
| // put a limit on the size increase). |
| // * Max absolute size increase (total of all added code). |
| // |
| // TODO: When we optimize we could run multiple cycles: A calls B calls C might |
| // end up with the refined+optimized B now having refined types in its |
| // call to C, which it did not have before. This is in fact the expected |
| // pattern of incremental monomorphization. Doing it in the pass could be |
| // more efficient as later cycles can focus only on what was just |
| // optimized and changed. Also, operating on functions just modified would |
| // help the case of A calls B and we end up optimizing A after we consider |
| // A->B, and the optimized version sends more refined types to B, which |
| // could unlock more potential. |
| // TODO: We could sort the functions so that we start from root functions and |
| // end on leaves. That would make it more likely for a single iteration to |
| // do more work, as if A->B->C then we'd do A->B and optimize B and only |
| // then look at B->C. |
| // TODO: If this is too slow, we could "group" things, for example we could |
| // compute the LUB of a bunch of calls to a target and then investigate |
| // that one case and use it in all those callers. |
| // TODO: Not just direct calls? But updating vtables is complex. |
| // TODO: Should we look at no-inline flags? We do move code between functions, |
| // but it isn't normal inlining. |
| // |
| |
| #include "ir/cost.h" |
| #include "ir/effects.h" |
| #include "ir/find_all.h" |
| #include "ir/iteration.h" |
| #include "ir/manipulation.h" |
| #include "ir/module-utils.h" |
| #include "ir/names.h" |
| #include "ir/properties.h" |
| #include "ir/return-utils.h" |
| #include "ir/type-updating.h" |
| #include "ir/utils.h" |
| #include "pass.h" |
| #include "support/hash.h" |
| #include "wasm-limits.h" |
| #include "wasm-type.h" |
| #include "wasm.h" |
| |
| namespace wasm { |
| |
| namespace { |
| |
| // Pass arguments. See descriptions in the comment above. |
| Index MinPercentBenefit = 95; |
| |
| // A limit on the number of parameters we are willing to have on monomorphized |
| // functions. Large numbers can lead to large stack frames, which can be slow |
| // and lead to stack overflows. |
| // TODO: Tune this and perhaps make it a flag. |
| const Index MaxParams = 20; |
| // This must be less than the corresponding Web limitation, so we do not emit |
| // invalid binaries. |
| static_assert(MaxParams < WebLimitations::MaxFunctionParams); |
| |
| // Core information about a call: the call itself, and if it is dropped, the |
| // drop. |
| struct CallInfo { |
| Call* call; |
| // Store a reference to the drop's pointer so that we can replace it, as when |
| // we optimize a dropped call we need to replace (drop (call)) with (call). |
| // Or, if the call is not dropped, this is nullptr. |
| Expression** drop; |
| }; |
| |
| // Finds the calls and whether each one of them is dropped. |
| struct CallFinder : public PostWalker<CallFinder> { |
| std::vector<CallInfo> infos; |
| |
| void visitCall(Call* curr) { |
| // Add the call as not having a drop, and update the drop later if we are. |
| infos.push_back(CallInfo{curr, nullptr}); |
| } |
| |
| void visitDrop(Drop* curr) { |
| if (curr->value->is<Call>()) { |
| // The call we just added to |infos| is dropped. |
| assert(!infos.empty()); |
| auto& back = infos.back(); |
| assert(back.call == curr->value); |
| back.drop = getCurrentPointer(); |
| } |
| } |
| }; |
| |
| // Relevant information about a callsite for purposes of monomorphization. |
| struct CallContext { |
| // The operands of the call, processed to leave the parts that make sense to |
| // keep in the context. That is, the operands of the CallContext are the exact |
| // code that will appear at the start of the monomorphized function. For |
| // example: |
| // |
| // (call $foo |
| // (i32.const 10) |
| // (..something complicated..) |
| // ) |
| // (func $foo (param $int i32) (param $complex f64) |
| // .. |
| // |
| // The context operands are |
| // |
| // [ |
| // (i32.const 10) ;; Unchanged: this can be pulled into the called |
| // ;; function, and removed from the caller side. |
| // (local.get $0) ;; The complicated child cannot; keep it as a value |
| // ;; sent from the caller, which we will local.get. |
| // ] |
| // |
| // Both the const and the local.get are simply used in the monomorphized |
| // function, like this: |
| // |
| // (func $foo-monomorphized (param $0 ..) |
| // (..local defs..) |
| // ;; Apply the first operand, which was pulled into here. |
| // (local.set $int |
| // (i32.const 10) |
| // ) |
| // ;; Read the second, which remains a parameter to the function. |
| // (local.set $complex |
| // (local.get $0) |
| // ) |
| // ;; The original body. |
| // .. |
| // |
| // The $int param is no longer a parameter, and it is set in a local at the |
| // top: we have "reverse-inlined" code from the calling function into the |
| // caller, pulling the constant 10 into here. The second parameter cannot be |
| // pulled in, so we must still send it, but we still have a local.set there to |
| // copy it into a local (this does not matter in this case, but does if the |
| // sent value is more refined; always using a local.set is simpler and more |
| // regular). |
| std::vector<Expression*> operands; |
| |
| // Whether the call is dropped. TODO |
| bool dropped; |
| |
| bool operator==(const CallContext& other) const { |
| if (dropped != other.dropped) { |
| return false; |
| } |
| |
| // We consider logically equivalent expressions as equal (rather than raw |
| // pointers), so that contexts with functionally identical shape are |
| // treated the same. |
| if (operands.size() != other.operands.size()) { |
| return false; |
| } |
| for (Index i = 0; i < operands.size(); i++) { |
| if (!ExpressionAnalyzer::equal(operands[i], other.operands[i])) { |
| return false; |
| } |
| } |
| |
| return true; |
| } |
| |
| bool operator!=(const CallContext& other) const { return !(*this == other); } |
| |
| // Build the context from a given call. This builds up the context operands as |
| // as explained in the comments above, and updates the call to send any |
| // remaining values by updating |newOperands| (for example, if all the values |
| // sent are constants, then |newOperands| will end up empty, as we have |
| // nothing left to send). |
| // |
| // The approach implemented here tries to move as much code into the call |
| // context as possible. That may not always be helpful, say in situations like |
| // this: |
| // |
| // (call $foo |
| // (i32.add |
| // (local.get $x) |
| // (local.get $y) |
| // ) |
| // ) |
| // |
| // If we move the i32.add into $foo then it will still be adding two unknown |
| // values (which will be parameters rather than locals). Moving the add might |
| // just increase code size if so. However, there are many other situations |
| // where the more code, the better: |
| // |
| // (call $foo |
| // (i32.eqz |
| // (local.get $x) |
| // ) |
| // ) |
| // |
| // While the value remains unknown, after moving the i32.eqz into the target |
| // function we may be able to use the fact that it has at most 1 bit set. |
| // Even larger benefits can happen in WasmGC: |
| // |
| // (call $foo |
| // (struct.new $T |
| // (local.get $x) |
| // (local.get $y) |
| // ) |
| // ) |
| // |
| // If the struct never escapes then we may be able to remove the allocation |
| // after monomorphization, even if we know nothing about the values in its |
| // fields. |
| // |
| // TODO: Explore other options that are more careful about how much code is |
| // moved. |
| void buildFromCall(CallInfo& info, |
| std::vector<Expression*>& newOperands, |
| Module& wasm, |
| const PassOptions& options) { |
| Builder builder(wasm); |
| |
| // First, find the things we can move into the context and the things we |
| // cannot. Some things simply cannot be moved out of the calling function, |
| // such as a local.set, but also we need to handle effect interactions among |
| // the operands, because each time we move code into the context we are |
| // pushing it into the called function, which changes the order of |
| // operations, for example: |
| // |
| // (call $foo |
| // (first |
| // (a) |
| // ) |
| // (second |
| // (b) |
| // ) |
| // ) |
| // |
| // (func $foo (param $first) (param $second) |
| // ) |
| // |
| // If we move |first| and |a| into the context then we get this: |
| // |
| // (call $foo |
| // ;; |first| and |a| were removed from here. |
| // (second |
| // (b) |
| // ) |
| // ) |
| // |
| // (func $foo (param $second) |
| // ;; |first| is now a local, and we assign it inside the called func. |
| // (local $first) |
| // (local.set $first |
| // (first |
| // (a) |
| // ) |
| // ) |
| // ) |
| // |
| // After this code motion we execute |second| and |b| *before* the call, and |
| // |first| and |a| after, so we cannot do this transformation if the order |
| // of operations between them matters. |
| // |
| // The key property here is that all things that are moved into the context |
| // (moved into the monomorphized function) remain ordered with respect to |
| // each other, but must be moved past all non-moving things after them. For |
| // example, say we want to move B and D in this list (of expressions in |
| // execution order): |
| // |
| // A, B, C, D, E |
| // |
| // After moving B and D we end up with this: |
| // |
| // A, C, E and executing later in the monomorphized function: B, D |
| // |
| // Then we must be able to move B past C and E, and D past E. It is simplest |
| // to compute this in reverse order, starting from E and going back, and |
| // then each time we want to move something we can check if it can cross |
| // over all the non-moving effects we've seen so far. To compute this, first |
| // list out the post-order of the expressions, and then we'll iterate in |
| // reverse. |
| struct Lister |
| : public PostWalker<Lister, UnifiedExpressionVisitor<Lister>> { |
| std::vector<Expression*> list; |
| void visitExpression(Expression* curr) { list.push_back(curr); } |
| } lister; |
| // As a quick estimate, we need space for at least the operands. |
| lister.list.reserve(operands.size()); |
| |
| for (auto* operand : info.call->operands) { |
| lister.walk(operand); |
| } |
| |
| // Go in reverse post-order as explained earlier, noting what cannot be |
| // moved into the context, and while accumulating the effects that are not |
| // moving. |
| std::unordered_set<Expression*> immovable; |
| EffectAnalyzer nonMovingEffects(options, wasm); |
| for (auto i = int64_t(lister.list.size()) - 1; i >= 0; i--) { |
| auto* curr = lister.list[i]; |
| |
| // This may have been marked as immovable because of the parent. We do |
| // that because if a parent is immovable then we can't move the children |
| // into the context (if we did, they would execute after the parent, but |
| // it needs their values). |
| bool currImmovable = immovable.count(curr) > 0; |
| if (!currImmovable) { |
| // This might be movable or immovable. Check both effect interactions |
| // (as described before, we want to move this past immovable code) and |
| // reasons intrinsic to the expression itself that might prevent moving. |
| ShallowEffectAnalyzer currEffects(options, wasm, curr); |
| if (currEffects.invalidates(nonMovingEffects) || |
| !canBeMovedIntoContext(curr, currEffects)) { |
| immovable.insert(curr); |
| currImmovable = true; |
| } |
| } |
| |
| if (currImmovable) { |
| // Regardless of whether this was marked immovable because of the |
| // parent, or because we just found it cannot be moved, accumulate the |
| // effects, and also mark its immediate children (so that we do the same |
| // when we get to them). |
| nonMovingEffects.visit(curr); |
| for (auto* child : ChildIterator(curr)) { |
| immovable.insert(child); |
| } |
| } |
| } |
| |
| // We now know which code can be moved and which cannot, so we can do the |
| // final processing of the call operands. We do this as a copy operation, |
| // copying as much as possible into the call context. Code that cannot be |
| // moved ends up as values sent to the monomorphized function. |
| // |
| // The copy operation works in pre-order, which allows us to override |
| // entire children as needed: |
| // |
| // (call $foo |
| // (problem |
| // (a) |
| // ) |
| // (later) |
| // ) |
| // |
| // We visit |problem| first, and if there is a problem that prevents us |
| // moving it into the context then we override the copy and then it and |
| // its child |a| remain in the caller (and |a| is never visited in the |
| // copy). |
| for (auto* operand : info.call->operands) { |
| operands.push_back(ExpressionManipulator::flexibleCopy( |
| operand, wasm, [&](Expression* child) -> Expression* { |
| if (!child) { |
| // This is an optional child that is not present. Let the copy of |
| // the nullptr happen. |
| return nullptr; |
| } |
| |
| if (!immovable.count(child)) { |
| // This can be moved; let the copy happen. |
| return nullptr; |
| } |
| |
| // This cannot be moved. Do not copy it into the call context. In the |
| // example above, |problem| remains as an operand on the call (so we |
| // add it to |newOperands|), and in the call context all we have is a |
| // local.get that reads that sent value. |
| auto paramIndex = newOperands.size(); |
| newOperands.push_back(child); |
| // TODO: If one operand is a tee and another a get, we could actually |
| // reuse the local, effectively showing the monomorphized |
| // function that the values are the same. EquivalentSets may |
| // help here. |
| return builder.makeLocalGet(paramIndex, child->type); |
| })); |
| } |
| |
| dropped = !!info.drop; |
| } |
| |
| // Checks whether an expression can be moved into the context. |
| bool canBeMovedIntoContext(Expression* curr, |
| const ShallowEffectAnalyzer& effects) { |
| // Pretty much everything can be moved into the context if we can copy it |
| // between functions, such as constants, globals, etc. The things we cannot |
| // copy are now checked for. |
| if (effects.branchesOut || effects.hasExternalBreakTargets()) { |
| // This branches or returns. We can't move control flow between functions. |
| return false; |
| } |
| if (effects.accessesLocal()) { |
| // Reads/writes to local state cannot be moved around. |
| return false; |
| } |
| if (effects.calls) { |
| // We can in principle move calls, but for simplicity we avoid such |
| // situations (which might involve recursion etc.). |
| return false; |
| } |
| if (Properties::isControlFlowStructure(curr)) { |
| // We can in principle move entire control flow structures with their |
| // children, but for simplicity stop when we see one rather than look |
| // inside to see if we could transfer all its contents. (We would also |
| // need to be careful when handling If arms, etc.) |
| return false; |
| } |
| for (auto* child : ChildIterator(curr)) { |
| if (child->type.isTuple()) { |
| // Consider this: |
| // |
| // (call $target |
| // (tuple.extract 2 1 |
| // (local.get $tuple) |
| // ) |
| // ) |
| // |
| // We cannot move the tuple.extract into the context, because then the |
| // call would have a tuple param. While it is possible to split up the |
| // tuple, or to check if we can also move the children with the parent, |
| // for simplicity just ignore this rare situation. |
| return false; |
| } |
| } |
| return true; |
| } |
| |
| // Check if a context is trivial relative to a call, that is, the context |
| // contains no information that can allow optimization at all. Such trivial |
| // contexts can be dismissed early. |
| bool isTrivial(Call* call, Module& wasm) { |
| // Dropped contexts are not trivial. |
| if (dropped) { |
| return false; |
| } |
| |
| // The context must match the call for us to compare them. |
| assert(operands.size() == call->operands.size()); |
| |
| // If an operand is not simply passed through, then we are not trivial. |
| auto callParams = wasm.getFunction(call->target)->getParams(); |
| for (Index i = 0; i < operands.size(); i++) { |
| // A local.get of the same type implies we just pass through the value. |
| // Anything else is not trivial. |
| if (!operands[i]->is<LocalGet>() || operands[i]->type != callParams[i]) { |
| return false; |
| } |
| } |
| |
| // We found nothing interesting, so this is trivial. |
| return true; |
| } |
| }; |
| |
| } // anonymous namespace |
| |
| } // namespace wasm |
| |
| namespace std { |
| |
| template<> struct hash<wasm::CallContext> { |
| size_t operator()(const wasm::CallContext& info) const { |
| size_t digest = hash<bool>{}(info.dropped); |
| |
| wasm::rehash(digest, info.operands.size()); |
| for (auto* operand : info.operands) { |
| wasm::hash_combine(digest, wasm::ExpressionAnalyzer::hash(operand)); |
| } |
| |
| return digest; |
| } |
| }; |
| |
| // Useful for debugging. |
| [[maybe_unused]] void dump(std::ostream& o, wasm::CallContext& context) { |
| o << "CallContext{\n"; |
| for (auto* operand : context.operands) { |
| o << " " << *operand << '\n'; |
| } |
| if (context.dropped) { |
| o << " dropped\n"; |
| } |
| o << "}\n"; |
| } |
| |
| } // namespace std |
| |
| namespace wasm { |
| |
| namespace { |
| |
| struct Monomorphize : public Pass { |
| // If set, we run some opts to see if monomorphization helps, and skip cases |
| // where we do not help out. |
| bool onlyWhenHelpful; |
| |
| Monomorphize(bool onlyWhenHelpful) : onlyWhenHelpful(onlyWhenHelpful) {} |
| |
| void run(Module* module) override { |
| // TODO: parallelize, see comments below |
| |
| applyArguments(); |
| |
| // Find all the return-calling functions. We cannot remove their returns |
| // (because turning a return call into a normal call may break the program |
| // by using more stack). |
| auto returnCallersMap = ReturnUtils::findReturnCallers(*module); |
| |
| // Note the list of all functions. We'll be adding more, and do not want to |
| // operate on those. |
| std::vector<Name> funcNames; |
| ModuleUtils::iterDefinedFunctions( |
| *module, [&](Function* func) { funcNames.push_back(func->name); }); |
| |
| // Find the calls in each function and optimize where we can, changing them |
| // to call the monomorphized targets. |
| for (auto name : funcNames) { |
| auto* func = module->getFunction(name); |
| |
| CallFinder callFinder; |
| callFinder.walk(func->body); |
| for (auto& info : callFinder.infos) { |
| if (info.call->type == Type::unreachable) { |
| // Ignore unreachable code. |
| // TODO: return_call? |
| continue; |
| } |
| |
| if (info.call->target == name) { |
| // Avoid recursion, which adds some complexity (as we'd be modifying |
| // ourselves if we apply optimizations). |
| continue; |
| } |
| |
| // If the target function does a return call, then as noted earlier we |
| // cannot remove its returns, so do not consider the drop as part of the |
| // context in such cases (as if we reverse-inlined the drop into the |
| // target then we'd be removing the returns). |
| if (returnCallersMap[module->getFunction(info.call->target)]) { |
| info.drop = nullptr; |
| } |
| |
| processCall(info, *module); |
| } |
| } |
| } |
| |
| void applyArguments() { |
| MinPercentBenefit = std::stoul(getArgumentOrDefault( |
| "monomorphize-min-benefit", std::to_string(MinPercentBenefit))); |
| } |
| |
| // Try to optimize a call. |
| void processCall(CallInfo& info, Module& wasm) { |
| auto* call = info.call; |
| auto target = call->target; |
| auto* func = wasm.getFunction(target); |
| if (func->imported()) { |
| // Nothing to do since this calls outside of the module. |
| return; |
| } |
| |
| // TODO: ignore calls with unreachable operands for simplicty |
| |
| // Compute the call context, and the new operands that the call would send |
| // if we use that context. |
| CallContext context; |
| std::vector<Expression*> newOperands; |
| context.buildFromCall(info, newOperands, wasm, getPassOptions()); |
| |
| // See if we've already evaluated this function + call context. If so, then |
| // we've memoized the result. |
| auto iter = funcContextMap.find({target, context}); |
| if (iter != funcContextMap.end()) { |
| auto newTarget = iter->second; |
| if (newTarget != target) { |
| // We saw benefit to optimizing this case. Apply that. |
| updateCall(info, newTarget, newOperands, wasm); |
| } |
| return; |
| } |
| |
| // This is the first time we see this situation. First, check if the context |
| // is trivial and has no opportunities for optimization. |
| if (context.isTrivial(call, wasm)) { |
| // Memoize the failure, and stop. |
| funcContextMap[{target, context}] = target; |
| return; |
| } |
| |
| // Create the monomorphized function that includes the call context. |
| std::unique_ptr<Function> monoFunc = |
| makeMonoFunctionWithContext(func, context, wasm); |
| |
| // If we ended up with too many params, give up. In theory we could try to |
| // monomorphize in ways that use less params, but this is a rare situation |
| // that is not easy to handle (when we move something into the context, it |
| // *removes* a param, which is good, but if it has many children and end up |
| // not moved, that is where the problem happens, so we'd need to backtrack). |
| // TODO: Consider doing more here. |
| if (monoFunc->getNumParams() >= MaxParams) { |
| return; |
| } |
| |
| // Decide whether it is worth using the monomorphized function. |
| auto worthwhile = true; |
| if (onlyWhenHelpful) { |
| // Run the optimizer on both functions, hopefully just enough to see if |
| // there is a benefit to the context. We optimize both to avoid confusion |
| // from the function benefiting from simply running another cycle of |
| // optimization. |
| // |
| // Note that we do *not* discard the optimizations to the original |
| // function if we decide not to optimize. We've already done them, and the |
| // function is improved, so we may as well keep them. |
| // |
| // TODO: Atm this can be done many times per function as it is once per |
| // function and per set of types sent to it. Perhaps have some |
| // total limit to avoid slow runtimes. |
| // TODO: We can end up optimizing |func| more than once. It may be |
| // different each time if the previous optimization helped, but |
| // often it will be identical. We could save the original version |
| // and use that as the starting point here (and cache the optimized |
| // version), but then we'd be throwing away optimization results. Or |
| // we could see if later optimizations do not further decrease the |
| // cost, and if so, use a cached value for the cost on such |
| // "already maximally optimized" functions. The former approach is |
| // more amenable to parallelization, as it avoids path dependence - |
| // the other approaches are deterministic but they depend on the |
| // order in which we see things. But it does require saving a copy |
| // of the function, which uses memory, which is avoided if we just |
| // keep optimizing from the current contents as we go. It's not |
| // obvious which approach is best here. |
| doOpts(func); |
| |
| // The cost before monomorphization is the old body + the context |
| // operands. The operands will be *removed* from the calling code if we |
| // optimize, and moved into the monomorphized function, so the proper |
| // comparison is the context + the old body, versus the new body (which |
| // includes the reverse-inlined call context). |
| // |
| // Note that we use a double here because we are going to subtract and |
| // multiply this value later (and want to avoid unsigned integer overflow, |
| // etc.). |
| double costBefore = CostAnalyzer(func->body).cost; |
| for (auto* operand : context.operands) { |
| // Note that a slight oddity is that we have *not* optimized the |
| // operands before. We optimize func before and after, but the operands |
| // are in the calling function, which we are not modifying here. In |
| // theory that might lead to false positives, if the call's operands are |
| // very unoptimized. |
| costBefore += CostAnalyzer(operand).cost; |
| } |
| if (costBefore == 0) { |
| // Nothing to optimize away here. (And it would be invalid to divide by |
| // this amount in the code below.) |
| worthwhile = false; |
| } else { |
| // There is a point to optimizing the monomorphized function, do so. |
| doOpts(monoFunc.get()); |
| |
| double costAfter = CostAnalyzer(monoFunc->body).cost; |
| |
| // Compute the percentage of benefit we see here. |
| auto benefit = 100 - ((100 * costAfter) / costBefore); |
| if (benefit <= MinPercentBenefit) { |
| worthwhile = false; |
| } |
| } |
| } |
| |
| // Memoize what we decided to call here. |
| funcContextMap[{target, context}] = worthwhile ? monoFunc->name : target; |
| |
| if (worthwhile) { |
| // We are using the monomorphized function, so update the call and add it |
| // to the module. |
| updateCall(info, monoFunc->name, newOperands, wasm); |
| |
| wasm.addFunction(std::move(monoFunc)); |
| } |
| } |
| |
| // Create a monomorphized function from the original + the call context. It |
| // may have different parameters, results, and may include parts of the call |
| // context. |
| std::unique_ptr<Function> makeMonoFunctionWithContext( |
| Function* func, const CallContext& context, Module& wasm) { |
| |
| // The context has an operand for each one in the old function, each of |
| // which may contain reverse-inlined content. A mismatch here means we did |
| // not build the context right, or are using it with the wrong function. |
| assert(context.operands.size() == func->getNumParams()); |
| |
| // Pick a new name. |
| auto newName = Names::getValidFunctionName(wasm, func->name); |
| |
| // Copy the function as the base for the new one. |
| auto newFunc = ModuleUtils::copyFunctionWithoutAdd(func, wasm, newName); |
| |
| // A local.get is a value that arrives in a parameter. Anything else is |
| // something that we are reverse-inlining into the function, so we don't |
| // need a param for it. Note that we might have multiple gets nested here, |
| // if we are copying part of the original parameter but not all children, so |
| // we scan each operand for all such local.gets. |
| // |
| // Use this information to generate the new signature, and apply it to the |
| // new function. |
| std::vector<Type> newParams; |
| for (auto* operand : context.operands) { |
| FindAll<LocalGet> gets(operand); |
| for (auto* get : gets.list) { |
| newParams.push_back(get->type); |
| } |
| } |
| // If we were dropped then we are pulling the drop into the monomorphized |
| // function, which means we return nothing. |
| auto newResults = context.dropped ? Type::none : func->getResults(); |
| newFunc->type = Signature(Type(newParams), newResults); |
| |
| // We must update local indexes: the new function has a potentially |
| // different number of parameters, and parameters are at the very bottom of |
| // the local index space. We are also replacing old params with vars. To |
| // track this, map each old index to the new one. |
| std::unordered_map<Index, Index> mappedLocals; |
| auto newParamsMinusOld = |
| newFunc->getParams().size() - func->getParams().size(); |
| for (Index i = 0; i < func->getNumLocals(); i++) { |
| if (func->isParam(i)) { |
| // Old params become new vars inside the function. Below we'll copy the |
| // proper values into these vars. |
| // TODO: We could avoid a var + copy when it is trivial (atm we rely on |
| // optimizations to remove it). |
| auto local = Builder::addVar(newFunc.get(), func->getLocalType(i)); |
| mappedLocals[i] = local; |
| } else { |
| // This is a var. The only thing to adjust here is that the parameters |
| // are changing. |
| mappedLocals[i] = i + newParamsMinusOld; |
| } |
| } |
| |
| // Copy over local names to help debugging. |
| if (!func->localNames.empty()) { |
| newFunc->localNames.clear(); |
| newFunc->localIndices.clear(); |
| for (Index i = 0; i < func->getNumLocals(); i++) { |
| auto oldName = func->getLocalNameOrDefault(i); |
| if (oldName.isNull()) { |
| continue; |
| } |
| |
| auto newIndex = mappedLocals[i]; |
| auto newName = Names::getValidLocalName(*newFunc.get(), oldName); |
| newFunc->localNames[newIndex] = newName; |
| newFunc->localIndices[newName] = newIndex; |
| } |
| }; |
| |
| Builder builder(wasm); |
| |
| // Surrounding the main body is the reverse-inlined content from the call |
| // context, like this: |
| // |
| // (func $monomorphized |
| // (..reverse-inlined parameter..) |
| // (..old body..) |
| // ) |
| // |
| // For example, if a function that simply returns its input is called with a |
| // constant parameter, it will end up like this: |
| // |
| // (func $monomorphized |
| // (local $param i32) |
| // (local.set $param (i32.const 42)) ;; reverse-inlined parameter |
| // (local.get $param) ;; copied old body |
| // ) |
| // |
| // We need to add such a local.set in the prelude of the function for each |
| // operand in the context. |
| std::vector<Expression*> pre; |
| for (Index i = 0; i < context.operands.size(); i++) { |
| auto* operand = context.operands[i]; |
| |
| // Write the context operand (the reverse-inlined content) to the local |
| // we've allocated for this. |
| auto local = mappedLocals.at(i); |
| auto* value = ExpressionManipulator::copy(operand, wasm); |
| pre.push_back(builder.makeLocalSet(local, value)); |
| } |
| |
| // Map locals. |
| struct LocalUpdater : public PostWalker<LocalUpdater> { |
| const std::unordered_map<Index, Index>& mappedLocals; |
| LocalUpdater(const std::unordered_map<Index, Index>& mappedLocals) |
| : mappedLocals(mappedLocals) {} |
| void visitLocalGet(LocalGet* curr) { updateIndex(curr->index); } |
| void visitLocalSet(LocalSet* curr) { updateIndex(curr->index); } |
| void updateIndex(Index& index) { |
| auto iter = mappedLocals.find(index); |
| assert(iter != mappedLocals.end()); |
| index = iter->second; |
| } |
| } localUpdater(mappedLocals); |
| localUpdater.walk(newFunc->body); |
| |
| if (!pre.empty()) { |
| // Add the block after the prelude. |
| pre.push_back(newFunc->body); |
| newFunc->body = builder.makeBlock(pre); |
| } |
| |
| if (context.dropped) { |
| ReturnUtils::removeReturns(newFunc.get(), wasm); |
| } |
| |
| return newFunc; |
| } |
| |
| // Given a call and a new target it should be calling, apply that new target, |
| // including updating the operands and handling dropping. |
| void updateCall(const CallInfo& info, |
| Name newTarget, |
| const std::vector<Expression*>& newOperands, |
| Module& wasm) { |
| info.call->target = newTarget; |
| info.call->operands.set(newOperands); |
| |
| if (info.drop) { |
| // Replace (drop (call)) with (call), that is, replace the drop with the |
| // (updated) call which now has type none. Note we should have handled |
| // unreachability before getting here. |
| assert(info.call->type != Type::unreachable); |
| info.call->type = Type::none; |
| *info.drop = info.call; |
| } |
| } |
| |
| // Run some function-level optimizations on a function. Ideally we would run a |
| // minimal amount of optimizations here, but we do want to give the optimizer |
| // as much of a chance to work as possible, so for now do all of -O3 (in |
| // particular, we really need to run --precompute-propagate so constants are |
| // applied in the optimized function). |
| // TODO: Perhaps run -O2 or even -O1 if the function is large (or has many |
| // locals, etc.), to ensure linear time, but we could miss out. |
| void doOpts(Function* func) { |
| PassRunner runner(getPassRunner()); |
| runner.options.optimizeLevel = 3; |
| runner.addDefaultFunctionOptimizationPasses(); |
| runner.setIsNested(true); |
| runner.runOnFunction(func); |
| } |
| |
| // Maps [func name, call info] to the name of a new function which is a |
| // monomorphization of that function, specialized to that call info. |
| // |
| // Note that this can contain funcContextMap{A, ...} = A, that is, that maps |
| // a function name to itself. That indicates we found no benefit from |
| // monomorphizing with that context, and saves us from computing it again |
| // later on. |
| std::unordered_map<std::pair<Name, CallContext>, Name> funcContextMap; |
| }; |
| |
| } // anonymous namespace |
| |
| Pass* createMonomorphizePass() { return new Monomorphize(true); } |
| |
| Pass* createMonomorphizeAlwaysPass() { return new Monomorphize(false); } |
| |
| } // namespace wasm |