gh-132657: Add lock-free set contains implementation (#132290)

This roughly follows what was done for dictobject to make a lock-free
lookup operation. With this change, the set contains operation scales much
better when used from multiple-threads. The frozenset contains performance
seems unchanged (as already lock-free).

Summary of changes:

* refactor set_lookkey() into set_do_lookup() which now takes a function
  pointer that does the entry comparison. This is similar to dictobject and
  do_lookup(). In an optimized build, the comparison function is inlined and
  there should be no performance cost to this.

* change set_do_lookup() to return a status separately from the entry value

* add set_compare_frozenset() and use if the object is a frozenset. For the
  free-threaded build, this avoids some overhead (locking, atomic operations,
  incref/decref on key)

* use FT_ATOMIC_* macros as needed for atomic loads and stores

* use a deferred free on the set table array, if shared (only on free-threaded
  build, normal build always does an immediate free)

* for free-threaded build, use explicit for loop to zero the table, rather than memcpy()

* when mutating the set, assign so->table to NULL while the change is a
  happening. Assign the real table array after the change is done.
5 files changed