Equality and Identity in Python
The is keyword in Python is a common source of confusion. The common answer is: “tests for equality, tests for identity.” I will try to explain what this means, literally and in the broader sense.
Equality
Equality means that two objects are considered to have the same value. This does not mean that they are one and the same object—two different strings, for example, could very well have the same content. Determining equality is done by the object itself:
x == y
Is just syntactic sugar for:
x.__eq__(y)
So the operator, if it is invoked on an object that could be of any type (or rather: that could have their .__eq__()
method overloaded), can not be trusted.
Identity
Identity simply means that every object in Python has an ID. You can use the id() function to determine what the id is of a particular object. In other words:
x is y
Is just syntactic sugar for:
id(x) == id(y)
That’s all there is to the is keyword.
There is always only one object. Every time you encounter a variable that points to a object, it is that same object. So, if you get a random object, and you want to check if it is , you can (and should) use an identity check. Imagine if you get an object that always says it equals everything (i.e.: its equality method is overloaded to always return True). Now, you write , and it tells you that x
equals , because x
said so. The is
keyword can not be fooled so easily. Why this is, is left as an exercise to the reader.
Comments
> Imagine if you get an object that always says it equals everything
Invalid argument I say. That is a bug in that code, and not your problem. The object in this case is violating the contract of the equality method it is overriding. You are supposed to assume that the API contract is honored. Otherwise, what’s the point in having an API contract? Imagine if you get an object which exits the program, or formats your drive in it’s
__str__
overload. No more printing objects?In the current python implementation, ‘is’ is only useful if you want to ensure the two references point to the same object in memory. In general, you wouldn’t need to do that. That can have amusing side-effects for value-objects (term I coined for objects which are equal if all their data is equal, like integers. Includes all non-resource objects, like for example a Rectangle or Point object). Example: http://stackoverflow.com/questions/306313/python-is-operator-behaves-unexpectedly-with-integers
Feel free to poke me by IM or email if you wish to argue.
waqas said on: Saturday, May 23, 2009 3:32
Thanks for your reply! To bring some closure to the discussion, I will answer here.
I want to make two points: the example I chose was a bit extreme, albeit not impossible, and there a few more reasons for doing it this way that I did not mention.
First: the “extreme” example. Obviously, you do not really have to guard yourself against malicious objects in Python (all bytecode that gets executed by the interpreter can already make it do whatever it wants, anyway, no need for malicious magical methods, even). Malicious attempts are not the only thing, though: what about a benign object that evaluates to True when compared with None? Say, a database library object that is an abstraction for SQL’s NULL, or a Wildcard object, representing a pattern, used by an object oriented database? These are all arguably legitimate uses of making comparison to Python’s None resolve as True. Sometimes, however, you want to discern them from Python’s None nonetheless (e.g.: debugging, serialization, verifying a return-value from any function that can return such an object, etc.).
There is always exactly one None instance, just as with True and False, so you do not have the problem that those large, non-stack-allocated integers may give you, for example.
The second point is that this abstraction is not the only reason to use identity testing. For None, what I just mentioned is a reason to use it, indeed, but what is more: there are no real reasons not to use it. There is no downside to using “is None” (if you really want to check for None identity). But there are more advantages than just that: it is (slightly) faster, for one (since you do not need to invoke the equality testing method).
But in all fairness, one of the most compelling reasons to use “is None” in Python is simply that PEP 8 recommends it (see the “Programming Recommendations” header). And strictly following PEP 8 is what makes Python bearable. For everyone. :)
Hraban Luyat said on: Saturday, June 27, 2009 22:42