Purity in the EVM

Purity in the EVM

This document seeks to define purity in the context of an Ethereum smart contract where the transaction data supplied to it in a call is considered the "input". Based upon this definition it then identifies which opcodes are impure and those which are pure or impure depending on their use.

This document is not official advice. Errors may be present.

This document is available as a Git repository at github.com/sigp/opcode-purity.

Background

This document is the result of "reverse engineering" the following two contracts and the majority any credit attributed to this document is deserving of their authors:

Definition of Impurity

A contract is considered pure if it will always return the same result given sufficient gas for execution and the same transaction data. Specifically, it may read the data field of a transaction but no other transaction context, it may not read block information and it must not read from or write to storage.

Impure Opcode Table

Opcode Value Mnemonic Impurity Category
0x31 BALANCE Always Impure
0x32 ORIGIN Always Impure
0x33 CALLER Always Impure
0x3a GASPRICE Always Impure
0x3b EXTCODESIZE Always Impure
0x3c EXTCODECOPY Always Impure
0x40 BLOCKHASH Always Impure
0x41 COINBASE Always Impure
0x42 TIMESTAMP Always Impure
0x43 NUMBER Always Impure
0x44 DIFFICULTY Always Impure
0x45 GASLIMIT Always Impure
0x46 - 0x4F Range of future impure opcodes Future Impure Opcodes
0x54 SLOAD Always Impure
0x55 SSTORE Always Impure
0xf0 CREATE Always Impure
0xff SELFDESTRUCT Always Impure
0xf1 CALL Potentially Impure Call-Type
0xf2 CALLCODE Potentially Impure Call-Type
0xf4 DELEGATECALL Potentially Impure Call-Type
0xfa STATICCALL Potentially Impure Call-Type
* 0xfb CREATE2 Always Impure

* Opcodes which were not implemented at the time of writing, but the author has an expectation they will be implemented in the future.

Impurity Categories

There are three classifications for impure opcodes: always impure, potentially impure call-type and future impure opcodes. Each category is described below.

Always Impure

These opcodes have no use other than to mutate state, return mutable state or provide context about the execution environment. Any contract which includes an "always impure" opcode should be immediately considered impure.

Future Impure Opcodes

These opcodes are assumed to be reserved for future impure opcodes. At the time of writing, there is no formal declaration that this is the case and this judgement is solely based off the authors informal conversations with the Ethereum community.

Potentially Impure Call-Type

Call-type opcodes (see the table for a listing) may execute code at some other address. It is possible for an external call to be either pure or impure, depending on the address specified for the call. The use of a call-type opcode can only be considered pure if the address specified is:

  • An address that has already been determined to be pure.
  • Any of the precompile addresses within the range of 0x0000000000000000000000000000000000000001 to 0x0000000000000000000000000000000000000008. Note: the purity of these contracts is yet to be confirmed.

See the Address Detection Techniques section for some techniques for extracting the address supplied to a call-type opcode from bytecode.

Any call to an externally-owned (non-contract) address should be considered impure. This is because it can potentially have impure code deployed to it.

Address Detection Techniques

Call-type opcodes (see the table for a listing) can only be considered pure if they call a specific set of addresses (see Potentially Impure Call-Types). Therefore, in order to permit some call-type opcodes it is necessary to determine the called address from the bytecode. This section describes methods which may be used to find the address supplied to the call-type opcode with certainty.

The code which may place an address on the stack for call-type opcode can be arbitrarily complex and only discoverable by executing said code. To allow purity checking within a single Ethereum transaction the techniques here are simplistic and will provide false positives (indicating impurity). However, these techniques should never produce false negatives (indicating purity).

Convenience Functions

First two convenience functions are declared; get_opcode(n) and get_last_opcode_param(n).

Convenience Function get_opcode(n)

Returns the n'th opcode declared in the subject bytecode[].

If n is out of bounds of bytecode[] the function returns None.

Example:

ADD = 0x01
PUSH2 = 0x61

bytecode = [PUSH2, 2, 1, ADD]
get_opcode(0)
# 3
get_opcode(2)
# None

Convenience Function get_last_opcode_param(n)

Returns the final parameter supplied to the n'th opcode declared in the subject bytecode[].

If n is out of bounds of bytecode[] or the n'th opcode does not have parameters the function returns None.

Example:

ADD = 0x01
PUSH2 = 0x61

bytecode = [PUSH2, 2, 1, ADD]
get_last_opcode_param(0)
# 1
get_last_opcode_param(1)
# None
get_last_opcode_param(2)
# None

Address Detection Functions

Four functions are now declared which return an address if a specific pattern of opcodes is found to precede a call-type opcode. If all of these functions return None, then the contract should be assumed to be impure.

Each function takes an input c which is the index of the call-type opcode in question.

Address Detection Function #1

PUSH1 = 0x60
PUSH32 = 0x7f

def address_detector_1(c):
    if PUSH1 <= get_opcode(c-2) <= PUSH32:
        return get_last_opcode_param(c-2)
    else:
        return None

Address Detection Function #2

SUB = 0x03
GAS = 0x5a
PUSH1 = 0x60
PUSH32 = 0x7f

def address_detector_2(c):
    if (get_opcode(c-1) == SUB and
       get_opcode(c-2) == GAS and
       PUSH1 <= get_opcode(c-3) <= PUSH32):
        return get_last_opcode_param(c-3)
    else:
        return None

Address Detection Function #3

GAS = 0x5a
SWAP1 = 0x90

def address_detector_3(c):
    if (get_opcode(c-1) == GAS OR
       get_opcode(c-1) == SWAP1):
        return get_last_opcode_param(c-2)
    else:
        return None

Address Detection Function #4

DUP1 = 0x80
DUP16 = 0x8f

def address_detector_4(c):
    if (DUP1 <= get_opcode(c-1) <= DUP16):
        return get_last_opcode_param(c-2)
    else:
        return None

Opcode Listing

This section contains an opcode-by-opcode listing of each defined opcode. For each opcode the following is provided:

  • Summary: a brief description of what the opcode does.
  • Impurity Reasoning: a reference demonstrating impurity reasoning.
  • Potential Attack: a scenario which assumes some attacker has deployed a contract and wishes to be able to have some pre-determined or ad hoc control of the return result of the contract. This section does not exhaustively list potential attacks, it simply provides an example for demonstrative purposes.

Specifications of opcodes can be found in Appendix H of the Ethereum Yellow Paper.

BALANCE

Summary: Returns the balance of some address.
References: py-evm/evm/vm/logic/context.py: balance()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract call by altering the balance of some external account.

ORIGIN

Summary: Returns the address of the sender of the transaction which triggered execution. In Solidity, this is tx.origin.
References: py-evm/evm/vm/logic/context.py: origin()
Impurity Reasoning: reads illegal transaction context.
Potential Attack: An attacker may influence the return value of a contract call by varying the private key with which a transaction is signed.

CALLER

Summary: Returns the address directly responsible for the execution. In Solidity, this is msg.sender.
References: py-evm/evm/vm/logic/context.py: caller()
Impurity Reasoning: reads illegal transaction context.
Potential Attack: An attacker may influence the return value of a contract call by varying the private key with which a transaction is signed or using an intermediary contract to alter the CALLER value.

GASPRICE

Summary: Returns the current gas price.
References: py-evm/evm/vm/logic/context.py: gasprice()
Impurity Reasoning: reads illegal transaction context.
Potential Attack: An attacker may influence the return value of a contract call by using some means to alter the gas price (e.g., directly controlling block proposers).

EXTCODESIZE

Summary: Returns the size of the code held at some address.
References: py-evm/evm/vm/logic/context.py: extcodesize()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract.
call by deploying code to some pre-computed address.

EXTCODECOPY

Summary: Copies some amount of code at some address to some position in memory.
References: py-evm/evm/vm/logic/context.py: extcodecopy()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract call by deploying code to some pre-computed address.

BLOCKHASH

Summary: Returns the hash of some past block (within the previous 256 complete blocks).
References: py-evm/evm/vm/logic/block.py: blockhash()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract call by controlling some portion of block proposers and selecting block hashes based upon how they will influence the contract call.

COINBASE

Summary: Returns the beneficiary address of the block.
References: py-evm/evm/vm/logic/block.py: coinbase()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract call by controlling some portion of block proposers and declaring the beneficiary address based upon how it will influence the contract call.

TIMESTAMP

Summary: Returns the timestamp of the block.
References: py-evm/evm/vm/logic/block.py: timestamp()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract call by controlling some portion of block proposers and declaring the timestamp based upon how it will influence the contract call.

NUMBER

Summary: Returns the number of the block (count of blocks in the chain since genesis).
References: py-evm/evm/vm/logic/block.py: number()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract call by selecting in which block a transaction should be included.

DIFFICULTY

Summary: Returns the block difficulty.
References: py-evm/evm/vm/logic/block.py: difficulty()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract call by assuming some control of the collective hash rate and modifying it based upon how it will influence the contract call.

GASLIMIT

Summary: Returns the block gas limit.
References py-evm/evm/vm/logic/block.py: gaslimit()
Impurity Reasoning: reads state.
Potential Attack: An attacker may influence the return value of a contract call by using some means to alter the gas limit (e.g., directly controlling block proposers or spamming the network).

SLOAD

Summary: Returns a word from storage.
References py-evm/evm/vm/logic/storage.py: sload()
Impurity Reasoning: reads state.
Potential Attack: At the time of writing the author is not aware of any attack using SLOAD if all other purity directives are followed. However, attacks could be imagined if combined with the SLOAD opcodes (other attacks may be possible).

SSTORE

Summary: Saves some word to storage.
References: py-evm/evm/vm/logic/storage.py: sstore()
Impurity Reasoning: reads and mutates state.
Potential Attack: At the time of writing the author is not aware of any attack using SSTORE if all other purity directives are followed. However, attacks could be imagined if combined with the SSTORE or GAS opcodes (other attacks may be possible).

CREATE

Summary: Creates a new account given some code.
References: py-evm/evm/vm/logic/system.py: Create.__call__()
Impurity Reasoning: reads and mutates state.
Potential Attack: At the time of writing the author is not aware of any attack using CREATE if all other purity directives are followed. However, attacks could be imagined if combined with the EXTCODESIZE opcode (other attacks may be possible).

SELFDESTRUCT

Summary: Registers the account for deletion, sending remaining Ether to some address.
References: py-evm/evm/vm/logic/system.py: _selfdestruct()
Impurity Reasoning: reads and mutates state.
Potential Attack: An attacker may self-destruct a contract, causing all future calls to it to fail.

CALL

Summary: Message-calls to some address.
References: py-evm/evm/vm/logic/call.py: Call()
Potential Impurity Reasoning: Executes code from another account.
Potential Attack: An attacker may call an impure contract and use its return data.

CALLCODE

Summary: Execute the code of some other account using the state of this account.
References: py-evm/evm/vm/logic/call.py: CallCode()
Potential Impurity Reasoning: Executes code from another account.
Potential Attack: An attacker may callcode an impure contract and read or mutate state.

DELEGATECALL

Summary: Execute the code of some other account using the state of this account whilst retaining the same values for sender and value.
References: py-evm/evm/vm/logic/call.py: DelegateCall()
Potential Impurity Reasoning: Executes code from another account.
Potential Attack: An attacker may delegate an impure contract and read or mutate state.

STATICCALL

Summary: Message-calls to some address without persisting state modifications.
References: py-evm/evm/vm/logic/call.py: StaticCall()
Potential Impurity Reasoning: Executes code from another account.
Potential Attack: An attacker may call an impure contract and use its return data.

CREATE2

This opcode has not been implemented at the time of writing.

Summary: Creates a new account given some code and some nonce (as opposed to CREATE which uses the current account nonce).
References: EIP86.
Impurity Reasoning: Reads and mutates state.
Potential Attack: An attacker could craft a contract which succeeds the first time it is called, but fails all other times.