Introduction
In this article, we dive into the storage system of the NEAR blockchain. We'll explore how storage works on NEAR, how to use it securely, and highlight some of the common pitfalls.
This is our second post about the NEAR blockchain. If you missed the first post, it offers an in-depth look at sharding and cross-contract calls.
How NEAR storage works
Every account on NEAR has storage which can be used to store data in a persistent manner. As such it is often used to store variables such as balances, NFT metadata or others. Under the hood, storage on NEAR is a key-value store. However, developers will rarely interact directly with the kv-store, instead they use the NEAR SDK to make use of storage.
The NEAR SDK abstracts the kv-store away and provides a more developer friendly interaction with storage. The SDK allows developers to define a storage layout. This is a struct marked with the #[near(contract_state)]
macro and it delineates which variables are in storage for this contract. Only one storage layout struct can be defined for a contract. For example, the below contract has 2 variables in its storage: a string and a vector of bytes.
#[near(contract_state)]
pub struct Contract {
greeting: String,
vector: Vector<u8>,
}
All regular Rust datatypes like i32
and String
can be used, however the NEAR SDK offers replacements for collection types (more on this later). The default()
function can be used to assign default values to the state variables.
When instantiating SDK collection types, such as Vector
, you must give it a unique prefix. This prefix is used to calculate the storage location of this collection.
fn default() -> Self {
Self {
greeting: "hi".to_string(),
vector: Vector::new("vecPref"),
}
}
The SDK will do the heavy lifting of serializing and deserializing these high-level variables to and from the low level kv-store. When doing so, the SDK uses the following structure for the kv-store:
- The main contract state is serialized to bytes (using Borsh) and stored in the kv-store with key
STATE
. - Collection types (such as
Vector
) are stored separately using the user-defined prefix. Each element of the collection gets stored in its own key-value pair with keyprefix|index of element
.
In our example, given that greeting = 'hi'
and vector
has one element equal to 1
, the contract will have the following in its kv-store:
[
{
"key": "STATE",
// u32(greeting.length)|greeting|u64(vector.length)|u32(vector.prefix.length))|vector.prefix
"value": "\x02\x00\x00\x00hi\x01\x00\x00\x00\x00\x00\x00\x00\x07\x00\x00\x00vecPref"
},
{
// vector.prefix|u64(index to access)
"key": "vecPref\x00\x00\x00\x00\x00\x00\x00\x00",
// vector[0]
"value": "\x01"
}
]
// Modified from https://docs.near.org/build/smart-contracts/anatomy/serialization#modifying-the-state
When a function that takes self
as a parameter is invoked, the main contract state (with key STATE
) is deserialized and stored in memory in its entirety. This is in contrast to the contents of SDK collection types, these are only deserialized and loaded in memory when needed (e.g. during a vec.get()
operation).
Similarly, the main contract state is serialized from memory and gets stored back into the kv-store at the end of contract execution. In contrast, the contents of collection types are serialized from memory and stored in storage when the collection is dropped (goes out of scope). This subtle difference has some implications that are explained later.
SDK Types
Now that it is clear how storage works at a lower level, we can explain the important distinction between SDK collection types (Vector
) and native Rust collection types (Vec
): SDK collections are stored in seperate key-value pairs as mentioned above, in contrast native Rust collections will be stored in the main contract state (STATE
). Seeing as the main contract state is always loaded in its entirety, this means that Rust collections will also always be loaded as a whole. This can be very resource intensive for large collections and may even lead to DoS issues. As a result, SDK types should be used for most collections while native Rust collection should only be used for very small collections.
Another case where SDK types should be used over native types is for u64
and u128
types. Since JSON is used to encode the inputs and outputs of function calls and JSON can’t represent integers of size u64
and u128
directly, the SDK types U64
and U128
should be used instead of the native Rust types u64
and u128
. The SDK types are automatically represented as strings when necessary.
Upgrading contracts
NEAR has built-in functionality to upgrade contracts. This allows modifying the code of the contract after deployment, however, the storage is not modified during this process. This means that the storage layout must remain the same. Otherwise, if the storage layout does not match the storage a deserialization error may occur and the contract will no longer work. Alternatively, if the storage layout must be changed, a migration function can be implemented which migrates the storage to the new storage layout.
Storage Staking
NEAR implements storage staking. This means that an amount of NEAR is locked based on the amount of storage used in a contract. When the storage is freed, the NEAR is unlocked from the contract and can be transferred out. This means that a contract should always have sufficient NEAR tokens to cover any storage allocations it may perform, otherwise a panic occurs when attempting to write. The cost of storage should be passed on to the user, otherwise, a contract opens itself up to griefing and DoS attacks (covered in the next section).
NEP-145 defines the Storage Management standard, this is a common interface for passing the storage cost on to users. When using this standard, users have to register to a contract and deposit some NEAR to pay for their storage. They may top-up or withdraw their storage balance depending on their usage of the contract.
Security Aspects
Storake Staking Security
The storage staking mechanism means that NEAR gets locked in accordance with how much storage the contract uses. As a result, the contract must keep track of exactly how much storage each user consumes and must pass this cost onto the user. If a contract does not do this, it opens up several attack vectors:
- If a contract does not have enough NEAR to pay for its storage it will panic, causing DoS issues.
- If this panic occurs during a critical part of execution, such as a callback, this may cause loss of funds.
- If a contract passes only a part of the storage cost to the user, then the user can launch a griefing attack. By using a large amount of storage the user forces the contract to pay a large amount of NEAR while only paying a fraction themselves. This may eventually lead to the above Denial-of-Service condition.
- Additionally if the contract needed some NEAR to pay for other things (e.g. to pay out rewards to the user), this balance may be locked by storage staking, meaning that the contract no longer has enough balance to cover the rewards.
Prefix Collisions
When using SDK collection types the developer must specify a prefix. This prefix is used to calculate the key of where the collection is stored in the key-value store.
- As a result, it is crucial that this prefix is unique. If the prefix is not unique, then those collections will read/write from the same storage and will overwrite each other.
- The best practice here is to use an
enum
to define storage keys. This guarantees their uniqueness. For example:
#[near]
#[derive(BorshStorageKey)]
pub enum Prefix {
Owners,
Users,
}
#[near(contract_state)]
pub struct StorageExample {
pub owners: Vector<String>,
pub users: LookupSet<String>,
}
impl Default for StorageExample {
fn default() -> Self {
Self {
owners: Vector::new(Prefix::Owners),
users: LookupSet::new(Prefix::Users),
}
}
}
// Modified from https://docs.near.org/build/smart-contracts/anatomy/collections#sdk-collections
Unsafe Collections
As explained above, when modifying variables, some state (e.g. contents of a Vector
) can be saved to storage before other state (e.g. the length of a Vector
) is saved. This can result in some unexpected behaviour in some circumstances:
When instantiating 2 objects with the same prefix they can have the same contents (since that is saved to storage), but not the same length (saved in memory)
let mut m = UnorderedMap::<u8, String>::new(b"m");
m.insert(1, "test".to_string()); // this writes "test" to storage
assert_eq!(m.len(), 1); // length is saved in memory
assert_eq!(m.get(&1), Some(&"test".to_string()));
m = UnorderedMap::new(b"m"); // instantiate second object with same prefix
assert!(m.is_empty()); // not same length
assert_eq!(m.get(&1), Some(&"test".to_string())); // but same contents
/// From https://docs.near.org/build/smart-contracts/anatomy/collections#error-prone-patterns
Collections should be attached to the main state. Otherwise, the contents are written to storage, but metadata such as length is lost. This is demonstrated in the following example:
#[near(contract_state)]
pub struct Contract {
vector: Vector<i32>,
}
impl Default for Contract {
fn default() -> Self {
Self {
vector: Vector::new("pref".as_bytes()),
}
}
}
#[near]
impl Contract {
pub fn test(&mut self) -> String {
{
// v is 'detached from the state'
let mut v = Vector::<u32>::new("pref".as_bytes());
v.push(1);
assert_eq!(v.len(), 1);
assert_eq!(v.get(0).unwrap(), &1);
} // when v is dropped its contents are saved, but its metadata (length) is lost
assert_eq!(self.vector.len(), 0); // length was not saved
// when reading the storage manually, the content is intact
let storage_key = &[b"pref".as_slice(), &0u32.to_le_bytes()].concat();
assert_eq!(
near_sdk::env::storage_read(storage_key).unwrap(), 1u32.to_le_bytes()
);
}
}
NEAR SDK's collections
vs store
This post assumes usage of near_sdk::store
for collections. This is the new version of the old near_sdk::collections
module. store
features better gas efficiency among other advantages. Additionally, collections
has some behaviour that can cause issues when nesting. As such, it is recommended to use store
over collections
.
Conclusion
Account storage in NEAR features some interesting mechanics such as storage staking and collection management. It is important to understand how these mechanics and their nuances work in order to create secure and robust smart contracts.