I left a comment on HN about how I teach new users not to clone, and someone asked me if I’d written more, so here we go!
When new users are writing Rust code, it’s reasonably common to “fight with the borrow checker”. I
personally find this to be quite a misnomer, as it is not a fight so much as a misunderstanding of
the nature of Rust’s semantics. In an attempt to make the errors go away, users will understandably
take the path of least resistance, according to the docs they’ve got. They dutifully read up, and
eventually land on the clone
method. This makes the errors go away! Great! Except if you were my newbie, in the code review I’d tell you clone
is banned unless you can tell me why you need it.
A simple example
Let’s consider a simple example. I’m going to write some C, and then write the same thing in Rust, and we’ll discuss how they differ.
#include <string.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
/* A thing struct that holds a pointer to some data */
typedef struct {
char* name;
} AThing;
/* A method to operate on that data */
size_t thinglen(const AThing *const thing) {
return strlen(thing->name);
}
int main() {
char* name = (char*)malloc(11);
if (!name) return -1;
strcpy(name, "rico suave");
AThing my_thing = { .name = name };
printf("%s is %lu chars long\n", name, thinglen(&my_thing));
free(name);
return 0;
}
Okay, so that’s … not the best C ever written. But it gets the job done and it’s intended to be a simple illustration. Let’s try writing a version in naive Rust:
/// Our thing
struct AThing {
/// Some data associated with our thing
pub name: String,
}
impl AThing {
/// Do some operation upon this thing
pub fn thing_len(&self) -> usize {
self.name.len()
}
}
fn main() {
let name = String::from("don juan");
let my_thing = AThing { name: name };
println!("{} is {} chars long", name, my_thing.thing_len());
}
Great! You know that you used a pointer in the C code, but trying to make the type of AThing::name
a &str
made the compiler mad and you fell down a rabbit hole reading about lifetimes. Rust has
String
, so we’ll use that. You probably predicted this, but the compiler will further complain
about the above example:
error[E0382]: borrow of moved value: `name`
--> src/main.rs:12:37
|
10 | let name = String::from("don juan");
| ---- move occurs because `name` has type `std::string::String`, which does not implement the `Copy` trait
11 | let my_thing = AThing { name: name };
| ---- value moved here
12 | println!("{} is {} chars long", name, thinglen(my_thing));
| ^^^^ value borrowed here after move
Oh bother. So you go and you read up on what the Copy
trait is, and the docs say you can derive
it. Easy! So you try, and it fails. So you do some more reading, and you fight with the borrow
checker a bit, and maybe eventually you land on
let my_thing = AThing { name: name.clone() };
and the code compiles and runs like you expect. This is admittedly a contrived example, and only really illustrates the kind of error that every tutorial covers in better depth than this post is intended to do.
The important part is this: cloning is almost always the wrong solution, but it’s an alluringly easy way to make the compiler stop bugging you and let you get on with your work. So how can you, as a novice Rust programmer, know when it’s the right call and when it’s not?
Update: I’ve gotten some questions about what this example would look like without clone, so I’ve added a source code listing in the first addendum.
What Does clone
Do?
I’m not going to regurgitate the docs here. They’re exceptionally well-written. Give them a read after we have our chat.
At its simplest, cloning makes a copy of the data, in a way that makes sure any references and
pointers internal to the data are valid (usually by copying the data behind the pointer). Most of
the time, you’re only going to care about library types like String
and Vec
when you’re deciding
whether to clone
. This means the stdlib has taken care of making sure “any references and pointers
internal to the data are valid”. If you’re implementing Clone
yourself, it’s almost certainly
going to consist of you calling clone
on some private objects that themselves implement Clone
.
It’s clones all the way down. Either way, you end up with a new copy of the data.
This is the first big clue: do you really need a whole new copy of the data? Irrespective of the demands
of the compiler, does it make sense from a design perspective to have two copies of the data? Take a
look at the above code examples. The C code has the AThing
struct keep a pointer to the data. That
means you can pass the pointer around. This makes sense, right? You only really need the string to
exist in one place at a time in this simple example. The Rust code clones the string. Is that
sensical? Do you need multiple copies of “don juan” in memory? Maybe you do! But probably you don’t.
It works similarly for vectors. If you’re cloning the vector, you’re saying that you need two entire copies of everything in the vector. Do you, really? Likewise for HashMaps, or any other collection.
What Does a Bad Clone Look Like?
So now I’ve told you all this, but the compiler is still refusing to build your code, or I am still refusing to approve your pull request. You’re stuck! I’ve encountered a few anti-patterns that Rust newbies encounter that can be instructive.
Copying a String
when a &str
Will Do
The first pattern is calling clone
on a String
. This is occasionally a totally reasonable thing
to do, but very often you don’t actually need it. Let’s consider two examples:
fn make_lipographic(banned: char, line: String) -> String {
line.as_str().chars().filter(|&c| c != banned).collect()
}
fn main() {
let passage = String::from("If Youth, throughout all history, had had a champion to stand up for it");
assert_eq!(make_lipographic('e', passage.clone()), passage);
}
The problem with this code is pretty straightforward: as in C, the make_lipographic
function
doesn’t need to own the line
. You wouldn’t pass it by value in C (ignoring, for a moment, the
difficulty in doing such a thing), so you shouldn’t move it in Rust. Instead, have it take a
reference (in C, a const * const
):
fn make_lipographic(banned: char, line: &str) -> String {
line.as_str().chars().filter(|&c| c != banned).collect()
}
Okay, but what about more complicted situations? Let’s say you want to store a value in a HashMap
,
and then retrieve it later.
use std::collections::HashMap;
fn main() {
// this String e.g. is read from a file or from stdin.
let quote = String::from("Up to about its primary school days a child thinks, naturally, only of play.");
// A map to store example quotes and the character it avoids.
let mut lipogram_map: HashMap<String, char> = HashMap::new();
// store the quote
lipogram_map.insert(quote, 'e');
// get the quote back out... oops!
println!("quote [{}] is missing character '{}'", quote, lipogram_map.get("e).unwrap());
}
This example is, of course, a bit contrived, but compiling it will fail:
error[E0382]: borrow of moved value: `quote`
--> src/main.rs:10:54
|
6 | let quote = String::from("Up to about its primary school days a child thinks, naturally, only of play.");
| ----- move occurs because `quote` has type `std::string::String`, which does not implement the `Copy` trait
7 |
8 | lipogram_map.insert(quote, 'e');
| ----- value moved here
9 |
10 | println!("quote [{}] is missing character '{}'", quote, lipogram_map.get("e).unwrap());
| ^^^^^ value borrowed here after move
This error message probably looks a bit similar, since it also complains about not implementing
Copy
. In this case, the HashMap
owns the String
s it uses as keys. That is, in order to
insert a value, you must move the key into it… but once you do that, you don’t have it any more!
Cloning the data is, in this case, possibly a good choice. If you have confidence that the key
(which, remember, the HashMap
owns) can be recreated, or doesn’t need to be unique in memory, then
you may find cloning to be a better option than trying to use some type of smart pointer to
eliminate duplication of data in memory.
Cloning a Vec
when iterating
Another common pattern is when a user will clone a collection when trying to iterate over it. Here’s an example of an attempt at iteration that will fail to compile:
fn main() {
let mut bad_letters = vec!['e', 't', 'o', 'i'];
for l in bad_letters {
// do something here
}
bad_letters.push('s');
}
This results in the following error:
error[E0382]: borrow of moved value: `bad_letters`
--> src/main.rs:6:5
|
2 | let mut bad_letters = vec!['e', 't', 'o', 'i'];
| --------------- move occurs because `bad_letters` has type `std::vec::Vec<char>`, which does not implement the `Copy` trait
3 | for l in bad_letters {
| ----------- value moved here
...
6 | bad_letters.push('s');
| ^^^^^^^^^^^ value borrowed here after move
Once again, the compiler appears to be telling us that we need to clone the Vec
before we can
iterate over it:
fn main() {
let mut bad_letters = vec!['e', 't', 'o', 'i'];
for l in bad_letters.clone() {
// do something here
}
bad_letters.push('s');
}
…but that doesn’t make sense from a design perspective. Why would we need to copy it if we’re not doing anything to, or even with, it? The reason for this is that the for _ in expr
syntax is syntactic sugar that uses IntoIterator::into_iter
.
That is to say, a for
loop is the equivalent to:
fn main() {
let mut bad_letters = vec!['e', 't', 'o', 'i'];
let mut iter = bad_letters.into_iter();
loop {
match iter.next() {
Some(c) => {
// do something here
},
None => break,
}
}
bad_letters.push('n')
}
The relevant signature is:
fn into_iter(mut self) -> IntoIter<T>;
Note that this function moves self
. When we add a clone
into the mix, what we’ve written is:
let iter = bad_letters.clone().into_iter();
loop {
match iter.next() { // ...
This way, the iterator is created over its own copy (an “owned copy”) of bad_letters
that
into_iter
can consume. Put another way, we move a new owned copy of bad_letters
into
into_iter
. It makes the compiler error go away, but it changes the meaning of our program
considerably.
The need, therefore, is to find a way to construct the loop so that bad_letters
is not moved. The
solution is easy: move a reference. References in Rust are free and can be created and destroyed
freely! We can make a reference, and move it into into_iter
, and still own the original vector!
for l in &bad_letters { // that's it! the code now compiles and does what you expect
// ...
}
Or, the desugared version:
let mut iter = (&bad_letters).into_iter();
// ...
I had a whole write up about this, but then I found this blog
post by Karol Kuczmarski, which is excellent. I
highly recommend reading and understanding it. It gets pretty deep into the weeds, but is a good
case study in how traits and types compose in the Rust standard library. Understanding this (deep,
but straightfoward) relationship will help to frame your thinking about Rust in the future. As a
reader’s guide, I recommend you pay close attention to the different function signatures for
into_iter
when implemented for Vec<T>
and &Vec<T>
.
Cloning when Handling an Option
or a Result
Another tempting place to clone is when dealing with Option
and Result
. These are specific cases
of any enumeration type that carries internal data, but as a new Rust programmer, you’re likely to
encounter them quickly.
As before, let’s contrive an example. Here, we’ve got a collection of Strings
that should not
contain a particular char
. Because we may not yet have some text, we will keep it inside
an Option
.
pub struct LipogramCorpora {
selections: Vec<(char, Option<String>)>,
}
impl LipogramCorpora {
pub fn validate_all(&mut self) -> Result<(), char> {
for selection in &self.selections {
if selection.1.is_some() {
if selection.1.unwrap().contains(selection.0) {
return Err(selection.0);
}
}
}
Ok(())
}
}
Excellent! We’ve got a way to store example bodies of text, alongside the character they shouldn’t
contain. And because C has trained us to check for null pointers, and Option::None
is kinda-sorta
like a null pointer, we even check to make sure our text is there before we unwrap it.
Except, our old friend “cannot move out of borrowed content” shows up again:
error[E0507]: cannot move out of borrowed content
--> src/lib.rs:10:20
|
10 | if selection.1.unwrap().contains(selection.0) {
| ^^^^^^^^^^^ cannot move out of borrowed content
error: aborting due to previous error
But you were so careful! No worries, you know what to do here. You’ll just clone
!
if selection.1.is_some() {
if selection.1.clone().unwrap().contains(selection.0) {
return Err(selection.0);
And all is well. Until I get your code review and remind you that we’ll be running this method on
billions of texts that could well be in the tens of GiBs and copying that much data would murder our
performance. So how do we accomplish this with no clone
? Let’s consider two options, one more
idiomatic than the other.
First we consider what unwrap
actually does:
pub fn unwrap(self) -> T;
This function moves self, and returns an owned copy of T
. Remember, that means it consumes
self
and gives you the actual data inside the Option
. When you clone it, you’re consuming the
clone, and getting another copy of that “actual data”.
In Rust, we really like references, right? Happily, Option
provides a method called
as_ref
, with the signature:
pub fn as_ref(&self) -> Option<&T>;
This borrows self (the Option
) and returns a reference to the data inside the Option
. Huzzah! Now our code
can look like this:
if selection.1.is_some() {
if selection.1.as_ref().unwrap().contains(selection.0) {
return Err(selection.0);
The key here is that immutable references are Copy
- they can be… copied. And Option
is Copy
if the thing it contains is Copy
(which it is, because it holds a reference to our String
). So when you call unwrap
, it gets a copy of an Option<&String>
that it can consume and all is well.
A note for meditation: calling fn foo(thing)
moves thing
if it can’t make a copy of it. If it can (if thing
’s type is Copy
) it will make a copy instead. There is no way to move a variable that is Copy
.
A more idiomatic way is to dispense with checking that the Option
is not None
and stop using
unwrap
entirely. We do this using if let
:
if let Some(text) = selection.1 {
if text.contains(selection.0) {
return Err(selection.0);
Much better.
Now Go Forth and Clone
Hopefully showing you some counterexamples will help you think about why you’re doing what you do in Rust, which will help reduce the amount of time you spend writing ill-formed code. If you have any more examples that you’ve seen in the wild, please send them along. I’ll add some more as I come up with them too.
Addenda
That first example, but without clone
/// Our thing
struct AThing<'a> {
/// A reference to some data associated with our thing
pub name: &'a str,
}
impl<'a> AThing<'a> {
/// Do some operation upon this thing
pub fn thing_len(&self) -> usize {
self.name.len()
}
}
fn main() {
let dj_name = String::from("don juan");
let my_thing = AThing { name: &dj_name };
println!("{} is {} chars long", name, my_thing.thing_len());
}
It’s a fairly straightforward change. We change the name
member to a &str
and give a name ('a
) to the lifetime of the borrow. We borrow dj_name
because it’s a String
. If we had written:
let dj_name = "don juan";
let my_thing = AThing { name: dj_name };
Then borrow isn’t necessary because the type of dj_name
would be &'static str
. Because references are Copy
, we just copy the “pointer” to the data right into our new AThing
. (The 'static
just means that the reference is always valid aka has a static lifetime aka is statically declared).