'Why does `std::input_iterator<In>` requires a `value_type`?
I am trying to create a data structure for arrays of dynamic-sized arrays. Multiple choices are possible, the simplest one being std::vector<std::vector<T>>. However it is often not efficient and we would like to compress the data of all the inner vectors into one big vector, and have a vector of offsets to tell where each element begins.
Example:
// encoding of : | 4.,5.,1. | 7.,8.,9.,2 |
std::vector<double> v = {4.,5.,1., 7.,8.,9.,2};
std::vector<int> offsets = {0 , 3 , 7};
Let's encapsulate it ! Consider the following data structure:
(note: the code is neither complete, general or precise, at this point this is just to give an idea of what is going on):
class vblock_vector {
private:
std::vector<double> v;
std::vector<int> offsets;
public:
using iterator = vblock_iterator;
auto begin() -> iterator {
return {v.data(),offsets.data()};
}
auto end() -> iterator {
return {v.data(),offsets.data()+offsets.size()};
}
};
An basic implementation of the iterator type is the following:
struct vblock_iterator {
private:
double* ptr;
int* offsets_ptr;
public:
using reference = span_ref<double>; // see notes (0) and (1)
// using value_type = ???; // See below
auto operator++() {
++offsets_ptr;
return *this;
}
auto operator*() const {
return span_ref<double,int>(ptr+offsets_ptr[0],ptr+offsets_ptr[1]);
}
auto operator<=>(const vblock_iterator&) const = default;
// ... other iterator interface stuff that is trivial
};
This iterator works with e.g. std::copy. (4)
Now let's say that I want to replace my old std::copy calls with std::ranges::copy. For that, vblock_iterator needs to satisfy the std::input_iterator concept. In order to do that, vblock_iterator needs to have an associated value_type (required by the intermediate std::indirectly_readable concept).
An obvious choice would be using value_type = std::vector<double>(2), but I surely don't want to give std::ranges::copy the freedom to use this type at its discretion in its implementation: it would be inefficient.
My question is the following : why does std::input_iterator<In> requires In to have a value_type? At least for copying it is not needed (the fact that I can use std::copy and that it does the right thing proves it). Of course, one can say : "define value_type to be anything, it won't be used by std::range::copy implementations anyway", but then why require it?
I am currently under the impression that value_type is mandatory for e.g. std::swappable, but not for std::input_iterator (nor even std::random_access_iterator dare I say). But the standard committee decided otherwise: what is the reason behind this choice? (3)
Notes:
(0) span_ref is just like a std::span with reference semantics (its operator= is "assign-through" and not "rebind to new array").
(1) In reality, the reference type needs to be a tad more complex to account for offsets, but it is not the subject here. Suffice to say, that it is possible to have an efficient reference type for this structure.
(2) And I think this is the only reasonable choice. At least a container is needed (vector, deque...). E.g. a std::span won't do because if we bother to save the value pointed to by the iterator, it is because we will modify the original memory, and std::span won't help us with that.
(3) In the presentation of the std::indirectly_readable concept (then called Readable), Eric Niebler goes into some detail of why we need value_type to be related in some form to reference to work well with proxy references, but I still don't see why we would would even need value_type for algorithms that don't need to swap elements (or store them somewhere). Yes, there is mathematically a value_type for vblock_iterator, but why require it if it is not meant to be used? (similarly, there is also mathematical operator+= for forward ranges : but since it is inefficient, it is simply not required).
(4) And other algorithms: std::move, std::find, std::find_if, std::any_of, std::partition_point, std::lower_bound, std::unique... So I think that there is something more fundamental going on than: "we are just lucky with std::copy".
Solution 1:[1]
std::copy requires a LegacyInputIterator for its iterator types. It does not check this requirement. If you fail to provide a LegacyInputIterator, your program is ill-formed, no diagnostic required.
A LegacyInputIterator requires that std::iterator_traits<X>::value_type exists because it subsumes LegacyIterator.
So your program was ill-formed once you passed it to std::copy. The behavior of your ill-formed program is not determined by the C++ standard in any way; the compiler can legally provide you a program that emails your browser history to your great aunt Eustice and be standard compliant. Or it could do something that happens to align with what you think the program "should" do. Or it could fail to compile.
The std::ranges algorithms have slightly different requirements. These requirements are far more likely to be checked by concepts than the old style algorithms are, telling the user with a compile time error.
You are running into such a case.
To be even more clear, you cannot rely on the implementation of std code to enforce the standard.
These types are required partly to make it easier to talk about the types in question and what operations on them mean, semantically.
Beyond the simple requirements like std::iterator_traits<X>::value_type exist, there are semantic requirements on what *it does, what x = *it++ does, etc. Most of those requirements cannot be checked by the compiler (due to Rice's theorem, they cannot be checked in theory); but the algorithms in the std namespace rely on those semantic meanings being correct for any iterator passed in.
Because the compiler can assume the semantic meanings are correct, the algorithms can be cleaner, simpler and faster than if they had to check them. And it means that multiple different compiler vendors can write different std algorithm implementations, improving the algorithm over each other, and there is an objective standard to argue against.
For a LegacyInputIterator and types value_type and reference from std::iterator_traits<X>, we must have:
value_type v = *it;
is a valid expression, *it must return reference, and
*it++
must return a type convertible to value_type.
Not every algorithm need use every property of every iterator it requires that iterator to have. The goal here is to have semantically meaningful categories that do not demand too much in the way of overhead.
Requiring that an iterator over stuff actually have a type it is an iterator over is not a large overhead. And it makes talking about that the iterator is insanely easier.
You could refactor it and remove that concept, or cut the concept up into smaller pieces so that the value_type is only required in the narrow cases where it is required, but that would make the concepts harder to write about and harder to understand.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
