Physical streams
A Tydi physical stream carries a stream of elements, dimensionality information for said elements, and (optionally) user-defined transfer information from a source to a sink. The contents of these three groups and to some extent the way in which they are transferred are defined in a parameterized way, to provide the necessary flexibility for the higher levels of this specification, and to give hardware developers the freedom to optimize their designs. Based on these parameters, this specification defines which signals must exist on the interface, and the significance of these signals.
Parameters
A physical stream is parameterized as \(\textrm{PhysicalStream}(E, N, D, C, U)\). The significance of these parameters is defined in the following subsections.
The parameters are defined on the interfaces of the source and the sink rather than the stream between them, because the complexity parameter need not be the same on the source and sink. The others do.
Element content (E) and user/transfer content (U)
\(E\) and \(U\) are of the form \(\textrm{Fields}(N_1 : b_1, N_2 : b_2, ..., N_n : b_n)\), where \(N\) are names, \(b\) are positive integers representing bit counts, and \(n\) is a nonnegative integer, describing a list of \(n\) named bit vector signals.
The element type can be given zero fields to make a "null" stream. Such streams can still be useful, as physical streams also carry metadata. The bit counts are however not allowed to be zero, because a lot of synthesis tools out there do not handle null ranges very well.
The difference between the element content and the user/transfer content of a stream, is that a transfer may encode zero or more elements, but always encodes one instance of the user/transfer content.
The user/transfer signals are intended for adding lower-level user-defined protocol signals to the stream, similar to the use of the
*USER
signals in AXI4. For instance, physical streams could be transferred over a network-on-chip-like structure by attaching routing information such as source and destination addresses through this method.
The name of each field is a string consisting of letters, numbers, and/or underscores. It may be empty, represented in this specification as \(\varnothing\).
The name cannot start or end with an underscore.
The name cannot start with a digit.
It is illegal to start or end with an underscore or start with a number to prevent confusion when the name is prefixed to form the signal name, and for compatibility with VHDL.
The name must be case-insensitively unique within the set of named fields.
The identifier is case-insensitive because compatibility with VHDL is desired.
\(|\textrm{Fields}(N_1 : b_1, N_2 : b_2, ..., N_n : b_n)|\) is a shorthand defined to equal \(\sum_{i=1}^{n} b_i\); that is, the sum of the field bit count over all fields in the element.
Number of element lanes (N)
\(N\) must be an integer greater than or equal to one. The signals used for describing a data element are replicated this many times, allowing multiple elements to be transferred per stream handshake. Each replication is called a lane.
Dimensionality (D)
\(D\) must be an integer greater than or equal to zero. It specifies the
number of last
bits needed to represent the data.
Intuitively, each sequence nesting level adds one to this number. For instance, to stream two-dimensional sequences, two
last
bits are needed: one to mark the boundaries of the inner sequence, and one to mark the boundary of each two-dimensional sequence.
Complexity (C)
\(C\) must be a nonempty list of nonnegative integers. It encodes the guarantees a source makes about how elements are transferred. Equivalently, it encodes the assumptions a sink can safely make.
Higher C Lower C Fewer guarantees made by source More guarantees made by source Source is easier to implement Source is harder to implement Sink can make fewer assumptions Sink can make more assumptions Sink is harder to implement Sink is easier to implement C is usually specified as a period-separated list, like a version number or paragraph/secion number, because like those things, it is orderable.
The complexity number carries the following intuitive significance.
\(C\) Description < 8 Only one sequence can be terminated per transfer. < 7 The indices of the active data lanes can be described with a simple range. < 6 The range of active data lanes must always start with lane zero. < 5 All lanes must be active for all but the last transfer of the innermost sequence. < 4 The last
flag cannot be postponed until after the transfer of the last element.< 3 Innermost sequences must be transferred in consecutive cycles. < 2 Whole outermost instances must be transferred in consecutive cycles. The exact requirements imposed for each \(C\) can be found along with the unconditional signal requirements in later sections.
The complexity levels and signals are defined such that a source with complexity \(C_a\) can be connected to a sink with complexity \(C_b \ge C_a\). A complexity number is higher than another when the leftmost integer of either is greater, and lower when the leftmost integer is lower. If the leftmost integer is equal, the next integer is checked. If one complexity number has more entries than another, the shorter number is padded with zeros on the right.
That is, it works a bit like a version number. The idea behind this is to allow future version of this specification (or any other kind of addition later) to insert new complexity levels between the ones defined here. For instance, 3 < 3.1 < 3.1.1 < 3.2 < 4.
A stream is considered to be normalized when \(C < 4\). At this complexity level, there is a one-to-one mapping between the transferred data and the actual transfers for any given \(N\). This is called the canonical representation of the data for a given \(N\).
Signals
A physical stream is comprised of the following signals.
Name | Origin | Purpose |
---|---|---|
valid | Source | Stalling the data stream due to the source not being ready. |
ready | Sink | Stalling the data stream due to the sink not being ready. |
data | Source | Data transfer of \(N\) \(|E|\)-bit elements. |
last | Source | Indicating the last transfer for \(D\) levels of nested sequences. |
stai | Source | Start index; encodes the index of the first valid lane. |
endi | Source | End index; encodes the index of the last valid lane. |
strb | Source | Strobe; encodes individual lane validity for \(C \ge 8\), empty sequences otherwise. |
user | Source | Additional control information carried along with the stream. |
The valid
and ready
signals are scalars, while all the other signals are
bit vectors with the following widths.
Name | Width |
---|---|
valid | scalar |
ready | scalar |
data | \(N \times |E|\) |
last | \(N \times D\) |
stai | \(\lceil \log_2{N} \rceil\) |
endi | \(\lceil \log_2{N} \rceil\) |
strb | \(N\) |
user | \(|U|\) |
Clock
All signals are synchronous to a single clock signal, shared between the source and sink.
It is up to the user to manage clock domains and clock domain crossing where needed, and specify the edge sensitivity of the clock(s).
Reset
This specification places no constraints on the reset signal. The requirements
on the valid
and ready
signals ensure that no transfers occur when either
the source or the sink is held under reset, so the source, synchronicity, and
sensitivity of the reset signal(s) can be chosen by the user.
Detailed signal description
valid
signal description
The
valid
signal has the same semantics as theTVALID
AXI4-stream signal. These semantics are repeated here.
The valid
signal indicates the presence of a valid transfer payload of the
other downstream signals. It is active-high.
The state of the valid
signal is always significant.
That is, if the source is in some undefined or "uninitialized" state,
valid
must nonetheless be low.
The downstream signals, including valid
, must be stable (maintain their
state) from the first active clock edge in which valid
is asserted until
acknowledgement of the transfer. Note that the validating and acknowledging
clock edge can be one and the same if ready
is already asserted during the
first clock cycle; in this case there are no stability requirements.
In other words, a transfer may not be cancelled or modified after validation. This allows a sink to start doing some sort of multicycle processing as soon as
valid
is asserted, only assertingready
when it is done, without having to worry about cancellation or changes in the transfer content. This prevents the need for extra holding registers or latches otherwise needed to keep the original transfer.
When a transfer is acknowledged, either the next transfer is presented on the
downstream signals and valid
remains asserted, or valid
is released.
[\(C < 3\)] valid
may only be released when lane \(N - 1\) of the
last
signal in the acknowledged transfer is nonzero.
[\(C < 2\)] valid
may only be released when lane \(N - 1\) of the
last
signal in the acknowledged transfer is all ones.
valid
must be released while the source is being reset.
This prevents spurious transfers when the sink has a different reset source.
ready
signal description
The
ready
signal has the same semantics as theTREADY
AXI4-stream signal. These semantics are repeated here.
The ready
signal indicates acknowledgement of a transfer initiated through
valid
. It is active-high.
The state of the ready
signal is significant only while valid
is asserted.
That is, a source must never wait for
ready
to assert before it assertsvalid
. Doing so can lead to deadlock. However, a sink may wait forvalid
to assert before assertingready
.
A transfer is considered "handshaked" when both valid
and ready
are
asserted during the active clock edge of the clock domain common to the source
and the sink.
ready
must be released while the sink is being reset.
This prevents transfers from being lost when the source has a different reset source.
data
signal description
The data
signal consists of \(N\) concatenated lanes, each carrying a
number of data element fields, totaling to \(|E|\) bits per lane.
The state of lane \(i\) in the data
signal is significant only while
valid
is asserted and data lane \(i\) is active. Data lane \(i\) is
considered active if and only if
- bit \(i\) of
strb
is asserted, - the unsigned integer interpretation of
endi
is greater than or equal to \(i\), and - the unsigned integer interpretation of
stai
is less than or equal to \(i\).
The redundancy of the
endi
andstai
signals given the presence of thestrb
signal (only applicable for \(C \ge 7\)) has to do with the rule that it must always be possible to connect a source with low complexity to a sink with high complexity. That is, while a source that desires individual control over the lanes and thus has \(C \ge 7\) would probably always driveendi
to \(N-1\) andstai
to 0, a sink with complexity \(C \ge 7\) still needs to accept input from sources that do useendi
andstai
.The overhead incurred by this should be negligible in practice. First of all, because \(C \ge 7\) with significant \(N\) will be quite large to begin with, so managing two additional signals of size \(\lceil \log_2{N} \rceil\) probably doesn't matter much. Secondly, in most cases, constant propagation will remove the unneeded
endi
andstai
signals whenever both source and sink have \(C \ge 7\). Finally, even if the above fails, going fromstai
+endi
+strb
to the internal lane enable signals in a sink is trivial for modern FPGAs and realistic \(N\), as the less-than-equal and greater-than-equal comparisons fit in a single 6-LUT up to and including \(N = 64\), resulting in just three decoding LUTs per lane (one forstai
, one forendi
, and one 3-input AND gate to merge the outputs of the first two and thestrb
bit).
The significant lanes are interpreted in order of increasing index for each transfer.
last
signal description
The last
signal consists of N
concatenated lanes, each carrying a
termination bit for each nested sequence represented by the physical stream.
The last
bit for lane \(i\) and dimension \(j\) (i.e., bit
\(i \cdot D + j\)) being asserted indicates completion of a sequence with
nesting level \(D - j - 1\), containing all elements streamed between the
previous lane/transfer this occured for dimension \(j' \ge j\) (or system
reset, if never) exclusively and this point inclusively, where nesting level
is defined to be 0 for the outermost sequence, 1 for its inner sequence, up to
\(D - 1\) for the innermost sequence. It is active-high.
For example, one way to represent the value
["Hello", "World"], ["Tydi", "is", "nice"], [""], []
with \(N = 6, C \ge 8\) is as follows. Note that \(D = 2\) follows from the data type, \(|E|\) depends on the character encoding, and the example does not depend on \(U\).reset released here | transfer --> v A B C D last "000100000000" "000011000000" "000001000100" "000010010000" data "WolleH" "yTdlro" "insiid" "----ec" strb "111111" "111111" "111111" "000011" .------------. .------------. .------------. .------------. 0 | data: 'H' | | data: 'o' | | data: 'd' | | data: 'c' | | last: 0 0 | | last: 0 0 | | last: 0 0 | | last: 0 0 | |------------| |------------| |------------| |------------| 1 | data: 'e' | | data: 'r' | | data: 'i' | | data: 'e' | | last: 0 0 | | last: 0 0 | | last: 0 1 | | last: 0 0 | |------------| |------------| |------------| |------------| lane 2 | data: 'l' | | data: 'l' | | data: 'i' | | data: - | index | last: 0 0 | | last: 0 0 | | last: 0 0 | | last: 0 1 | | |------------| |------------| |------------| |------------| v 3 | data: 'l' | | data: 'd' | | data: 's' | | data: - | | last: 0 0 | | last: 1 1 | | last: 0 1 | | last: 1 0 | |------------| |------------| |------------| |------------| 4 | data: 'o' | | data: 'T' | | data: 'n' | | data: - | | last: 0 1 | | last: 0 0 | | last: 0 0 | | last: 1 1 | |------------| |------------| |------------| |------------| 5 | data: 'W' | | data: 'y' | | data: 'i' | | data: - | | last: 0 0 | | last: 0 0 | | last: 0 0 | | last: 1 0 | '------------' '------------' '------------' '------------'
"Hello"
is delimited by the reset condition and the assertedlast
bit in transfer A, lane 4, dimension 0 (innermost)."World"
is delimited by the aforementioned and thelast
bit in transfer B, lane 3, also for dimension 0. Thelast
bit for B3/dimension 1 is also asserted, which delimits["Hello", "World"]
along with the reset condition; both the"Hello"
and"World"
sequence are fully contained by the delimited portion of the stream, so the outer sequence has those as its two entries."Tydi"
is delimited by B3 and C1,"is"
by C1 and C3, and"nice"
by C3 and D2 for dimension 0. Note that the D2 is postponed beyond the last element (D1) by means of deactivating data lane D2. Thelast
flag for the surrounding sequence is similarly postponed to D3; note though that data lane D3 must be inactive, or the last bit for D3/dimension 0 must be asserted for the data stream to be legal. The next""
is delimited by D3/dimension 1 and D4/dimension 0, and[""]
is delimited by D3 and D4 for dimension 1. The final[]
is delimited by D4 and D5 for dimension 1. It does not contain an inner sequence item because D5/dimension 0 is not asserted, so the outer sequence is empty.
The state of lane \(i\) in the data
signal is significant only while
valid
is asserted.
Note that the validity of the
last
lanes is thus not controlled by which data lanes are active. This allows sequences to be terminated without sending an element; this is in fact required in order to send empty sequences, but also allows postponing the end-of-sequence marker until after the last element (if complexity allows) as the previous example shows.
It is illegal to assert a last
bit for dimension \(j\) without first
terminating all dimensions \(j' < j\), if any data lanes were active since
the previous assertion of a last
bit for dimension 0.
For example, the following is illegal because of the above:
.------------. 0 | data: '1' | | last: 0 0 | |------------| 1 | data: '2' | | last: 0 1 | |------------| lane 2 | data: '3' | index | last: 0 0 | | |------------| v 3 | data: '4' | | last: 1 0 | |------------| 4 | data: '5' | | last: 0 0 | |------------| 5 | data: '6' | | last: 1 1 | '------------'
An argument could be made that this encodes
[[1, 2], 3, 4], [[5, 6]]
, but allowing this interpretation would make the stream significantly less type-safe, and there would be no way to encode non-sequence elements before the last inner sequence in the outer sequence without introducing astart
signal as well.An argument could also be made that this encodes
[[1, 2]], [[3, 4, 5, 6]]
, because thelast
flag delimiting[1, 2]
falls within the first outer sequence and thelast
flag delimiting[3, 4, 5, 6]
falls within the second. However, this would unnecessarily complicate sink logic, make manual interpretation of the data structure more context-sensitive, and so on.One could also argue that it encodes
[[1, 2]], [[5, 6]]
, because3
and4
are not enclosed by any inner sequence, but then it makes more sense to just deactivate the lanes.Ultimately, this representation is ambiguous or at least leads to confusion, and is therefore illegal.
[\(C < 8\)] All last bits for lanes 0 to \(N - 2\) inclusive must be driven low by the source, and may be ignored by the sink.
The above rule ultimately means that the
last
information is transferred on a transfer basis instead of on element basis, similar to AXI4-stream. This can significantly decrease decoding complexity, but only allows one innermost sequence to be transferred per cycle regardless of \(N\) and the length of the sequence.
[\(C < 4\)] It is illegal to assert a last
bit for dimension \(j\)
without also asserting the last
bits for dimensions \(j' < j\) in the same
lane.
[\(C < 4\)] It is illegal to assert the last
bit for dimension 0 when
the respective data lane is inactive, except for empty sequences.
The above two rules mean that the
last
flags cannot be postponed.
stai
signal description
The stai
signal (start index) is used to deactivate data lanes with an index
lower than the encoded binary number. It is a bit vector of size
\(\lceil \log_2{N} \rceil\) to encode the full index range.
The state of the stai
signal is significant only while valid
is asserted.
It is illegal to drive stai
to the value \(N\) or greater.
This implies that
stai
cannot be used to disable the last lane.
[\(C < 6\)] stai
must always be driven to 0 by the source, and may be
ignored by the sink.
endi
signal description
The endi
signal (end index) is used to deactivate data lanes with an index
greater than the encoded binary number. It is a bit vector of size
\(\lceil \log_2{N} \rceil\) to encode the full index range.
Note that
endi
cannot be used to disable the first lane.
The state of the endi
signal is significant only while valid
is asserted.
It is illegal to drive endi
to the value \(N\) or greater.
This would semantically not be different from driving \(N - 1\), so there is no reason to burden the sink logic by allowing this case.
It is illegal to drive endi
to a value less than stai
.
This would essentially encode a VHDL null range, which, if allowed, could be used to encode empty sequences. However, detecting this condition is relatively hard on FPGA resources (timing in particular), typically requiring a carry chain for \(N > 8\). Instead, use
strb
for this purpose; at lower complexities allstrb
bits must be equal, so only a single bit has to be checked to determine whether a transfer carries any data.
[\(C < 5\)] endi
must be driven to \(N - 1\) by the source when last
is zero, and may be ignored by the sink in this case.
Together with the other complexity rules up to this level, this means that all lanes must be used for all but the last transfer containing elements for the innermost sequence. This furthermore implies that the element with innermost index \(i\) is always transferred on data lane \(i \mod N\).
strb
signal description
The strb
signal (strobe) is used to deactivate individual data lanes. It is
a bit vector of size \(N\). When a strb
bit is low, the associated data
lane is deactivated.
Note that the opposite (
strb
high -> activated) is not necessarily true due toendi
andstai
.
The state of the strb
signal is significant only while valid
is asserted.
[\(C < 8\)] All strb
bits must be driven to the same value by the source.
The sink only needs to interpret one of the bits.
This effectively reduces the
strb
signal to a single bit indicating whether the transfer is empty (low) or not (high), as it is illegal to deactivate all data lanes by drivingendi
<stai
.
user
signal description
The user
signal is the concatenation of a number of user-specified fields,
carrying additional transfer-oriented information. The significance of this is
user-defined.
The state of the user
signal is not significant when valid
is not asserted.
The opposite is not necessarily true; it is up to the user's specifications if (part of) the user signal can be insignificant despite
valid
being asserted.
Signal omission
Not all signals are always needed. The following table shows the condition for a signal to be relevant. If the condition is false, the signal may be omitted on the interface. When two interfaces with differing but compatible complexities are connected together, the default value specified in the table below must be driven for the omitted signals.
Name | Condition | Default |
---|---|---|
valid | see below | '1' |
ready | see below | '1' |
data | \(|E| > 0\) | all '0' |
last | \(D \ge 1\) | all '1' |
stai | \(C \ge 6 \wedge N > 1\) | 0 |
endi | \((C \ge 5 \vee D \ge 1) \wedge N > 1\) | \(N-1\) |
strb | \(C \ge 7 \vee D \ge 1\) | all '1' |
user | \(|U| > 0\) | all '0' |
valid
may be omitted for sources that are always valid.
For example, a constant or status signal source (the latter must be valid during reset as well for this to apply).
ready
may be omitted for sinks that never block.
For example, a sink that voids all incoming data in order to measure performance of the source.
Interface conventions
Streams may be named, in order to prevent name conflicts due to multiple streams existing within the same namespace. Such a name is to be prefixed to the signal names using a double underscore.
Double underscores are used as a form of unambiguous hierarchy separation to allow user-specified field names in the logical stream types (defined later) to contain (non-consecutive) underscores without risk of name conflicts.
The canonical representation for the data
and user
signals is the LSB-first
concatenation of the contained fields. For \(N > 1\), the lanes are
concatenated LSB first, such that the lane index is major and the field index
is minor.
Where applicable, the signals must be listed in the following order: dn
,
up
, valid
, ready
, data
, last
, stai
, endi
, strb
, user
(dn
and up
are part of the alternative representation defined below).
This is mostly just a consistency thing, primarily because it helps to get used to a single ordering when interpreting simulation waveforms. It may also simplify tooling.
The signal names must be lowercase.
This is for interoperability between languages that differ in case sensitivity.
Alternative representation
To improve code readability in hardware definition languages supporting array and aggregate constructs (record, struct, ...), the following changes are permissible. However, it is recommended to fall back to the conventions above for interoperability with other streamlets on the "outer" interfaces of an IP block.
-
valid
,data
,last
,stai
,endi
,strb
, anduser
may be bundled in an aggregate type named<stream-name>__dn__type
with signal name<stream-name>__dn
(dn
is short for "downstream"). -
ready
may be "bundled" in an aggregate type named<stream-name>__up__type
with signal name<stream-name>__up
for symmetry (up
is short for "upstream"). -
If the language allows for signal direction reversal within a bundle, all stream signals may also be bundled into a single type named
<stream-name>__type
, withready
in the reverse direction. -
The data and user fields may be bundled in aggregate types named
<stream-name>__data__type
and<stream-name>__user__type
respectively. Thedata
signal becomes an array of<stream-name>__data__type
s from 0 to \(N - 1\) if \(N > 1\) or when otherwise desirable. -
Data and user fields consisting of a single bit may be interpreted as either a bit vector of size one or a scalar bit depending on context.
-
Fields with a common double-underscore-delimited prefix may be aggregated recursively using
<stream-name>__data__<common-prefix>__type
, in such a way that the double underscores in the canonical signal name are essentially replaced with the hierarchy separator of the hardware definition language.
Arrays, vectors, and concatenations
Where applicable, the bitrange for bit vector signals is always n-1 downto 0
,
where n
is the number of bits in the signal.
Concatenations of bit vector signals are done LSB-first.
That is, lower indexed entries use lower bit indices and thus occur on the right-hand side when the bit vector is written as a binary number. Note that this results in the inverse order when using the concatenation operator in VHDL (and possibly other languages). LSB-first order is chosen because it is more intuitive when indexing the vector. Ultimately this is just a convention.
Where applicable, the range for arrays is always 0 to n-1
, where n
is the
number of entries in the array.
This is the more natural order for array-like structures, actually putting the first entry on the left-hand side.