Peter Schmidt-Nielsen

Strong agree here -- thinking of stuff of shape [n] as secretly being of shape [1, n] or [n, 1] usually leads to more confusion, especially once you have rank-three tensors involved. I just write outer products and inner products explicitly.

As an example, try deriving backprop for a linear layer y = Ax. When you compute dy/dA that is a rank three tensor! (one index on y, two on A)

Trying to "apply" dy/dA to some "row vector"/"column vector" easily leads to confusion -- just write out indices.

@francoisfleuret I will bite the bullet: the idea of transposing rank-one tensors is also confusing and unnecessary. Instead of writing x x^T or x^T x, we can write x⨂x or x⋅x, or we can write xᵢxʲ or xᵢxⁱ.

– @davidad

Aug 22, 2023, 7:53:21 AM