SIMD getting started

SIMD getting started
SIMD (Single Instruction, Multiple Data), in simple terms, refers to CPU instructions that allow computations to take place in parallel. While standard instructions typically execute one at a time (scalar), SIMD performs multiple calculations simultaneously. For example, you can process 8 data points in a single clock cycle, significantly optimizing both performance and power efficiency.
This post focuses on NEON (ARM's SIMD architecture) rather than Rust's standard SIMD library, simply because I prefer the control of specialization over generalization.
