# NumPy: Numerical [[python|Python]] - Good [[c|C]] API, making NumPy both efficient and ideal for wrapping [[c|C]], [[c++|C++]], and legacy [[fortran|Fortran]] code. - NumPy internally stores data in contiguous blocks of memory, this takes much less memory, and operation on it don't have overhead with regular interpreted Python code. - NumPy is designed to work with very large arrays, so all slicing returns a view instead of a copy. (Boolean indexing creates a copy, thought.) - [[numba|Numba]] ## Usage - "Fancy indexing" selects with "coordinates tuple", not rectangular regions, e.g. `arr[[1, , 3, 4], [5, 6, 7, 8]]` actually returns `arr[1, 5], arr[2, 4]...` instead of `arr[[1, 2, 3, 4]][:, [5, 6, 7, 8]]`. - NumPy _universal functions_ `ufunc` are functions that operate element wise on the whole array. - NumPy almost always return a view instead of copy. - Reshaping - `reshape` method. - `ravel` and `flatten` method, the latter returns a **copy**! - `order='C'` or `'F'` can be passed for [[c|C]] (traverse higher dimensions _first_) or [[fortran|Fortran]] (traverse higher dimensions _last_) style order. - `vstack` and `hstack` are good shorthands for `concatenate` for 2D arrays. `np.c_` and `np.r_` are even more concise. - `repeat` and `tile`, with `tile`, the arg specifies the "layout" of the tiling. - Broadcasting - _Vectorization_ and _broadcast_: performing operations on array without writing loops, e.g. with `xs, ys = np.meshgrid(...)` or `np.where(cond, xarr, yarr)` <!-- cSpell:words xarr yarr --> - Broadcasting rule: for each trailing dimension, the axis length match or either is `1`. The broadcast is made over the missing or or length `1` dimension. - `np.newaxis` can be used to easily create new axis: `arr = arr[:, np.newaxis, :]`. - "Local reduce" `reduceat` is similar to grouping: `np.add.reduceat(arr, [0, 5, 8])` aggregates `arr[0:5]` and `arr[5:8]` and `arr[8:]`. - _Structured array_ - Can be used to hold somewhat heterogenous data. - `dtype = [('x', np.float64), ('y', np.int64)]` - Can also add shape: `('x', np.int64, 3)`, in this case `arr['x']` returns an array. - This can be further nested - Sorting - `argsort` and `np.lexsort((field1, field2))` both returns indexers. - `kind='mergesort'` is the only available stable sorting. - `np.partition(arr, 3)` will populate the least 3 elements in the beginning, `argpartition` is similar but returns an indexer. - `arr.searchsorted()` performs binary search on sorted data. - `labels = bins.searchsorted(data)` can be used to categorize data. - Use `np.memmap()` to load a memmap file. Any changes will be buffered in mem until `flush` method is invoked. - Memory contiguity is important for performance - `arr.flags` has `C_CONTINUOUS` and `F_CONTIGUOUS` fields. - When an array is `C_CONTIGUOUS`, aggregation on the rows are much faster. - `arr.copy('F')` can create a copy in Fortran order.