# NumPy: Numerical [[python|Python]]
- Good [[c|C]] API, making NumPy both efficient and ideal for wrapping [[c|C]],
[[c++|C++]], and legacy [[fortran|Fortran]] code.
- NumPy internally stores data in contiguous blocks of memory, this takes much
less memory, and operation on it don't have overhead with regular interpreted
Python code.
- NumPy is designed to work with very large arrays, so all slicing returns a
view instead of a copy. (Boolean indexing creates a copy, thought.)
- [[numba|Numba]]
## Usage
- "Fancy indexing" selects with "coordinates tuple", not rectangular regions,
e.g. `arr[[1, , 3, 4], [5, 6, 7, 8]]` actually returns
`arr[1, 5], arr[2, 4]...` instead of `arr[[1, 2, 3, 4]][:, [5, 6, 7, 8]]`.
- NumPy _universal functions_ `ufunc` are functions that operate element wise on
the whole array.
- NumPy almost always return a view instead of copy.
- Reshaping
- `reshape` method.
- `ravel` and `flatten` method, the latter returns a **copy**!
- `order='C'` or `'F'` can be passed for [[c|C]] (traverse higher dimensions
_first_) or [[fortran|Fortran]] (traverse higher dimensions _last_) style
order.
- `vstack` and `hstack` are good shorthands for `concatenate` for 2D arrays.
`np.c_` and `np.r_` are even more concise.
- `repeat` and `tile`, with `tile`, the arg specifies the "layout" of the
tiling.
- Broadcasting
- _Vectorization_ and _broadcast_: performing operations on array without
writing loops, e.g. with `xs, ys = np.meshgrid(...)` or
`np.where(cond, xarr, yarr)` <!-- cSpell:words xarr yarr -->
- Broadcasting rule: for each trailing dimension, the axis length match or
either is `1`. The broadcast is made over the missing or or length `1`
dimension.
- `np.newaxis` can be used to easily create new axis:
`arr = arr[:, np.newaxis, :]`.
- "Local reduce" `reduceat` is similar to grouping:
`np.add.reduceat(arr, [0, 5, 8])` aggregates `arr[0:5]` and `arr[5:8]` and
`arr[8:]`.
- _Structured array_
- Can be used to hold somewhat heterogenous data.
- `dtype = [('x', np.float64), ('y', np.int64)]`
- Can also add shape: `('x', np.int64, 3)`, in this case `arr['x']` returns an
array.
- This can be further nested
- Sorting
- `argsort` and `np.lexsort((field1, field2))` both returns indexers.
- `kind='mergesort'` is the only available stable sorting.
- `np.partition(arr, 3)` will populate the least 3 elements in the beginning,
`argpartition` is similar but returns an indexer.
- `arr.searchsorted()` performs binary search on sorted data.
- `labels = bins.searchsorted(data)` can be used to categorize data.
- Use `np.memmap()` to load a memmap file. Any changes will be buffered in mem
until `flush` method is invoked.
- Memory contiguity is important for performance
- `arr.flags` has `C_CONTINUOUS` and `F_CONTIGUOUS` fields.
- When an array is `C_CONTIGUOUS`, aggregation on the rows are much faster.
- `arr.copy('F')` can create a copy in Fortran order.