Last updated November 15, 2022
NumPy is a Python library for applying functions to multi-dimensional arrays.
The package includes operators for mathematical, statistical, logical, shape manipulation,and for sorting, selecting, and random simulation.
NumPy processes arrays fast thanks to the vectorization, which is running a single action across large arrays for data. In ‘single instruction, multiple data’ (SIMD) processing, a single command is executed simultaneously, in parallel. This is much faster than writing a loop, which could only execute against a single row of data.
Assuming you have Python already installed on your computer, download the numpy package from the command line, if you haven’t already (example is on a Windows machine):
c:\python -m pip install numpy
To use Numpy, first import the library into your program:
Import numpy as npLOADING DATA
A standard Python list looks like this:
a_list = ["a", "b", "c", "d"]
You must make any list into a numpy array with np.array():
an_array = np.array(a_list)
In the line above,‘an_array’ is now a variable.
You can also create the list inline:
an_array = np.array(["a", "b", "c", "d"])
Multi-dimensional arrays, or nested arrays, can be captured as well, with each array separated by a comma, and a set of outer brackets to capture the array as a whole.
>>>ratings = np.array([[7, 10, 8], [6, 6, 7], [5, 3, 1]])
Vehicles = np.genfromtxt('vehicles.csv', delimiter=',')
Once your data is loaded into numpy, you can enjoy a great variety of operations. Examples below use the IDLE IDE for Python, from the Python Foundation. The basic information here has been lifted from CodeAcademy's Numpy series of lessons, as well as from Numpy's own introduction.
Individual elements from an array can be queried, and operated on.
The first element of an_array (above) could be called like this
To select a range of elements in an array, define the start and end points as per the Python syntax of my_array[start:end]:
To select a range of elements in a two-dimensional array, use the syntax of (my_array[x,x]) where the first x specifies the row of the array, and the second x is the column (the : delineator means "choose none").
A subset of an array can be defined as a variable, which can be operated on mathematically.
In Numpy, each dimension of an array is called an Axis. The column is the '0' axis, and row is the '1' axis. To execute an operation on individual rows or columns on a two-dimensional array, use 0 for column, 1 for row.
Numpy provides many built-in functions to operate on arrays directly.
With this one-dimensional array....
collection = np.array([7, 10, 8, 6, 6, 7, 5, 3, 1])
A bit more on percentiles: A percentile is a "comparison score between a particular score and the scores of the rest of a group. It shows the percentage of scores that a particular score surpassed." A score in the "90th percentile" means that scoredid better than 90% of all the other scores in that group.
An "interquartile range" (IQR) is the middle 50% of all the values in an array. It is taken by calculating the 25th and the 75th percentile (the "first" and "third" quartile respectively), then subtracting the first from the third: