Numerical computing in Python¶

Mihai Andrei

scientific computing
numpy
simulation
optimisation, machine learning

Programs¶

control hardware: C,asm
record keeping: Cobol, Java
measurement processing: Fortran, Matlab

Data in record keeping¶

About people
data is created by user input: do you wear glasses?
data is precise and discrete: yes/no
validation is possible in principle: jk4##! is not a person name
strings and ints (language and counting)
data is a graph of relationships:

{
  'name': 'mihai',
  'presentations': [
     {'where': 'ropython', 'topic': 'numpy'},
     {'where': 'tvb node berlin', 'topic': 'tvb gui'}
   ]
}

Data in measurement¶

About the world
data is produced by sensors: batroom scale
data is imprecise and continuous: bathroom scale is not microgram precise
validation is hard: what is a valid cat picture?
data is imprecise, large and flat
floats: (nature no speak english)

array([[-1.        , -0.86666667, -0.73333333, -0.6       ],
       [-0.46666667, -0.33333333, -0.2       , -0.06666667],
       [ 0.06666667,  0.2       ,  0.33333333,  0.46666667],
       [ 0.6       ,  0.73333333,  0.86666667,  1.        ]])

procedures¶

record keeping¶

CRUD : Mihai wears no glasses!
search : does Mihai have glasses?
presentation: fancy text and media
model relationships

measurement¶

finding structure: Is Mihai wearing glasses in this picture?
visualization: find structure yourself human
model reality: heat transfer
simulation: how will the stove heat the room?

science¶

algorithms and data structures¶

Directed_graph

linear algebra, calculus, statistics¶

numpy¶

a new datatype suitable for tables of numbers
the foundation of all numerics in Python
n-dimensional, homogeneous, fixed-size, contigous array
vectorized operations

Installing

$ pip install numpy matplotlib jupyter

like lists but typed and fixed sized¶

In [1]:

# let's meet the beast
import numpy as np

In [2]:

# ndarrays look like lists
a = np.array([0, 1, 2, 3])
a

Out[2]:

array([0, 1, 2, 3])

In [3]:

# I can index
a[1] + a[-1]

Out[3]:

In [4]:

# and slice
a[1:-1]

Out[4]:

array([1, 2])

In [5]:

# mutable
a[0] = 4
a

Out[5]:

array([4, 1, 2, 3])

In [6]:

# ok, so they are lists ... meh
a[0] = 'nope'

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-8bf143d5311d> in <module>()
      1 # ok, so they are lists ... meh
----> 2 a[0] = 'nope'

ValueError: invalid literal for int() with base 10: 'nope'

In [7]:

# typed
a.dtype

Out[7]:

dtype('int64')

In [8]:

# fixed size
a.append(43)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-f8af5ece7ee3> in <module>()
      1 # fixed size
----> 2 a.append(43)

AttributeError: 'numpy.ndarray' object has no attribute 'append'

But why ??¶

Speed:

Memory compactness
no indirection or allocation
optimized typed operations


header	4	1	2	3

In [9]:

# dtypes have defaults
a = np.zeros(4)
print(repr(a.dtype))
print(a)

dtype('float64')
[0. 0. 0. 0.]

In [10]:

# but we can ask for a dtype
a = np.zeros(4, dtype=np.uint8)
a

Out[10]:

array([0, 0, 0, 0], dtype=uint8)

multidimensional¶

In [11]:

# multidimensional
a = np.array([
    [0, 1, 2],
    [3, 4, 5]
])

In [12]:

a.ndim

Out[12]:

In [13]:

# the shape is fixed, no appends
a.shape

Out[13]:

(2, 3)

In [14]:

# indexing in 2d
a[1]

Out[14]:

array([3, 4, 5])

In [15]:

# this works
a[1][1]

Out[15]:

In [16]:

# this is idiomatic
a[1, 1]

Out[16]:

In [17]:

# a cube
a = np.arange(2*3*2).reshape((2, 3, 2))
a.shape

Out[17]:

(2, 3, 2)

In [18]:

Out[18]:

array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5]],

       [[ 6,  7],
        [ 8,  9],
        [10, 11]]])

In [19]:

a[1, :, 1]

Out[19]:

array([ 7,  9, 11])

slices¶

In [20]:

a = np.arange(4*4)
# reshape 
a = a.reshape((4, 4))
a

Out[20]:

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [21]:

a[1]

Out[21]:

array([4, 5, 6, 7])

In [22]:

# all from dim 0, third from dim 1
a[:,2]

Out[22]:

array([ 2,  6, 10, 14])

In [23]:

# the interior
a[1:-1, 1:-1]

Out[23]:

array([[ 5,  6],
       [ 9, 10]])

In [24]:

# a more informative a[1]
a[1, :]

Out[24]:

array([4, 5, 6, 7])

In [25]:

# have defaults
a[:2, 2:]

Out[25]:

array([[2, 3],
       [6, 7]])

In [26]:

# Slices *ARE VIEWS* in numpy!
a = np.zeros((2, 2))
a

Out[26]:

array([[0., 0.],
       [0., 0.]])

In [27]:

b = a[:, 1]
b

Out[27]:

array([0., 0.])

In [28]:

b[0] = 23
b

Out[28]:

array([23.,  0.])

In [29]:

Out[29]:

array([[ 0., 23.],
       [ 0.,  0.]])

vectorized operations¶

In [30]:

a = np.array([[2, 1], [3, 0]])
a

Out[30]:

array([[2, 1],
       [3, 0]])

In [31]:

b = np.array([[-1, 1], [0, 1]])
b

Out[31]:

array([[-1,  1],
       [ 0,  1]])

In [32]:

a + b

Out[32]:

array([[1, 2],
       [3, 1]])

In [33]:

# ufuncs are functions that work in this vectorized fashion
np.sin(a)

Out[33]:

array([[0.90929743, 0.84147098],
       [0.14112001, 0.        ]])

vectorized operations make numpy fast¶

replace loops

a = range(3) # integers are objects on a heap
b = range(3)
ret = []  # containter of generic objects

for u, v in zip(a, b): # happens in python
    ret.append(u + v)

a = np.arange(3) # native machine integers
b = np.arange(3) # fixed size containter, no indirection
ret = a + b # happens in C

broadcasting¶

In [34]:

a = np.array([[0], [10], [20], [30]])
b = np.array([0, 1, 2])
print('a', a.shape)
print(repr(a))
print('b', b.shape)
print(repr(b))
a + b

a (4, 1)
array([[ 0],
       [10],
       [20],
       [30]])
b (3,)
array([0, 1, 2])

Out[34]:

array([[ 0,  1,  2],
       [10, 11, 12],
       [20, 21, 22],
       [30, 31, 32]])

In [35]:

# why not :?
c = np.array([[0, 10, 20, 30]])

print('c', c.shape)

c + b

c (1, 4)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-35-2645cb6512bc> in <module>()
      4 print('c', c.shape)
      5
----> 6 c + b

ValueError: operands could not be broadcast together with shapes (1,4) (3,)

broadcasting rule is: trailing dimensions must have same size or be 1¶

boolean indexing¶

In [36]:

a = np.random.randint(3, size=(3, 3))
a

Out[36]:

array([[2, 1, 1],
       [1, 1, 2],
       [0, 2, 1]])

In [37]:

a > 1

Out[37]:

array([[ True, False, False],
       [False, False,  True],
       [False,  True, False]])

In [38]:

a[a>1]

Out[38]:

array([2, 2, 2])

In [39]:

a[a>1] = 4
a

Out[39]:

array([[4, 1, 1],
       [1, 1, 4],
       [0, 4, 1]])

In [40]:

np.any(a == 0)

Out[40]:

True

In [41]:

np.all(a < 8)

Out[41]:

True

In [42]:

# cant convert to bool directly
bool(a < 8)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-42-56527d2ed582> in <module>()
      1 # cant convert to bool directly
----> 2 bool(a < 8)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Integer indexing

In [43]:

a = np.array([[1, 2], [3, 4], [5, 6]])
a

Out[43]:

array([[1, 2],
       [3, 4],
       [5, 6]])

In [44]:

rows = [0, 2]
columns = [0, 1]
# for each row pick the values from columns
a[rows, columns]

Out[44]:

array([1, 6])

In [45]:

rows = np.array([[0, 0], [-1, -1]])
columns = np.array([[0, -1], [0, -1]])
# same logoc repeating for each dimension in the index arrays
corners = a[rows, columns]
corners

Out[45]:

array([[1, 2],
       [5, 6]])

In [46]:

color_palette = np.array([
    [0, 0, 0], [11, 11, 11], [23, 23, 23],
    [3, 33, 3], [4, 4, 14], [51, 51, 51]])
color_palette[2]

Out[46]:

array([23, 23, 23])

In [47]:

a = np.arange(2*3).reshape((2,3))
a

Out[47]:

array([[0, 1, 2],
       [3, 4, 5]])

In [48]:

# transform all values to colors
a_colors = color_palette[a]
a_colors

Out[48]:

array([[[ 0,  0,  0],
        [11, 11, 11],
        [23, 23, 23]],

       [[ 3, 33,  3],
        [ 4,  4, 14],
        [51, 51, 51]]])

In [49]:

a_colors[1, 1]

Out[49]:

array([ 4,  4, 14])

Why this complex indexing ?¶

replaces loops

Hybrid data¶

both structure and measurements
excell like tabular data
use pandas, like numpy but with excell and database like operations

In [53]:

import pandas as pd

In [54]:

f = pd.read_csv('lala.csv')
f

Out[54]:

	Names	Births	Country
0	Bob	968	US
1	Jessica	155	US
2	Mary	77	US
3	John	578	US
4	Mel	973	US
5	Ion	32	RO
6	Petrica	4	RO
7	Mihai	123	RO

In [55]:

# index by column names
f['Names']

Out[55]:

0        Bob
1    Jessica
2       Mary
3       John
4        Mel
5        Ion
6    Petrica
7      Mihai
Name: Names, dtype: object

In [56]:

f.loc[2:4, 'Names']

Out[56]:

2    Mary
3    John
4     Mel
Name: Names, dtype: object

In [57]:

f.columns

Out[57]:

Index(['Names', 'Births', 'Country'], dtype='object')

In [58]:

group = f.groupby('Country').sum()
group

Out[58]:

	Births
Country
RO	159
US	2751