Some notes about NumPy and SciPy Python 14.12.2013

NumPy (Numerical Python) is useful for scientific computing and data analysise. Key features are writed with C so it have good performance.

Main features:

  • ndarray - a powerful N-dimensional array object whis has vectorized arithmetic operations);
  • set of mathematics functions for work with vectors and matrix;
  • read/write array to file.

Install

The following installation instruction was tested under Arch Linux.

I use Tcl/Tk for showing output from matplotlib.

yaourt -S tk tcl

Make virtual environment

mkvirtualenv -p /usr/bin/python2.7 science

Install NumPy, SciPy, Matplotlib, IPython

pip install numpy, scipy, matplotlib, ipython

We should check backend for matplotlib and ascertain if it is tkagg.

ipython --pylab

In [1]: import matplotlib
In [2]: matplotlib.matplotlib_fname()
Out [2]: '/path/to/matplotlibrc'

grep 'backend' /path/to/matplotlibrc
backend      : tkagg

Basic

At the heart of NumPy is N-dimensional array (ndarray) with capability to keep big data. All items of ndarray should be the same type.

In [26]: from numpy import *
In [27]: a = array([1,2,3])
In [28]: a
Out[28]: array([1, 2, 3])

Each array contains information like shape and type of elemets (dtype)

In [29]: a.shape
Out[29]: (3,)

Creating an array from list

In [31]: a = array([[1, 2, 3, 4], [5, 6, 7, 8]])
In [32]: a
Out[32]: 
array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

Creating an array, filled with zeroes

In [33]: zeros(7)
Out[33]: array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.])

In [35]: zeros((3,3))
Out[35]: 
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

Creating an array, filled with ones

In [36]: ones(7)
Out[36]: array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.])

Creating an array, filled with sequence (similar to python's range)

In [37]: arange(7)
Out[37]: array([0, 1, 2, 3, 4, 5, 6])

Creating an array, filled with sequence and step 3

In [39]: arange(2,10,3)
Out[39]: array([2, 5, 8])

Creating an array with evenly spaces numbers over a specified interval

In [46]: start = 1
In [47]: stop = 10
In [48]: n = 3
In [49]: linspace(start, stop, n)
Out[49]: array([  1. ,   5.5,  10. ])

Creating an array which is filled with trash from RAM

In [54]: a = empty((3,3,))
In [55]: a[:] = NAN
In [56]: a
Out[56]: 
array([[ nan,  nan,  nan],
       [ nan,  nan,  nan],
       [ nan,  nan,  nan]])

Creating an array from vector

In [57]: arange(32).reshape((8, 4))
Out[57]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

Creating an array with random integer sequence

In [70]: size = 7
In [71]: a = random.randint(0, 10, size)
In [71]: a
Out[72]: array([0, 5, 5, 7, 9, 3, 3])

Creating an array with random real sequence

In [78]: randn(7)
Out[78]: 
array([ 0.41864199, -0.97131428, -2.05212359, -0.56811645,  0.21215915,
        0.17165842,  0.06305309])

Indexing items

In [72]: a = array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
In [73]: a[2][1]
Out[73]: 8
In [74]: a[2, 1]
Out[74]: 8
In [75]: a[:, 0] # first column
Out[75]: array([1, 4, 7])

Indexing items with fancy indexing

In [98]: a[[2,8,6]]
Out[98]: array([-1,  3,  2])

In [101]: a = arange(32).reshape((8, 4))
In [102]: a
Out[102]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])
In [103]: a[[1, 5, 7, 2], [0, 3, 1, 2]]
Out[103]: array([ 4, 23, 29, 10])

Build curve, scatter and histogram

In [104]: a = random.randint(0, 10, 10)
In [105]: fig = plt.figure()
In [106]: plt.plot(range(10), a, 'k--')
In [107]: plt.bar(range(10), a)
In [108]: plt.scatter(range(10), a)
In [109]: plt.savefig('figpath.svg') # save to file

Conditions

Output array with True values if condition is True

In [86]: a = random.randint(2, 15, size=10) 
In [87]: a
Out[87]: array([12,  7,  7,  8,  5,  9,  2, 10,  3, 14])
In [89]: a > 5
Out[89]: array([ True,  True,  True,  True, False,  True, False,  True, False,  True], dtype=bool)

Combining conditions

In [91]: (a > 5) & (a < 10)
Out[91]: array([False,  True,  True,  True, False,  True, False, False, False, False], dtype=bool)

Set value if condition is True

In [92]: a[(a > 5) & (a < 10)] = -1
In [93]: a
Out[93]: array([12, -1, -1, -1,  5, -1,  2, 10,  3, 14])

Creating new array where items are from a if cond is True, else items from b

In [8]: a = random.randint(-5, 5, size=5)
In [9]: a
Out[9]: array([ 0, -5,  1,  4, -3])

In [14]: b = random.randint(-5, 5, size=5)
In [15]: b
Out[15]: array([-1,  2,  2, -1,  0])

In [5]: cond = random.rand(5) > .5
In [6]: cond
Out[6]: array([ True,  True, False, False, False], dtype=bool)

In [19]: where(cond, a, b)
Out[19]: array([ 0, -5,  2, -1,  0])

Or we can set '+' if item is great than 0 or '-' if one less than 0

In [20]: where(a > 0, '+', '-')
Out[20]: 
array(['-', '-', '+', '+', '-'], dtype='|S1')

Sum items great than 0

In [43]: (a > 0).sum()
Out[43]: 2

Random sequence

Creating an array with normal distribution

a = random.normal(size=(4, 4))

Others functions

permutation - return a random permutation of a sequence, or return a permuted range

shuffle - randomly permute a sequence in place

rand - draw samples from a uniform distribution

randint - draw random integers from a given low-to-high range

randn - draw samples from a normal distribution with mean 0 and standard deviation 1 (MATLAB-like interface)

binomial - draw samples a binomial distribution

normal - draw samples from a normal (Gaussian) distribution

beta - draw samples from a beta distribution

chisquare - draw samples from a chi-square distribution

gamma - draw samples from a gamma distribution

uniform - draw samples from a uniform [0, 1) distribution

Functions

Unary functions are used for elementwise operations.

In [110]: a = random.randint(2, 15, size=5)
In [111]: a
Out[111]: array([11,  3,  6,  6, 10])
In [112]: sqrt(a)
Out[112]: array([ 3.31662479,  1.73205081,  2.44948974,  2.44948974,  3.16227766])

List of unary functions

abs, fabs - compute the absolute value element-wise for integer, floating point

sqrt - compute the square root of each element

square - compute the square of each element

exp - Compute the exponent e^x of each element

log, log10, log2, log1p - natural logarithm (base e), log base 10, log base 2, and log(1 + x), respectively

sign - compute the sign of each element: 1 (positive), 0 (zero), or -1 (negative)

ceil - compute the ceiling of each element, i.e. the smallest integer greater than or equal to each element

floor - compute the floor of each element, i.e. the largest integer less than or equal to each element

rint - round elements to the nearest integer, preserving the dtype

cos, sin, tan - trigonometric functions

List of binary functions

add - add corresponding elements in arrays

subtract - subtract elements in second array from first array

multiply - multiply array elements

divide, floor_divide - divide or floor divide

power - raise elements in first array to powers indicated in second array

maximum - element-wise maximum.

minimum - element-wise minimum

mod - element-wise modulus (remainder of division)

Statistical functions

Array initialization

In [24]: a = random.randn(5)
In [25]: a
Out[25]: array([ 1.98311057,  0.42013985,  2.77981702,  0.82146552, -0.86014544])

Calculate arithmetic mean

In [26]: a.mean()
Out[26]: 1.0288775028049444

Calculate median

In [41]: median(a)
Out[41]: 0.82146551999999995

Calculate standard deviation

In [27]: a.std()
Out[27]: 1.2616131660043575

Calculate variance

In [28]: a.var()
Out[28]: 1.5916677806355384

Sets

Array initialization

In [47]: a = random.randint(2, 10, 6)
In [48]: a
Out[48]: array([6, 3, 9, 4, 8, 9])

Intersection array

In [51]: in1d(a, [3,9])
Out[51]: array([False,  True,  True, False, False,  True], dtype=bool)

Others functions

unique(x) - compute the sorted, unique elements in

intersect1d(x, y) - compute the sorted, common elements in x and y

union1d(x, y) - compute the sorted union of elements

in1d(x, y) - compute a boolean array indicating whether each element of x is contained in y

setdiff1d(x, y) - set difference, elements in x that are not in y

Save and load array to file

Save ndarray to file

savetxt('/home/proft/temp/a.txt', a, delimiter=',')

Load from file

a = loadtxt('/home/proft/temp/a.txt', delimiter=',')

SciPy

SciPy is a collection of packages addressing a number of different standard problem domains in scientific computing. Here is a sampling of the packages included:

  • scipy.integrate - numerical integration routines and differential equation solvers
  • scipy.linalg - linear algebra routines and matrix decompositions extending beyond those provided in numpy.linalg
  • scipy.optimize - function optimizers (minimizers) and root finding algorithms
  • scipy.signal - signal processing tools
  • scipy.sparse - sparse matrices and sparse linear system solvers
  • scipy.special - wrapper around SPECFUN, a Fortran library implementing many common mathematical functions, such as the gamma function
  • scipy.stats - standard continuous and discrete probability distributions (density functions, samplers, continuous distribution functions), various statistical tests, and more descriptive statistics
  • scipy.weave - tool for using inline C++ code to accelerate array computations

Compute the determinant of a matrix

In [1]: from scipy import linalg
In [2]: arr = np.array([[1, 2], [3, 4]])
In [3]: linalg.det(arr)
Out [3]: -2.0

The stats sub-package contains a wealth of functions for probability and statistics. The library currently features 81 continuous distributions and 12 discrete distributions. The following example shows how to compute the probability that a normal (Gaussian) random variable with mean 0 and standard deviation 3 takes on a value less than 5. It also shows how to generate five random samples from the same distribution.

In [1]: from scipy.stats import norm
In [2]: norm.cdf(5, 0, 3)
Out [2]: 0.9522096477271853
In [3]: norm.rvs(0, 3, size=5)
Out [3]: array([ 4.85229537,  3.0104119 ,  1.13189841,  5.19688369, -2.97970912])

Web access to IPython notebook

We will setup ipython notebook for access from web.

Installation

pip install numpy ipython matplotlib jinja2 pyzmq scipy

Generate password

# ipython

In [1]: from IPython.lib import passwd
In [2]: passwd()
Enter password: 
Verify password:

Create profile and update it

$ ipython profile create myserver
$ vim ~/.config/ipython/profile_myserver/ipython_notebook_config.py

c = get_config()
c.IPKernelApp.pylab = 'inline'
#c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.password = u'sha1:yourhashedpassword'
c.NotebookApp.port = 9999
c.FileNotebookManager.notebook_dir = u'/home/www/ipynotebook'

Test run

ipython notebook --profile=myserver

Configurations for nginx

# sudo vim /etc/nginx/sites-enabled/notebook.conf

server {
    server_name notebook.example.com www.notebook.example.com;

    location / {
        proxy_pass http://127.0.0.1:9999;
        #include /etc/nginx/proxy.conf;        
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_set_header X-NginX-Proxy true;
        proxy_redirect off;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Reload nginx

sudo service nginx reload

Install supervisor

sudo apt-get install supervisor

Edit supervisor's config file

# sudo vim /etc/supervisor/conf.d/sentry.conf

[program:exp]
command=/home/www/.virtualenvs/science/bin/ipython notebook --ipython-dir=/home/proft/.ipython/ --profile=myserver 
directory=/home/www/ipynotebook
user=www
group=www
autostart=True
autorestart=True
redirect_stderr=True
daemon = False
debug = False
stdout_logfile=/home/www/ipynotebook/logs/supervisor_exp.log
loglevel = "info"
environment = PYTHON_EGG_CACHE="/home/www/ipynotebook/.python-eggs"

Update supervisor

sudo supervisorctl reread
sudo supervisorctl update

Additional information