14/12

2013

NumPy (Numerical Python) is useful for scientific computing and data analysise. Key features are writed with C so it have good performance.

Main features:

- ndarray - a powerful N-dimensional array object whis has vectorized arithmetic operations);
- set of mathematics functions for work with vectors and matrix;
- read/write array to file.

**Install**

The following installation instruction was tested under Arch Linux.

I use Tcl/Tk for showing output from matplotlib.

yaourt -S tk tcl

Make virtual environment

mkvirtualenv -p /usr/bin/python2.7 science

Install NumPy, SciPy, Matplotlib, IPython

pip install numpy, scipy, matplotlib, ipython

We should check *backend* for *matplotlib* and ascertain if it is *tkagg*.

ipython --pylab In [1]: import matplotlib In [2]: matplotlib.matplotlib_fname() Out [2]: '/path/to/matplotlibrc' grep 'backend' /path/to/matplotlibrc backend : tkagg

**Basic**

At the heart of NumPy is N-dimensional array (ndarray) with capability to keep big data. All items of ndarray should be the same type.

In [26]: from numpy import * In [27]: a = array([1,2,3]) In [28]: a Out[28]: array([1, 2, 3])

Each array contains information like *shape* and *type of elemets* (dtype)

In [29]: a.shape Out[29]: (3,)

Creating an array from list

In [31]: a = array([[1, 2, 3, 4], [5, 6, 7, 8]]) In [32]: a Out[32]: array([[1, 2, 3, 4], [5, 6, 7, 8]])

Creating an array, filled with zeroes

In [33]: zeros(7) Out[33]: array([ 0., 0., 0., 0., 0., 0., 0.]) In [35]: zeros((3,3)) Out[35]: array([[ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.]])

Creating an array, filled with ones

In [36]: ones(7) Out[36]: array([ 1., 1., 1., 1., 1., 1., 1.])

Creating an array, filled with sequence (similar to python's *range*)

In [37]: arange(7) Out[37]: array([0, 1, 2, 3, 4, 5, 6])

Creating an array, filled with sequence and step *3*

In [39]: arange(2,10,3) Out[39]: array([2, 5, 8])

Creating an array with evenly spaces numbers over a specified interval

In [46]: start = 1 In [47]: stop = 10 In [48]: n = 3 In [49]: linspace(start, stop, n) Out[49]: array([ 1. , 5.5, 10. ])

Creating an array which is filled with trash from RAM

In [54]: a = empty((3,3,)) In [55]: a[:] = NAN In [56]: a Out[56]: array([[ nan, nan, nan], [ nan, nan, nan], [ nan, nan, nan]])

Creating an array from vector

In [57]: arange(32).reshape((8, 4)) Out[57]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]])

Creating an array with random integer sequence

In [70]: size = 7 In [71]: a = random.randint(0, 10, size) In [71]: a Out[72]: array([0, 5, 5, 7, 9, 3, 3])

Creating an array with random real sequence

In [78]: randn(7) Out[78]: array([ 0.41864199, -0.97131428, -2.05212359, -0.56811645, 0.21215915, 0.17165842, 0.06305309])

Indexing items

In [72]: a = array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) In [73]: a[2][1] Out[73]: 8 In [74]: a[2, 1] Out[74]: 8 In [75]: a[:, 0] # first column Out[75]: array([1, 4, 7])

Indexing items with *fancy indexing*

In [98]: a[[2,8,6]] Out[98]: array([-1, 3, 2]) In [101]: a = arange(32).reshape((8, 4)) In [102]: a Out[102]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]]) In [103]: a[[1, 5, 7, 2], [0, 3, 1, 2]] Out[103]: array([ 4, 23, 29, 10])

Build curve, scatter and histogram

In [104]: a = random.randint(0, 10, 10) In [105]: fig = plt.figure() In [106]: plt.plot(range(10), a, 'k--') In [107]: plt.bar(range(10), a) In [108]: plt.scatter(range(10), a) In [109]: plt.savefig('figpath.svg') # save to file

**Conditions**

Output array with True values if condition is True

In [86]: a = random.randint(2, 15, size=10) In [87]: a Out[87]: array([12, 7, 7, 8, 5, 9, 2, 10, 3, 14]) In [89]: a > 5 Out[89]: array([ True, True, True, True, False, True, False, True, False, True], dtype=bool)

Combining conditions

In [91]: (a > 5) & (a < 10) Out[91]: array([False, True, True, True, False, True, False, False, False, False], dtype=bool)

Set value if condition is True

In [92]: a[(a > 5) & (a < 10)] = -1 In [93]: a Out[93]: array([12, -1, -1, -1, 5, -1, 2, 10, 3, 14])

Creating new array where items are from *a* if *cond* is True, else items from *b*

In [8]: a = random.randint(-5, 5, size=5) In [9]: a Out[9]: array([ 0, -5, 1, 4, -3]) In [14]: b = random.randint(-5, 5, size=5) In [15]: b Out[15]: array([-1, 2, 2, -1, 0]) In [5]: cond = random.rand(5) > .5 In [6]: cond Out[6]: array([ True, True, False, False, False], dtype=bool) In [19]: where(cond, a, b) Out[19]: array([ 0, -5, 2, -1, 0])

Or we can set '+' if item is great than 0 or '-' if one less than 0

In [20]: where(a > 0, '+', '-') Out[20]: array(['-', '-', '+', '+', '-'], dtype='|S1')

Sum items great than 0

In [43]: (a > 0).sum() Out[43]: 2

**Random sequence**

Creating an array with normal distribution

a = random.normal(size=(4, 4))

Others functions

`permutation`

- return a random permutation of a sequence, or return a permuted range

`shuffle`

- randomly permute a sequence in place

`rand`

- draw samples from a uniform distribution

`randint`

- draw random integers from a given low-to-high range

`randn`

- draw samples from a normal distribution with mean 0 and standard deviation 1 (MATLAB-like interface)

`binomial`

- draw samples a binomial distribution

`normal`

- draw samples from a normal (Gaussian) distribution

`beta`

- draw samples from a beta distribution

`chisquare`

- draw samples from a chi-square distribution

`gamma`

- draw samples from a gamma distribution

`uniform`

- draw samples from a uniform [0, 1) distribution

**Functions**

Unary functions are used for elementwise operations.

In [110]: a = random.randint(2, 15, size=5) In [111]: a Out[111]: array([11, 3, 6, 6, 10]) In [112]: sqrt(a) Out[112]: array([ 3.31662479, 1.73205081, 2.44948974, 2.44948974, 3.16227766])

List of unary functions

`abs, fabs`

- compute the absolute value element-wise for integer, floating point

`sqrt`

- compute the square root of each element

`square`

- compute the square of each element

`exp`

- Compute the exponent *e^x* of each element

`log, log10, log2, log1p`

- natural logarithm (base *e*), log base 10, log base 2, and log(1 + x), respectively

`sign`

- compute the sign of each element: 1 (positive), 0 (zero), or -1 (negative)

`ceil`

- compute the ceiling of each element, i.e. the smallest integer greater than or equal to
each element

`floor`

- compute the floor of each element, i.e. the largest integer less than or equal to each
element

`rint`

- round elements to the nearest integer, preserving the dtype

`cos, sin, tan`

- trigonometric functions

List of binary functions

`add`

- add corresponding elements in arrays

`subtract`

- subtract elements in second array from first array

`multiply`

- multiply array elements

`divide, floor_divide`

- divide or floor divide

`power`

- raise elements in first array to powers indicated in second array

`maximum`

- element-wise maximum.

`minimum`

- element-wise minimum

`mod`

- element-wise modulus (remainder of division)

**Statistical functions**

Array initialization

In [24]: a = random.randn(5) In [25]: a Out[25]: array([ 1.98311057, 0.42013985, 2.77981702, 0.82146552, -0.86014544])

Calculate arithmetic mean

In [26]: a.mean() Out[26]: 1.0288775028049444

Calculate median

In [41]: median(a) Out[41]: 0.82146551999999995

Calculate standard deviation

In [27]: a.std() Out[27]: 1.2616131660043575

Calculate variance

In [28]: a.var() Out[28]: 1.5916677806355384

**Sets**

Array initialization

In [47]: a = random.randint(2, 10, 6) In [48]: a Out[48]: array([6, 3, 9, 4, 8, 9])

Intersection array

In [51]: in1d(a, [3,9]) Out[51]: array([False, True, True, False, False, True], dtype=bool)

Others functions

`unique(x)`

- compute the sorted, unique elements in

`intersect1d(x, y)`

- compute the sorted, common elements in x and y

`union1d(x, y)`

- compute the sorted union of elements

`in1d(x, y)`

- compute a boolean array indicating whether each element of x is contained in y

`setdiff1d(x, y)`

- set difference, elements in x that are not in y

**Save and load array to file**

Save ndarray to file

savetxt('/home/proft/temp/a.txt', a, delimiter=',')

Load from file

a = loadtxt('/home/proft/temp/a.txt', delimiter=',')

**SciPy**

SciPy is a collection of packages addressing a number of different standard problem domains in scientific computing. Here is a sampling of the packages included:

`scipy.integrate`

- numerical integration routines and differential equation solvers`scipy.linalg`

- linear algebra routines and matrix decompositions extending beyond those provided in numpy.linalg`scipy.optimize`

- function optimizers (minimizers) and root finding algorithms`scipy.signal`

- signal processing tools`scipy.sparse`

- sparse matrices and sparse linear system solvers`scipy.special`

- wrapper around SPECFUN, a Fortran library implementing many common mathematical functions, such as the gamma function`scipy.stats`

- standard continuous and discrete probability distributions (density functions, samplers, continuous distribution functions), various statistical tests, and more descriptive statistics`scipy.weave`

- tool for using inline C++ code to accelerate array computations

Compute the determinant of a matrix

In [1]: from scipy import linalg In [2]: arr = np.array([[1, 2], [3, 4]]) In [3]: linalg.det(arr) Out [3]: -2.0

The *stats* sub-package contains a wealth of functions for probability and statistics. The library currently features 81 continuous distributions and 12 discrete distributions. The following example shows how to compute the probability that a normal (Gaussian) random variable with mean 0 and standard deviation 3 takes on a value less than 5. It also shows how to generate five random samples from the same distribution.

In [1]: from scipy.stats import norm In [2]: norm.cdf(5, 0, 3) Out [2]: 0.9522096477271853 In [3]: norm.rvs(0, 3, size=5) Out [3]: array([ 4.85229537, 3.0104119 , 1.13189841, 5.19688369, -2.97970912])

**Web access to IPython notebook**

We will setup ipython notebook for access from web.

Installation

pip install numpy ipython matplotlib jinja2 pyzmq scipy

Generate password

# ipython In [1]: from IPython.lib import passwd In [2]: passwd() Enter password: Verify password:

Create profile and update it

$ ipython profile create myserver $ vim ~/.config/ipython/profile_myserver/ipython_notebook_config.py c = get_config() c.IPKernelApp.pylab = 'inline' #c.NotebookApp.ip = '*' c.NotebookApp.open_browser = False c.NotebookApp.password = u'sha1:yourhashedpassword' c.NotebookApp.port = 9999 c.FileNotebookManager.notebook_dir = u'/home/www/ipynotebook'

Test run

ipython notebook --profile=myserver

Configurations for nginx

# sudo vim /etc/nginx/sites-enabled/notebook.conf server { server_name notebook.example.com www.notebook.example.com; location / { proxy_pass http://127.0.0.1:9999; #include /etc/nginx/proxy.conf; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $http_host; proxy_set_header X-NginX-Proxy true; proxy_redirect off; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; } }

Reload nginx

sudo service nginx reload

Install supervisor

sudo apt-get install supervisor

Edit supervisor's config file

# sudo vim /etc/supervisor/conf.d/sentry.conf [program:exp] command=/home/www/.virtualenvs/science/bin/ipython notebook --ipython-dir=/home/proft/.ipython/ --profile=myserver directory=/home/www/ipynotebook user=www group=www autostart=True autorestart=True redirect_stderr=True daemon = False debug = False stdout_logfile=/home/www/ipynotebook/logs/supervisor_exp.log loglevel = "info" environment = PYTHON_EGG_CACHE="/home/www/ipynotebook/.python-eggs"

Update supervisor

sudo supervisorctl reread sudo supervisorctl update

**Additional information**