NumPy (Numerical Python) is useful for scientific computing and data analysise. Key features are writed with C so it have good performance.
Main features:
Install
The following installation instruction was tested under Arch Linux.
I use Tcl/Tk for showing output from matplotlib.
yaourt -S tk tcl
Make virtual environment
mkvirtualenv -p /usr/bin/python2.7 science
Install NumPy, SciPy, Matplotlib, IPython
pip install numpy, scipy, matplotlib, ipython
We should check backend for matplotlib and ascertain if it is tkagg.
ipython --pylab In [1]: import matplotlib In [2]: matplotlib.matplotlib_fname() Out [2]: '/path/to/matplotlibrc' grep 'backend' /path/to/matplotlibrc backend : tkagg
Basic
At the heart of NumPy is N-dimensional array (ndarray) with capability to keep big data. All items of ndarray should be the same type.
In [26]: from numpy import * In [27]: a = array([1,2,3]) In [28]: a Out[28]: array([1, 2, 3])
Each array contains information like shape and type of elemets (dtype)
In [29]: a.shape Out[29]: (3,)
Creating an array from list
In [31]: a = array([[1, 2, 3, 4], [5, 6, 7, 8]]) In [32]: a Out[32]: array([[1, 2, 3, 4], [5, 6, 7, 8]])
Creating an array, filled with zeroes
In [33]: zeros(7) Out[33]: array([ 0., 0., 0., 0., 0., 0., 0.]) In [35]: zeros((3,3)) Out[35]: array([[ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.]])
Creating an array, filled with ones
In [36]: ones(7) Out[36]: array([ 1., 1., 1., 1., 1., 1., 1.])
Creating an array, filled with sequence (similar to python's range)
In [37]: arange(7) Out[37]: array([0, 1, 2, 3, 4, 5, 6])
Creating an array, filled with sequence and step 3
In [39]: arange(2,10,3) Out[39]: array([2, 5, 8])
Creating an array with evenly spaces numbers over a specified interval
In [46]: start = 1 In [47]: stop = 10 In [48]: n = 3 In [49]: linspace(start, stop, n) Out[49]: array([ 1. , 5.5, 10. ])
Creating an array which is filled with trash from RAM
In [54]: a = empty((3,3,)) In [55]: a[:] = NAN In [56]: a Out[56]: array([[ nan, nan, nan], [ nan, nan, nan], [ nan, nan, nan]])
Creating an array from vector
In [57]: arange(32).reshape((8, 4)) Out[57]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]])
Creating an array with random integer sequence
In [70]: size = 7 In [71]: a = random.randint(0, 10, size) In [71]: a Out[72]: array([0, 5, 5, 7, 9, 3, 3])
Creating an array with random real sequence
In [78]: randn(7) Out[78]: array([ 0.41864199, -0.97131428, -2.05212359, -0.56811645, 0.21215915, 0.17165842, 0.06305309])
Indexing items
In [72]: a = array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) In [73]: a[2][1] Out[73]: 8 In [74]: a[2, 1] Out[74]: 8 In [75]: a[:, 0] # first column Out[75]: array([1, 4, 7])
Indexing items with fancy indexing
In [98]: a[[2,8,6]] Out[98]: array([-1, 3, 2]) In [101]: a = arange(32).reshape((8, 4)) In [102]: a Out[102]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]]) In [103]: a[[1, 5, 7, 2], [0, 3, 1, 2]] Out[103]: array([ 4, 23, 29, 10])
Build curve, scatter and histogram
In [104]: a = random.randint(0, 10, 10) In [105]: fig = plt.figure() In [106]: plt.plot(range(10), a, 'k--') In [107]: plt.bar(range(10), a) In [108]: plt.scatter(range(10), a) In [109]: plt.savefig('figpath.svg') # save to file
Conditions
Output array with True values if condition is True
In [86]: a = random.randint(2, 15, size=10) In [87]: a Out[87]: array([12, 7, 7, 8, 5, 9, 2, 10, 3, 14]) In [89]: a > 5 Out[89]: array([ True, True, True, True, False, True, False, True, False, True], dtype=bool)
Combining conditions
In [91]: (a > 5) & (a < 10) Out[91]: array([False, True, True, True, False, True, False, False, False, False], dtype=bool)
Set value if condition is True
In [92]: a[(a > 5) & (a < 10)] = -1 In [93]: a Out[93]: array([12, -1, -1, -1, 5, -1, 2, 10, 3, 14])
Creating new array where items are from a if cond is True, else items from b
In [8]: a = random.randint(-5, 5, size=5) In [9]: a Out[9]: array([ 0, -5, 1, 4, -3]) In [14]: b = random.randint(-5, 5, size=5) In [15]: b Out[15]: array([-1, 2, 2, -1, 0]) In [5]: cond = random.rand(5) > .5 In [6]: cond Out[6]: array([ True, True, False, False, False], dtype=bool) In [19]: where(cond, a, b) Out[19]: array([ 0, -5, 2, -1, 0])
Or we can set '+' if item is great than 0 or '-' if one less than 0
In [20]: where(a > 0, '+', '-') Out[20]: array(['-', '-', '+', '+', '-'], dtype='|S1')
Sum items great than 0
In [43]: (a > 0).sum() Out[43]: 2
Random sequence
Creating an array with normal distribution
a = random.normal(size=(4, 4))
Others functions
permutation
- return a random permutation of a sequence, or return a permuted range
shuffle
- randomly permute a sequence in place
rand
- draw samples from a uniform distribution
randint
- draw random integers from a given low-to-high range
randn
- draw samples from a normal distribution with mean 0 and standard deviation 1 (MATLAB-like interface)
binomial
- draw samples a binomial distribution
normal
- draw samples from a normal (Gaussian) distribution
beta
- draw samples from a beta distribution
chisquare
- draw samples from a chi-square distribution
gamma
- draw samples from a gamma distribution
uniform
- draw samples from a uniform [0, 1) distribution
Functions
Unary functions are used for elementwise operations.
In [110]: a = random.randint(2, 15, size=5) In [111]: a Out[111]: array([11, 3, 6, 6, 10]) In [112]: sqrt(a) Out[112]: array([ 3.31662479, 1.73205081, 2.44948974, 2.44948974, 3.16227766])
List of unary functions
abs, fabs
- compute the absolute value element-wise for integer, floating point
sqrt
- compute the square root of each element
square
- compute the square of each element
exp
- Compute the exponent e^x of each element
log, log10, log2, log1p
- natural logarithm (base e), log base 10, log base 2, and log(1 + x), respectively
sign
- compute the sign of each element: 1 (positive), 0 (zero), or -1 (negative)
ceil
- compute the ceiling of each element, i.e. the smallest integer greater than or equal to
each element
floor
- compute the floor of each element, i.e. the largest integer less than or equal to each
element
rint
- round elements to the nearest integer, preserving the dtype
cos, sin, tan
- trigonometric functions
List of binary functions
add
- add corresponding elements in arrays
subtract
- subtract elements in second array from first array
multiply
- multiply array elements
divide, floor_divide
- divide or floor divide
power
- raise elements in first array to powers indicated in second array
maximum
- element-wise maximum.
minimum
- element-wise minimum
mod
- element-wise modulus (remainder of division)
Statistical functions
Array initialization
In [24]: a = random.randn(5) In [25]: a Out[25]: array([ 1.98311057, 0.42013985, 2.77981702, 0.82146552, -0.86014544])
Calculate arithmetic mean
In [26]: a.mean() Out[26]: 1.0288775028049444
Calculate median
In [41]: median(a) Out[41]: 0.82146551999999995
Calculate standard deviation
In [27]: a.std() Out[27]: 1.2616131660043575
Calculate variance
In [28]: a.var() Out[28]: 1.5916677806355384
Sets
Array initialization
In [47]: a = random.randint(2, 10, 6) In [48]: a Out[48]: array([6, 3, 9, 4, 8, 9])
Intersection array
In [51]: in1d(a, [3,9]) Out[51]: array([False, True, True, False, False, True], dtype=bool)
Others functions
unique(x)
- compute the sorted, unique elements in
intersect1d(x, y)
- compute the sorted, common elements in x and y
union1d(x, y)
- compute the sorted union of elements
in1d(x, y)
- compute a boolean array indicating whether each element of x is contained in y
setdiff1d(x, y)
- set difference, elements in x that are not in y
Save and load array to file
Save ndarray to file
savetxt('/home/proft/temp/a.txt', a, delimiter=',')
Load from file
a = loadtxt('/home/proft/temp/a.txt', delimiter=',')
SciPy
SciPy is a collection of packages addressing a number of different standard problem domains in scientific computing. Here is a sampling of the packages included:
scipy.integrate
- numerical integration routines and differential equation solversscipy.linalg
- linear algebra routines and matrix decompositions extending beyond those provided in numpy.linalgscipy.optimize
- function optimizers (minimizers) and root finding algorithmsscipy.signal
- signal processing toolsscipy.sparse
- sparse matrices and sparse linear system solversscipy.special
- wrapper around SPECFUN, a Fortran library implementing many common mathematical functions, such as the gamma functionscipy.stats
- standard continuous and discrete probability distributions (density functions, samplers, continuous distribution functions), various statistical tests, and more descriptive statisticsscipy.weave
- tool for using inline C++ code to accelerate array computationsCompute the determinant of a matrix
In [1]: from scipy import linalg In [2]: arr = np.array([[1, 2], [3, 4]]) In [3]: linalg.det(arr) Out [3]: -2.0
The stats sub-package contains a wealth of functions for probability and statistics. The library currently features 81 continuous distributions and 12 discrete distributions. The following example shows how to compute the probability that a normal (Gaussian) random variable with mean 0 and standard deviation 3 takes on a value less than 5. It also shows how to generate five random samples from the same distribution.
In [1]: from scipy.stats import norm In [2]: norm.cdf(5, 0, 3) Out [2]: 0.9522096477271853 In [3]: norm.rvs(0, 3, size=5) Out [3]: array([ 4.85229537, 3.0104119 , 1.13189841, 5.19688369, -2.97970912])
Web access to IPython notebook
We will setup ipython notebook for access from web.
Installation
pip install numpy ipython matplotlib jinja2 pyzmq scipy
Generate password
# ipython In [1]: from IPython.lib import passwd In [2]: passwd() Enter password: Verify password:
Create profile and update it
$ ipython profile create myserver $ vim ~/.config/ipython/profile_myserver/ipython_notebook_config.py c = get_config() c.IPKernelApp.pylab = 'inline' #c.NotebookApp.ip = '*' c.NotebookApp.open_browser = False c.NotebookApp.password = u'sha1:yourhashedpassword' c.NotebookApp.port = 9999 c.FileNotebookManager.notebook_dir = u'/home/www/ipynotebook'
Test run
ipython notebook --profile=myserver
Configurations for nginx
# sudo vim /etc/nginx/sites-enabled/notebook.conf server { server_name notebook.example.com www.notebook.example.com; location / { proxy_pass http://127.0.0.1:9999; #include /etc/nginx/proxy.conf; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $http_host; proxy_set_header X-NginX-Proxy true; proxy_redirect off; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; } }
Reload nginx
sudo service nginx reload
Install supervisor
sudo apt-get install supervisor
Edit supervisor's config file
# sudo vim /etc/supervisor/conf.d/sentry.conf [program:exp] command=/home/www/.virtualenvs/science/bin/ipython notebook --ipython-dir=/home/proft/.ipython/ --profile=myserver directory=/home/www/ipynotebook user=www group=www autostart=True autorestart=True redirect_stderr=True daemon = False debug = False stdout_logfile=/home/www/ipynotebook/logs/supervisor_exp.log loglevel = "info" environment = PYTHON_EGG_CACHE="/home/www/ipynotebook/.python-eggs"
Update supervisor
sudo supervisorctl reread sudo supervisorctl update
Additional information