Mateusz's notes

Cause of compulsive curiosity

Deeplearn week 5

Bias and variance In machine learning we talk about trade-off between bias and variance. Training set not performing well (high bias) Test set not performing well (high variance) Whether it performs well or not we judge by accuracy. However, we need to know baseline accuracy (i.e human or other algorithms). High bias We look at train set performance and can try Bigger network (more layers, units) Train longer Better suited NN architecture for the problem High variance We look at test set performance and can try

Deeplearn week 4

Fourth week This is just a reminder, that material is not my own and comes from the course Deep learning by Andrew Ng. Posts are just my notes and digressions which help me to memorise the material. This week will about stepping up from shallow network which we used for classifying images to deep networks with more than two layers. Deep neural networks Deep neural networks are like shallow networks but with more layers \(l\).

Deeplearn week 3

Third week This is just a reminder, that material is not my own and comes from the course Deep learning by Andrew Ng. Posts are just my notes I made while taking this course. Two layer network In previous week we used simple logistic regression as toy example simple netural network with just single output node. For inputs \(x,w,b\) we had the following computational graph $$z=\mathbf{w}^t x+b$$ $$a=\sigma(z)$$ $$\mathcal{L}(a,y)$$

Deeplearn week 2

Notes from Deep learning course I recently started Deep learning course by Andrew Ng. In the second week we talk about logistic regression as a toy example to introduce basic concepts, such as forward and reverse propagation, computational graph and gradient descent. I usually remember better if I try explaining concepts to someone else, hence these notes serve such a purpose and follow my understanding of the content in the course.

Dask and local Kubernetes (minikube)

Motivation I want to use Dask which implements Numpy and Pandas API to operate on data distributed over compute cluster Kubernetes. This approach enables easy scaling up of available resources on demand, using cloud services. Setup First, sudo apt-get install qemu-kvm libvirt-clients libvirt-daemon-system sudo adduser $USER libvirt sudo adduser $USER libvirt-qemu virsh --connect qemu:///system list --all # To check that all is fine Now install single cluster node variant of kubernetes

Docker and Apache Airflow

Objective I am going to show how to setup Apache's Airflow inside docker as a basic container for data analysis pipeline. Docker Install docker sudo apt-get install docker.io sudo adduser $USER docker sudo apt-get install debbootstrap # Relog Now bootstrap minbase smallest variant of GNU/Linux Debian system mkdir debian_root && cd debian_root sudo debootstrap --variant=minbase sid . From debian_root directory, import it as docker image while tagging it 'raw'

Genome wide association studies

Background Genome Wide Association Studies (GWAS) make use of genetic data that are a collection of genetic variants such as single nucleotide polymorphism (SNP) across large population. Such data can be used to find which genetic variants are associated with certain phenotype traits. Analysis requires two sources of information, genetic variant data for instance SNP for multiple individuals and data about their phenotype. The goal is to find which regions in the genome measured by means of SNPs co-vary with certain trait.

Algebra backends for R

Goal Matrix operations in R are mostly executed by system-wide linear algebra library. Here I compare three different implementations: Blas, Atlas and OpenBlas. Benchmark is performed using microbenchmark 1.4 for R 3.4.4. All operations are carried out on matrix with 1000x1000 random entries which occupies approximetly 7.6Mb of RAM. Computations were performed on 8 core Intel's i7-6700HQ CPU @ 2.60GHz. OpenBlas and Atlas utilzied all 8 cores, whereas Blas run only on single core, as noted by htop.

Building tensorflow from source

Build tensorflow-gpu on GNU/Linux Debian I am running GNU/Linux Debian with Nvidia 9.1 drivers and cuDNN 7.1. Tensorflow-gpu pip packages are build for specific CUDA CUDA Toolkit files from Debian packages are placed in different location than from conventional installer That means that with GNU/Linux Debian packages, I can not use pip directly, since versions of libraries might not match. I can not also easily compile tensorflow source against latest CUDA Toolkit installed from packages, because of misplaced header files and shared objects.

Migration from Jekyll to Hugo

Introduction Initially I started my blog using Jekyll, but it was problematic to use with Org-mode and required quite some hacks. Hugo can render org files natively, and from now on, I will stick to this platform.