Monday, February 25, 2013

AWK, Gauss and Coin Toss

Hi all,

It has been ages. We have moved out of college, and are living in different places. So, no more regular night long discussion about random stuffs. Things were getting awkward and I am trying to get them back to normal. Now, I don't remember whether I read it somewhere, or my friend told me about it, or my friend told me about reading it somewhere. However, I do remember that hidden within nature is Normal or Gaussian Distribution.

A normal distribution curve looks something like what you see in the image here ->

If you want to know a lot more about Gaussian distribution, go here. If you want to know more about the church, go here.

Imagine that you toss an unbiased coin 10 times and increase counter by 1 when head. Record the net value of the counter. Repeat the game 100 times. Plot the frequency of each of the records. You get a bell-curve.

Few days back, I was reading about awk. I needed some problem to try, neither easy, nor hard. The above simulation seemed to be the perfect fit (awk isn't meant to be used for such problems usually).

Following is the output and code:

Wednesday, April 11, 2012

Syncing files of two folders in different computers (dirty way)

Recently I submitted my assignment for Distributed System class. I took it too lightly ,and so procrastinated. I started working on it in the 11th hour ,and quickly realized how much FUBAR my situation was. I set up a team of  two Ubuntu virtual machines on my system, and connected them using virtual LAN. 135 lines later I apparently had something which bore a slight resemblance to the thing I was required to make.

I compared the snapshots of the filelist of the source in the previous sync, of the source in present time, and of the target in the present time. Syncing is done accordingly. I used rsync over ssh to efficiently and securely transfer files between the two VMs. The missing files are deleted. One necessary condition for this to work is that the system clock of the two machines must have almost same time. Another problem is that the synchronizations must happen quite frequently. Also, it has not been extended for more than two machines, and for sub-folders.

Luckily, it worked fine enough to fetch marks.

Following is the python script which compared the filelists


Following script drives the main program (does the comparison, transfer, and deletion)


Saturday, March 3, 2012

Fibonacci from golden ratio using power series



While reading about Fibonacci series, I came across the formula which gives the Fibonacci number using golden ratios. I was curious about how did such an equation was derived and how could it give the answer so correctly. This problem introduced me to the world of generating function. I will start with a few definitions.

A formal power series is an expression of the form
a+ a1x + a2x+ …
where a is a coefficient.

A generating function is a formal power series whose coefficients give the sequence { a0, a1, a2, … }. A general idea is as follows. In counting problems we are often interested in counting the number of objects of size , which we represent represent as a. By varying n, we get different values of an. In this way, we get a sequence of real numbers a0, a1, a2, … 
from which we can define a power series
G(x) = a+ a1x + a2x+ …

The above G(x) is the generating function for the sequence a0, a1, a2, … 

Let us find an explicit formula for finding nth element of the fibonacci series
First, we will define some useful results.
1 and -ϕ2 be roots of the equation (x+ x – 1)
ϕ= (1 + √5)/2 and ϕ2 = (1  √5)/2
Therefore, ϕ ϕ√5  .......(1)
and ϕ1.ϕ1 .........(2)
We know that for a fibonacci number Fn,
Fn = Fn-1 + Fn-2
or Fn  Fn-1 – Fn-2  = 0
F0 = 0, F1 = 1.
Let the generating function for Fn be  
F(x) = F0 + F1x + F2x2 + F3x+ ...
Now, multiplying F(x) with 1, -x and -xand, adding them
                   F(x) = F1x + F2x+ F3x+ ...
                –xF(x) =       – F1x2 – F2x–  F3x4 ...
               –x2F(x) =                  – F1x3– F2x– F3x5 ...
_____________________________________________
(1 – x  – x2).F(x) = x    + 0       + 0     + 0 ....
or F(x) = –x/(x+ x – 1)

Using partial fractions, we decompose 
F(x) = A/(x + ϕ1) + B/(x + ϕ2)
or –x/(x+ x – 1) = A/(x + ϕ1) + B/(x + ϕ2)
or -x = A(x + ϕ2) + B(x + ϕ1)
Putting x = -ϕ2, we have ϕ= B(-ϕ2 + ϕ1),
or B = ϕ2/√5
Putting x = -ϕ1, we have ϕ1 = A(-ϕ1 + ϕ2),
or A= -ϕ1/√5


Therefore,
F(x) = (-ϕ1/√5)/(x + ϕ1) + (ϕ2/√5)/(x + ϕ2)
       = -(1/√5)/(1 + x/ϕ1) + (1/√5)/(1 + x/ϕ2)
       = -(1/√5)/(1 - xϕ2) + (1/√5)/(1 - xϕ1)             [From 2]
       = -(1/√5)Σ(xϕ2)+ (1/√5)Σ(xϕ1)n                 [From the equation for summation of G.P series] 
       = (1/√5)Σ(ϕ1n - ϕ2n)xn


Therefore, the coefficient of xwill give Fn


Finally, we have
Fn = (1/√5)(ϕ1ϕ2n)

To learn more about generating functions check this out.

Wednesday, February 29, 2012

!Lets Get Deranged!


Imagine a movie show is sold out. The viewers don't sit in the seats indicated by the tickets, but chose their seats randomly. What is the probability that atleast one guy sits on the correct seat?

For that we need to first know what is the total number of permutations possible ( which is N! ), and what is the total number of permutations where none of the viewers take their designated seats, i.e. the number of derangements.

A derangement is a permutation of the elements of a set such that none of the elements appear in their original position. For example, the derangements for set { 1, 2, 3 } are { 3, 1, 2 } and { 2, 3, 1 }.  Number of derangements is called subfactorials. Subfactorial of N is represented as !N.

We start with counting the number of derangements. Suppose there are 'n' seats, the viewer with ticket number takes seat j. There are (n-1) ways of doing it. Now, the viewer with ticket j has two options - he can either take seat i or leave it. If he takes seat i the case reduces to solving the problem for (n-2) people with (n-2) seats, where each has a forbidden choice. If he does not take the seat, the case reduces the problem of (n-1) people with (n-1) seats. Therefore, we get a recurring equation:-
!n = ( n - 1 ) ( !( n - 1 ) + !( n - 2 ) )

Thus, the probability is !n/n!


Now, imagine some of the viewers sat in their allotted seat. Such a configuration can be referred to as partial derangement. Let the number of viewers who take their correct place be k. Therefore, the number of such permutations can be given by R( n, k ) where, 
R( n, n ) = 1
R ( n, n-1 ) = 0
R ( n, k ) = nCk.!(n-k)

Tuesday, February 28, 2012

Algorithmic Games - Part 1 of N

Hello again, 

If I had to say one thing I learned most from my school pal Mridul, I would say it was his way of keeping an organised record of things that were important to him. He maintained a journal of all the bikes and cars he dreamed of owning, all pens he had (there were 100s of them), B'days, tele. no. and address of his all his classmates .......
He was referred to as the most neat and meticulous person in our batch and this inspired me towards keeping records and writing blogs or as I see it my taking a step into being a less messy me.


This post is my first attempt towards organizing the details of Game-Theory that I have come across till now. Someone who's just introduced to Game-theory may refer to this for more info and to those who have spent some time with it may refresh there basics as they go throug this.

So here we go:

First of all we define the basic properties of the games that we will be looking at, and they are :

    1. There are two players.
    2. There is a set, usually finite, of possible positions of the game.
    3. The rules of the game specify for both players. If rules make no distinction between the players i.e. if players have same options of moving from each position, the game is called impartial :otherwise its called partial.
    4. The players alternate moving.
    5. The game ends when a position is reached from which no moves are possible for the player whose turn it is to move. Under the normal play rule, the last player to move wins. Under the misere play rule the last player to move loses.
    6. The game ends in a finite number of moves no matter how it is played.



Simplest of such games are the Take away games. Here is an example explaining what a Take-away game is:
  1. There are two players.
  2. There is a pile of chips in the center of a table (say 30).
  3. A move consists of removing one, two, or three chips from the pile. At least one chip must be removed, but no more than three may be removed.
  4. Players alternate moves with Player I starting.
  5. The player that removes the last chip wins.

We analyze this game from the end back to the beginning. This method is sometimes called backward induction.
We note that when there are no chips left its the loosing position for the person who's turn it is now. When there are 1,2 or 3 chips left, the person who's turn is next will definitely will (since he can take all the coins that are left). Now if we look at the situation where 4 chips are left, it appears to be a loosing one for the person who's turn is next, because no matter how many chips the player (who's turn it is) take away from the pile (he can take either 1,2 or 3 chips), he always ends up giving the other player a position from where we found he would win. Similarly we find that 5,6 and 7 are all winning positions for the person who's  turn is next.
Thus we see that the Take-away game ends up with a pattern of winning and loosing positions.
More elaborately, we define these winning positions and loosing positions as N positions and P positions respectivly.



P-positions and N-positions are defined recursively by the following three statements.
(1) All terminal positions are P-positions.
(2) From every N-position, there is at least one move to a P-position.
(3) From every P-position, every move is to an N-position



There  are innumerable games based on the above mentioned P-position N-position logic.
Describing each of them would make this reminder node too long, hence i have just mentioned their names.


Some Games based on P-position N-position logic are:

  1. The Subtraction Game: This is a modified Take-away Game where the number of chips that can be removed from the pile of coin, is taken from a given set.
  2. The Thirty-one Game:  Another modified Take-away Game where the number of times that a particular number can be subtracted from the pile is restricted to a given value.
  3. Empty and Divide: In this there are two boxes containing (m,n) coins. at each turn a player needs to empty one box and divide the content of the other between the two boxes according to his wish.
  4. Chomp: This game involves removing squares from a rectangular board (m*n) A move consists in taking a square and removing it and all squares to the right and above. The player who takes square (1, 1) loses.
  5. Dynamic subtraction: Another modified Take -away Game with restriction on the subtracting numbers varying dynamicly.
  6. The SOS Game: This consist of a square (n*n), initially empty, where player write S or O alternately. The first person who completes the word SOS wins.

Thursday, December 22, 2011

Markov Chains or random text generation


and then at the end of every line: 'Speak roughly to your tea; it's getting late.' So Alice began to tremble. Alice looked round, eager to see if she were looking over their shoulders, that all the jurymen on to himself as he said in a solemn tone, only changing the order of the jurors were all crowded round it, panting, and asking, 'But who is to do this, so she began again: 'Ou est ma chatte?' which was immediately suppressed by the English, who wanted leaders,

The above lines appear to be lifted from Alice in Wonderland. However, on closer observation it is random rubbish. It is just an implementation of Markov Chains (to which we got introduced while working on our mandatory Project required in partial fulfillment of degree). Markov chains work on the principle that the future depends on the present and not on the past. Consider the example "What is the last word at the end of this ______ ?". Well, you must have guessed the word is "sentence" or "line". Markov chains work in a similar way. We have a set of possible succeeding states and the probability of choosing a state depends on the data, the chain is trained on. In other words, the two words "line" and "sentence" represent two different states. Now, if a person usually uses the word "sentence" more then "line", he is more likely to choose "sentence" as his next state.

I will introduce another term "n-grams". Here, "n" refers to the number of states that will be considered for determining the the future state.

In the following program, the training filename is is given as argument. The n-gram model provides an approximation of the book. Books can be downloaded from http://www.gutenberg.org/

import sys,random
F=open(sys.argv[1])
dic={}
eoS=[]
N=2  #N is equal to (n-1) from the n-gram
W=tuple(['']*N)
for line in F:
    words = line.split()
    for word in words:
        if '' not in W:
            dic.setdefault( W, [] ).append(word)
            if W[-1][-1] in (',' , '.' , '?' , '!'):
                eoS.append(W)
        W =  W[1:] + tuple([(word)])
#
key = ()
count = 10 #number of sentences to be generated
while True:
    if dic.has_key(key):
        word = random.choice(dic[key])
        print word,
        key = (key[1:] + tuple([(word)]))
        if key in eoS:
            count = count - 1
            if count <= 0:
                break
    else:
        key = random.choice(eoS)


Popular implementations are Mark V Shaney and Dissociated press.

Tuesday, December 20, 2011

Sudoku Solver or How to solve Sudoku from its pic | Part 3

Hello.
In this post, I will discuss about the first few phases of the work. The system needs lots of improvements, which include technology used and the approach followed. Anyways, it was all part of a learning process.

First the image is converted to greyscale, then thresholding is performed on the image and we obtain a binary image. Thresholding can be used to distinguish background from foreground. Usually in simple images, the threshold value is the mean or median of all the pixel values present in it. Pixels whose value is greater than the threshold is assigned a value 1 or otherwise 0. The situation can also be reversed. However, setting a global thresholding value has a drawback - it cannot handle changing lighting conditions e.g. those happening due to strong illumination gradient or shadows. This problem can be solved using adaptive thresholding (also known as local or dynamic thresholding). It is based on the assumption that the illumination throughout a small region will be uniform. This method selects individual thresholds for each pixel based on its neighborhood.

After we get the binary image, we search for the largest blob (based on the assumption that the grid will be the largest blob). Using hough transform on the grid, we can get an estimate of the amount of rotation. We rotate the image to get an upright puzzle area. We crop out the puzzle area and divide it into 9x9 parts. Using template matching, we identify the numbers. template matching is a very naive method. It got confused between 3 and 8, 1 and 4, 2 and 9, 5 and 6. We solved this problem by finding the number of holes in the figure. For example, 8 and 3 has two and zero holes respectively.

The code is as follows. It is rough at the edges and a lot can be improved.


warning('off') 
%% adaptive threshholding
I=imread('sudoku_1.jpg');
I=imrotate(I,30,'bilinear');
J=mat2gray(I);
K=imfilter(J,fspecial('average',10),'replicate');
C = 0.05;
sIM=K-J-C;
bw=im2bw(sIM,0);
%%imshow(bw)

%% get biggest blob
[label, num] = bwlabel(bw,4);
stats = regionprops (label, 'basic');
mx=max([stats.Area]);
mp = find([stats.Area]==mx);
X=label==mp;
%%imshow(X) 
 
%% crop and get the puzzle area
xy=max(1,floor(stats(mp).BoundingBox));
fI=bw(xy(2):xy(2)+xy(4),xy(1):xy(1)+xy(3));
%%imshow(fI) 

%% Rotation
cleanI=X;
cleanI=edge(cleanI,'canny');
[H,theta,rho] = hough(cleanI);
P = houghpeaks(H,5,'threshold',ceil(0.3*max(H(:))));
lines = houghlines(cleanI,theta,rho,P,'FillGap',5,'MinLength',7);
angl=max([lines.theta]);
fI=imrotate(fI,angl-90,'bilinear');
[label, num] = bwlabel(fI,4);
stats = regionprops (label, 'basic');
mx=max([stats.Area]);
mp = find([stats.Area]==mx);
X=label==mp;
xy=max(1,floor(stats(mp).BoundingBox));
fI=fI(xy(2):xy(2)+xy(4),xy(1):xy(1)+xy(3));
fI=fI-X(xy(2):xy(2)+xy(4),xy(1):xy(1)+xy(3)); 

%% segment and identify
sz=size(fI);
s=round(sz/9);
XY=zeros(9,26,18);
box=zeros(9,9);
perc=round(s(1).*s(2)*0.04);
for i=1:9
    filename=strcat('',num2str(i),'.jpg');
    XY(i,:,:)=im2bw(imread(filename));
end
for i =0:8
    for j=0:8
        %subplot(9,9,(9*i+j+1))
        xm=min(sz(1),(i+1)*s(1));
        ym=min(sz(2),(j+1)*s(2));
        gh=fI(i*s(1)+1:xm,j*s(2)+1:ym);
        gh=bwareaopen(gh,perc);
        [label n]=bwlabel(gh,8);
     
        if n >0
            stats=regionprops(label,'BoundingBox','PixelIdxList');
         
            for region =1:length(stats)
                xy=stats(region).BoundingBox;
                if (xy(3) <10 || xy(4) < 10 )
                    label(stats(region).PixelIdxList) = 0;
                end
            end
            gh = (label~=0);
            [label n]=bwlabel(gh,8);
         
            if n < 1
                %imshow(gh)
                continue
            end
            stats=regionprops(label,'basic');
            mx=max([stats.Area]);
            mp = find([stats.Area]==mx);
            gh=label;
            gh(gh==mp)=1;
            xy=stats(mp).BoundingBox;
            if (xy(3)/xy(4) > 0.75 || xy(3)/xy(4) <0.25)
                %imshow(gh)
                continue
            end
            szx=size(gh);
            gh=gh(xy(2):min(xy(2)+xy(4),szx(1)), xy(1):min(xy(1)+xy(3),szx(2)));
            gh=im2bw(imresize(gh,[26,18]));
            gh=bwareaopen(gh,120);
            mx_corr=0;
            for k=1:9
                temp_(:,:)=XY(k,:,:) ;
                temp=sum(sum(gh.*temp_));
                if mx_corr < temp
                    mx_corr=temp;
                    box(i+1,j+1)=k;
                end
            end
            se1=strel('line',2,0);
            se2=strel('line',2,90);
            gh=imdilate(gh,[se1 se2]);
            gh=imerode(gh,[se1 se2]);
            if (box(i+1,j+1)==8 || box(i+1,j+1)==3)
               nx=regionprops(gh,'EulerNumber');
               if (nx(1).EulerNumber==1)
                   box(i+1,j+1)=3;
               else
                   box(i+1,j+1)=8;
               end
            end
            if (box(i+1,j+1)==4 || box(i+1,j+1)==1)
               nx=regionprops(gh,'EulerNumber');
               if (nx(1).EulerNumber==0)
                   box(i+1,j+1)=4;
               else
                   box(i+1,j+1)=1;
               end
            end
         
            if (box(i+1,j+1)==5 || box(i+1,j+1)==6)
               nx=regionprops(gh,'EulerNumber');
               if (nx(1).EulerNumber<=0)
                   box(i+1,j+1)=6;
               else
                   box(i+1,j+1)=5;
               end
            end
            if (box(i+1,j+1)==9 || box(i+1,j+1)==2)
                nx=regionprops(gh,'EulerNumber');
                if (nx(1).EulerNumber<=0)
                    box(i+1,j+1)=9;
                else
                    box(i+1,j+1)=2;
                end
            end
        end
        %imshow(gh)
    end
end
file_=fopen('out_dip.txt','w');
fprintf(file_,'%d',box');
fclose(file_);
[status result]=system('python do.py');
clc
disp(result)
warning on