Tuesday, September 4, 2012

An overview of my thesis document

A summary of what I've wrote
Wordle: My Ph.D Thesis in Computer Science
The document has been delivered and is now under review.

Saturday, July 21, 2012

Y todo comenzó con un grito

El día más importante para los colombianos es el 20 de Julio, día de la Independencia. Nos sentimos orgullosos de nuestra tierra y de nuestra patria, celebramos lo que somos. Y hay en verdad muchas cosas de las cuales podemos sentirnos orgullosos. En medio de un mundo en crisis, Colombia es un país en crecimiento, por ejemplo.

Es el día para reflexionar qué tan buenos colombianos somos, y qué podemos hacer mejor por nuestro país. Porque el 20 de Julio celebramos el grito de independencia, y dar el grito fue solo el comienzo. Desde entonces, los colombianos iniciaron muchas batallas por lograr esa anhelada independencia, para ver a este país grande, respetado y libre. Y en mi sentir, 202 años después del grito, las batallas continúan y deben continuar.

En aquella época, se ganaron muchas batallas contra el régimen español para empoderar a los criollos en la administración de sus territorios y sus políticas. Pero el 20 de Julio no se celebra la victoria de esas batallas. Se celebra el grito de independencia, ese deseo por construir nación para los colombianos, esas ganas de labrar nuestro proprio destino. Y es por eso que hoy las batallas deben continuar. Porque esos grandes ideales no se logran con unas cuantas batallas que ya terminaron, sino que se van construyendo con el trabajo de todo un pueblo, que por muchas generaciones ha contribuído a hacer de Colombia un país cada vez mejor, para sus hijos y sus futuras generaciones.

Quiero hacer un homenaje a todos los colombianos que aún están en batalla para construir este país. Y esos somos prácticamente todos, que aportamos por asumir un rol o trabajo en esta sociedad, y de su compromiso y sus resultados con seguridad dependemos muchas personas diariamente. De manera muy especial quiero resaltar el trabajo de la comunidad académica en Colombia, desde los profesores de primaria y bachillerato, hasta los profesores universitarios e investigadores que dedican esfuerzos a fortalecer y mejorar el sistema educativo nacional. En ellos recae la responsabilidad de educar y formar el futuro de la patria.


Tuesday, May 29, 2012

Linear Algebra in Java

I run a quick benchmark of Java libraries for linear algebra. I tested two libraries: jblas (uses native interfaces for BLAS and LAPACK) and parallel colt (the multithreaded version of colt). The following are the results running in my laptop (Intel Centrino, 4GB RAM):

Matrix Solver Benchmark: Solve for X in AX = B
Problem Dimensions: A(1000,1000) and B(1000,500)
Number of problems to solve: 100
Starting with JBLAS...
Time to solve 100 random problems: 57099 ms
Computing with PColt...
Time to solve 100 random problems: 659838 ms

The results are in favor of jblas (57 secs vs 659 secs!). However, since my laptop does not fully exploit multithreaded capabilities, I run another experiment in a server computer for fairness with pcolt (12 cores, 32GB RAM):

Matrix Solver Benchmark: Solve for X in AX = B
Problem Dimensions: A(1000,1000) and B(1000,500)
Number of problems to solve: 100
Starting with JBLAS...
Time to solve 100 random problems: 48350 ms
Computing with PColt...
Time to solve 100 random problems: 200456 ms

which are in favor of jblas again. (48 secs vs. 200 secs). In addition to the evident speed up of jblass, there are two more cool things you have to know about it:
  1. That speed is achieved using only 1 (one) CPU. It means we can make further parallelization for obtaining even faster results for certain applications.
  2. The API is pretty simple and easy to use, compared to pcolt. Look at the code I wrote for this benchmark and make your own conclussions:

      public static void benchmark(String[] args){  
           System.out.println("Matrix Solver Benchmark. Solve for X in AX = B");  
           int a1 = 1000, a2 = 1000, b1 = 1000, b2 = 500;  
           int problems = 100;  
           System.out.println("Problem Dimensions: A("+a1+","+a2+") and B("+b1+","+b2+")");  
           System.out.println("Number of problems to solve: " + problems);  
           System.out.println("Starting with JBLAS...");  
           long s = System.currentTimeMillis();  
           for(int i = 0; i < problems; i++){  
                DoubleMatrix A = DoubleMatrix.rand(a1, a2);  
                DoubleMatrix B = DoubleMatrix.rand(b1, b2);  
                DoubleMatrix X = Solve.solve(A, B);  
           System.out.println("Time to solve "+problems+" random problems: " + (System.currentTimeMillis()-s) + " ms");  
           System.out.println("Computing with PColt...");  
           s = System.currentTimeMillis();  
           for(int i = 0; i < problems; i++){  
                DoubleMatrix2D A = new DenseDoubleMatrix2D(a1, a2);  
                DoubleMatrix2D B = new DenseDoubleMatrix2D(b1, b2);  
                DenseDoubleAlgebra alg = new DenseDoubleAlgebra();  
                DoubleMatrix2D X = alg.solve(A, B);  
           System.out.println("Time to solve "+problems+" random problems: " + (System.currentTimeMillis()-s) + " ms");  

Thursday, January 26, 2012

Histogram Intersection

Consider the problem of calculating the histogram intersection, defined as:

where a and b are two histograms with n bins each. This is a useful similarity measure for images and is even proved to be an effective kernel for machine learning. An alternative way to compute it, using a non-branching rule is:

where non-branching means that we do not need to explicitly compare the values of each bin using an if statement. That might be an advantage, considering that in some programming languages, the evaluation of the abs() function is more efficient than the evaluation of an if statement. For instance, when programming for a GPU.

In my experiments using python (numpy), I have found the latter to be a little bit faster. In matlab, the difference seems to be smaller, although still in favor of the alternative formulation.

Here I have code snippets:

def intersection(A,B):
  K = []
  for v in A.T:
    K.append( 0.5*N.sum( v.A + B.T.A - N.abs(v.A - B.T.A), axis=1 ) )
  return N.asmatrix(K).T

function K = intersection(A, B)
a = size(A,2); b = size(B,2); 
K = zeros(a, b);
for i = 1:a
  Va = repmat(A(:,i),1,b);
  K(i,:) = 0.5*sum(Va + B - abs(Va - B));

If you know how to optimize that code, I appreciate your comments.

Wednesday, December 28, 2011

Parameters for Gradient Descent

I'm implementing a machine learning algorithm using gradient descent. The cost function is a standard error measure, so the derivatives are used to define the updating rules. Besides from regularization, which is an important parameter for learning, the other important parameters are related to moving the solution (step size) and stopping the search algorithm.

Some strategies to select these parameters:

  • Choose a step size that is decreasing with respect to time. Which is that function? step = step/t ? or step = step/2? Also, what's the initial value of step?
  • Stop after a fixed number of iterations: easy to interpret, just update the model parameters N times, that's it. However, what if we actually needed less iterations, or even worse, what if we might reach a better solution by computing a bit more of iterations.
  • Stop after a desired error has been reached: it's not so obvious if the algorithm can reach the value that we desire. This mainly depends on the input data and the objective function.
In my implementation and using real data, I have found the following heuristics to be useful (assuming M training examples):
  1. Set the step size as a small fixed number (with no decreasing value). More specifically, set step = 1/M.
  2. Set the maximum number of iterations equal to twice the number of examples (2*M).
  3. Set the desired error as a hundredth of the error in the initial iteration.
  4. Combine the two criteria and stop the algorithm when the first of them is met.
I spent a lot of time trying to find useful values, with many trials guessing by hand, and also exploring parameters in certain ranges. Only using these rules I have found the best solutions. 

I am specially surprised by the accuracy of the step size. Using a smaller value leads to very slow convergence, whereas using a bigger value leads to fast convergence on a local minima. Another surprising result is the max number of iterations. No more iterations are really needed, and less than that, can prevent the algorithm to reach a steady solution.

I have to say this works for datasets of moderate size, on the order of thousands. I haven't tried on bigger datasets, not sure if this works on the order of millions, for instance. It's hard to imagine two million iterations to reach a good solution. Also, this heuristics work from an optimization perspective. I haven't measured the effects of overfitting, which turns out to be a serious problem.

Wednesday, December 14, 2011


Simple scripts to normalize column vectors in a matrix using Matlab:

% L2-Norm
X = X./( ones(size(X)) * sqrt(diag(diag(X'*X))) );
% L1-Norm
X = X./( ones(size(X)) * diag(sum(X)) );

Saturday, June 11, 2011

Search Engine for Visual Patterns

    How to design an image indexing system to search visual patterns in an image collection? Here some basic principles:
    • The way to go may be using sparse dictionaries and sparse representations. The reconstruction scores for a particular block are the key to evaluate whether an image matches other images or not.
    • The visual dictionary should be sparse enough to allow high discrimination of patterns, highly sparse representations and of course, sublinear access to visual patterns.
    • Index all visual patterns per image. That is, do not try to make only a global image representation, but try to index each individual pattern in the image. That is, if you extract 10,000 blocks from one image, each block should be indexed.
    • Blocks can be rejected from the image (as stopwords) using a sparsity criteria according to the elements in the visual dictionary that are required to reconstruct the original signal.
    • The index of individual patterns should take the sparse representation based on the visual dictionary and also the scale and position in which the pattern is found in the source image.
    • An image index can be also designed using a summary of the sparse representations of all its blocks. That full image representation might have features related to the "more frequent" visual patterns, the location of those patterns, spatial information such as pyramid-based features, and so on and so forth.
    • Using the index of visual patterns, we can explain to the user how the images are matching the query. This can be regarded as snipetts for image search.
    • Two different search tasks may be solved using this index: search based on a image examples (full match) and pattern-based or subimage based search.
    • Search algorithms to solve each of these two tasks might be very sophisticated. For instance, to solve the image example search, one can try to analyze spatial positions of matching patterns, scales in which patterns appear and so on. The same analysis can be followed to design a pattern-based search.
    The most important thing is to produce a highly sparse image representation, even though each sub-pattern is being indexed. Also, to achieve this we need a massively distributed computing infrastructure to allow this kind of indexing to get done in reasonable amounts of time. 

One of the possible flaws in this image indexing method is the role of afine transformations in images, that is rotations and translations. Even though translations are easily modelled, rotations are a big deal. May be this new paper "TILT: Transform Invariant Low-rank Textures" could be interesting to bring an invariance property for a visual pattern search engine.

Jul 20, 2010
Redmond, WA, USA

Industrial-strength Audio Search Algorithm

    The following paper catches my attention since the search algorithms perspective: http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf I haven't checked it thoroughly, but just staring at the titles and results, it looks great. First of all, it searches based on a sample of music, that is, it does not need a complete song to find other ones, it just needs a piece of music to identify and "almost perfectly match" the right song. So I wonder how does it index audio, without exhaustively process each "window" in the source? Also, some hash techniques are being used to compute the subset of possible matching elements, in a similar way as my current internship mate (Yong Jae) was proposing. He talked about hashing algorithms to find the more similar shape in a shape database, based on an example shape. Also, this paper talks about constellation strategies. I'm not really sure what that actually means, but it's somehow related to the pedestrian tracking work presented here by a researcher in MSR. I have to check the lecture out, to get at least some notion of the work he is doing. It could be interesting to see this paper in detail, so please, consider it when you start designing a visual search engine.

July 4, 2010
Redmond, WA, USA

The Future of Image Search


    This talk is about computer vision applied to recognition of objects in images. The speaker has shown a time line of object recognition starting with character recognition to calltech 101 challenges and face recognition. They say face recognition has been already solved, so that approach might be extended to the 30,000 visual categories. 

Sorry, it is not about face recognition, it is about face detection, since the problem of image analysis is to detect semantic concepts.

May 19, 2010
Bogotá, Colombia

Query Paradigms

    When accessing image collections in a content-based image retrieval system, it is often pointed out that the user is not aware of the low-level image representation, in terms of colors, textures and shapes. Blobworld did some effort to make the process more understandable since the user view point and that was one of their main contributions in terms of concept and paradigm. Sketching shapes, presenting image examples and so on and so forth, the main problem with these approaches for accessing images by content is that users do not know exactly what are they searching for and cannot figure out how the system works to provide more meaningful queries. I think a pixel-based index might provide a more natural way to understand what the system is actually doing and also can be useful to show the exact visual matching between image query (or region) and result images. 

May 13, 2010
Bogotá, Colombia