Sunday, 21 December 2014

on Statistical Machine Translation

Statistical Machine Translation is one of my main research interests. To that end, I achieved the rite of passage through the MOSES Phrase-based statistical machine translation software setup. which means that i can use Moses software to build a French to English translation system. I am now aiming to construct statistical translation systems based on MOSES that can be used to translate between Ugandan languages and English. There is already reasonable progress concerning Luganda -> English (not by me). Before anything can be done for other languages, huge amounts of parallel sentences (translations between two languages) are needed that can be used to start constructing translation models that can be used to produce reasonable translations. To sum it all, parallel sentences are very important for statistical machine translation; for languages that are yet to be looked at, language -specific software for pre-processing the sentences before they are used to build translation models are  very critical.

Saturday, 18 October 2014

Web Engineering

This week i began teaching Web engineering methods. I am surprised that they are many and have implementations that automate them! This post is aimed at helping me to record these wonderful methods and their implementations.

1. Object-oriented Web Solutions [based on the OO Method and implemented by OlivaNova (commercial tool)]

Monday, 7 July 2014

Bar plots and histograms in R

Last week i had to analyse some data and present some statistics about it. In order to make my statistical presentation fanciful, i needed to plot statistical distributions in either bar plots or histograms. I know Microsoft Excel can do the job here. But I desired to have this done using the R statistical package. And this is how it went:

Data entry

The first step is to have the data that will be used to generate the plots. In R, data can be entered manually, or can be imported. For very small data sets, I preferred to enter the data manually. But it is often the case that we have to generate plots for large data sets. But in both cases, one requirement is that this data has to be captured in a specific data structure like a vector (which is the simplest case) or matrix.

Initially my first attempt was to plot a histogram but this turned out to be very confusing whereas the bar plots were relatively easier to implement.

Bar Plots

Manual Process

Consider some frequencies for 10 interval ranges between 0 to 100%
0-9:  2
10-19:  4
20-29:  8
30-39:  18
40-49:  30
50-59:  50
60-69:  40
70-79:  28
80-89:  15
90-100:  5

Now a bar plot would be suitable in this case. The idea is to plot the ranges horizontally and the frequencies vertically. So what you need to do here is to put the frequencies in a vector (call it freqs). This is how it goes in R:

R-prompt> freqs -> c(2, 4, 8, 18, 30, 50, 40, 28, 15, 5)

Then you specify the ranges for the horizontal axis:

R-prompt> names(freqs) -> c("0-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90-100")

And finally do the plot:

R-prompt> barplot(freqs, xlab="range", ylab="frequency")



Wednesday, 28 May 2014

Dangerous Linux operations

This week i decided to do some natural language processing (NLP) research which mainly involves running NLP task experiments in Ubuntu. As i was installing the required tools, i started tampering with permissions on several files including those in the root directory. Specifically, i applied chmod -R [NNN] to several directories and then suddenly many applications and functions somehow just vanished! I tried to find a solution but ended up wasting a whole evening and night! Later, i had to reinstall the latest version of Ubuntu (14.04) to have a go at my experiments ASAP. This means i have started from scratch again.

LESSON: Don't mess around with system files. apply chmod in root directory and you are very likely to have to reinstall again.

Sunday, 30 March 2014

Useful linux operations (inspired by linux commando)

Inspired by linux commando's blog posts (http://linuxcommando.blogspot.com), i will be adding a list of helpful commands that have helped me in my research experiments down here.

1. Convert a file's content to all lower case

Using Perl

prompt> perl -pe '$_=lc($_)' < inputfile > outputfile

Using SED

prompt> sed -e 's/\(.*\)/\L\1/' < inputfile > outputfile




2. Convert files from DOS or MS Windows that have ^M\n to a file with a single line of text where ^M is removed

Using Perl

perl -pi -e 's/\r\n/ /g' < inputfile > outputfile

Tuesday, 18 March 2014

Theories on the location of a 'dead rat' in my CIT office room

Since yesterday, i have not managed to find a 'dead rat' (that is if it is) in my School of Computing and Informatics Technology (SCIT) office room.

Since then I have developed and encountered several theories about the dead rat's location. One of my colleagues had a theory that the currently unseen rat was imported from elsewhere in a boxed package. So, we got rid of the big boxes and their contents. But the smell remained. Another theory is that it could have died under one of the bookshelves. We checked and noticed that the location under the bookshelves had not been cleaned in a long while. The "under bookshelf theory" was extended to suggest that the dead rat's remains could have turned into liquid in the process hence the bad smell. I made sure that the cleaning of the room's floor was improved by moving 'suspect furniture' to expose places that are not regularly cleaned. However, the bad smell has stuck. I developed a theory that the bad smell in my room could be coming from outside (maybe from a dead bird). I must add that the smell is not only in my room but also in two rooms that neighbor my room on opposite sides. My theory has been trashed by one of my colleagues reasoning that the smell from a dead rat is distinctive and it is the smell that is in the three rooms. I have looked out the window and around it and i don't see anything. As i continue imagining the dead rat's whereabouts another theory has emerged that it could even be more than one rat! that these rats could have eaten something poisonous and they died. But the central question still remains, where are these rats? The last theory that is costly to verify is that the rats could be located somewhere in the rooms' cabling covers. Which means that we have to unscrew and remove the covers to check. But it's very difficult to find unscrewing tools. It is now becoming very likely that i may have to put up with the smell until when it is no more. I can just hope that the bad smell 'goes' away soon. Or maybe an air freshner will work here. And what of a disinfectant. Oh, let me see how this ends ...