Fun with PHP data types

February 2nd, 2012

All data stored by PHP is put into one of eight different types of buckets, technically called data types. The data type determines what operations can be carried out on that piece of data.

Four of the data types are scalar–they support only a single value. These are integer (a whole number), float (a number with a decimal point), string (a sequence of symbols) and boolean (0 or 1).

PHP also has two compound data types. One is an array, or a bounded collection of multiple values. The other is an object, or a collection of data that also contains properties and methods. Finally, PHP has two special data types. One is a resource, which is a reference to some external data source. The other, null, has only one value, null.

PHP is a loosely typed language, meaning the variables do not need to have their data types defined before they are used. If you code $sample = 4.5; then PHP will assume the the $sample variable is a floating point data type. It is totally up to you to make sure that you pass the correct data type to any operation (i.e. don’t ask PHP to multiply two text strings).

That said, PHP offers some tools to help get a handle on all your loose data types. PHP’s gettype() function can let you know what type of data type the provided variable is, i.e. gettype( $sample ). You can also change a data type with PHP settype(), i.e. settype ( $sample, “integer”) will return the value of 4. PHP tries to preserve as much of the original value as possible during the conversion process.

Finally, you can set a variable’s type even before it is assigned a value. This is called type casting. Here you place the name of the data type, in parenthesis, before the name of the variable. So “(float) $sample = 4.5″ will ensure that the $sample variable will be a floating point data type.

From the book




All mistakes are my own, however…-Joab Jackson

Calculus: The Shape of Equations

December 1st, 2011

The nature of an equation can be revealed by using a graph.

A linear equation produces a straight line, which may be sloped. A quadratic equation produces a parabola.

A linear equation takes the form of y = mx + b, where x is the independent variable, m and b are constants. The result is always a straight line through a graph. See example.

The straight line of a linear equation usually has a slope, in relation to the x axis. A vertical line indicates an infinite, or undefined, slope.

The slope is defined by the vertical distance divided by the horizontal distance, for any two points on the line (y2 – y1 / x2 – x1)

Miles per gallon is a type of linear equation. If you drive 20 miles (y axis), and burn 1 gallon of gas (y), the slope in essence represents 20 mpg.

If the linear expression comes in the form of y = mx + b then the slope is x. A positive slope ascends left to right, a negative one descends left to right.

****

A quadratic equation, on the other hand, produces a line in the shape of a parabola, one with its axis parallel to the y axis.

A quadratic equation is a polynomial equation of the second degree, meaning the highest exponent of x is 2. The form is

Ax2+ bx + c = 0

Where a, b and c are constants, and x is variable. A can not equal 0

If a is greater than zero, the parabola will open upward; if less than zero it will open downward. The a coefficient determines the outward slope of the parabola; The higher the number the more narrow the parabola.

C determines the offset where the parabola intersects the y axis.

Because it is a second degree polynomial, when it is solved for x, it will have two answers, or roots. On the graph, these answers will be the two places the line crosses over the x axis.

Some answers are complex numbers, notably when the parabola doesn’t touch the x axis.–Joab Jackson

JavaScript’s Document Object Model (DOM)

October 20th, 2011

For JavaScript programming langauge, the Web page is a single object, called the document object. The Document Object Model (DOM), therefore, is the way Web programmers can read and update elements on a page using JavaScript.

To step back for a moment, JavaScript works with browsers though object models. The Window is the global object model for the language, for browser operations. All variables assigned at the Window level are global properties.,i.e. Window.counter = 97. In essence, this means that the variable “counter” can be accessed from anywhere in the window.

The Document Object Model (DOM) is a property of the Window object, and it is the one that JavaScript programmers work with most often. The DOM contains the properties that determine how the web page is displayed by the browser. This provides the basis for JavaScript to modify the Web page. JavaScript is event driven programming, in which the program waits for an event to happen, in this case some action by the user of the browser.

Commonly used elements, such as images and forms are represented as arrays, i.e. Document.images[3] refers to a specific image, one tagged by the HTML tag.

Material taken from the following books (all mistake are my own)….

Hadoop for the single node

April 5th, 2011

Hadoop has been hyped quite a bit in the trade press lately, where it is pitched as a way to analyze data residing across many different servers.

The curious can also set up a Hadoop instance on a single Linux server.

On a single machine, Hadoop offers no distinct advantage in data processing over using any one of a number of other Unix tools, such as awk.

Nonetheless, installing a single-node Hadoop instance will give the administrator a feeling for how Hadoop works. It will also come in handy as more MapReduce-based programs become available. And if you are developing MapReduce jobs, testing it on a single node is the easiest way to go.

Looking to set up my own Hadoop instance on my Ubuntu Linux server, I used Michael Noll’s guide. Tom White’s O’Reilly book, “Hadoop: The Definitive Guide” also came in handy.

All in all, installing the Hadoop package is fairly easy, in terms of setting up any program from the Linux command line. You download the tarball from an Apache mirror, unzip in the desired parent directory, tweak a few configuration files and you are set. I went with the latest stable version, which was 0.20.2 at the time of this writing.

Myself, I placed the contents of the tarball in /usr/local/hadoop directory.

Prerequisites

You do need to do some preparatory work. A Java Virtual Machine (JVM) needs to be installed on your machine, if it isn’t already.

Secondly, Noll suggests setting up a separate Hadoop user account for running Hadoop jobs. Keep in mind, however, setting up a user account named Hadoop provides another possible point of entry for malicious attackers, who are always trying to log-in server SSH ports with the user names of programs, such as MySQL, Nagios, Oracle and others.

I haven’t seen any attempted log-ins using Hadoop account name yet on my own server, but it’s just a matter of time no doubt. (Another security red flag with this set-up is that you create a key pair for SSH with an empty password. SSH is needed so that Hadoop can log into other nodes, though the empty passwords also makes break-ins less traceable).

Another peculiarity of the Noll’s installation is that he asks you to disable the server’s IPv6 compatibility, due to the way Hadoop interacts with IPv6. He argues that if your server is not using IPv6, then this shouldn’t be a problem.

And this is probably true for most systems now, though disabling IPv6 might be the exact sort of thing you forget about a few years down the road when you are trying to hook your server to an Internet Service Provider via IPv6.

As an alternative, Noll suggests disabling IPv6 on Hadoop itself, which seems like a more reasonable idea.

Lastly, you need to create a dedicated directory where Hadoop can store its work files, and give the Hadoop full permissions to use that directory. I chose, for instance, /usr/local/hadoop/data-store, which resides in the directory of my Hadoop instance.

Configuration

Once Hadoop is downloaded and unzipped, the first thing you need to do is makes some changes in the configuration files.

In the hadoop-env.sh file, you must specify where the Java JVM resides at.

In the core-site.xml file, you must specify where the working directory is, and the user name that will be running Hadoop. In my case, it was “/usr/local/hadoop/data-store/hadoop-${user.name}, with {user.name} to be filled in with “Hadoop”

Noll also offers some configurations additions to add to the mapred-site.xml (for MapReduce configurations) and hdfs-site.xml (for the file system configurations). All these files are found in the Hadoop “conf” subdirectory.

Finally, you need to format the working directory in the HDFS (Hadoop File System). This file system is laid over your current working file system for this directory. The command for that operation is:

$ /usr/local/hadoop/bin/hadoop namenode -format

This command will format the directory specified in the core-site.xml file (/usr/local/hadoop/data-store in this case).

And that is petty much it, installation-wise.

Starting and Stopping

Starting Hadoop can be done through the command line:

$ /usr/local/Hadoop/bin/start-all.sh

This will start a number of services, namely Namenode, Datanode, Jobtracker and a Tasktracker. If all is working properly, the command line will respond with a set of messages indicating each program has been started.

Stopping Hadoop can be done thusly:

$ /usr/local/Hadoop/bin/stop-all.sh

–Joab Jackson







Math: Inequalities describe the problem

March 20th, 2011

An inequality is a statement, or equation, defining the difference between two (or more) numbers, in terms of their sizes or their relative order along the number line.

At their simplest, inequalities are stated as two numbers, using the greater (>), less-than (<) or equal (=) symbols. "7 > 3″ and so on.

Inequalities can be used to expressed many different kinds of problems. Answers.com offers an example of someone who has $100 and wishes to buy two $40 pairs of shoes and as many pairs of $4 tights as possible. The problem is: how many pairs of tights can be purchased? The inequality, or problem, would be stated as “(2*$40) + ($4 * x) <_ $100," and the answer is 4 or less.

In these cases, Inequalities are incomplete, meaning that they include at least one as-of-yet unknown value, such as x in the example above. These inequalities can have one or more unknown variables, or an unknown quantity or range of quantities. To "solve" an inequality means using the rules of the equation to find all the possible values that would work in these variable placeholders, thereby identifying all the values that would make the inequality a true statement.

You solve an inequality much like you solve a linear equation, namely by isolating the variable on one side of the inequality.

For instance, x here:

x + 5 < 8
=
x < 3

…can stand for any number less than 3.

One thing to keep in mind: when factoring out, if you divide or multiply the equality by a negative number (to isolate the variable), the inequality sign flips, i.e.

-10x > 40
=
-10x/-10 < 40/-10
=
x < 4

Some material borrowed from the book:




Mistakes are my own, however–Joab Jackson





Perl: two ways of extracting MySQL data

February 12th, 2011

With Perl, you can extract data from a MySQL database using the DBI module, with one of two methods it offers, fetchrow_array() and fetchrow_arrayref().

When interacting with the Perl DBI module, MySQL returns one row of data at a time, one element of data per column. fetchrow_array() allows you to work with each row of data as it is called, usually by using a loop of some sort. fetchrow_arrayref() stores the entire result of a query, which may have multiple rows, into an array, which you can unroll later.

fetchrow_array() is best for calls you where you want to process the results immediately. fetchrow_arrayref() is more suited for those times when you have to draw multiple sets of data from a database. Using fetchrow_arrayref(), your program fetches everything it needs first, through multiple SQL calls, and then parses the results later.

Here’s how to implement: First, For a Perl program to draw MySQL data, first you call the DBI module:

use DBI;

Next you open a connection with the database, filing in the values for the database, log-in name and password, thusly:

$dbh = DBI->connect(‘DBI:mysql:[NAME OF DATABASE]‘, ‘[LOG IN NAME]‘, ‘[LOG IN PASSWORD]‘)
|| die “ERROR: $DBI::errstr”;

Then you prepare the SQL statement:

$query = “[SQL QUERY (w/o trailing semicolon)]“;

Here you prepare and execute:

$sth = $dbh->prepare($query);
$sth->execute();

From here, you use either fetchrow_array() or fetchrow_arrayref().

Using fetchrow_array(), you’d set up a loop that catches each column of data as a separate variable. Here is an example using a MySQL table with 3 columns per row:

while ( @row = $sth->fetchrow_array) {
$variable1 = $row[0];
$variable2 = $row[1];
$variable3 = $row[2];
}

Finally, after you’ve completed your database calls, you should close the connection:

$sth->finish():
$dbh->disconnect();

(A working template for a Perl fetchrow_array-based program can be found here).

In the second approach, fetchall_arrayref(), stores the entire set of results from the query within a single array. Then you extract data from the array.

As in the previous example, you prepare and execute the data query the same way, but then you use fetchall_arrayref() to store the results in a Perl array. So:

$data = $sth->fetchall_arrayref();
$sth->finish;

foreach $data ( @$data) {

($variable1, $variable2, $variable3) = @$data;
print “$variable1\n”;
print “$variable2\n”;
print “$variable3\n”;
}

Note you close “sth” before the data is processed here. This allows you to make multiple calls before processing the data with Perl. Don’t forget to close the database though (“$dbh->disconnect()”).

(A working template for a fetchall_arrayref()-based program can be found here).

Material taken from the book…




All mistakes are my own, though.–Joab Jackson





Regex: Three quantifiers

February 6th, 2011

This is a blog post about how three regular expression metacharacters, namely ?, * and +, can help describe complex patterns. The differences between them are subtle, but useful.

When used in a regular expression, the ? metacharacter signifies that the first character preceding it is an optional one. For instance, the expression “Jeffre?y” would match either “Jeffry” or “Jeffrey.”

To identify more than one symbol, the ? metacharacter can be attached to a parenthesized expression. For instance the expression “Jeff(rey)?” would match either “Jeff” or “Jeffrey.”

The + character is a quantifier, meaning that it will look for strings that have one or more instances of the the character preceding it. For instance, “sto+p” will match either “stop” or “stoop” because the expression looks for the string s-t-(one or possibly more occurrences of o)-p.

The + metacharacter also works with parenthesis, meaning, for instance, the search “(aei)+” would match the phrase “aeiaeiaei.”

Note that unlike ?, + needs to match at least once to in order to return a result.

Serving a nice in between between ? and + is the * metacharacter. The * can find multiple instances of an optional character. This means that, like the ?, the character before it may or may not be there. And like the +, there may be multiple instances of the preceding character.

For instance, say you are looking for a string that may, or may not, have one space, or multiple spaces, in between two words. In other words, the phrase could be “Live Free” or “Live Free” or perhaps “LiveFree.” You would use the * thusly in order to match any of those occurrences: “Live *Free”

Material taken from the book:




all mistakes are my own however…–Joab Jackson





Regex: Encompassing your needs with parentheses

February 2nd, 2011

While Character Classes can be used to sum up the possible variations within a single space, the regular expression language also provides a way to look for multiple multi-character expressions, through the use of parentheses, (), as well as the | symbol.

For instance, if you are looking, in a particular location, for either the word “train” or “bus” you would express that as “(train|bus).”

Alternation can also be used to alternative word spellings as well. If you are looking for either the word “color” or “colour,” one way to build the expression would be “col(o|ou)r.”

Material taken from the book:




all mistakes are my own however…–Joab Jackson





Regex: character classes bracket the possible

January 23rd, 2011

One of the ways in which regular expressions (regex) are more powerful than simple pattern matching filters is that the regex syntax offers a wide set of metacharacters that can be used to identify complex patterns.

For instance, regex uses a set of square brackets, [], to hold a character class, or a range of possible characters that could fit within a single space.

In other words, using a character class, you can match an expression that could have one of a number characters in a given space.

For instance, the regex h[eu]llo World, would match either Hello World or Hullo World.

Character classes have a range of metacharacters to help advanced searching.

Within a character class, the - character represents a range of characters: <H[1-6]> would match <H1> through <H6>.

Ranges within character classes also work for letters, though they are case sensitive: [a-zA-Z] would work for all letters.

Character classes can consist of a combination of ranges and literal characters: [a-z7!].

Note, however, that each instance of a character class is a set of possible values for a single space: [acquainted] will match every word with the letters, a,c,q,u,a,i,n,t, e or d, not the word acquainted itself.

You can also find phrases that do not have a particular phrase, through the ^ within a character class: [^c] matches any word that does not contain the letter c. s[^k] will highlight any instances where an “s” is not followed by a “k,” and ignore those where it is (such as “sky”).

The dot, “.” is a place holder. It represents any character. For instance, if you are looking for a word with an unknown second character (“h7llo” or “hxllo,”) you could use h[.]llo which would match any occurrence of the pattern “h?llo”

Keep in mind that, within regular expressions, regex metacharacters such as “^” and “-” have different meanings when they are placed inside characters classes than when they are outside them.

Material taken from the book:




all mistakes are my own however…–Joab Jackson





Nailing down the Uncertainty Principle

January 17th, 2011

The Uncertainty Principle is not difficult to understand. It is not weirdly spooky and beyond explanation. It does not involve electrons copying themselves, appearing in two locations at once, or somehow sensing our presence.

The Uncertainty Principle simply states that, at quantum levels, you can not measure something without disturbing what you are measuring.

I’m listening to set of 1962 CalTech lectures given by famed physicist Richard Feynman in 1962 (offered by Audible as a series of recordings called “The Feynman Lectures on Physics”), and he offers the definitive explanation of this phenomena.

Werner Heisenberg coined the term “Uncertainty Principle” in 1927, as a way to explain why it is impossible to specify both the position and momentum of a moving electron at any given time.

The need for determining the coordinates of electrons came about as an attempt to better understand why particles, when assembled en mass, can act like waves, showing interference patterns.

Now, typically, wave interference patterns can be found when the output from two sources of wave-like energy, such as light, intersect. The two sets of waves, when they intersect, cancels out some waves and heightens others.

Typically, with two adjacent sources of streaming particles, you will not see an interference pattern, just two adjacent bands of streaming particles.

However, at the quantum level, two adjacent sources of electrons would indeed produce an interference pattern. The famed double slit experiment illustrates this by showing a wave pattern when electrons from a single source are shot through two adjacent slits.

Even if electrons are shot out one at a time, so that they don’t bump into one other, they still produce an interference pattern.

But–and this is the weirdest part–they produce this pattern only when they are being observed, meaning the presence of each electron is recorded as it is observed. When their presence is not recorded, they go back to making non-interfering patterns.

While many explanations conclude that the electron somehow knows it is being observed, Feynman doesn’t go in for this mystical of an explanation.

Instead, he talks about the effect that physical measurement has on the test. At these levels of precision, you can not measure something without disturbing what you are measuring.

“Looking at electrons disturbs them. And the light waves we’re shining on them as they come through, it is like hitting them with a hammer,” he told the students. The photons are in effect hitting the electrons, changing their courses. “The light makes a big influence on the electrons.”

This interference isn’t seen when measuring large objects, such as baseballs or bullets, because the effect of the measurement is so small in relation to what is being measured. It only becomes apparent at the quantum level.

At first glance, one might have a number of solutions to beat this problem: Turning down the intensity of light used to observe the electrons, in effect reducing the number of observation photons. This approach reduces the number of electrons you detect though.

Another idea: Elongate the light waves would improve the ability to detect the particles. This approach, however, though reduces the the ability to pinpoint its exact location of the electron.

“It is impossible to arrange the light in such a way that … you can tell which hole it went through, but at the same time … won’t disturb the pattern” he said. “No one has ever designed a way to beat this game, to beat the uncertainty principle,”

As a result of the uncertainty principle, physicists must be careful to explain phenomena in such a way that it either accounts for the influence of the equipment measuring the phenomena, or makes no statements about the phenomena in its unmeasured state.

“This is the logical tightrope on which we have to walk if we wish to describe nature successfully,” he said. “If you don’t look don’t say it has to do this or that; only when you look, and then it does.”

Instead, at this level of measurement, physicists use statistical probabilities as a measurement tool, namely probability amplitude. For an event that could occur more than one way, “Probability amplitude is the sum of each way it could occur, should it occur separately,” Feynman said.
–Joab Jackson

Some material taken from the following lecture…