Unix: Converting files between DOS and Unix

February 28th, 2010

Recently I found that, after a uploading file from a Windows computer to a Linux one, and opened the file from the command line, Ubuntu would notify me that it was converting it from the DOS format.

Even if it was a standard text file (.txt) filled with ASCII characters, it still needed converting.

Why? Aren’t text files the same across different operating systems? Evidently not.

Unix handles end-of-line signifiers differently than Windows/DOS does, according to Sumitabha Das’s book “Your Unix”.

Specifically, DOS uses two different sets of characters, “\r” (for Carriage Return [CR], or simply “enter”) and “\n” (for Line Feed [LF]) to signify the end of a line.

Unix only uses one, namely LF

These markers can both be seen by examining text files with Octal Dump.

Ubuntu anyway seems to handle DOS text files easily in day to day operation. Nonetheless, most variants of Unix/Linux have a set of utilities to convert files from Windows/DOS into Unix, and back again. They are called dos2unix and unix2dos, respectively.


Taken from this book:


…as well as a class I’m taking on Unix. All mistakes are my own, however.–Joab Jackson

Unix: Decoding binary files with Octal Dump

February 22nd, 2010

In many cases with Unix/Linux, if you want to view a file, using the cat command works just fine. The phrase “cat samplescript.txt”, will reveal, at the command line, the content of that file.

Cat won’t work for binary files, because binary files contain non-printing characters (Or non-ASCII characters). Run a cat on a binary program, such as sed, will only get you a screen full of gibberish, and may even destroy the terminal session itself.

(Storing programs as binary files is more efficient than storing them in ASCII, largely because binary programs use all eight bits in a byte [up to 256 possible combinations], whereas ASCII only uses seven [128 combinations] leaving the last bit to sign the byte).

What Octal Dump (od from the command line) does is display the contents of a binary file, including an execution files, as sets of octals.

As the name suggests, the octal numbering system is a numbering system in base eight. When used with the “-bc” option, he od program renders each byte of the program in octal.

For instance, rendering this command from the command line in the /bin directory of binary files:

od -bc sed

will return a row of six digit octals, preceded by a seven digit number that is the offset, or position, of the first byte in the line. Below each octal is a its conversion into ASCI characters, if the resulting decimal conversion falls between decimal 33 and 127.

As an aside, to convert from octal to decimal yourself, simply multiply each digit of the octal number by a successive power of eight, going from right to left. So, if the octal is 114, then you would calculate (1* [8^2] + 1 * [8^1] + 4 * [8 ^ 0]), which would equal (64 + 8 + 4), which would equal 76


Taken from this book:


…as well as a class I’m taking on Unix. All mistakes are my own, however.–Joab Jackson

Unix: Indexing files with inode

February 16th, 2010

In Unix, an inode is a data structure that holds information about a file, or set of data blocks. You can think of it as an index, or a collection of metadata about a file. It contains info such as the owner, the permissions, the date created and last modified, as well as the location of the data blocks that contain the information.It is kept on a disk in a separate location from the data blocks themselves.

“When users search for or access a file, the UNIX system searches through the inode table for the correct inode number. When the inode number is found, the command in question can access the inode and make the appropriate changes if applicable,” according to the online paper about inodes posted by IBM.

Each time a user creates a file, a corresponding inode is created. It is possible to run out of inode numbers. Typically, however, a disk will run out of space first before it runs out of inode numbers, according to one instructional site. Although typically, the number of inodes is set by the operating system, they can be set during the set up process of the file system.

By using numerical inode numbers as identifiers, the OS can have multiple file names, in different directories, point to the same file (Called hard linking). inodes are also handy during file system maintenance or recovery operations, such as fsck. fsck checks for lost inodes, or inodes with no pointers, and attempts to repair them.

One can use the “df” command to check the remaining percentage of inodes left on a system. For Ubuntu Linux, the command is “df -i.” To find the inode numbers of all the files in a directory, type “ls -i”

–Joab Jackson

Windows: Troubleshooting a non-working Hosts file

January 30th, 2010

What do you do when your Windows XP computer isn’t recognizing the Hosts file? Here are a few possible solutions.

Recently, I ran into this devil of a problem. I wanted to do some internal testing of a Web site, from a browser on a Windows XP machine. So I added an entry in the hosts file on the XP machine that would redirect joabj.com to the internal IP address of the server (”192.168.0.33 joabj.com” in this case). (Typically, in WinXP, the Hosts file was located in the C:\WINDOWS\system32\drivers\etc folder). Yet, the browser still returned errors!

Yet, the browser kept consulting the external DNS service first, and returning the wrong page (my cable modem page in this case).

Most infuriatingly, Windows host file command line tools (namely, ping and SSH) recognized entries, but the browser did not!! If I ping’ed my domain name entered into the Hosts file (”ping -a joabj.com” in this case), it pinged the correct IP number (”reply from 192.168.0.33:” etc…).

Surfing the Web, I came across a number of different solutions to this problem:

*Reboot: Not only rebooting the machine (duh!), but emptying the browser caches, flushing the DNS (from the command line, type “ipconfig /flushdns”).

*Extra empty characters in the Hosts file: Evidently, Windows doesn’t like an empty space behind the entry, i.e. “192.168.0.33 joabj.com ” rather than “192.168.0.33 joabj.com” –make sure you don’t add in an empty space.

*Corrupt Hosts file: This could be the case even if it opens in Notepad o.k. Try replacing the existing Hosts file with a new one.

*Specify exact subdomain in Hosts file: This is the solution that ultimately worked for me, after trying all these other more complicated solutions, described below.

In a nutshell, if you plan on using the address “www.YOURDOMAIN.com” you should type “www.YOURDOMAIN.com” into the Hosts file, rather than just “YOURDOMAIN.com”.

So, for me, once I replaced “192.168.0.33 joabj.com” with “192.168.0.33 www.joabj.com” then using http://www.joabj.com worked fine, whereas before it wouldn’t.

*Editor adding extension to Hosts file name: Sometimes a text editor could add on the .txt to the file name during save, making it Hosts.txt rather than just Hosts. Of course, then Windows won’t recognize the Hosts file, and Explorer won’t show, by default, the suffixes of file names.

If perusing from Explorer, set the folder view options to show suffixes. From Explorer, go Tools–>Folder Options–>View and uncheck “Hide extensions for known folder types.” If Hosts is a .txt, remove the .txt from the file name.

*XP’s DNS Cache service taking priority: One troubleshooting site suggested this as a probable cause. It didn’t make any difference in my case.

To disable this service, go Start–>Control Panel–>Administrative Tools–>Component Services–>Services(Local). Then search for DNS Cache and disable it. You could just stop the service to check if it has any affect, though it will start up again on reboot. The Manual setting just means that the service will start up, once a browser is fired up. The Disable option turns it off altogether, until you turn it back on again.

According to other people who’ve tried this, disabling DNS Cache should have no ill-effect on your DNSing.

*Reorder the DNS lookup sequence: Typically, Windows XP will consult the local Hosts file before checking with a DNS server to resolve domain name. But, sometimes not.

You change the order of the lookup in the registry. (STANDARD DISCLAIMER: DO not mess w/ registry until you know what you are doing).

To fire up the registry editor, do Start–>Run and put “regedit” in the box.

Once in regedit, go to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\ServiceProvider . Once there, you will see a number of entries, including “DnsPriority” “HostsPriority” “LocalPriority” and “NetbtPriority” –which are the entries for DNS-based lookup, Hosts file-based lookup, Computer-based lookup and NetBios-based look-up. (More info on that here).

In the data column for each you see a number in parenthesis. This number is the priority for that lookup. The lower the number, the earlier in the domain name resolution sequence it is consulted (evidently the range is between -32768 and 32767). If the DNS number is lower than the Hosts’ number, then you want to give the Hosts number a lower number than the DNS number.

So if DnsPriority is 5000 and Hosts is 7000, you may want to change Hosts to, say, 4500

Keep in mind, that when you set the number, by right clicking on the entry and choosing “modify,” you just can’t add the number as is — you will have to enter the new number in either hexidecimal or binary.

One easy way to convert a number into hexadecimal is to call up Windows calculator, switch the view from standard to scientific, then enter the number into the field for entering a number. After the number is entered, look for where “dec” is selected on top of the calculator, and switch that to “hex.”

–Joab Jackson

Functional dependencies: How databases relate

December 8th, 2009

“The single most important concept in relational schema design theory is that of a functional dependency,” write Ramez Elmasri and Shamkant Navathe in “Fundamentals of Database Systems.”

But what is a functional dependency? It is the actual relationships in the relational database. It is the relations among the attributes within a table. It is a constraint between two sets of attributes. The relations are permanent and unalterable.

For instance, a table may have two attributes, or columns. If one is the primary key, we can then say we can always determine the value of the second attribute using the primary key. This means we can use the primary key as an index to look up the second attribute. It is a mathematical certainty. The primary key is the determinant and the other is the dependent, the parlance of database-speak.

(Keep in mind that this relationship does not work in reverse. You can not use a dependent value to definitively determine the primary key, chiefly because the dependent value may not be unique in a given table).

Beyond the simple connection drawn between the primary key and the dependents, a number of other inferences can be made as well, using Armstrong’s inference rules, which cover the laws of logic such as transitivity and reflexivity.

All material taken from a class I’m attending at UMUC on relational databases, as well as from the book….



…All mistakes are my own, though..
–Joab Jackson

SQL: Constraining the Database

October 30th, 2009

Constraining what gets entered into a relational database is a good thing. It maintains the data integrity that is so important for database use. You don’t want users to enter the wrong types of information (I.e. letters instead of numbers)

My database class teaches that there are three different ways to constrain input for a database:

1. Declarative Integrity
2. Procedural Application Code
3. Business Procedure

Declarative Integrity means you put restraints directly in the table design. The constraints specify what input the database accepts. More on them below.

When you restrain by using the Procedural Application Code, you put the constraints not in the database tables, but in the programming logic that handles the input of the data (PL/SQL), with actions such as “triggers.”

Business rules or for those cases when you can not (easily) check the integrity of the data by computer, so you put the rules in the employee handbook (”Enter your real birth date, not a false birth date” would be a silly example).


Here’s some basics on Declarative Integrity. When you create a data table, you can add in constraints on what data is accepted. This can be done as part of a column definition, or at the end of the “create table” statement.

Here are the basic types of Integrity Constraints you can use, at least for Oracle databases:

NOT NULL: When an insert is made, a column defined as NOT NULL must be given some data. The NOT NULL declaration goes right after the data type. For instance, when creating the column with the DATE data type, you would add:

[NameOfColumn] DATE NOT NULL,

UNIQUE: UNIQUE requires each new value entered into that column be different from all those value entered before.

PRIMARY KEY: The PRIMARY KEY is the one column that identifies the column from all the others. As such it is considered UNIQUE, meaning each value entered will be different from all the other values entered. However, you don’t use the UNIQUE qualifier when declaring the PRIMARY KEY (it is implied). Here is an example:

[NameOfColumn] [DataType] PRIMARY KEY,

The PRIMARY KEY can be a composite of multiple column entries, which means each each key must be comprised of a unique combination of values from the participating columns. The composite key is defined at the end of the table creation statement, after the last column definition:

constraint [Name_Of_Composite_Primary_Key] PRIMARY KEY ([Name_of_1st_Participating_column], [Name_of_2nd_Participating_column], [...] )

FOREIGN KEY: a FOREIGN KEY uses as its domain of possible values a PRIMARY KEY from another table. This is written as a column definition:

constraint [Name_Of_Foreign_Key] foreign key ([Name_of_External_Column])

references [Name_Of_External_Table]([Name_of_External_Column])

Note, you can not refer to a table in another database, only to another table in the same database. But you can refer to the primary key even in the same table.

The references clause tells the database to delete the dependent row when the corresponding row in the parent table is deleted.

CHECK: The CHECK constraint allows you to specify only certain values can be inserted, as such:

[Name_Of_Column] [Datatype] Check([Name_Of_Column] [operator] [value]),

For example,

Stats VARCHAR2(2) CHECK (Stats => 0)

….means that any values entered for the Stats column must be 0 or higher.

Material taken from this book….



…As well as from a class I’m taking on database design. All mistakes are my own…–Joab Jackson

PHP: Post the results of a simple MySql Query on a Web Page

October 24th, 2009

Say you want to post the results of a simple query from a MySQL database on a Web page, using PHP. You would think that all you’d need to do is assign a variable name to the results of the SQL query, and then ask PHP to print the variable.

It doesn’t work that way. Instead of PHP printing the result, what gets printed is a mysterious message, like “Resource ID #3″

As explained here, the variable itself points to a place holder of sorts. To get the actual value, you have to use another MySQL function.

In this case that function would be mysql-fetch-row.

For example, within the PHP body of code, you do something like this:

$QueryResult = mysql_query(”select avg(Height) from Boys);

$ResultInBetweenStep = mysql_fetch_row($QueryResult);

$ResultPresent = $ResultInBetweenStep[0];

echo($ResultPresent);

In the above quote, we’re getting the result of a query from a table called Boys that is the average of all the entries in the Height column. It is assigned to the variable $QueryResult.

In order to get the actual data from the query, the function mysql_fetch_row is applied to $QueryResult, and the results are stored in another variable, $ResultInBetweenStep.

The final step is to assign a variable to the first row of $ResultInBetweenStep only (which would be the *only* row in the query, as the average function will return a single number), which, here, is called $ResultPresent.

$ResultPresent can then be printed.

There are other MySql functions that allow you to extract more complex bits of information from a MySQL query. Check the mysql_fetch_* entries here for more info.–Joab Jackson

Web: HTTP, the King of URIs

October 15th, 2009

The Web is based on Universal Resource Identifiers, or URI. “HTTP” is a URI. It is the most popular one, in fact. “FTP” is a URI. There is no centralized organization that manages “official” URIs. There is an unspoken assumption that URIs are not created lightly. Generalizing, the URI specification defines a space in which resources can be organized by some means. “The principle that anything, absolutely anything, ‘on the Web’ should identified distinctly by an otherwise opaque string of characters,” according to the document.

The resources can be formally defined in relation to one another. Or not.

HTTP has proved to be the most popular URI. It is a protocol for getting stuff, hence the name (Hypertext Transfer Protocol). It makes no assumptions about what you want to get, be it a HTML Web page, PDF, Word document, etc. The most used format it serves on the Web, HTML has “no special place architecturally” within the Web architecture. It is just another format. The format of the data is notated by its MIME type.

The intro also states an interesting, though, thus far in my mind, unconnected factoid: That HTTP runs up against some compatibility problems with distributed object-oriented systems developed by the software development community (As opposed to the Web developers), namely CORBA, DCOM and RMI. Remote Procedure Calls could be a bridge: RDF can work with RPC, namely by considering RPC a structured document.

Taken from a W3C document on design issues for the Web, notably this overview section.

–Joab Jackson

Databases: Further Defining Entities

October 11th, 2009

In a previous post, I sketched out the basic structure of a database model. The basic elements are entities, relations and attributes. This post will describe the further definitions that can be made under this model, using the Enhanced Entity Relation Model (EER).

First, we can specify different types of entities a bit further, into subclasses and superclasses. Subclasses are groupings within a set of entities that are clustered in different roles they play, within the category of the entity. If “employee” is an entity, then “technician” and “manager” could be two different subclasses. Inversely, “employee” is a superclass of “technician” and “manager.”

Defining how subclasses is a process called specialization. The employee entity can be broken into specialization subclasses, such as “technician” or “secretary.” Some attributes can only be applied to certain subclasses (a secretary’s typing speed, for instance). These are called specific attributes or local attributes.

In some cases, inclusion into a subclass can be determined by the existence of a certain attribute. These subclasses are called predicate-defined subclasses. Membership is defined by a particular value of an attribute. (Subclasses can also be defined without any particular attribute, though they must be defined manually–Those are called user-defined subclasses).

The reverse operation of specialization–namely to summarize a set of subclasses into a master entity, is called a generalization.

All material taken from a class I’m attending at UMUC on relational databases, as well as from the book….



…All mistakes are my own, though..
–Joab Jackson

Databases: Entities, Relationships and the Attributes That Describe Them

October 4th, 2009

At their most basic, relational databases are composed of three sets of elements: An entity, a relationship or an attribute. Relationships tie together different entities, and both can have attributes.

This rule-of-thumb comes from the Entity-Relationship (ER) model, which sets the formal rules for how different attributes relate to one another, at an abstract level.

An entity can be anything, or, to be more precise, any thing. An entity can be a person, or a type of motorbike, or a particular motorbike. An entity’s attributes are a set of properties used to describe the entity. An person may have a first name, last name, birthdate—entities all.

Attributes can be single-valued or have multiple values (the names of the person’s siblings, for instance). An attribute can be a composite of multiple values: A person’s address itself an attribute. It could be composed of an address, city name, zip code, and other elements.

Attributes may also be derived, meaning that they don’t actually reside anywhere on the database, but can be calculated when needed, i.e. how many people live within a certain zip code.

Also, in many cases, one of the attributes will what is known as a key attribute. This attribute must be a unique value, at least against for each and every other entity kept in the database. The key attribute allows a query to pick out an individual entry, among all the entries. A person’s social security number, for instance, can work as a primary key, because every person’s SSN is different.

Relationships are used to tie together different entities, explaining the relation between the two. A PERSON (one entity) WORKS FOR (a relationship) a COMPANY (another entity).

Relationships can have different levels of cardinality, meaning that one entity can have one or more than one relationships with other entities. A PERSON can work for more than one COMPANY. And a COMPANY can employ more than one person. Such rules are usually defined in the database model.

Relationships, like entities, can have attributes. In the example above, the WORKS FOR relationship can have an attribute such as DATE STARTED.

All material taken from a class I’m attending at UMUC on relational databases, as well as from the book….



…All mistakes are my own, though..
–Joab Jackson