Pixel Coordinates

Mathematics is remarkably efficient language for describing images and how to process them to computers, due to its high precision. As such, we're going to have to learn a bit ourselves, if not just to satisfy the machines.

Thankfully, most of image processing can be thought of using only two dimensions. Intuitively this makes sense: Images are flat.

'most' of?

Occasionally we'll add the concept of 'layers' - colors, bit depth - but many fundamentals are a 2D affair.

Conventions and lack thereof

In typical Cartesian coordinates we're used to 'right' being the positive x-direction and 'up' being the positive y-direction.

Unfortunately, not typically so for image processing. Even worse, it tends not to be terribly consistent. For instance, Woods and Gonzalez have the convention (that I will follow) that 'down' is x-positive and 'right' is y-positive. MATLAB and HTML's SVG use a 'flipped' version where 'down' is y-positive and 'right' is x-positive. The one consistency is that the top left corner is usually always considered the origin, (0,0)

Cartesian coordinates

MATLAB coordinates

Woods and Gonzalez coordinates

Why did I choose Woods and Gonzalez' version?

First, since that's what I'm learning from. Second, because I think of images like a `data.frame` in R, where you index by (row, column) from the top left. If I remember correctly, this is like MATLAB, but unlike Excel.

Don't be seduced by the grid shape of the coordinate system: Our pixels lay with their centers on the intersections of the lines, not neatly slotted between the grid lines.

One additional source of friction is that the top left pixel is at (0, 0), not (1, 1). As a side effect, the bottom right corner of an image that is 6 pixels long and 6 pixels wide will have the coordinates (5, 5), rather than (6, 6).

Put more mathematically (and more generally), an image whose top left pixel rests on the origin (0, 0), and is M pixels long and N pixels wide will have its bottom right corner at (M-1, N-1). From now on, we will often use M and N to denote image length and width.