Skip navigation
Help

linear algebra

warning: Creating default object from empty value in /var/www/vhosts/sayforward.com/subdomains/recorder/httpdocs/modules/taxonomy/taxonomy.pages.inc on line 33.
Original author: 
The Physicist

Physicist: This is a question that comes up a lot when you’re first studying linear algebra.  The determinant has a lot of tremendously useful properties, but it’s a weird operation.  You start with a matrix, take one number from every column and multiply them together, then do that in every possible combination, and half of the time you subtract, and there doesn’t seem to be any rhyme or reason why.  This particular math post will be a little math heavy.

If you have a matrix, {\bf M} = \left(\begin{array}{cccc}a_{11} & a_{21} & \cdots & a_{n1} \\a_{12} & a_{22} & \cdots & a_{n1} \\\vdots & \vdots & \ddots & \vdots \\a_{1n} & a_{2n} & \cdots & a_{nn}\end{array}\right), then the determinant is det({\bf M}) = \sum_{\vec{p}}\sigma(\vec{p}) a_{1p_1}a_{2p_2}\cdots a_{np_n}, where \vec{p} = (p_1, p_2, \cdots, p_n) is a rearrangement of the numbers 1 through n, and \sigma(\vec{p}) is the “signature” or “parity” of that arrangement.  The signature is (-1)k, where k is the number of times that pairs of numbers in \vec{p} have to be switched to get to \vec{p} = (1,2,\cdots,n).

For example, if {\bf M} = \left(\begin{array}{ccc}a_{11} & a_{21} & a_{31} \\a_{12} & a_{22} & a_{32} \\a_{13} & a_{23} & a_{33} \\\end{array}\right) = \left(\begin{array}{ccc}4 & 2 & 1 \\2 & 7 & 3 \\5 & 2 & 2 \\\end{array}\right), then

\begin{array}{ll}det({\bf M}) \\= \sum_{\vec{p}}\sigma(\vec{p}) a_{1p_1}a_{2p_2}a_{3p_3} \\=\left\{\begin{array}{ll}\sigma(1,2,3)a_{11}a_{22}a_{33}+\sigma(1,3,2)a_{11}a_{23}a_{32}+\sigma(2,1,3)a_{12}a_{21}a_{33}\\+\sigma(2,3,1)a_{12}a_{23}a_{31}+\sigma(3,1,2)a_{13}a_{21}a_{32}+\sigma(3,2,1)a_{13}a_{22}a_{31}\end{array}\right.\\=a_{11}a_{22}a_{33}-a_{11}a_{23}a_{32}-a_{12}a_{21}a_{33}+a_{12}a_{23}a_{31}+a_{13}a_{21}a_{32}-a_{13}a_{22}a_{31}\\= 4 \cdot 7 \cdot 2 - 4 \cdot 2 \cdot 3 - 2 \cdot 2 \cdot 2 +2 \cdot 2 \cdot 1 + 5 \cdot 2 \cdot 3 - 5 \cdot 7 \cdot 1\\=23\end{array}

Turns out (and this is the answer to the question) that the determinant of a matrix can be thought of as the volume of the parallelepiped created by the vectors that are columns of that matrix.  In the last example, these vectors are \vec{v}_1 = \left(\begin{array}{c}4\\2\\5\end{array}\right), \vec{v}_2 = \left(\begin{array}{c}2\\7\\2\end{array}\right), and \vec{v}_3 = \left(\begin{array}{c}1\\3\\2\end{array}\right).

Parallelepiped

The parallelepiped created by the vectors a, b, and c.

Say the volume of the parallelepiped created by \vec{v}_1, \cdots,\vec{v}_n is given by D\left(\vec{v}_1, \cdots, \vec{v}_n\right).  Here come some properties:

1) D\left(\vec{v}_1, \cdots, \vec{v}_n\right)=0, if any pair of the vectors are the same, because that corresponds to the parallelepiped being flat.

2) D\left(a\vec{v}_1,\cdots, \vec{v}_n\right)=aD\left(\vec{v}_1,\cdots,\vec{v}_n\right), which is just a fancy math way of saying that doubling the length of any of the sides doubles the volume.  This also means that the determinant is linear (in each column).

3) D\left(\vec{v}_1+\vec{w},\cdots, \vec{v}_n\right) = D\left(\vec{v}_1,\cdots, \vec{v}_n\right) + D\left(\vec{w},\cdots, \vec{v}_n\right), which means “linear”.  This works the same for all of the vectors in D.

Check this out!  By using these properties we can see that switching two vectors in the determinant swaps the sign.

\begin{array}{ll}    D\left(\vec{v}_1,\vec{v}_2, \vec{v}_3\cdots, \vec{v}_n\right)\\    =D\left(\vec{v}_1,\vec{v}_2, \vec{v}_3\cdots, \vec{v}_n\right)+D\left(\vec{v}_1,\vec{v}_1, \vec{v}_3\cdots, \vec{v}_n\right) & \textrm{Prop. 1}\\    =D\left(\vec{v}_1,\vec{v}_1+\vec{v}_2, \vec{v}_3\cdots, \vec{v}_n\right) & \textrm{Prop. 3} \\    =D\left(\vec{v}_1,\vec{v}_1+\vec{v}_2, \vec{v}_3\cdots, \vec{v}_n\right)-D\left(\vec{v}_1+\vec{v}_2,\vec{v}_1+\vec{v}_2, \vec{v}_3\cdots, \vec{v}_n\right) & \textrm{Prop. 1} \\    =D\left(-\vec{v}_2,\vec{v}_1+\vec{v}_2, \vec{v}_3\cdots, \vec{v}_n\right) & \textrm{Prop. 3} \\    =-D\left(\vec{v}_2,\vec{v}_1+\vec{v}_2, \vec{v}_3\cdots, \vec{v}_n\right) & \textrm{Prop. 2} \\    =-D\left(\vec{v}_2,\vec{v}_1, \vec{v}_3\cdots, \vec{v}_n\right)-D\left(\vec{v}_2,\vec{v}_2, \vec{v}_3\cdots, \vec{v}_n\right) & \textrm{Prop. 3} \\    =-D\left(\vec{v}_2,\vec{v}_1, \vec{v}_3\cdots, \vec{v}_n\right) & \textrm{Prop. 1}    \end{array}

4) D\left(\vec{v}_1,\vec{v}_2, \vec{v}_3\cdots, \vec{v}_n\right)=-D\left(\vec{v}_2,\vec{v}_1, \vec{v}_3\cdots, \vec{v}_n\right), so switching two of the vectors flips the sign.  This is true for any pair of vectors in D.  Another way to think about this property is to say that when you exchange two directions you turn the parallelepiped inside-out.

Finally, if \vec{e}_1 = \left(\begin{array}{c}1\\0\\\vdots\\0\end{array}\right), \vec{e}_2 = \left(\begin{array}{c}0\\1\\\vdots\\0\end{array}\right), … \vec{e}_n = \left(\begin{array}{c}0\\0\\\vdots\\1\end{array}\right), then

5) D\left(\vec{e}_1,\vec{e}_2, \vec{e}_3\cdots, \vec{e}_n\right) = 1, because a 1 by 1 by 1 by … box has a volume of 1.

Also notice that, for example, \vec{v}_2 = \left(\begin{array}{c}v_{21}\\v_{22}\\\vdots\\v_{2n}\end{array}\right) = \left(\begin{array}{c}v_{21}\\0\\\vdots\\0\end{array}\right)+\left(\begin{array}{c}0\\v_{22}\\\vdots\\0\end{array}\right)+\cdots+\left(\begin{array}{c}0\\0\\\vdots\\v_{2n}\end{array}\right) = v_{21}\vec{e}_1+v_{22}\vec{e}_2+\cdots+v_{2n}\vec{e}_n

Finally, with all of that math in place,

\begin{array}{ll}  D\left(\vec{v}_1,\vec{v}_2, \cdots, \vec{v}_n\right) \\  = D\left(v_{11}\vec{e}_1+v_{12}\vec{e}_2+\cdots+v_{1n}\vec{e}_n,\vec{v}_2, \cdots, \vec{v}_n\right) \\  = D\left(v_{11}\vec{e}_1,\vec{v}_2, \cdots, \vec{v}_n\right) + D\left(v_{12}\vec{e}_2,\vec{v}_2, \cdots, \vec{v}_n\right) + \cdot + D\left(v_{1n}\vec{e}_n,\vec{v}_2, \cdots, \vec{v}_n\right) \\= v_{11}D\left(\vec{e}_1,\vec{v}_2, \cdots, \vec{v}_n\right) + v_{12}D\left(\vec{e}_2,\vec{v}_2, \cdots, \vec{v}_n\right) + \cdot + v_{1n}D\left(\vec{e}_n,\vec{v}_2, \cdots, \vec{v}_n\right) \\    =\sum_{j=1}^n v_{1j}D\left(\vec{e}_j,\vec{v}_2, \cdots, \vec{v}_n\right)  \end{array}

Doing the same thing to the second part of D,

=\sum_{j=1}^n\sum_{k=1}^n v_{1j}v_{2k}D\left(\vec{e}_j,\vec{e}_k, \cdots, \vec{v}_n\right)

The same thing can be done to all of the vectors in D.  But rather than writing n different summations we can write, =\sum_{\vec{p}}\, v_{1p_1}v_{2p_2}\cdots v_{np_n}D\left(\vec{e}_{p_1},\vec{e}_{p_2}, \cdots, \vec{e}_{p_n}\right), where every term in \vec{p} = \left(\begin{array}{c}p_1\\p_2\\\vdots\\p_n\end{array}\right) runs from 1 to n.

When the \vec{e}_j that are left in D are the same, then D=0.  This means that the only non-zero terms left in the summation are rearrangements, where the elements of \vec{p} are each a number from 1 to n, with no repeats.

All but one of the D\left(\vec{e}_{p_1},\vec{e}_{p_2}, \cdots, \vec{e}_{p_n}\right) will be in a weird order.  Switching the order in D can flip sign, and this sign is given by the signature, \sigma(\vec{p}).  So, D\left(\vec{e}_{p_1},\vec{e}_{p_2}, \cdots, \vec{e}_{p_n}\right) = \sigma(\vec{p})D\left(\vec{e}_{1},\vec{e}_{2}, \cdots, \vec{e}_{n}\right), where \sigma(\vec{p})=(-1)^k, where k is the number of times that the e’s have to be switched to get to D(\vec{e}_1, \cdots,\vec{e}_n).

So,

\begin{array}{ll}    det({\bf M})\\    = D\left(\vec{v}_{1},\vec{v}_{2}, \cdots, \vec{v}_{n}\right)\\    =\sum_{\vec{p}}\, v_{1p_1}v_{2p_2}\cdots v_{np_n}D\left(\vec{e}_{p_1},\vec{e}_{p_2}, \cdots, \vec{e}_{p_n}\right) \\    =\sum_{\vec{p}}\, v_{1p_1}v_{2p_2}\cdots v_{np_n}\sigma(\vec{p})D\left(\vec{e}_{1},\vec{e}_{2}, \cdots, \vec{e}_{n}\right) \\    =\sum_{\vec{p}}\, \sigma(\vec{p})v_{1p_1}v_{2p_2}\cdots v_{np_n}    \end{array}

Which is exactly the definition of the determinant!  The other uses for the determinant, from finding eigenvectors and eigenvalues, to determining if a set of vectors are linearly independent or not, to handling the coordinates in complicated integrals, all come from defining the determinant as the volume of the parallelepiped created from the columns of the matrix.  It’s just not always exactly obvious how.

For example: The determinant of the matrix {\bf M} = \left(\begin{array}{cc}2&3\\1&5\end{array}\right) is the same as the area of this parallelogram, by definition.

The parallelepiped (in this case a 2-d parallelogram) created by (2,1) and (3,5).

The parallelepiped (in this case a 2-d parallelogram) created by (2,1) and (3,5).

Using the tricks defined in the post:

\begin{array}{ll}  D\left(\left(\begin{array}{c}2\\1\end{array}\right),\left(\begin{array}{c}3\\5\end{array}\right)\right) \\[2mm]  = D\left(2\vec{e}_1+\vec{e}_2,3\vec{e}_1+5\vec{e}_2\right) \\[2mm]  = D\left(2\vec{e}_1,3\vec{e}_1+5\vec{e}_2\right) + D\left(\vec{e}_2,3\vec{e}_1+5\vec{e}_2\right) \\[2mm]  = D\left(2\vec{e}_1,3\vec{e}_1\right) + D\left(2\vec{e}_1,5\vec{e}_2\right) + D\left(\vec{e}_2,3\vec{e}_1\right) + D\left(\vec{e}_2,5\vec{e}_2\right) \\[2mm]  = 2\cdot3D\left(\vec{e}_1,\vec{e}_1\right) + 2\cdot5D\left(\vec{e}_1,\vec{e}_2\right) + 3D\left(\vec{e}_2,\vec{e}_1\right) + 5D\left(\vec{e}_2,\vec{e}_2\right) \\[2mm]  = 0 + 2\cdot5D\left(\vec{e}_1,\vec{e}_2\right) + 3D\left(\vec{e}_2,\vec{e}_1\right) + 0 \\[2mm]  = 2\cdot5D\left(\vec{e}_1,\vec{e}_2\right) - 3D\left(\vec{e}_1,\vec{e}_2\right) \\[2mm]  = 2\cdot5 - 3 \\[2mm]  =7  \end{array}

Or, using the usual determinant-finding-technique, det\left|\begin{array}{cc}2&3\\1&5\end{array}\right| = 2\cdot5 - 3\cdot1 = 7.

 

0
Your rating: None
Original author: 
Jakub

Image by Denisse Garcia

Image by Denisse Garcia





Last week Bibio shared a video sampler of what his album will sound like, geeking out to this one.

Little People was sent over email to me, I love the use of moss, glass and projection, if only I could get large amounts of moss for live shows, i’d be soo content on stage.

This Sparkles and Wine teaser video has been floating around the web the last few days, showing off the importance of how you light a face in a video, makes a huge difference.

Last but not least, more great content from the guys at Yourstru.ly, sharing an inside look of the Solar Year studio time.

Permalink |
Comment On This Post (4) |
Tweet This Post | Add to
del.icio.us | Stumbleupon

Post tags:

0
Your rating: None

Getting started with charts in R

So you want to make some charts in R, but you don't know where to begin. This straightforward tutorial should teach you the basics, and give you a good idea of what you want to do next.

Install R

I know, I know, the R homepage looks horrible, but just ignore that part.Of course, you need to actually have R installed and running on you computer before you do anything. You can download R for Windows, OS X, or Linux here. For Windows, download the base and the latest version. It's a .exe file and quick installation. For OS X, download the latest .pkg, which is also a one-click install. For the Linux folk, I'll leave you to your own devices.

Loading and handling data

A function is a way to tell the computer what to do. The c() function is short for "combine" so you essentially tell the computer (in the R language) to combine the values.You have to load and be able to handle data before you can chart it. No data means no chart. Enter the vector. It's a structure in R that you use to store data, and you use it often. Use the c() function to create one, as shown in the line of code below. (Hopefully, you've opened R by now. Enter this in the window that opened up aka the console.)

# Vector
c(1,2,3,4,5)

Imagine that the values 1 through 5 are data points that you want to access later. When you enter the above, you create a vector of values, and it's just sort of gone. To save it for later, assign the vector to a variable.

# Assign to variable
fakedata <- c(1,2,3,4,5)

In this example, the variable name is "fakedata." Now if you want to access the first value in the fakedata vector, use the syntax below where 1 is the index. You get back the value in that spot in the vector. Try using other indicies and see what values come up.

# Access a value from vector
fakedata[1]

Did you try using an index greater than five? That would give you an "NA" value, which is R's way of saying that there's nothing in that place (because the vector only has a length of five).

Create another vector, morefake, of all a's. Notice the quotes around the a's to indicate that those are characters and not variables and that this new vector is of the same length as fakedata. Then use the cbind() function to combine the two to see what you get.

# Matrix, values converted to characters
morefake <- c("a", "a", "a", "a", "a")
cbind(fakedata, morefake)

When you combine the two vectors, you get a matrix with two columns and five rows. The numbers have quotes around them too now, because a matrix can only have one data type. The fakedata vector has numeric values and again, the morefake vector has all character values.

However, in many cases (as you'll see soon), you want a structure that has all your values, but with columns of different data types. The data frame in R lets you do this, and it's where most of your CSV-formatted data will go. Create a data frame from multiple vectors as follows:

fake.df <- data.frame(cbind(fakedata, morefake))
fake.df$morefake <- as.character(fake.df$morefake)
colnames(fake.df)

You use cbind() again to combine the vectors, and then pass the resulting matrix to data.frame(). Then convert the column that contains morefake values back to characters.

The dollar sign ($) syntax is important here. The data frame is assigned to the variable fake.df. The column names are automatically assigned the variable names of the vectors, so to access the morefake column, follow the data frame variable, fake.df, with a dollar sign and the column name.

Loading a CSV file

The CSV file is included in the downloadable source linked at the beginning of this tutorial. Be sure to set your working directory in R to the directory where the file is via the Misc > Change Working Directory... menu.With data frame and vectors in mind, load "2009education.csv" with read.csv(). The data is assigned to the education variable as a data frame, so you can access rows and columns using index values. However, unlike the vector, the data frame is two-dimensional (rows and columns), so use two indices separated with a comma. The first index is the row number, and the second is the column number.

education <- read.csv("2009education.csv", header=TRUE, sep=",", as.is=TRUE)
education[1,]		# First row
education[1:10,]	# First ten rows
education$state		# First columnn
education[,1]		# Also first column
education[1,1]		# First cell

The data are US state estimates for people with at least high school degrees, bachelors, or higher — one column for each education level.

It's often useful to sort rows by a certain column. For example, you can sort states by the percentage of people with at least high school diplomas, least to greatest, using the order() function. The function gives you a vector of indices, which you pass to the education data frame and assign to education.high.

# Sort least to greatest
high.order <- order(education$high)			
education.high <- education[high.order,]

Similarly, you can order from greatest to least by setting decreasing to TRUE in order().

# Sort greatest to least
high.order <- order(education$high, decreasing=TRUE)
education.high <- education[high.order,]

Okay, you got the data. On to charts.

Basic plotting

It's not that smart though. But at least it won't crash on you.R has a plot() function that is kind of smart in that it adapts to the data that you pass it. For example, plot fakedata.

plot(fakedata)

It's only a one-dimensional vector, so by default, R uses a dot plot with the values of the vector on the vertical axis and indices on the horizontal.

Dot plot and fake data

However, try to plot the education data frame, as shown below, and you get an error.

plot(education)

The plot() function gets mixed up, because it doesn't know what to do with the first column, which is state names, and the other columns which are numeric values. What if you plot just one column?

plot(education.high$high)

Scatter plot

This shouldn't surprise you, because when you passed education.high$high to plot(), you gave it a vector, just like in the fakedata example.As you might expect, you get a dot plot, where each dot represents a state. Again, indicies are on the horizontal, and high school estimates are on the vertical.

Want a scatter plot with high school on the horizontal and bachelors on the vertical? Pass plot() both columns of data.

plot(education$high, education$bs)
plot(education[,2:3])

The two lines of code above give you same plot. What if you pass three columns?

# Passing multiple columns
plot(education[,2:4])

You get a scatter plot matrix. The bachelor degree and advanced degree rates are strongly correlated.

Scatter plot matrix

Using arguments

When you pass different valus to functions, you actually set the value of arguments. Change the values, and you get different output and charts. For example, the plot() function has a type argument, which use to specify the type of chart you want. If you don't specify, R will guess and use dots by default.

Set type to "l" and you get a line chart.

# Line
plot(education.high$high, type="l")

You would never use a line chart for this particular dataset, but for the sake of simplicity, let's pretend that it's useful.Line chart

Set it to "h" and you get a high density chart, or essentially a bar chart with skinny bars.

# High-density
plot(education.high$high, type="h")

High density chart

Set it to "s" and you get a step chart.

# Step
plot(education.high$high, type="s")

Step chart

There are several other types that you can set it to. Simply enter "?plot" in the console to see documentation for the function. Most R functions offer pretty good documentation, which you can access with a question mark followed by the function name. Use this. A lot. It might be a little confusing at first, but the sooner you can read documentation, the easier learning R (or any code really) will be.

There are quite a few other arguments to tinker with. For example, all the charts you made so far rotate vertical axis labels ninety degrees. Set las to 1 to change the label positions so that they are horizontal.

# Changing argument values
plot(education.high$high, las=1)

Default dot plot

Or change labels for the axes and title with xlab, ylab, and main.

plot(education.high$high, las=1, xlab="States", ylab="Percent", main="At least high school degree or equivalent by state")

Dot plot with title and labels

You can also set the size of shapes (cex), labels (cex.axis), remove border (bty), or change the symbols used (pch).

plot(education.high$high, las=1, xlab="States", ylab="Percent", main="At least high school degree or equivalent", bty="n", cex=0.5, cex.axis=0.6, pch=19)

You get a plot that looks a little cleaner.

Dot plot with changed options

Additional charts

Although plot() can handle a good bit, there will be times when you want to use other chart types that the function doesn't offer. For example, the function doesn't provide a basic bar chart. Instead, use barplot().

# Bar plot
barplot(education$high)

And you get a basic bar plot.

Default bar chart

Like the plot() function, there are arguments to fiddle with to get what you want.

# Bar plot with changed parameters
barplot(education$high, names.arg=education$state, horiz=TRUE, las=1, cex.names=0.5, border=NA)

# Documentation for function
?barplot

Horizontal bar chart

Similarly, you can use boxplot() to see distributions.

# Single box plot
boxplot(education$high)

A box plot

Just as easily, you can make multiple boxplots in a single space.

# Multiple box plots for comparison
boxplot(education[,2:4])

Multiple box plots

Finally, you can set some universal parameters so that you don't have to specify them in every function. For example, mfrow sets the number of rows and columns if you want to show multiple charts in the same window, and mar sets the margins in between charts. Then like in previous examples, you've seen las and bty, but now you don't have to plug it in everywhere.

# Multiple charts in one window
par(mfrow=c(3,3), mar=c(2,5,2,1), las=1, bty="n")
plot(education.high$high)
plot(education$high, education$bs)
plot(education.high$high, type="l")	# Line
plot(education.high$high, type="h")	# High-density
plot(education.high$high, type="s")	# Step
barplot(education$high)
barplot(education$high, names.arg=education$state, horiz=TRUE, las=1, cex.names=0.5, border=NA)
boxplot(education$high)
boxplot(education[,2:4])

Bam. You get a grid of charts.

Grid of charts

Wrapping up

Most basic charts only require a couple of lines of code in R, and you can make customizations by changing argument values. Use function documentation, which usually includes code snippets at the end, to learn how to use a new function. Want more? The examples in this tutorial are just tip of the iceberg.

0
Your rating: None

Today, Sameer Agarwal and Keir Mierle (as well as a couple others I'm sure) at Google open sourced the Ceres Non-Linear Least Squares Solver.

Since coming to Google, this is probably the most interesting code library that I have had a chance to work with. And now, you can use it too.   So, what exactly is a "non-linear least squares solver"? and why should you care?

It turns out that a solver like Ceres is at the heart of many modern computer vision and robotics algorithms. Anywhere you have a bunch of observed sensor data about the world and you want to create an explanation of all those observations, a non-linear least squares solver can probably do that for you. For example, if you have a bunch of distance sensors and you want to figure out where you are relative to the walls. Like this:

Or if you have a camera, and you want to figure out the position of the camera and objects in view:

Or say you have a quad copter, and you want to model how it will respond to thrust on different propellers:

or (as in the case of Google Street view) combining vehicle sensors in the cars with GPS data:

or even figure out the best way to position your plant so it gets the most amount of sun (assuming you could accurately measure the amount of sun hitting the leaves):


Non-linear least squares solvers, like Ceres, are a tool for optimizing many variables simultaneously in a complex model/formula to fit some target data. Many of the advanced engineering problems today come down to this.  It's basically a fancy version of your typical line fitting problem:


This is linear least-squares. The model here is:

y = m*x + b
This is "linear" because it is the simple addition of a scaled variable m*x and a constant b.  It is "least-squares" because it minimizes the square of the distance between the line and each of the data points. In this simple case, that algorithm is simply solving for m and b in the line equation. There are methods for directly computing these values. But, if the equation was non-linear such as:

y = (m*x - cos(x))^2/b
You now need a non-linear least squares solver.  Many real world problems are non-linear such as anything that involves rotation, camera projection, multiplicative effects, or compounding/exponential behavior.  You might be able to devise a clever way to calculate the optimal values for m and b directly, or you can use an iterative algorithm and let the computer tweak the values of m and b until the squared error to your data is minimized. While this example also only has two variables, Ceres can handle optimizing thousands of variables simultaneously and uses techniques for reaching an error minimizing solution quickly.  Though, it's important to note that it can only iteratively crawl toward the lowest error solution starting from the initial values of m and b you provide... like a drop of water sliding down to the bottom of a bowl.   If the bottom of the bowl is very bumpy, it can get stuck in one of the smaller divots and never reach the lowest part of the bowl.  This is known as getting stuck in a "local minima" and never finding the "global minimum" and the shape of the bowl is called the "cost surface".  When the cost surface of a problem is not very bowl-like, it can lead to problems.

Ceres can also handle something called "sparsity" efficiently.  This occurs when you have many many variables, but only a few of them interact with each other at a time. For example, the current position of a flying quad copter depends on the previous position and previous velocity. But, the current velocity doesn't really depend that much on the previous position.  Imagine if you made a giant table with all your input variables in the column names and all of your output values in row names and then put check mark in the table where ever the input was used to compute the output.  If most of the table is empty, then you have a "sparse matrix" and Ceres can take advantage of this emptiness (which indicates independence of the variables) to dramatically increase the speed of computation.

Anywhere you that have data, and you have a model (which just a fancy term for complicated formula) that should be able to generate that data and you want to tweak the values inside your generative model to best fit your data... a tool like Ceres might do the job.

For many problems, mathematicians and engineers have spent decades devising clever and complex formulas to solve them directly. But in many fields, having a computer perform non-linear optimization on data is becoming the preferred method because it makes it much easier to tackle very complicated problems with many variables, and often the result can be more stable to noisy input.

The neat thing about using a non-linear solver in a real-time system, is that the computer can respond to feedback in much the same way you do.  If an object is too far to the left of a target position, it knows to move it right.  If the wind starts blowing, and it drift backwards it will automatically respond by pushing forward.  As long as you have an equation to explain how the output will be affected by the controls.  It can figure out the best way to fiddle with the controls to minimize the distance from a target value.

If I find the time, I might try to post some tutorials on using Ceres. Because I believe this is one of the most powerful tools in modern engineering, and no one ever taught it to me in undergrad or high school.  It's like the difference between doing long division by hand and then being handed a calculator.

0
Your rating: None