Here you go! I'll try to condense 4 years of college computer architecture classes down to 1000 words. I just hope that some of it makes sense when I'm done.
The first thing I should probably do it explain what a "front side bus" is in the first place.
A "front side bus" is the link between the CPU and the rest of the system, specifically what is typically known as the "Northbridge".
Most chipsets (which are on the motherboard itself) consist of two parts, the Northbridge and Southbridge. The Northbridge historically has contained the memory controller (Sdram, DDR Sdram, Rambus, etc) as well as more recently the controller for the AGP slot itself. The Southbridge typically controls just about everything else in the system. (PS2 ports, USB ports, LPT port, onboard sound, IDE controller, floppy controller, onboard network, etc.) The Northbridge and Southbridge are typically linked by the PCI bus, on which most of the expansion cards in the PC also connect. (There are exceptions to this, as some single chip solutions now exist (Sis chipsets), and sometimes a different separate bus links the NB and SB, as is the case with the nForce chipsets which use a Hypertransport link and VIA chipsets which use "Vlink".) Why does this matter? Basically the front side bus is the critical link between the CPU and the entire system. This means that the faster the FSB is, the faster the CPU can communicate with everything else in the system. If the CPU wants to get data or instructions from system RAM, that data travels over the FSB. If the CPU needs data from the hard drive, that data travels to the southbridge, over the PCI bus to the Northbridge, and then over the FSB to the CPU. As you would expect, the faster this link is, the faster the system will be. If this is true, why would I say that a faster FSB results in diminishing returns in system speed beyond a certain point? I'll get to that.
(BTW for all you tech historians: There used to be a "back side bus" which linked the CPU to it's Level 2 (L2) cache. The term is now obsolete, because just about every modern CPU since the Coppermine Pentium 3 core has had it's L2 cache as part of the CPU itself, meaning the BSB is part of the CPU itself as well. If you want to get really technical, the term front side bus is no longer really valid in its original context, because it is now the only bus.)
Perhaps the first thing I should cover when trying to explain why a faster FSB doesn't always result in a corresponding increase in system performance is to consider the case when the CPU needs data from the harddrive. (Which happens quite a bit when loading programs and when the data the CPU needs does not fit into system memory.) I'm sure all of you know that the harddrive is many orders of magnitude slower in transfering data than system memory is. The amount of delay imposed by the data traveling over the FSB is nearly negligable when compared to the amount of time it takes for the hard drive to retreve and store information. This makes the FSB speed itself very much a non-factor.
The next case is when the CPU needs data from main memory. There are two key concepts to understand here: "Latency" and "Bandwidth".
Latency is essentially the amount of time the CPU must wait between issuing a request for information and when the information actually is available to the processor. This time is generally measured in nanoseconds, but it's far more useful to look at it in terms of clockcycles the CPU executes. This is because the CPU is essentially wasting time during the clockcycles where it is waiting for data and/or instructions from memory. I'll come back to this later, because it is probably the most important thing to understand.
Bandwidth is the amount of data that can be transfered in a given unit of time.
Let's look at this from a more intuitive example. Consider a highway where vehicles travel from one point to another. In this example, bandwidth is essentally the number of lanes of the highway. Latency is essentially its length. Lets say you have a contest to get the most vehicles from one end of the highway to the other. Unfortunately, only a certain number of vehicles can enter the highway per second. This start of the highway is roughly analogous to main memory in a computer. The end of the highway is the CPU itself, and compared to main memory, is far faster. As you can well imagine, if you make the highway shorter (lower the latency) you can get more vehicles to the end (data to the CPU) in the same amount of time as that of a longer highway. Given you can get enough vehicles onto the highway, having more lanes will also get more vehicles to the end of the highway. Consider this though, what happens when you have 800 lanes on your freeway, but only 400 cars can enter it at any given time? Basically, 400 lanes are wasted. (Ok, enough car talk.
I'm getting bored with it... )
Real memory in a computer cannot transmit data continuously. It takes a certain amount of time from when the CPU (or more correctly the memory controller in the Northbridge acting on behalf of the CPU) requests data, until when the memory can begin sending that information. This amount of time is the memory latency. To read data from DDR SDRAM memory, which is arranged as a giant grid of both rows and columns, it takes a certain amount of clock cycles to charge the individual cell the data is in (precharge), a certain amount of time to activate the row the data is in (RAS - row address strobe), and a certain amount of time to access the column the data is in (CAS - column address strobe, a term most people who buy memory have heard.), the final factor is the command rate (time between issuing a command to memory to when the command is executed, usually only a cycle or two). All of this is what is collectively known at memory latency. (You see this printed on memory and on review sites as a string of 4 or 5 numbers.) The lower the latency, the less time it takes for the memory to begin transfering data to the northbridge. DDR memory currently runs at 100 Mhz - PC1600, 133 Mhz - PC2100, 166 MHz - PC2700, and 200 MHz - PC3200 as standard rates. The latency is measured in the number of memory clock cycles. (You probably think I'm wrong here, and that PC3200 memory runs at 400 MHz. That's not actually true, and I'm getting to that.) DDR memory (double data rate) has the capability of transfering data on both the rising edge (low to high) of the clock pulse and on the falling edge (high to low) of the clock pulse. If it could do this all the time, it would have the same bandwidth as regular SDRAM, which transfers data on only the rising (low to high) clock edge. This is why PC3200 is also known as DDR400, because it is capable of transfering, at a maximum, at the same bandwidth as SDRAM running at 400 MHz. This also explains why you sometimes see DDR memory with a CAS latency of 2.5 cycles, this means the data can access that column after 5 clock edges (rising or falling). DDR memory can transfer data on both the rising and falling edges when it is performing a burst transfer of more than one location in memory. Most of the time it does, and there is a very good reason for this. Typically when a CPU wants data from memory, the next access from memory will be from a location very close to that of the first access. For this reason, SDRAM (and the older fast page memory) will transfer the entire contents of the memory row. This boosts performance, because if the CPU does end up needing data in the next cell, that data has already been transfered. If it ends up the access is not from the same row, nothing is really lost, as the CPU just discards the data it doesn't need. Note that I've hugely simplified this. This is what is known as "spacial locatity" in computer architecture classes, which basically says that a CPU will most request data from memory in a location near the last access most of the time. Basically SDRAM and DDR SDRAM assume this and just transfer all the data near to what the CPU requests. Wow, that's a lot of information to try to condense and "dumb" down, but hopefully those of you who stuck with it now better understand what memory latency is.
Now, lets briefly touch on bandwidth. Individual DDR memory modules in modern computers are 64 bits wide, meaning they transfer data in 64 bit chunks on both the rising and falling edges of their data clock. This is the amount of data transfered over a single channel. If we are talking about a DDR 400 module, this bandwidth will be 3.2 Gigabytes per second. If we have two independant channels (dual channel) transfering data at the same time this will be 6.4 Gigabytes per second.