Using ROW_NUMBER to get the most recent record from a table

Using the ROW_NUMBER() SQL Server function to assign a row number based on a set of criteria

Quite a while ago I wrote about using SQL Server CLR to add "earliest" and "latest" aggregate functions that could be used to retrieve the, well, earliest and latest Id values from a table based on a specific DATETIME column. These were originally put together, even though I didn't blog about them until 2017, much earlier and used against SQL Server 2015 which I don't think supported the SQL Server ROW_NUMBER() function.

Using this function instead is definitely the way to go now as it eliminates the need to use (and support / bug fix) a CLR assembly, it's also likely that it's more performant - though I've not measured this - as it will allow SQL Servers query optimiser to be fully aware of everything that's going on and make fully qualified decisions about the entirety of the query rather than the CLR functions being black boxes which it may not be able to reason about fully.

Getting the most recent Exchange Rate

Since November I've been working for a company that buys and sells currency and one of the things that tends to accumulate is a lot of data about the rates at which different currency pairs (e.g. the rate at which selling 1 GBP would get you EUR). This data isn't always complete either, so for a given pair we might not have data stored at the same point in time as for others, i.e. at 08:30 this morning we might have a rate stored for GBP/EUR but the most recent rate for GBP/AUD (Australian Dollars) might be from 08:28. 

Using the ROW_NUMBER() function is a good way to retrieve the most recent rate for a pair of currencies, irrespective of when it was received and stored. Here's a bit of SQL that does this:

SELECT	[PartitionedRates].BidRate, 
	[PartitionedRates].RateDate, 
	[PartitionedRates].CurrencyCode, 
	[PartitionedRates].BaseCurrencyCode
FROM
	(
		SELECT	BidRate, 
			RateDate, 
			CurrencyCode, 
			BaseCurrencyCode,
			ROW_NUMBER() OVER (PARTITION BY CurrencyCode, BaseCurrencyCode ORDER BY RateDate DESC) [RowNumber]
		FROM	[dbo].[RateData]
	) [PartitionedRates]
WHERE	[PartitionedRates].[RowNumber] = 1

This query is operating against a pretty simple table structure ([dbo].[RateData]) which looks a little like this, though the real table is a little bit different I've simplified it for this example:

CREATE TABLE [dbo].[RateData]
(
	RateId INT NOT NULL IDENTITY(1, 1),
	BidRate DECIMAL(10, 10),
	AskRate DECIMAL(10, 10),
	CurrencyCode CHAR(3),
	BaseCurrencyCode CHAR(3)
	RateDate DATETIME,
)

The key bit of the query is the sub-query PartitionedRates which retrieves four columns from the table itself and then uses the ROW_NUMBER() function to return a value into the column named, rather boringly, RowNumber. To break the defintion of this column apart, lets start with the definition:

ROW_NUMBER() OVER (PARTITION BY CurrencyCode, BaseCurrencyCode ORDER BY RateDate DESC) [RowNumber]

The PARTITION BY clause breaks the data retrieved by the FROM into partitions, which you can pretty much think of as being equivalent to GROUP BY - so it takes the data in the table and puts it into groups/partitions/buckets based on their unique combination of CurrencyCode and BaseCurrencyCode.

The ORDER BY clause puts the data in each of the partitions into the specified order, in this instance based on RateDate which is the date and time that the exchange rate for the two currencies was received with the most recent first.

If I pick on rates for the Fuijian Dollar (FJD) to the Euro (EUR) and only run the sub-query we can see what the data that's returned looks like once it's been pumped through the ROW_NUMBER() function:

A list of exchange rates for the Fijian Dollar to the Euro, numbered by how recent they are

As you can see from the data, we don't have a particularly recent rate for this currency pair but what we do have has now been (a) ordered, and (b) assigned a RowNumber based on how recent it is (I retrieved this data by adding WHERE CurrencyCode = 'FJD' to the sub-query and running it in isolation). The assigned row number increases based on the ORDER BY with the most recent row given the value 1, the next most recent row given the value 2 and so on, all the way through to 21 for each of the 21 records that we have for the FJD/EUR pair.

Another snippet from the data returned by the sub-query shows what happens when the last record for FJD/EUR is returned and we move on to FJD/GBP:

The row numbers restart when we move from the FJD/EUR partition to the FJD/GBP partition

Here you can see the RowNumber column starting again from 1 because we've moved on to the next partition, i.e. we're now numbering the rows in the FJD/GBP partition instead of the FJD/EUR partition.

The last thing to see is the result of running the overall query, where the only mildly interesting thing that happens is we apply a WHERE clause of [RowNumber] = 1 to restrict ourselves to the most recent exchange rate held for each of the currency pairs:

The final rates, showing what a GBP was worth against various different currencies

So, there you go - using the ROW_NUMBER function to retrieve the most recent row based on a set of criteria.

About Rob

I've been interested in computing since the day my Dad purchased his first business PC (an Amstrad PC 1640 for anyone interested) which introduced me to MS-DOS batch programming and BASIC.

My skillset has matured somewhat since then, which you'll probably see from the posts here. You can read a bit more about me on the about page of the site, or check out some of the other posts on my areas of interest.

No Comments

Add a Comment