How To Retrieve the Last Non-NULL Value in SQL Server

By:   |   Updated: 2022-09-14   |   Comments (2)   |   Related: More > TSQL


Problem

Sometimes there are gaps in the result set of my SQL query. For example, a specific state was not known for a period, and thus the query returns NULL for that period. In that case, I would like to return the previous known value. Is this possible in SQL?

Solution

Suppose we have the following sample data:

We have a couple of months worth of timesheet data for two employees. We retrieved the contract type for each employee. However, employee A has left the company in March, but in the contract type lookup table, the contact type was only listed until the end of February. This means NULL is returned for March, which ideally should have been the value "Permanent". Employee B on the other hand, has switched from a temporary contract to a permanent position, but this change only went in effect in May. For March and April, NULL is returned instead of "Temporary".

The goal of the solution is to return the following result set:

The "last known contract type" column returns the current contract type. If it is NULL, the last known contract type is fetched. It keeps going back in time until a non-NULL value is found. This solution can be created in SQL using window functions. In this tip, we propose two solutions. The first solution will work on older versions of SQL Server (SQL Server 2012 – 2019), while the second one will work on SQL Server 2022 or Azure SQL Database.

At the time of writing, SQL Server 2022 is still in preview (currently CTP 2.1 has been released). This means functionality or features of SQL Server might change, disappear, or be added in the final release.

Older Versions of SQL Server

This solution has been proposed by Itzik Ben-Gan in this article: The Last non NULL Puzzle. The solution uses multiple steps and the LAG function to get the desired result.

Let’s generate the sample data and store it into a temporary table:

 DROP TABLE IF EXISTS #SampleData;
 
 WITH CTE_SampleData AS
 (
 SELECT DateKey = 20210101, EmployeeCode ='A', ContractType ='Permanent', DaysWorked = 20
 UNION ALL
 SELECT DateKey = 20210201, EmployeeCode ='A', ContractType ='Permanent', DaysWorked = 18
 UNION ALL
 SELECT DateKey = 20210301, EmployeeCode ='A', ContractType = NULL, DaysWorked = 1
 UNION ALL
 SELECT DateKey = 20210101, EmployeeCode ='B', ContractType ='Temporary', DaysWorked = 20
 UNION ALL
 SELECT DateKey = 20210201, EmployeeCode ='B', ContractType ='Temporary', DaysWorked = 18
 UNION ALL
 SELECT DateKey = 20210301, EmployeeCode ='B', ContractType = NULL, DaysWorked = 0
 UNION ALL
 SELECT DateKey = 20210401, EmployeeCode ='B', ContractType = NULL, DaysWorked = 0
 UNION ALL
 SELECT DateKey = 20210501, EmployeeCode ='B', ContractType ='Permanent', DaysWorked = 19
 )
 SELECT *
 INTO #SampleData
 FROM CTE_SampleData;
 

The first step is to group the rows with NULL values together with the last non-NULL value. This can be done by using a MAX window function:

 WITH cte_grp AS
 (
 SELECT
 *
 ,grp = MAX(IIF(ContractType IS NOT NULL, DateKey,NULL)) OVER (PARTITION BY EmployeeCode ORDER BY DateKey ROWS UNBOUNDED PRECEDING)
 FROM #SampleData
 )
 SELECT * FROM cte_grp
 

The following result is returned:

To make this work, you need a column with unique and incrementing values per employee. This can be an identity column, but in this example the dates work as well. As you can see, for employee A, 20210201 is returned as the month with the last non-NULL value in row 3, while for employee B, 20210201 is returned as well for both rows 6 and 7.

In the next step, we can use the MAX window function again to retrieve the effective last known non-NULL value within each group, giving us the desired end result.

 WITH cte_grp AS
 (
 SELECT
 *
 ,grp = MAX(IIF(ContractType IS NOT NULL, DateKey,NULL)) OVER (PARTITION BY EmployeeCode ORDER BY DateKey ROWS UNBOUNDED PRECEDING)
 FROM #SampleData
 )
 SELECT
 *
 ,LastKnownContractType = MAX(ContractType) OVER (PARTITION BY EmployeeCode, grp ORDER BY DateKey ROWS UNBOUNDED PRECEDING)
 FROM cte_grp
 ORDER BY EmployeeCode, DateKey
 

SQL Server 2022 (and Newer) and Azure SQL Database

In the ANSI SQL standard, there’s an optional IGNORE NULLS option, which tells the database engine to ignore null values while executing the function. This would be ideal for our use case here. However, in older versions of SQL Server, this feature was not implemented. The opposite of IGNORE NULLS is RESPECT NULLS, which you could say has been the default since window functions were introduced.

In the preview of SQL Server 2022, and a bit earlier in Azure SQL DB, this option has been implemented. Unfortunately only in the FIRST_VALUE and LAST_VALUE functions. Hopefully, it gets implemented in other functions as well, such as LAG and LEAD.

Fortunately, we can solve our use case with the LAST_VALUE function. The query is shorter than the previous one:

 SELECT
 *
 ,LastKnownContractType = LAST_VALUE(ContractType) IGNORE NULLS OVER (PARTITION BY EmployeeCode ORDER BY DateKey)
 FROM #SampleData
 ORDER BY EmployeeCode, DateKey
 

The result:

As you can see, the SQL statement is shorter and perhaps easier to understand.

Next Steps
  • Stay tuned for more SQL Server 2022 tips on the MSSQLTips website!
  • If you want to try it out yourself, you can create a pre-configured virtual machine in Azure. You can create a free trial here. More info about the SQL Server 2022 release can be found here. Or you can try it out in Azure SQL Database as well.
  • You can find more SQL Server 2022 tips in this overview.
  • If you want to learn more about window functions in T-SQL, check out this tutorial.



Get Started Now - Click here to get your free 14 day trial of SolarWinds Database Insights








About the author
Koen Verbeeck is a BI professional, specializing in the Microsoft BI stack with a particular love for SSIS.

View all my tips


Article Last Updated: 2022-09-14

Comments For This Article




Friday, September 16, 2022 - 3:42:07 AM - Koen VerbeeckBack To Top(90490)
Hi,

as a general rule, I try to avoid correlated subqueries. They tend to be less scalable, as in a worst case scenario you end up with one subquery for each row. I tested the queries from this tip on a larger sample data set, and typically the logical reads for a solution using window functions were much lower than the query using a correlated subquery.

Before SQL Server 2012, the correlated subquery was your only option though, but I hope that most of us are working on a more recent version of SQL Server :)

Regards,
Koen

Thursday, September 15, 2022 - 2:32:31 AM - KBack To Top(90482)
for older version is posssible:

SELECT *, LastKnownContractType = (SELECT TOP 1 ContractType FROM #SampleData WHERE EmployeeCode = t1.EmployeeCode AND ContractType IS NOT NULL AND DateKey <= t1.DateKey ORDER BY DateKey DESC)
FROM #SampleData t1