Quantcast
Channel: SQL Server with Mr. Denny » Microsoft CSS
Viewing all articles
Browse latest Browse all 6

TempDB latch timeouts with lots of RAM

0
0

Recently I had an interesting problem where the SQL Server 2008 R2 instance would randomly in the middle of the morning start having latch timeouts on various tempdb database pages. The first assumption was that these pages were GAM pages and that more tempdb database files would solve this problem. However looking at these pages, these weren’t GAM pages but instead were normal data pages.

2013-07-27 08:29:31.50 spid745 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x00000000056D5708 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.

2013-07-27 08:29:31.54 spid1107 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x0000000026EA8988 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.

2013-07-27 08:29:31.56 spid672 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x00000000272154C8 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.

2013-07-27 08:29:31.65 spid1919 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x000000265E681048 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.

2013-07-27 08:29:32.05 spid819 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0×0000000027509048 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.

Looking at the server’s memory usage, in this case via Spotlight for SQL Server, we could see that the SQL Server was allocating huge amounts of memory to the SQL Server process, but it wasn’t actually using this memory for anything it was just allocating it.

To make things more interesting, this problem first started happening after we upgraded the RAM in the server from 256 Gigs of RAM to 1 TB of RAM. While trying to figure out what was happening we could simply reduce the maximum amount of RAM that SQL Server could access to below 256 Gigs of RAM and the problem would just go away.

To make things worse management wouldn’t allow the server to remain broken long enough for any sort of proper diagnosis to be done. So basically we could try a change, and if the problem came back all we could do was set the memory back down to 256 Gigs and wait for the next window to try the next fix.

After a bit of trial and error of different traceflags and settings we found the right set of settings. We turned on traceflag 834 which turns on large page allocations. This traceflag requires that the lock pages in memory setting is enabled, so that was turned on as well. We also turned on AWE within the SQL Server based on this blog post from Microsoft.

After making these changes and setting the max server memory on the server back to 900+ Gigs of RAM and everything began working as expected without the above page latch timeout errors.


Viewing all articles
Browse latest Browse all 6

Latest Images

Trending Articles





Latest Images