Interface Summary
•
File objects start out unadorned
•
CcInitializeCacheMap to initiate caching via Cc . a file object
�C
setup the Shared/Private Cache Map & Mm if neccesary
•
Access methods (Copy, Mdl, Mapping/Pinning)
•
Maintenance Functions
•
CcUninitializeCacheMap to terminate caching . a file object
�C
teardown S/P Cache Maps
�C
Mm lives .. Its data section is the cache!
The Cache Manager Doesn’t Stand Alone
•
Cc is an extension of either Mm or the FS depending how you look at it
•
Cc is intimately tied into the filesystem model
•
Understanding Cc means we have to take a slight detour to mention some concepts filesystem folks think are interesting. Raise your hand if you’re a filesystem person :-)
The Slight Filesystem Digression
•
Three basic types of IO . NT: cached, noncached and “paging”
•
Paging IO is simply IO generated by Mm �C flushing or faulting
�C
the data section implies the file is big enough
�C
can never extend a file
•
A filesystem will re-enter itself . the same callstack as Mm dispatches cache pagefaults
•
This makes things exciting! (ERESOURCEs)
The Three File Sizes
•
FileSize �C how big the file looks to the user
�C
1 byte, 102 bytes, 1040592 bytes
•
AllocationSize �C how much backing store is allocated . the volume
�C
multiple of cluster size, which is 2n * sector size
�C
... a more practical definition shortly
•
ValidDataLength �C how much of the file has been written by the user in cache, zeros seen beyond (some OS use sparse allocation)
•
ValidDataLength <= FileSize <= AllocationSize
•
Why not use Fast IO all the time?
�C
file locks
�C
oplocks
�C
extending files (and so forth)
Pagefault Cluster Hints
•
Taking a pagefault can result in Mm opportunistically bringing surrounding pages in (up 7/15 depending)
•
Since Cc takes pagefaults . streams, but knows a lot about which pages are useful, Mm provides a hinting mechanism in the TLS
�C
MmSetPageFaultReadAhead()
•
Not exposed to usermode …
Readahead
•
CcScheduleReadAhead detects patterns . a handle and schedules readahead into the next suspected ranges
�C
Regular motion, backwards and forwards, with gaps
�C
Private Cache Map contains the per-handle info
�C
Called by CcCopyRead and CcMdlRead
•
Readahead granularity (64KB) controls the scheduling trigger points and length
�C
Small IOs �C don’t want readahead every 4KB
�C
Large IOs �C ya get what ya need (up to 8MB, thanks to Jim Gray)
•
CcPerformReadAhead maps and touch-faults pages in a Cc worker thread, will use the new Mm prefetch APIs in a future release
Unmap Behind
•
Recall how views are managed (misses)
•
On view miss, Cc will unmap two views behind the current (missed) view before mapping
•
Unmapped valid pages go to the standby list in LRU order and can be soft-faulted. In practice, this is where much of the actual cache is as of Windows 2000.
•
Unmap behind logic is default due to large file read/write operations causing huge swings in working set. Mm’s working set trim falls down at the speed a disk can produce pages, Cc must help.
Write Throttling
•
Avoids out of memory problems by delaying writes to the cache
�C
Filling memory faster than writeback speed is not useful, we may as well run into it sooner
•
Throttle limit is twofold
�C
CcDirtyPageThreshold �C dynamic, but ~1500 . all current machines (small, but see above)
�C
MmAvailablePages & pagefile page backlog
•
CcCanIWrite sees if write is ok, optionally blocking, also serving as the restart test
•
CcDeferWrite sets up for callback when write should be allowed (async case)
•
!defwrites debugger extension triages and shows the state of the throttle
Writing Cached Data
•
There are three basic sets of threads involved, .ly .e of which is Cc’s
�C
Mm’s modified page writer
•
the paging file
�C
Mm’s mapped page writer
•
almost anything else
�C
Cc’s lazy writer pool
•
executing in the kernel critical work queue
•
writes data produced through Cc interfaces
The Lazy Writer
•
Name is misleading, its really delayed
•
All files with dirty data have been queued .to CcDirtySharedCacheMapList
•
Work queueing �C CcLazyWriteScan()
�C
Once per second, queues work to arrive at writing 1/8th of dirty data given current dirty and production rates
�C
Fairness considerations are interesting
•
CcLazyWriterCursor rotated around the list, pointing at the next file to operate . (fairness)
�C
16th pass rule for user and metadata streams
•
Work issuing �C CcWriteBehind()
�C
Uses a special mode of CcFlushCache() which flushes front to back (HotSpots �C fairness again)
Letting the Filesystem Into The Cache
•
Two distinct access interfaces
�C
Map �C given File+FileOffset, return a cache address
�C
Pin �C same, but acquires synchronization �C this is a range lock . the stream
•
Lazy writer acquires synchronization, allowing it to serialize metadata production with metadata writing
•
Pinning also allows setting of a log sequence number (LSN) . the update, for transactional FS
�C
FS receives an LSN callback from the lazy writer prior to range flush
Remember FsContext2?
•
Synchronization . Pin interfaces requires that Cc be the writer of the data
•
Mm provides a method to turn off the mapped page writer for a stream, MmDisableModifiedWriteOfSection()
�C
confusing name, I know (modified writer is not involved)
•
Serves as the trigger for Cc to perform synchronization . write
BCBs and Lies Thereof
•
Mapping and Pinning interfaces return opaque Buffer Control Block (BCB) pointers
•
Unpin receives BCBs to indicate regions
•
BCBs for Map interfaces are usually VACB pointers
•
BCBs for Pin interfaces are pointers to a real BCB structure in Cc, which references a VACB for the cache address
Cache Manager Summary
Virtual block cache for files not logical block cache for disks
Memory manager is the ACTUAL cache manager
Cache Manager context integrated into FileObjects
Cache Manager manages views . files in kernel virtual address space
I/O has special fast path for cached accesses
The Lazy Writer periodically flushes dirty data to disk
Filesystems need two interfaces to CC: map and pin