VxFS tuning comments and questions from reading the admin guide

Here are some notes on tuning VxFS I/O from page 44-53 of Veritas File Syste Administrator’s Guide Solaris 5.0 (SPARC). Other docs are here. I also added my comments and questions for the application that I am interested with random I/O characteristic. Please let me know if you have any comments or corrections. Thanks.

  • /etc/vx/tunefstab file set I/O parameters for VxFS file system to override default parameters. See man page for more information.
  • vxtunefx command can take command-line parameters or input from /etc/vx/tunefstab file. It can print the current values of the I/O parameters. To print, run # vxtunefs -p mount_point. See man page for more information.
  • Tunable I/O Parameters

    • read_pref_io: The preferred read request size. The default value is 64K. For random I/O characteristic, this size should be the same as the data block size. For example, if the data block size is 8K, then this value should be 8K.
    • write_pref_io: The preferred write request size. The default value is 64K. Since most of read/write activities works with same data block size, my comment about the preferred read request size also applies to write size.
    • read_nstream: “The number of parallel read requests of size read_pref_io to have outstanding at one time.” The default value is 1. For RAID-5, the document recommended this value set to “the number of columns in the strip”. However, this may conflict with the recommended formula presented in the same document on the same page: “read requests = read_nstream x by read_perf_io” So what should be used?
    • write_nstream: “The number of parallel write requests of size write_pref_io to have outstanding at one time.” The default value is 1. See comments for read_nstream.
    • default_indir_ size: This parameter defines default indirect extent size (default 8K). The default value could be too small for large files and if set too large, write may fail if the file system can not allocate a block that will fit specified indirect extent size. “This parameter should generally be set to some multiple of the read_pref_io parameter.” Since the application that I am thinking of has large files, then this value probably should set to a value that is multiple of 8K. So the question is how many multiple of 8K?
    • discovered_direct_iosz: Any file I/O request larger than the value defined by this parameter will result in an unbuffered I/O without ” a synchronous commit of the inode when the file is extended or blocks are allocated.” This is similar to direct I/O. The default value is 256K. For random I/O access pattern, unbuffered or direct I/O is preferred. So, to set 100% of the I/O activities to unbuffered I/O, I would expect setting this parameter less than 8K will do the trick.
    • fcl_keeptime: “Specifies the minimum amount of time, in seconds, that the VxFS File Change Log (FCL) keeps records in the log.” If no application using VxFS has dependency on FCL, I expect this feature can be turned off to reduce load on the system. If FCL is part of the journaling feature in VxFS, then FCL should not be turned off. If FCL should be turned on, then I guess that it should be at a similar value as the I/O spike intervals or at a value between single and two intervals for some applications. The reasoning behind this guess is that the Log should be relatively empty before a regular I/O spike. The interesting part is how to time this interval and the system I/O spike in a way that they don’t overlap. Perhaps a better way to control this is using the next parameter.
    • fcl_maxalloc: “Specifies the maximum amount of space that can be allocated to the VxFS File Change Log (FCL).” If no application using VxFS has dependency on FCL, I expect this feature can be turned off to reduce load on the system. If FCL is part of the journaling feature in VxFS, then FCL should not be turned off. If FCL should be turned on, then I expect the tuner to measure how much log space is needed for an application that has regular interval I/O spikes and set this parameter to be a value that will sustain a spike and will cause the file system to reclaim the space soon after the spike and before the next spike.
    • fcl_winterval: “Specifies the time, in seconds, that must elapse before the VxFS File Change Log (FCL) records a data overwrite, data extending write, or data truncate for a file.” If no application using VxFS has dependency on FCL, I expect this feature can be turned off to reduce load on the system. If FCL should be turned on, then this parameter should set to a value that is “the shortest interval between reads of the FCL by any application.” See also vxfs_fcl_sync man page.
    • hsm_write_ prealloc: This parameter is specifically designed for hierarchical storage management applications. If turned on (i.e. set to 1), “a sufficient number of disk blocks are allocated on the first write to the empty file so that no disk block allocation is required for subsequent writes.” I assume that the random I/O generating application that I have discuss so far does not organize data in hierarchical fashion and do not need this parameter to turn on.
    • initial_extent_size: This parameter determines the size of first extend to be allocated. I would expect a default value should be fine, except for applications that always write a large number of blocks at the start of a file. The benefit of setting a larger value will allow the file system to not allocate a large number of small extent at the start of a file. I guess setting a large value will help the application that I am interested in. Why? I know the application will cause the file system to grow files constantly.
    • inode_aging_count: This parameter defines how big the aging list should be. The aging list “is used in conjunction with file system Storage Checkpoints to allow quick restoration of large, recently deleted files.” The default is 2048. The document recommend to age a small number of large files. Since the application that I am interested in doesn’t delete files, the default or a smaller value should be sufficient.
    • inode_aging_size: “Specifies the minimum size to qualify a deleted inode for inode aging.” See my comment on inode_aging_count
    • max_direct_iosz: “The maximum size of a direct I/O request that are issued by the file system.” The definition specified that the I/O requests will break down to the value specified by this parameter before sending blocks down to disks. So logically, I assume this value should be the same as the size of each data block, which is 8K for the application I am interested in.
    • max_diskq: “Limits the maximum disk queue generated by a single file.” The default is 1 MB. Since th application that I am interested in generates a large amount of write data requests, I would assume this value should be as big as possible. What is the reasonable limit? Should it be less than the amount of cache on the an actual raw disk?
    • max_seqio_extent_size: “Increases or decreases the maximum size of an extent.” Setting this parameter to a large value should help applications that frequently increase file size. A potential consequence of larger extent size is potential waste of free space.
    • qio_cache_enable: “Enables or disables caching on Quick I/O files.” The default is disabled. This parameter is useful for systems with a lot of spare memory. This could be worth a try on such higher-end systems.
    • read_ahead: This parameter controls if VxFS reads ahead or not. This is good for sequential access pattern. For random access, this feature should be less useful. VxFS is smart enough to detect sequential access done by multiple threads (See vx_era_nthreads). VxFS even has an enhanced read ahead feature. I wonder what is special about this feature?
    • write_throttle:This parameter limits “the number of dirty pages per file that a file system generates before flushing the pages to disk.” The default is zero, which means unlimited. The document stated that this parameter is good for systems with large memory and slow disks, which will break the fsync operations down into smaller chunks for the disk to process and reduce the response time. For a system that has large memory and fast storage array, this parameter do not need to be modified.
Advertisements

2 thoughts on “VxFS tuning comments and questions from reading the admin guide

  1. The enhanced read ahead detects patterns in your I/O, such as:
    strided reads – read 100k, skip 1MB, read 100k, skip 1MB, …
    plus backward reads and backward strided reads (maybe a few others that slip my mind). If your application is truly random, this won’t help you, but it might be interesting to try.

  2. Also, in terms of the FCL settings, the FCL is not a part of the file system intent log/journal. Also, it is not (currently) turned on by default.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s