diff options
Diffstat (limited to 'Documentation/admin-guide')
51 files changed, 875 insertions, 498 deletions
diff --git a/Documentation/admin-guide/bcache.rst b/Documentation/admin-guide/bcache.rst index 8d3a2d045c0a..bb5032a99234 100644 --- a/Documentation/admin-guide/bcache.rst +++ b/Documentation/admin-guide/bcache.rst @@ -204,7 +204,7 @@ For example:: This should present your unmodified backing device data in /dev/loop0 If your cache is in writethrough mode, then you can safely discard the -cache device without loosing data. +cache device without losing data. E) Wiping a cache device diff --git a/Documentation/admin-guide/blockdev/paride.rst b/Documentation/admin-guide/blockdev/paride.rst index e1ce90af602a..e85ad37cc0e5 100644 --- a/Documentation/admin-guide/blockdev/paride.rst +++ b/Documentation/admin-guide/blockdev/paride.rst @@ -3,6 +3,7 @@ Linux and parallel port IDE devices =================================== PARIDE v1.03 (c) 1997-8 Grant Guenther <grant@torque.net> +PATA_PARPORT (c) 2023 Ondrej Zary 1. Introduction =============== @@ -51,27 +52,15 @@ parallel port IDE subsystem, including: as well as most of the clone and no-name products on the market. -To support such a wide range of devices, PARIDE, the parallel port IDE -subsystem, is actually structured in three parts. There is a base -paride module which provides a registry and some common methods for -accessing the parallel ports. The second component is a set of -high-level drivers for each of the different types of supported devices: +To support such a wide range of devices, pata_parport is actually structured +in two parts. There is a base pata_parport module which provides an interface +to kernel libata subsystem, registry and some common methods for accessing +the parallel ports. - === ============= - pd IDE disk - pcd ATAPI CD-ROM - pf ATAPI disk - pt ATAPI tape - pg ATAPI generic - === ============= - -(Currently, the pg driver is only used with CD-R drives). - -The high-level drivers function according to the relevant standards. -The third component of PARIDE is a set of low-level protocol drivers -for each of the parallel port IDE adapter chips. Thanks to the interest -and encouragement of Linux users from many parts of the world, -support is available for almost all known adapter protocols: +The second component is a set of low-level protocol drivers for each of the +parallel port IDE adapter chips. Thanks to the interest and encouragement of +Linux users from many parts of the world, support is available for almost all +known adapter protocols: ==== ====================================== ==== aten ATEN EH-100 (HK) @@ -91,251 +80,87 @@ support is available for almost all known adapter protocols: ==== ====================================== ==== -2. Using the PARIDE subsystem -============================= +2. Using pata_parport subsystem +=============================== While configuring the Linux kernel, you may choose either to build -the PARIDE drivers into your kernel, or to build them as modules. +the pata_parport drivers into your kernel, or to build them as modules. In either case, you will need to select "Parallel port IDE device support" -as well as at least one of the high-level drivers and at least one -of the parallel port communication protocols. If you do not know -what kind of parallel port adapter is used in your drive, you could -begin by checking the file names and any text files on your DOS +and at least one of the parallel port communication protocols. +If you do not know what kind of parallel port adapter is used in your drive, +you could begin by checking the file names and any text files on your DOS installation floppy. Alternatively, you can look at the markings on the adapter chip itself. That's usually sufficient to identify the correct device. -You can actually select all the protocol modules, and allow the PARIDE +You can actually select all the protocol modules, and allow the pata_parport subsystem to try them all for you. For the "brand-name" products listed above, here are the protocol and high-level drivers that you would use: - ================ ============ ====== ======== - Manufacturer Model Driver Protocol - ================ ============ ====== ======== - MicroSolutions CD-ROM pcd bpck - MicroSolutions PD drive pf bpck - MicroSolutions hard-drive pd bpck - MicroSolutions 8000t tape pt bpck - SyQuest EZ, SparQ pd epat - Imation Superdisk pf epat - Maxell Superdisk pf friq - Avatar Shark pd epat - FreeCom CD-ROM pcd frpw - Hewlett-Packard 5GB Tape pt epat - Hewlett-Packard 7200e (CD) pcd epat - Hewlett-Packard 7200e (CD-R) pg epat - ================ ============ ====== ======== - -2.1 Configuring built-in drivers ---------------------------------- - -We recommend that you get to know how the drivers work and how to -configure them as loadable modules, before attempting to compile a -kernel with the drivers built-in. - -If you built all of your PARIDE support directly into your kernel, -and you have just a single parallel port IDE device, your kernel should -locate it automatically for you. If you have more than one device, -you may need to give some command line options to your bootloader -(eg: LILO), how to do that is beyond the scope of this document. - -The high-level drivers accept a number of command line parameters, all -of which are documented in the source files in linux/drivers/block/paride. -By default, each driver will automatically try all parallel ports it -can find, and all protocol types that have been installed, until it finds -a parallel port IDE adapter. Once it finds one, the probe stops. So, -if you have more than one device, you will need to tell the drivers -how to identify them. This requires specifying the port address, the -protocol identification number and, for some devices, the drive's -chain ID. While your system is booting, a number of messages are -displayed on the console. Like all such messages, they can be -reviewed with the 'dmesg' command. Among those messages will be -some lines like:: - - paride: bpck registered as protocol 0 - paride: epat registered as protocol 1 - -The numbers will always be the same until you build a new kernel with -different protocol selections. You should note these numbers as you -will need them to identify the devices. + ================ ============ ======== + Manufacturer Model Protocol + ================ ============ ======== + MicroSolutions CD-ROM bpck + MicroSolutions PD drive bpck + MicroSolutions hard-drive bpck + MicroSolutions 8000t tape bpck + SyQuest EZ, SparQ epat + Imation Superdisk epat + Maxell Superdisk friq + Avatar Shark epat + FreeCom CD-ROM frpw + Hewlett-Packard 5GB Tape epat + Hewlett-Packard 7200e (CD) epat + Hewlett-Packard 7200e (CD-R) epat + ================ ============ ======== + +All parports and all protocol drivers are probed automatically unless probe=0 +parameter is used. So just "modprobe epat" is enough for a Imation SuperDisk +drive to work. + +Manual device creation:: + + # echo "port protocol mode unit delay" >/sys/bus/pata_parport/new_device + +where: + + ======== ================================================ + port parport name (or "auto" for all parports) + protocol protocol name (or "auto" for all protocols) + mode mode number (protocol-specific) or -1 for probe + unit unit number (for backpack only, see below) + delay I/O delay (see troubleshooting section below) + ======== ================================================ If you happen to be using a MicroSolutions backpack device, you will also need to know the unit ID number for each drive. This is usually the last two digits of the drive's serial number (but read MicroSolutions' documentation about this). -As an example, let's assume that you have a MicroSolutions PD/CD drive -with unit ID number 36 connected to the parallel port at 0x378, a SyQuest -EZ-135 connected to the chained port on the PD/CD drive and also an -Imation Superdisk connected to port 0x278. You could give the following -options on your boot command:: - - pd.drive0=0x378,1 pf.drive0=0x278,1 pf.drive1=0x378,0,36 - -In the last option, pf.drive1 configures device /dev/pf1, the 0x378 -is the parallel port base address, the 0 is the protocol registration -number and 36 is the chain ID. - -Please note: while PARIDE will work both with and without the -PARPORT parallel port sharing system that is included by the -"Parallel port support" option, PARPORT must be included and enabled -if you want to use chains of devices on the same parallel port. - -2.2 Loading and configuring PARIDE as modules ----------------------------------------------- - -It is much faster and simpler to get to understand the PARIDE drivers -if you use them as loadable kernel modules. - -Note 1: - using these drivers with the "kerneld" automatic module loading - system is not recommended for beginners, and is not documented here. - -Note 2: - if you build PARPORT support as a loadable module, PARIDE must - also be built as loadable modules, and PARPORT must be loaded before - the PARIDE modules. - -To use PARIDE, you must begin by:: - - insmod paride - -this loads a base module which provides a registry for the protocols, -among other tasks. - -Then, load as many of the protocol modules as you think you might need. -As you load each module, it will register the protocols that it supports, -and print a log message to your kernel log file and your console. For -example:: - - # insmod epat - paride: epat registered as protocol 0 - # insmod kbic - paride: k951 registered as protocol 1 - paride: k971 registered as protocol 2 - -Finally, you can load high-level drivers for each kind of device that -you have connected. By default, each driver will autoprobe for a single -device, but you can support up to four similar devices by giving their -individual coordinates when you load the driver. - -For example, if you had two no-name CD-ROM drives both using the -KingByte KBIC-951A adapter, one on port 0x378 and the other on 0x3bc -you could give the following command:: - - # insmod pcd drive0=0x378,1 drive1=0x3bc,1 - -For most adapters, giving a port address and protocol number is sufficient, -but check the source files in linux/drivers/block/paride for more -information. (Hopefully someone will write some man pages one day !). - -As another example, here's what happens when PARPORT is installed, and -a SyQuest EZ-135 is attached to port 0x378:: - - # insmod paride - paride: version 1.0 installed - # insmod epat - paride: epat registered as protocol 0 - # insmod pd - pd: pd version 1.0, major 45, cluster 64, nice 0 - pda: Sharing parport1 at 0x378 - pda: epat 1.0, Shuttle EPAT chip c3 at 0x378, mode 5 (EPP-32), delay 1 - pda: SyQuest EZ135A, 262144 blocks [128M], (512/16/32), removable media - pda: pda1 - -Note that the last line is the output from the generic partition table -scanner - in this case it reports that it has found a disk with one partition. - -2.3 Using a PARIDE device --------------------------- - -Once the drivers have been loaded, you can access PARIDE devices in the -same way as their traditional counterparts. You will probably need to -create the device "special files". Here is a simple script that you can -cut to a file and execute:: - - #!/bin/bash - # - # mkd -- a script to create the device special files for the PARIDE subsystem - # - function mkdev { - mknod $1 $2 $3 $4 ; chmod 0660 $1 ; chown root:disk $1 - } - # - function pd { - D=$( printf \\$( printf "x%03x" $[ $1 + 97 ] ) ) - mkdev pd$D b 45 $[ $1 * 16 ] - for P in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 - do mkdev pd$D$P b 45 $[ $1 * 16 + $P ] - done - } - # - cd /dev - # - for u in 0 1 2 3 ; do pd $u ; done - for u in 0 1 2 3 ; do mkdev pcd$u b 46 $u ; done - for u in 0 1 2 3 ; do mkdev pf$u b 47 $u ; done - for u in 0 1 2 3 ; do mkdev pt$u c 96 $u ; done - for u in 0 1 2 3 ; do mkdev npt$u c 96 $[ $u + 128 ] ; done - for u in 0 1 2 3 ; do mkdev pg$u c 97 $u ; done - # - # end of mkd - -With the device files and drivers in place, you can access PARIDE devices -like any other Linux device. For example, to mount a CD-ROM in pcd0, use:: - - mount /dev/pcd0 /cdrom - -If you have a fresh Avatar Shark cartridge, and the drive is pda, you -might do something like:: - - fdisk /dev/pda -- make a new partition table with - partition 1 of type 83 - - mke2fs /dev/pda1 -- to build the file system - - mkdir /shark -- make a place to mount the disk - - mount /dev/pda1 /shark - -Devices like the Imation superdisk work in the same way, except that -they do not have a partition table. For example to make a 120MB -floppy that you could share with a DOS system:: - - mkdosfs /dev/pf0 - mount /dev/pf0 /mnt - - -2.4 The pf driver ------------------- - -The pf driver is intended for use with parallel port ATAPI disk -devices. The most common devices in this category are PD drives -and LS-120 drives. Traditionally, media for these devices are not -partitioned. Consequently, the pf driver does not support partitioned -media. This may be changed in a future version of the driver. - -2.5 Using the pt driver ------------------------- - -The pt driver for parallel port ATAPI tape drives is a minimal driver. -It does not yet support many of the standard tape ioctl operations. -For best performance, a block size of 32KB should be used. You will -probably want to set the parallel port delay to 0, if you can. - -2.6 Using the pg driver ------------------------- - -The pg driver can be used in conjunction with the cdrecord program -to create CD-ROMs. Please get cdrecord version 1.6.1 or later -from ftp://ftp.fokus.gmd.de/pub/unix/cdrecord/ . To record CD-R media -your parallel port should ideally be set to EPP mode, and the "port delay" -should be set to 0. With those settings it is possible to record at 2x -speed without any buffer underruns. If you cannot get the driver to work -in EPP mode, try to use "bidirectional" or "PS/2" mode and 1x speeds only. +If you omit the parameters from the end, defaults will be used, e.g.: + +Probe all parports with all protocols:: + + # echo auto >/sys/bus/pata_parport/new_device + +Probe parport0 using protocol epat and mode 4 (EPP-16):: + + # echo "parport0 epat 4" >/sys/bus/pata_parport/new_device + +Probe parport0 using all protocols:: + + # echo "parport0 auto" >/sys/bus/pata_parport/new_device + +Probe all parports using protoocol epat:: + + # echo "auto epat" >/sys/bus/pata_parport/new_device + +Deleting devices:: + + # echo pata_parport.0 >/sys/bus/pata_parport/delete_device 3. Troubleshooting @@ -344,9 +169,9 @@ in EPP mode, try to use "bidirectional" or "PS/2" mode and 1x speeds only. 3.1 Use EPP mode if you can ---------------------------- -The most common problems that people report with the PARIDE drivers +The most common problems that people report with the pata_parport drivers concern the parallel port CMOS settings. At this time, none of the -PARIDE protocol modules support ECP mode, or any ECP combination modes. +protocol modules support ECP mode, or any ECP combination modes. If you are able to do so, please set your parallel port into EPP mode using your CMOS setup procedure. @@ -354,17 +179,14 @@ using your CMOS setup procedure. ------------------------- Some parallel ports cannot reliably transfer data at full speed. To -offset the errors, the PARIDE protocol modules introduce a "port +offset the errors, the protocol modules introduce a "port delay" between each access to the i/o ports. Each protocol sets a default value for this delay. In most cases, the user can override the default and set it to 0 - resulting in somewhat higher transfer rates. In some rare cases (especially with older 486 systems) the default delays are not long enough. if you experience corrupt data transfers, or unexpected failures, you may wish to increase the -port delay. The delay can be programmed using the "driveN" parameters -to each of the high-level drivers. Please see the notes above, or -read the comments at the beginning of the driver source files in -linux/drivers/block/paride. +port delay. 3.3 Some drives need a printer reset ------------------------------------- @@ -374,66 +196,12 @@ that do not always power up correctly. We have noticed this with some drives based on OnSpec and older Freecom adapters. In these rare cases, the adapter can often be reinitialised by issuing a "printer reset" on the parallel port. As the reset operation is potentially disruptive in -multiple device environments, the PARIDE drivers will not do it +multiple device environments, the pata_parport drivers will not do it automatically. You can however, force a printer reset by doing:: insmod lp reset=1 rmmod lp If you have one of these marginal cases, you should probably build -your paride drivers as modules, and arrange to do the printer reset -before loading the PARIDE drivers. - -3.4 Use the verbose option and dmesg if you need help ------------------------------------------------------- - -While a lot of testing has gone into these drivers to make them work -as smoothly as possible, problems will arise. If you do have problems, -please check all the obvious things first: does the drive work in -DOS with the manufacturer's drivers ? If that doesn't yield any useful -clues, then please make sure that only one drive is hooked to your system, -and that either (a) PARPORT is enabled or (b) no other device driver -is using your parallel port (check in /proc/ioports). Then, load the -appropriate drivers (you can load several protocol modules if you want) -as in:: - - # insmod paride - # insmod epat - # insmod bpck - # insmod kbic - ... - # insmod pd verbose=1 - -(using the correct driver for the type of device you have, of course). -The verbose=1 parameter will cause the drivers to log a trace of their -activity as they attempt to locate your drive. - -Use 'dmesg' to capture a log of all the PARIDE messages (any messages -beginning with paride:, a protocol module's name or a driver's name) and -include that with your bug report. You can submit a bug report in one -of two ways. Either send it directly to the author of the PARIDE suite, -by e-mail to grant@torque.net, or join the linux-parport mailing list -and post your report there. - -3.5 For more information or help ---------------------------------- - -You can join the linux-parport mailing list by sending a mail message -to: - - linux-parport-request@torque.net - -with the single word:: - - subscribe - -in the body of the mail message (not in the subject line). Please be -sure that your mail program is correctly set up when you do this, as -the list manager is a robot that will subscribe you using the reply -address in your mail headers. REMOVE any anti-spam gimmicks you may -have in your mail headers, when sending mail to the list server. - -You might also find some useful information on the linux-parport -web pages (although they are not always up to date) at - - http://web.archive.org/web/%2E/http://www.torque.net/parport/ +your pata_parport drivers as modules, and arrange to do the printer reset +before loading the pata_parport drivers. diff --git a/Documentation/admin-guide/cgroup-v1/blkio-controller.rst b/Documentation/admin-guide/cgroup-v1/blkio-controller.rst index 16253eda192e..dabb80cdd25a 100644 --- a/Documentation/admin-guide/cgroup-v1/blkio-controller.rst +++ b/Documentation/admin-guide/cgroup-v1/blkio-controller.rst @@ -106,7 +106,7 @@ Proportional weight policy files see Documentation/block/bfq-iosched.rst. blkio.bfq.weight_device - Specifes per cgroup per device weights, overriding the default group + Specifies per cgroup per device weights, overriding the default group weight. For more details, see Documentation/block/bfq-iosched.rst. Following is the format:: diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 5db4c4dd5bb4..f67c0829350b 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -624,7 +624,7 @@ and is an example of this type. Limits ------ -A child can only consume upto the configured amount of the resource. +A child can only consume up to the configured amount of the resource. Limits can be over-committed - the sum of the limits of children can exceed the amount of resource available to the parent. @@ -642,11 +642,11 @@ on an IO device and is an example of this type. Protections ----------- -A cgroup is protected upto the configured amount of the resource +A cgroup is protected up to the configured amount of the resource as long as the usages of all its ancestors are under their protected levels. Protections can be hard guarantees or best effort soft boundaries. Protections can also be over-committed in which case -only upto the amount available to the parent is protected among +only up to the amount available to the parent is protected among children. Protections are in the range [0, max] and defaults to 0, which is @@ -1079,7 +1079,7 @@ All time durations are in microseconds. $MAX $PERIOD - which indicates that the group may consume upto $MAX in each + which indicates that the group may consume up to $MAX in each $PERIOD duration. "max" for $MAX indicates no limit. If only one number is written, $MAX is updated. @@ -2289,7 +2289,7 @@ Cpuset Interface Files For a valid partition root with the sibling cpu exclusivity rule enabled, changes made to "cpuset.cpus" that violate the exclusivity rule will invalidate the partition as well as its - sibiling partitions with conflicting cpuset.cpus values. So + sibling partitions with conflicting cpuset.cpus values. So care must be taking in changing "cpuset.cpus". A valid non-root parent partition may distribute out all its CPUs diff --git a/Documentation/admin-guide/cifs/usage.rst b/Documentation/admin-guide/cifs/usage.rst index ed3b8dc854ec..2e151cd8c2e4 100644 --- a/Documentation/admin-guide/cifs/usage.rst +++ b/Documentation/admin-guide/cifs/usage.rst @@ -399,7 +399,7 @@ A partial list of the supported mount options follows: sep if first mount option (after the -o), overrides the comma as the separator between the mount - parms. e.g.:: + parameters. e.g.:: -o user=myname,password=mypassword,domain=mydom @@ -765,7 +765,7 @@ cifsFYI If set to non-zero value, additional debug information Some debugging statements are not compiled into the cifs kernel unless CONFIG_CIFS_DEBUG2 is enabled in the kernel configuration. cifsFYI may be set to one or - nore of the following flags (7 sets them all):: + more of the following flags (7 sets them all):: +-----------------------------------------------+------+ | log cifs informational messages | 0x01 | diff --git a/Documentation/admin-guide/device-mapper/cache-policies.rst b/Documentation/admin-guide/device-mapper/cache-policies.rst index b17fe352fc41..13da4d831d46 100644 --- a/Documentation/admin-guide/device-mapper/cache-policies.rst +++ b/Documentation/admin-guide/device-mapper/cache-policies.rst @@ -70,7 +70,7 @@ the entries (each hotspot block covers a larger area than a single cache block). All this means smq uses ~25bytes per cache block. Still a lot of -memory, but a substantial improvement nontheless. +memory, but a substantial improvement nonetheless. Level balancing ^^^^^^^^^^^^^^^ diff --git a/Documentation/admin-guide/device-mapper/dm-ebs.rst b/Documentation/admin-guide/device-mapper/dm-ebs.rst index 534fa38e8862..c09f66db5621 100644 --- a/Documentation/admin-guide/device-mapper/dm-ebs.rst +++ b/Documentation/admin-guide/device-mapper/dm-ebs.rst @@ -31,7 +31,7 @@ Mandatory parameters: Optional parameter: - <underyling sectors>: + <underlying sectors>: Number of sectors defining the logical block size of <dev path>. 2^N supported, e.g. 8 = emulate 8 sectors of 512 bytes = 4KiB. If not provided, the logical block size of <dev path> will be used. diff --git a/Documentation/admin-guide/device-mapper/dm-zoned.rst b/Documentation/admin-guide/device-mapper/dm-zoned.rst index 0fac051caeac..932383fe6e88 100644 --- a/Documentation/admin-guide/device-mapper/dm-zoned.rst +++ b/Documentation/admin-guide/device-mapper/dm-zoned.rst @@ -46,7 +46,7 @@ just like conventional zones. The zones of the device(s) are separated into 2 types: 1) Metadata zones: these are conventional zones used to store metadata. -Metadata zones are not reported as useable capacity to the user. +Metadata zones are not reported as usable capacity to the user. 2) Data zones: all remaining zones, the vast majority of which will be sequential zones used exclusively to store user data. The conventional diff --git a/Documentation/admin-guide/device-mapper/unstriped.rst b/Documentation/admin-guide/device-mapper/unstriped.rst index 0a8d3eb3f072..5772ccdd1f5f 100644 --- a/Documentation/admin-guide/device-mapper/unstriped.rst +++ b/Documentation/admin-guide/device-mapper/unstriped.rst @@ -35,7 +35,7 @@ An example of undoing an existing dm-stripe This small bash script will setup 4 loop devices and use the existing striped target to combine the 4 devices into one. It then will use -the unstriped target ontop of the striped device to access the +the unstriped target on top of the striped device to access the individual backing loop devices. We write data to the newly exposed unstriped devices and verify the data written matches the correct underlying device on the striped array:: @@ -110,8 +110,8 @@ to get a 92% reduction in read latency using this device mapper target. Example dmsetup usage ===================== -unstriped ontop of Intel NVMe device that has 2 cores ------------------------------------------------------ +unstriped on top of Intel NVMe device that has 2 cores +------------------------------------------------------ :: @@ -124,8 +124,8 @@ respectively:: /dev/mapper/nvmset0 /dev/mapper/nvmset1 -unstriped ontop of striped with 4 drives using 128K chunk size --------------------------------------------------------------- +unstriped on top of striped with 4 drives using 128K chunk size +--------------------------------------------------------------- :: diff --git a/Documentation/admin-guide/dynamic-debug-howto.rst b/Documentation/admin-guide/dynamic-debug-howto.rst index faa22f77847a..8dc668cc1216 100644 --- a/Documentation/admin-guide/dynamic-debug-howto.rst +++ b/Documentation/admin-guide/dynamic-debug-howto.rst @@ -330,7 +330,7 @@ Examples // boot-args example, with newlines and comments for readability Kernel command line: ... - // see whats going on in dyndbg=value processing + // see what's going on in dyndbg=value processing dynamic_debug.verbose=3 // enable pr_debugs in the btrfs module (can be builtin or loadable) btrfs.dyndbg="+p" diff --git a/Documentation/admin-guide/gpio/gpio-sim.rst b/Documentation/admin-guide/gpio/gpio-sim.rst index d8a90c81b9ee..1cc5567a4bbe 100644 --- a/Documentation/admin-guide/gpio/gpio-sim.rst +++ b/Documentation/admin-guide/gpio/gpio-sim.rst @@ -123,7 +123,7 @@ Each simulated GPIO chip creates a separate sysfs group under its device directory for each exposed line (e.g. ``/sys/devices/platform/gpio-sim.X/gpiochipY/``). The name of each group is of the form: ``'sim_gpioX'`` where X is the offset of the line. Inside each -group there are two attibutes: +group there are two attributes: ``pull`` - allows to read and set the current simulated pull setting for every line, when writing the value must be one of: ``'pull-up'``, diff --git a/Documentation/admin-guide/hw-vuln/mds.rst b/Documentation/admin-guide/hw-vuln/mds.rst index 2d19c9f4c1fe..f491de74ea79 100644 --- a/Documentation/admin-guide/hw-vuln/mds.rst +++ b/Documentation/admin-guide/hw-vuln/mds.rst @@ -64,8 +64,8 @@ architecture section: :ref:`Documentation/x86/mds.rst <mds>`. Attack scenarios ---------------- -Attacks against the MDS vulnerabilities can be mounted from malicious non -priviledged user space applications running on hosts or guest. Malicious +Attacks against the MDS vulnerabilities can be mounted from malicious non- +privileged user space applications running on hosts or guest. Malicious guest OSes can obviously mount attacks as well. Contrary to other speculation based vulnerabilities the MDS vulnerability diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst index c4dcdb3d0d45..3fe6511c5405 100644 --- a/Documentation/admin-guide/hw-vuln/spectre.rst +++ b/Documentation/admin-guide/hw-vuln/spectre.rst @@ -610,9 +610,9 @@ kernel command line. retpoline,generic Retpolines retpoline,lfence LFENCE; indirect branch retpoline,amd alias for retpoline,lfence - eibrs enhanced IBRS - eibrs,retpoline enhanced IBRS + Retpolines - eibrs,lfence enhanced IBRS + LFENCE + eibrs Enhanced/Auto IBRS + eibrs,retpoline Enhanced/Auto IBRS + Retpolines + eibrs,lfence Enhanced/Auto IBRS + LFENCE ibrs use IBRS to protect kernel Not specifying this option is equivalent to diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index 0571938ecdc8..0ad7e7ec0d27 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -56,6 +56,17 @@ ABI will be found here. sysfs-rules +This is the beginning of a section with information of interest to +application developers and system integrators doing analysis of the +Linux kernel for safety critical applications. Documents supporting +analysis of kernel interactions with applications, and key kernel +subsystems expectations will be found here. + +.. toctree:: + :maxdepth: 1 + + workload-tracing + The rest of this manual consists of various unordered guides on how to configure specific aspects of kernel behavior to your liking. diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt b/Documentation/admin-guide/kdump/gdbmacros.txt index 82aecdcae8a6..030de95e3e6b 100644 --- a/Documentation/admin-guide/kdump/gdbmacros.txt +++ b/Documentation/admin-guide/kdump/gdbmacros.txt @@ -312,10 +312,10 @@ define dmesg set var $prev_flags = $info->flags end - set var $id = ($id + 1) & $id_mask if ($id == $end_id) loop_break end + set var $id = ($id + 1) & $id_mask end end document dmesg diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst index 959f73a32712..19600c50277b 100644 --- a/Documentation/admin-guide/kernel-parameters.rst +++ b/Documentation/admin-guide/kernel-parameters.rst @@ -142,7 +142,6 @@ parameter is applicable:: NFS Appropriate NFS support is enabled. OF Devicetree is enabled. PV_OPS A paravirtualized kernel is enabled. - PARIDE The ParIDE (parallel port IDE) subsystem is enabled. PARISC The PA-RISC architecture is enabled. PCI PCI bus support is enabled. PCIE PCI Express support is enabled. diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 8870a29f92a8..276a793168a6 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -378,18 +378,16 @@ autoconf= [IPV6] See Documentation/networking/ipv6.rst. - show_lapic= [APIC,X86] Advanced Programmable Interrupt Controller - Limit apic dumping. The parameter defines the maximal - number of local apics being dumped. Also it is possible - to set it to "all" by meaning -- no limit here. - Format: { 1 (default) | 2 | ... | all }. - The parameter valid if only apic=debug or - apic=verbose is specified. - Example: apic=debug show_lapic=all - apm= [APM] Advanced Power Management See header of arch/x86/kernel/apm_32.c. + apparmor= [APPARMOR] Disable or enable AppArmor at boot time + Format: { "0" | "1" } + See security/apparmor/Kconfig help text + 0 -- disable. + 1 -- enable. + Default value is set via kernel config option. + arcrimi= [HW,NET] ARCnet - "RIM I" (entirely mem-mapped) cards Format: <io>,<irq>,<nodeID> @@ -480,8 +478,10 @@ See Documentation/block/cmdline-partition.rst boot_delay= Milliseconds to delay each printk during boot. - Values larger than 10 seconds (10000) are changed to - no delay (0). + Only works if CONFIG_BOOT_PRINTK_DELAY is enabled, + and you may also have to specify "lpj=". Boot_delay + values larger than 10 seconds (10000) are assumed + erroneous and ignored. Format: integer bootconfig [KNL] @@ -557,6 +557,7 @@ Format: <string> nosocket -- Disable socket memory accounting. nokmem -- Disable kernel memory accounting. + nobpf -- Disable BPF memory accounting. checkreqprot= [SELINUX] Set initial checkreqprot flag value. Format: { "0" | "1" } @@ -672,7 +673,7 @@ Sets the size of kernel per-numa memory area for contiguous memory allocations. A value of 0 disables per-numa CMA altogether. And If this option is not - specificed, the default value is 0. + specified, the default value is 0. With per-numa CMA enabled, DMA users on node nid will first try to allocate buffer from the pernuma area which is located in node nid, if the allocation fails, @@ -944,7 +945,7 @@ driver code when a CPU writes to (or reads from) a random memory location. Note that there exists a class of memory corruptions problems caused by buggy H/W or - F/W or by drivers badly programing DMA (basically when + F/W or by drivers badly programming DMA (basically when memory is written at bus level and the CPU MMU is bypassed) which are not detectable by CONFIG_DEBUG_PAGEALLOC, hence this option will not help @@ -1045,26 +1046,12 @@ can be useful when debugging issues that require an SLB miss to occur. - stress_slb [PPC] - Limits the number of kernel SLB entries, and flushes - them frequently to increase the rate of SLB faults - on kernel addresses. - - stress_hpt [PPC] - Limits the number of kernel HPT entries in the hash - page table to increase the rate of hash page table - faults on kernel addresses. - disable= [IPV6] See Documentation/networking/ipv6.rst. disable_radix [PPC] Disable RADIX MMU mode on POWER9 - radix_hcall_invalidate=on [PPC/PSERIES] - Disable RADIX GTSE feature and use hcall for TLB - invalidate. - disable_tlbie [PPC] Disable TLBIE instruction. Currently does not work with KVM, with HASH MMU, or with coherent accelerators. @@ -1166,16 +1153,6 @@ Documentation/admin-guide/dynamic-debug-howto.rst for details. - nopku [X86] Disable Memory Protection Keys CPU feature found - in some Intel CPUs. - - <module>.async_probe[=<bool>] [KNL] - If no <bool> value is specified or if the value - specified is not a valid <bool>, enable asynchronous - probe on this module. Otherwise, enable/disable - asynchronous probe on this module as indicated by the - <bool> value. See also: module.async_probe - early_ioremap_debug [KNL] Enable debug messages in early_ioremap support. This is useful for tracking down temporary early mappings @@ -1532,6 +1509,15 @@ boot up that is likely to be overridden by user space start up functionality. + Optionally, the snapshot can also be defined for a tracing + instance that was created by the trace_instance= command + line parameter. + + trace_instance=foo,sched_switch ftrace_boot_snapshot=foo + + The above will cause the "foo" tracing instance to trigger + a snapshot at the end of boot up. + ftrace_dump_on_oops[=orig_cpu] [FTRACE] will dump the trace buffers on oops. If no parameter is passed, ftrace will dump @@ -1752,7 +1738,7 @@ boot-time allocation of gigantic hugepages is skipped. hugetlb_free_vmemmap= - [KNL] Reguires CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP + [KNL] Requires CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP enabled. Control if HugeTLB Vmemmap Optimization (HVO) is enabled. Allows heavy hugetlb users to free up some more @@ -1791,12 +1777,6 @@ which allow the hypervisor to 'idle' the guest on lock contention. - keep_bootcon [KNL] - Do not unregister boot console at start. This is only - useful for debugging when something happens in the window - between unregistering the boot console and initializing - the real console. - i2c_bus= [HW] Override the default board specific I2C bus speed or register an additional I2C bus that is not registered from board initialization code. @@ -2366,17 +2346,18 @@ js= [HW,JOY] Analog joystick See Documentation/input/joydev/joystick.rst. - nokaslr [KNL] - When CONFIG_RANDOMIZE_BASE is set, this disables - kernel and module base offset ASLR (Address Space - Layout Randomization). - kasan_multi_shot [KNL] Enforce KASAN (Kernel Address Sanitizer) to print report on every invalid memory access. Without this parameter KASAN will print report only for the first invalid access. + keep_bootcon [KNL] + Do not unregister boot console at start. This is only + useful for debugging when something happens in the window + between unregistering the boot console and initializing + the real console. + keepinitrd [HW,ARM] kernelcore= [KNL,X86,IA-64,PPC] @@ -2816,6 +2797,9 @@ * [no]setxfer: Indicate if transfer speed mode setting should be skipped. + * [no]fua: Disable or enable FUA (Force Unit Access) + support for devices supporting this feature. + * dump_id: Dump IDENTIFY data. * disable: Disable this device. @@ -3325,6 +3309,13 @@ For details see: Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst + <module>.async_probe[=<bool>] [KNL] + If no <bool> value is specified or if the value + specified is not a valid <bool>, enable asynchronous + probe on this module. Otherwise, enable/disable + asynchronous probe on this module as indicated by the + <bool> value. See also: module.async_probe + module.async_probe=<bool> [KNL] When set to true, modules will use async probing by default. To enable/disable async probing for a @@ -3708,7 +3699,7 @@ implementation; requires CONFIG_GENERIC_IDLE_POLL_SETUP to be effective. This is useful on platforms where the sleep(SH) or wfi(ARM,ARM64) instructions do not work - correctly or when doing power measurements to evalute + correctly or when doing power measurements to evaluate the impact of the sleep instructions. This is also useful when using JTAG debugger. @@ -3779,6 +3770,11 @@ nojitter [IA-64] Disables jitter checking for ITC timers. + nokaslr [KNL] + When CONFIG_RANDOMIZE_BASE is set, this disables + kernel and module base offset ASLR (Address Space + Layout Randomization). + no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page @@ -3824,6 +3820,19 @@ nopcid [X86-64] Disable the PCID cpu feature. + nopku [X86] Disable Memory Protection Keys CPU feature found + in some Intel CPUs. + + nopv= [X86,XEN,KVM,HYPER_V,VMWARE] + Disables the PV optimizations forcing the guest to run + as generic guest with no PV drivers. Currently support + XEN HVM, KVM, HYPER_V and VMWARE guest. + + nopvspin [X86,XEN,KVM] + Disables the qspinlock slow path using PV optimizations + which allow the hypervisor to 'idle' the guest on lock + contention. + norandmaps Don't use address space randomization. Equivalent to echo 0 > /proc/sys/kernel/randomize_va_space @@ -4117,10 +4126,6 @@ pcbit= [HW,ISDN] - pcd. [PARIDE] - See header of drivers/block/paride/pcd.c. - See also Documentation/admin-guide/blockdev/paride.rst. - pci=option[,option...] [PCI] various PCI subsystem options. Some options herein operate on a specific device @@ -4383,9 +4388,6 @@ for debug and development, but should not be needed on a platform with proper driver support. - pd. [PARIDE] - See Documentation/admin-guide/blockdev/paride.rst. - pdcchassis= [PARISC,HW] Disable/Enable PDC Chassis Status codes at boot time. Format: { 0 | 1 } @@ -4398,12 +4400,6 @@ allocator. This parameter is primarily for debugging and performance comparison. - pf. [PARIDE] - See Documentation/admin-guide/blockdev/paride.rst. - - pg. [PARIDE] - See Documentation/admin-guide/blockdev/paride.rst. - pirq= [SMP,APIC] Manual mp-table setup See Documentation/x86/i386/IO-APIC.rst. @@ -4565,9 +4561,6 @@ pstore.backend= Specify the name of the pstore backend to use - pt. [PARIDE] - See Documentation/admin-guide/blockdev/paride.rst. - pti= [X86-64] Control Page Table Isolation of user and kernel address spaces. Disabling this feature removes hardening, but improves performance of @@ -4591,6 +4584,10 @@ r128= [HW,DRM] + radix_hcall_invalidate=on [PPC/PSERIES] + Disable RADIX GTSE feature and use hcall for TLB + invalidate. + raid= [HW,RAID] See Documentation/admin-guide/md.rst. @@ -5583,13 +5580,6 @@ 1 -- enable. Default value is 1. - apparmor= [APPARMOR] Disable or enable AppArmor at boot time - Format: { "0" | "1" } - See security/apparmor/Kconfig help text - 0 -- disable. - 1 -- enable. - Default value is set via kernel config option. - serialnumber [BUGS=X86-32] sev=option[,option...] [X86-64] See Documentation/x86/x86_64/boot-options.rst @@ -5597,6 +5587,15 @@ shapers= [NET] Maximal number of shapers. + show_lapic= [APIC,X86] Advanced Programmable Interrupt Controller + Limit apic dumping. The parameter defines the maximal + number of local apics being dumped. Also it is possible + to set it to "all" by meaning -- no limit here. + Format: { 1 (default) | 2 | ... | all }. + The parameter valid if only apic=debug or + apic=verbose is specified. + Example: apic=debug show_lapic=all + simeth= [IA-64] simscsi= @@ -5740,9 +5739,9 @@ retpoline,generic - Retpolines retpoline,lfence - LFENCE; indirect branch retpoline,amd - alias for retpoline,lfence - eibrs - enhanced IBRS - eibrs,retpoline - enhanced IBRS + Retpolines - eibrs,lfence - enhanced IBRS + LFENCE + eibrs - Enhanced/Auto IBRS + eibrs,retpoline - Enhanced/Auto IBRS + Retpolines + eibrs,lfence - Enhanced/Auto IBRS + LFENCE ibrs - use IBRS to protect kernel Not specifying this option is equivalent to @@ -6036,6 +6035,16 @@ be used to filter out binaries which have not yet been made aware of AT_MINSIGSTKSZ. + stress_hpt [PPC] + Limits the number of kernel HPT entries in the hash + page table to increase the rate of hash page table + faults on kernel addresses. + + stress_slb [PPC] + Limits the number of kernel SLB entries, and flushes + them frequently to increase the rate of SLB faults + on kernel addresses. + sunrpc.min_resvport= sunrpc.max_resvport= [NFS,SUNRPC] @@ -6283,13 +6292,33 @@ comma-separated list of trace events to enable. See also Documentation/trace/events.rst + trace_instance=[instance-info] + [FTRACE] Create a ring buffer instance early in boot up. + This will be listed in: + + /sys/kernel/tracing/instances + + Events can be enabled at the time the instance is created + via: + + trace_instance=<name>,<system1>:<event1>,<system2>:<event2> + + Note, the "<system*>:" portion is optional if the event is + unique. + + trace_instance=foo,sched:sched_switch,irq_handler_entry,initcall + + will enable the "sched_switch" event (note, the "sched:" is optional, and + the same thing would happen if it was left off). The irq_handler_entry + event, and all events under the "initcall" system. + trace_options=[option-list] [FTRACE] Enable or disable tracer options at boot. The option-list is a comma delimited list of options that can be enabled or disabled just as if you were to echo the option name into - /sys/kernel/debug/tracing/trace_options + /sys/kernel/tracing/trace_options For example, to enable stacktrace option (to dump the stack trace of each event), add to the command line: @@ -6322,7 +6351,7 @@ [FTRACE] enable this option to disable tracing when a warning is hit. This turns off "tracing_on". Tracing can be enabled again by echoing '1' into the "tracing_on" - file located in /sys/kernel/debug/tracing/ + file located in /sys/kernel/tracing/ This option is useful, as it disables the trace before the WARNING dump is called, which prevents the trace to @@ -6777,11 +6806,11 @@ functions are at fixed addresses, they make nice targets for exploits that can control RIP. - emulate [default] Vsyscalls turn into traps and are - emulated reasonably safely. The vsyscall - page is readable. + emulate Vsyscalls turn into traps and are emulated + reasonably safely. The vsyscall page is + readable. - xonly Vsyscalls turn into traps and are + xonly [default] Vsyscalls turn into traps and are emulated reasonably safely. The vsyscall page is not readable. @@ -6978,16 +7007,6 @@ fairer and the number of possible event channels is much higher. Default is on (use fifo events). - nopv= [X86,XEN,KVM,HYPER_V,VMWARE] - Disables the PV optimizations forcing the guest to run - as generic guest with no PV drivers. Currently support - XEN HVM, KVM, HYPER_V and VMWARE guest. - - nopvspin [X86,XEN,KVM] - Disables the qspinlock slow path using PV optimizations - which allow the hypervisor to 'idle' the guest on lock - contention. - xirc2ps_cs= [NET,PCMCIA] Format: <irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]] diff --git a/Documentation/admin-guide/kernel-per-CPU-kthreads.rst b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst index e4a5fc26f1a9..993c2a05f5ee 100644 --- a/Documentation/admin-guide/kernel-per-CPU-kthreads.rst +++ b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst @@ -25,7 +25,7 @@ References - In order to locate kernel-generated OS jitter on CPU N: - cd /sys/kernel/debug/tracing + cd /sys/kernel/tracing echo 1 > max_graph_depth # Increase the "1" for more detail echo function_graph > current_tracer # run workload diff --git a/Documentation/admin-guide/laptops/thinkpad-acpi.rst b/Documentation/admin-guide/laptops/thinkpad-acpi.rst index 475eb0e81e4a..e27a1c3f634e 100644 --- a/Documentation/admin-guide/laptops/thinkpad-acpi.rst +++ b/Documentation/admin-guide/laptops/thinkpad-acpi.rst @@ -1488,7 +1488,7 @@ Example of command to set keyboard language is mentioned below:: Text corresponding to keyboard layout to be set in sysfs are: be(Belgian), cz(Czech), da(Danish), de(German), en(English), es(Spain), et(Estonian), fr(French), fr-ch(French(Switzerland)), hu(Hungarian), it(Italy), jp (Japan), -nl(Dutch), nn(Norway), pl(Polish), pt(portugese), sl(Slovenian), sv(Sweden), +nl(Dutch), nn(Norway), pl(Polish), pt(portuguese), sl(Slovenian), sv(Sweden), tr(Turkey) WWAN Antenna type diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/md.rst index d8fc9a59c086..4ff2cc291d18 100644 --- a/Documentation/admin-guide/md.rst +++ b/Documentation/admin-guide/md.rst @@ -317,7 +317,7 @@ All md devices contain: suspended (not supported yet) All IO requests will block. The array can be reconfigured. - Writing this, if accepted, will block until array is quiessent + Writing this, if accepted, will block until array is quiescent readonly no resync can happen. no superblocks get written. diff --git a/Documentation/admin-guide/media/bttv.rst b/Documentation/admin-guide/media/bttv.rst index 125f6f47123d..58cbaf6df694 100644 --- a/Documentation/admin-guide/media/bttv.rst +++ b/Documentation/admin-guide/media/bttv.rst @@ -909,7 +909,7 @@ DE hat diverse Treiber fuer diese Modelle (Stand 09/2002): - TVPhone98 (Bt878) - AVerTV und TVCapture98 w/VCR (Bt 878) - AVerTVStudio und TVPhone98 w/VCR (Bt878) - - AVerTV GO Serie (Kein SVideo Input) + - AVerTV GO Series (Kein SVideo Input) - AVerTV98 (BT-878 chip) - AVerTV98 mit Fernbedienung (BT-878 chip) - AVerTV/FM98 (BT-878 chip) diff --git a/Documentation/admin-guide/media/building.rst b/Documentation/admin-guide/media/building.rst index 2d660b76caea..a06473429916 100644 --- a/Documentation/admin-guide/media/building.rst +++ b/Documentation/admin-guide/media/building.rst @@ -137,7 +137,7 @@ The ``LIRC user interface`` option adds enhanced functionality when using the from remote controllers. The ``Support for eBPF programs attached to lirc devices`` option allows -the usage of special programs (called eBPF) that would allow aplications +the usage of special programs (called eBPF) that would allow applications to add extra remote controller decoding functionality to the Linux Kernel. The ``Remote controller decoders`` option allows selecting the diff --git a/Documentation/admin-guide/media/si476x.rst b/Documentation/admin-guide/media/si476x.rst index 87062301d6a1..c8882ee9f208 100644 --- a/Documentation/admin-guide/media/si476x.rst +++ b/Documentation/admin-guide/media/si476x.rst @@ -142,7 +142,7 @@ The drivers exposes following files: indicator 0x18 lassi Signed Low side adjacent Channel Strength indicator - 0x19 hassi ditto fpr High side + 0x19 hassi ditto for High side 0x20 mult Multipath indicator 0x21 dev Frequency deviation 0x24 assi Adjacent channel SSI diff --git a/Documentation/admin-guide/media/vivid.rst b/Documentation/admin-guide/media/vivid.rst index 672a8371f6ad..58ac25b2c385 100644 --- a/Documentation/admin-guide/media/vivid.rst +++ b/Documentation/admin-guide/media/vivid.rst @@ -580,7 +580,7 @@ Metadata Capture ---------------- The Metadata capture generates UVC format metadata. The PTS and SCR are -transmitted based on the values set in vivid contols. +transmitted based on the values set in vivid controls. The Metadata device will only work for the Webcam input, it will give back an error for all other inputs. diff --git a/Documentation/admin-guide/mm/concepts.rst b/Documentation/admin-guide/mm/concepts.rst index c79f1e336222..e796b0a7e4a5 100644 --- a/Documentation/admin-guide/mm/concepts.rst +++ b/Documentation/admin-guide/mm/concepts.rst @@ -1,5 +1,3 @@ -.. _mm_concepts: - ================= Concepts overview ================= @@ -86,16 +84,15 @@ memory with the huge pages. The first one is `HugeTLB filesystem`, or hugetlbfs. It is a pseudo filesystem that uses RAM as its backing store. For the files created in this filesystem the data resides in the memory and mapped using huge pages. The hugetlbfs is described at -:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`. +Documentation/admin-guide/mm/hugetlbpage.rst. Another, more recent, mechanism that enables use of the huge pages is called `Transparent HugePages`, or THP. Unlike the hugetlbfs that requires users and/or system administrators to configure what parts of the system memory should and can be mapped by the huge pages, THP manages such mappings transparently to the user and hence the -name. See -:ref:`Documentation/admin-guide/mm/transhuge.rst <admin_guide_transhuge>` -for more details about THP. +name. See Documentation/admin-guide/mm/transhuge.rst for more details +about THP. Zones ===== @@ -125,8 +122,8 @@ processor. Each bank is referred to as a `node` and for each node Linux constructs an independent memory management subsystem. A node has its own set of zones, lists of free and used pages and various statistics counters. You can find more details about NUMA in -:ref:`Documentation/mm/numa.rst <numa>` and in -:ref:`Documentation/admin-guide/mm/numa_memory_policy.rst <numa_memory_policy>`. +Documentation/mm/numa.rst` and in +Documentation/admin-guide/mm/numa_memory_policy.rst. Page cache ========== diff --git a/Documentation/admin-guide/mm/damon/lru_sort.rst b/Documentation/admin-guide/mm/damon/lru_sort.rst index c09cace80651..7b0775d281b4 100644 --- a/Documentation/admin-guide/mm/damon/lru_sort.rst +++ b/Documentation/admin-guide/mm/damon/lru_sort.rst @@ -54,7 +54,7 @@ that is built with ``CONFIG_DAMON_LRU_SORT=y``. To let sysadmins enable or disable it and tune for the given system, DAMON_LRU_SORT utilizes module parameters. That is, you can put ``damon_lru_sort.<parameter>=<value>`` on the kernel boot command line or write -proper values to ``/sys/modules/damon_lru_sort/parameters/<parameter>`` files. +proper values to ``/sys/module/damon_lru_sort/parameters/<parameter>`` files. Below are the description of each parameter. @@ -283,7 +283,7 @@ doesn't make progress and therefore the free memory rate becomes lower than 20%, it asks DAMON_LRU_SORT to do nothing again, so that we can fall back to the LRU-list based page granularity reclamation. :: - # cd /sys/modules/damon_lru_sort/parameters + # cd /sys/module/damon_lru_sort/parameters # echo 500 > hot_thres_access_freq # echo 120000000 > cold_min_age # echo 10 > quota_ms diff --git a/Documentation/admin-guide/mm/damon/reclaim.rst b/Documentation/admin-guide/mm/damon/reclaim.rst index 4f1479a11e63..d2ccd9c21b9a 100644 --- a/Documentation/admin-guide/mm/damon/reclaim.rst +++ b/Documentation/admin-guide/mm/damon/reclaim.rst @@ -46,7 +46,7 @@ that is built with ``CONFIG_DAMON_RECLAIM=y``. To let sysadmins enable or disable it and tune for the given system, DAMON_RECLAIM utilizes module parameters. That is, you can put ``damon_reclaim.<parameter>=<value>`` on the kernel boot command line or write -proper values to ``/sys/modules/damon_reclaim/parameters/<parameter>`` files. +proper values to ``/sys/module/damon_reclaim/parameters/<parameter>`` files. Below are the description of each parameter. @@ -251,7 +251,7 @@ therefore the free memory rate becomes lower than 20%, it asks DAMON_RECLAIM to do nothing again, so that we can fall back to the LRU-list based page granularity reclamation. :: - # cd /sys/modules/damon_reclaim/parameters + # cd /sys/module/damon_reclaim/parameters # echo 30000000 > min_age # echo $((1 * 1024 * 1024 * 1024)) > quota_sz # echo 1000 > quota_reset_interval_ms diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst index 19f27c0d92e0..bca00cb6f43a 100644 --- a/Documentation/admin-guide/mm/hugetlbpage.rst +++ b/Documentation/admin-guide/mm/hugetlbpage.rst @@ -1,5 +1,3 @@ -.. _hugetlbpage: - ============= HugeTLB Pages ============= @@ -86,7 +84,7 @@ by increasing or decreasing the value of ``nr_hugepages``. Note: When the feature of freeing unused vmemmap pages associated with each hugetlb page is enabled, we can fail to free the huge pages triggered by -the user when ths system is under memory pressure. Please try again later. +the user when the system is under memory pressure. Please try again later. Pages that are used as huge pages are reserved inside the kernel and cannot be used for other purposes. Huge pages cannot be swapped out under @@ -313,7 +311,7 @@ memory policy mode--bind, preferred, local or interleave--may be used. The resulting effect on persistent huge page allocation is as follows: #. Regardless of mempolicy mode [see - :ref:`Documentation/admin-guide/mm/numa_memory_policy.rst <numa_memory_policy>`], + Documentation/admin-guide/mm/numa_memory_policy.rst], persistent huge pages will be distributed across the node or nodes specified in the mempolicy as if "interleave" had been specified. However, if a node in the policy does not contain sufficient contiguous diff --git a/Documentation/admin-guide/mm/idle_page_tracking.rst b/Documentation/admin-guide/mm/idle_page_tracking.rst index df9394fb39c2..b5a285bd73fd 100644 --- a/Documentation/admin-guide/mm/idle_page_tracking.rst +++ b/Documentation/admin-guide/mm/idle_page_tracking.rst @@ -1,5 +1,3 @@ -.. _idle_page_tracking: - ================== Idle Page Tracking ================== @@ -70,9 +68,8 @@ If the tool is run initially with the appropriate option, it will mark all the queried pages as idle. Subsequent runs of the tool can then show which pages have their idle flag cleared in the interim. -See :ref:`Documentation/admin-guide/mm/pagemap.rst <pagemap>` for more -information about ``/proc/pid/pagemap``, ``/proc/kpageflags``, and -``/proc/kpagecgroup``. +See Documentation/admin-guide/mm/pagemap.rst for more information about +``/proc/pid/pagemap``, ``/proc/kpageflags``, and ``/proc/kpagecgroup``. .. _impl_details: diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst index d1064e0ba34a..1f883abf3f00 100644 --- a/Documentation/admin-guide/mm/index.rst +++ b/Documentation/admin-guide/mm/index.rst @@ -16,8 +16,7 @@ are described in Documentation/admin-guide/sysctl/vm.rst and in `man 5 proc`_. .. _man 5 proc: http://man7.org/linux/man-pages/man5/proc.5.html Linux memory management has its own jargon and if you are not yet -familiar with it, consider reading -:ref:`Documentation/admin-guide/mm/concepts.rst <mm_concepts>`. +familiar with it, consider reading Documentation/admin-guide/mm/concepts.rst. Here we document in detail how to interact with various mechanisms in the Linux memory management. diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst index fb6ba2002a4b..eed51a910c94 100644 --- a/Documentation/admin-guide/mm/ksm.rst +++ b/Documentation/admin-guide/mm/ksm.rst @@ -1,5 +1,3 @@ -.. _admin_guide_ksm: - ======================= Kernel Samepage Merging ======================= diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst index a3c9e8ad8fa0..1b02fe5807cc 100644 --- a/Documentation/admin-guide/mm/memory-hotplug.rst +++ b/Documentation/admin-guide/mm/memory-hotplug.rst @@ -1,5 +1,3 @@ -.. _admin_guide_memory_hotplug: - ================== Memory Hot(Un)Plug ================== diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst index 5a6afecbb0d0..46515ad2337f 100644 --- a/Documentation/admin-guide/mm/numa_memory_policy.rst +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst @@ -1,5 +1,3 @@ -.. _numa_memory_policy: - ================== NUMA Memory Policy ================== @@ -246,7 +244,7 @@ MPOL_INTERLEAVED interleaved system default policy works in this mode. MPOL_PREFERRED_MANY - This mode specifices that the allocation should be preferrably + This mode specifies that the allocation should be preferably satisfied from the nodemask specified in the policy. If there is a memory pressure on all nodes in the nodemask, the allocation can fall back to all existing numa nodes. This is effectively @@ -360,7 +358,7 @@ and NUMA nodes. "Usage" here means one of the following: 2) examination of the policy to determine the policy mode and associated node or node lists, if any, for page allocation. This is considered a "hot path". Note that for MPOL_BIND, the "usage" extends across the entire - allocation process, which may sleep during page reclaimation, because the + allocation process, which may sleep during page reclamation, because the BIND policy nodemask is used, by reference, to filter ineligible nodes. We can avoid taking an extra reference during the usages listed above as diff --git a/Documentation/admin-guide/mm/numaperf.rst b/Documentation/admin-guide/mm/numaperf.rst index 166697325947..24e63e740420 100644 --- a/Documentation/admin-guide/mm/numaperf.rst +++ b/Documentation/admin-guide/mm/numaperf.rst @@ -1,5 +1,3 @@ -.. _numaperf: - ============= NUMA Locality ============= diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst index 6e2e416af783..1a22674ab18e 100644 --- a/Documentation/admin-guide/mm/pagemap.rst +++ b/Documentation/admin-guide/mm/pagemap.rst @@ -1,5 +1,3 @@ -.. _pagemap: - ============================= Examining Process Page Tables ============================= @@ -19,10 +17,10 @@ There are four components to pagemap: * Bits 0-4 swap type if swapped * Bits 5-54 swap offset if swapped * Bit 55 pte is soft-dirty (see - :ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`) + Documentation/admin-guide/mm/soft-dirty.rst) * Bit 56 page exclusively mapped (since 4.2) * Bit 57 pte is uffd-wp write-protected (since 5.13) (see - :ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`) + Documentation/admin-guide/mm/userfaultfd.rst) * Bits 58-60 zero * Bit 61 page is file-page or shared-anon (since 3.5) * Bit 62 page swapped @@ -105,8 +103,7 @@ Short descriptions to the page flags A compound page with order N consists of 2^N physically contiguous pages. A compound page with order 2 takes the form of "HTTT", where H donates its head page and T donates its tail page(s). The major consumers of compound - pages are hugeTLB pages - (:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`), + pages are hugeTLB pages (Documentation/admin-guide/mm/hugetlbpage.rst), the SLUB etc. memory allocators and various device drivers. However in this interface, only huge/giga pages are made visible to end users. @@ -128,7 +125,7 @@ Short descriptions to the page flags Zero page for pfn_zero or huge_zero page. 25 - IDLE The page has not been accessed since it was marked idle (see - :ref:`Documentation/admin-guide/mm/idle_page_tracking.rst <idle_page_tracking>`). + Documentation/admin-guide/mm/idle_page_tracking.rst). Note that this flag may be stale in case the page was accessed via a PTE. To make sure the flag is up-to-date one has to read ``/sys/kernel/mm/page_idle/bitmap`` first. diff --git a/Documentation/admin-guide/mm/shrinker_debugfs.rst b/Documentation/admin-guide/mm/shrinker_debugfs.rst index 3887f0b294fe..c582033bd113 100644 --- a/Documentation/admin-guide/mm/shrinker_debugfs.rst +++ b/Documentation/admin-guide/mm/shrinker_debugfs.rst @@ -1,5 +1,3 @@ -.. _shrinker_debugfs: - ========================== Shrinker Debugfs Interface ========================== diff --git a/Documentation/admin-guide/mm/soft-dirty.rst b/Documentation/admin-guide/mm/soft-dirty.rst index cb0cfd6672fa..aeea936caa44 100644 --- a/Documentation/admin-guide/mm/soft-dirty.rst +++ b/Documentation/admin-guide/mm/soft-dirty.rst @@ -1,5 +1,3 @@ -.. _soft_dirty: - =============== Soft-Dirty PTEs =============== diff --git a/Documentation/admin-guide/mm/swap_numa.rst b/Documentation/admin-guide/mm/swap_numa.rst index e0466f2db8fa..2e630627bcee 100644 --- a/Documentation/admin-guide/mm/swap_numa.rst +++ b/Documentation/admin-guide/mm/swap_numa.rst @@ -1,5 +1,3 @@ -.. _swap_numa: - =========================================== Automatically bind swap device to numa node =========================================== diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 8ee78ec232eb..b0cc8243e093 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -1,5 +1,3 @@ -.. _admin_guide_transhuge: - ============================ Transparent Hugepage Support ============================ diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 83f31919ebb3..7dc823b56ca4 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -1,5 +1,3 @@ -.. _userfaultfd: - =========== Userfaultfd =========== diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst index 6dd74a18268b..c5c2c7dbb155 100644 --- a/Documentation/admin-guide/mm/zswap.rst +++ b/Documentation/admin-guide/mm/zswap.rst @@ -1,5 +1,3 @@ -.. _zswap: - ===== zswap ===== diff --git a/Documentation/admin-guide/perf/hns3-pmu.rst b/Documentation/admin-guide/perf/hns3-pmu.rst index 578407e487d6..75a40846d47f 100644 --- a/Documentation/admin-guide/perf/hns3-pmu.rst +++ b/Documentation/admin-guide/perf/hns3-pmu.rst @@ -53,7 +53,7 @@ two events have same value of bits 0~15 of config, that means they are event pair. And the bit 16 of config indicates getting counter 0 or counter 1 of hardware event. -After getting two values of event pair in usersapce, the formula of +After getting two values of event pair in userspace, the formula of computation to calculate real performance data is::: counter 0 / counter 1 diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst index d143e72cf93e..6e5298b521b1 100644 --- a/Documentation/admin-guide/pm/amd-pstate.rst +++ b/Documentation/admin-guide/pm/amd-pstate.rst @@ -473,7 +473,7 @@ Unit Tests for amd-pstate * We can introduce more functional or performance tests to align the result together, it will benefit power and performance scale optimization. -1. Test case decriptions +1. Test case descriptions 1). Basic tests diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst index d5043cd8d2f5..bf13ad25a32f 100644 --- a/Documentation/admin-guide/pm/intel_pstate.rst +++ b/Documentation/admin-guide/pm/intel_pstate.rst @@ -712,7 +712,7 @@ it works in the `active mode <Active Mode_>`_. The following sequence of shell commands can be used to enable them and see their output (if the kernel is generally configured to support event tracing):: - # cd /sys/kernel/debug/tracing/ + # cd /sys/kernel/tracing/ # echo 1 > events/power/pstate_sample/enable # echo 1 > events/power/cpu_frequency/enable # cat trace @@ -732,7 +732,7 @@ The ``ftrace`` interface can be used for low-level diagnostics of P-state is called, the ``ftrace`` filter can be set to :c:func:`intel_pstate_set_pstate`:: - # cd /sys/kernel/debug/tracing/ + # cd /sys/kernel/tracing/ # cat available_filter_functions | grep -i pstate intel_pstate_set_pstate intel_pstate_cpu_init diff --git a/Documentation/admin-guide/spkguide.txt b/Documentation/admin-guide/spkguide.txt index 1265c1eab31c..74ea7f391942 100644 --- a/Documentation/admin-guide/spkguide.txt +++ b/Documentation/admin-guide/spkguide.txt @@ -1105,8 +1105,8 @@ speakup load Alternatively, you can add the above line to your file ~/.bashrc or ~/.bash_profile. -If your system administrator ran himself the script, all the users will be able -to change from English to the language choosed by root and do directly +If your system administrator himself ran the script, all the users will be able +to change from English to the language chosen by root and do directly speakupconf load (or add this to the ~/.bashrc or ~/.bash_profile file). If there are several languages to handle, the administrator (or every user) will have to run the first steps until speakupconf diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst index 6394f5dc2303..466c560b0c30 100644 --- a/Documentation/admin-guide/sysctl/net.rst +++ b/Documentation/admin-guide/sysctl/net.rst @@ -215,6 +215,12 @@ rmem_max The maximum receive socket buffer size in bytes. +rps_default_mask +---------------- + +The default RPS CPU mask used on newly created network devices. An empty +mask means RPS disabled by default. + tstamp_allow_data ----------------- Allow processes to receive tx timestamps looped together with the original diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 988f6a4c8084..45ba1f4dc004 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -356,7 +356,7 @@ The lowmem_reserve_ratio is an array. You can see them by reading this file:: But, these values are not used directly. The kernel calculates # of protection pages for each zones from them. These are shown as array of protection pages -in /proc/zoneinfo like followings. (This is an example of x86-64 box). +in /proc/zoneinfo like the following. (This is an example of x86-64 box). Each zone has an array of protection pages like this:: Node 0, zone DMA @@ -433,7 +433,7 @@ a 2bit error in a memory module) is detected in the background by hardware that cannot be handled by the kernel. In some cases (like the page still having a valid copy on disk) the kernel will handle the failure transparently without affecting any applications. But if there is -no other uptodate copy of the data it will kill to prevent any data +no other up-to-date copy of the data it will kill to prevent any data corruptions from propagating. 1: Kill all processes that have the corrupted and not reloadable page mapped diff --git a/Documentation/admin-guide/sysrq.rst b/Documentation/admin-guide/sysrq.rst index 0a178ef0111d..51906e47327b 100644 --- a/Documentation/admin-guide/sysrq.rst +++ b/Documentation/admin-guide/sysrq.rst @@ -138,7 +138,7 @@ Command Function ``v`` Forcefully restores framebuffer console ``v`` Causes ETM buffer dump [ARM-specific] -``w`` Dumps tasks that are in uninterruptable (blocked) state. +``w`` Dumps tasks that are in uninterruptible (blocked) state. ``x`` Used by xmon interface on ppc/powerpc platforms. Show global PMU Registers on sparc64. diff --git a/Documentation/admin-guide/thermal/intel_powerclamp.rst b/Documentation/admin-guide/thermal/intel_powerclamp.rst index 3ce96043af17..08509b978af4 100644 --- a/Documentation/admin-guide/thermal/intel_powerclamp.rst +++ b/Documentation/admin-guide/thermal/intel_powerclamp.rst @@ -87,7 +87,7 @@ migrated, unless the CPU is taken offline. In this case, threads belong to the offlined CPUs will be terminated immediately. Running as SCHED_FIFO and relatively high priority, also allows such -scheme to work for both preemptable and non-preemptable kernels. +scheme to work for both preemptible and non-preemptible kernels. Alignment of idle time around jiffies ensures scalability for HZ values. This effect can be better visualized using a Perf timechart. The following diagram shows the behavior of kernel thread diff --git a/Documentation/admin-guide/workload-tracing.rst b/Documentation/admin-guide/workload-tracing.rst new file mode 100644 index 000000000000..b2e254ec8ee8 --- /dev/null +++ b/Documentation/admin-guide/workload-tracing.rst @@ -0,0 +1,606 @@ +.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0) + +====================================================== +Discovering Linux kernel subsystems used by a workload +====================================================== + +:Authors: - Shuah Khan <skhan@linuxfoundation.org> + - Shefali Sharma <sshefali021@gmail.com> +:maintained-by: Shuah Khan <skhan@linuxfoundation.org> + +Key Points +========== + + * Understanding system resources necessary to build and run a workload + is important. + * Linux tracing and strace can be used to discover the system resources + in use by a workload. The completeness of the system usage information + depends on the completeness of coverage of a workload. + * Performance and security of the operating system can be analyzed with + the help of tools such as: + `perf <https://man7.org/linux/man-pages/man1/perf.1.html>`_, + `stress-ng <https://www.mankier.com/1/stress-ng>`_, + `paxtest <https://github.com/opntr/paxtest-freebsd>`_. + * Once we discover and understand the workload needs, we can focus on them + to avoid regressions and use it to evaluate safety considerations. + +Methodology +=========== + +`strace <https://man7.org/linux/man-pages/man1/strace.1.html>`_ is a +diagnostic, instructional, and debugging tool and can be used to discover +the system resources in use by a workload. Once we discover and understand +the workload needs, we can focus on them to avoid regressions and use it +to evaluate safety considerations. We use strace tool to trace workloads. + +This method of tracing using strace tells us the system calls invoked by +the workload and doesn't include all the system calls that can be invoked +by it. In addition, this tracing method tells us just the code paths within +these system calls that are invoked. As an example, if a workload opens a +file and reads from it successfully, then the success path is the one that +is traced. Any error paths in that system call will not be traced. If there +is a workload that provides full coverage of a workload then the method +outlined here will trace and find all possible code paths. The completeness +of the system usage information depends on the completeness of coverage of a +workload. + +The goal is tracing a workload on a system running a default kernel without +requiring custom kernel installs. + +How do we gather fine-grained system information? +================================================= + +strace tool can be used to trace system calls made by a process and signals +it receives. System calls are the fundamental interface between an +application and the operating system kernel. They enable a program to +request services from the kernel. For instance, the open() system call in +Linux is used to provide access to a file in the file system. strace enables +us to track all the system calls made by an application. It lists all the +system calls made by a process and their resulting output. + +You can generate profiling data combining strace and perf record tools to +record the events and information associated with a process. This provides +insight into the process. "perf annotate" tool generates the statistics of +each instruction of the program. This document goes over the details of how +to gather fine-grained information on a workload's usage of system resources. + +We used strace to trace the perf, stress-ng, paxtest workloads to illustrate +our methodology to discover resources used by a workload. This process can +be applied to trace other workloads. + +Getting the system ready for tracing +==================================== + +Before we can get started we will show you how to get your system ready. +We assume that you have a Linux distribution running on a physical system +or a virtual machine. Most distributions will include strace command. Let’s +install other tools that aren’t usually included to build Linux kernel. +Please note that the following works on Debian based distributions. You +might have to find equivalent packages on other Linux distributions. + +Install tools to build Linux kernel and tools in kernel repository. +scripts/ver_linux is a good way to check if your system already has +the necessary tools:: + + sudo apt-get build-essentials flex bison yacc + sudo apt install libelf-dev systemtap-sdt-dev libaudit-dev libslang2-dev libperl-dev libdw-dev + +cscope is a good tool to browse kernel sources. Let's install it now:: + + sudo apt-get install cscope + +Install stress-ng and paxtest:: + + apt-get install stress-ng + apt-get install paxtest + +Workload overview +================= + +As mentioned earlier, we used strace to trace perf bench, stress-ng and +paxtest workloads to show how to analyze a workload and identify Linux +subsystems used by these workloads. Let's start with an overview of these +three workloads to get a better understanding of what they do and how to +use them. + +perf bench (all) workload +------------------------- + +The perf bench command contains multiple multi-threaded microkernel +benchmarks for executing different subsystems in the Linux kernel and +system calls. This allows us to easily measure the impact of changes, +which can help mitigate performance regressions. It also acts as a common +benchmarking framework, enabling developers to easily create test cases, +integrate transparently, and use performance-rich tooling subsystems. + +Stress-ng netdev stressor workload +---------------------------------- + +stress-ng is used for performing stress testing on the kernel. It allows +you to exercise various physical subsystems of the computer, as well as +interfaces of the OS kernel, using "stressor-s". They are available for +CPU, CPU cache, devices, I/O, interrupts, file system, memory, network, +operating system, pipelines, schedulers, and virtual machines. Please refer +to the `stress-ng man-page <https://www.mankier.com/1/stress-ng>`_ to +find the description of all the available stressor-s. The netdev stressor +starts specified number (N) of workers that exercise various netdevice +ioctl commands across all the available network devices. + +paxtest kiddie workload +----------------------- + +paxtest is a program that tests buffer overflows in the kernel. It tests +kernel enforcements over memory usage. Generally, execution in some memory +segments makes buffer overflows possible. It runs a set of programs that +attempt to subvert memory usage. It is used as a regression test suite for +PaX, but might be useful to test other memory protection patches for the +kernel. We used paxtest kiddie mode which looks for simple vulnerabilities. + +What is strace and how do we use it? +==================================== + +As mentioned earlier, strace which is a useful diagnostic, instructional, +and debugging tool and can be used to discover the system resources in use +by a workload. It can be used: + + * To see how a process interacts with the kernel. + * To see why a process is failing or hanging. + * For reverse engineering a process. + * To find the files on which a program depends. + * For analyzing the performance of an application. + * For troubleshooting various problems related to the operating system. + +In addition, strace can generate run-time statistics on times, calls, and +errors for each system call and report a summary when program exits, +suppressing the regular output. This attempts to show system time (CPU time +spent running in the kernel) independent of wall clock time. We plan to use +these features to get information on workload system usage. + +strace command supports basic, verbose, and stats modes. strace command when +run in verbose mode gives more detailed information about the system calls +invoked by a process. + +Running strace -c generates a report of the percentage of time spent in each +system call, the total time in seconds, the microseconds per call, the total +number of calls, the count of each system call that has failed with an error +and the type of system call made. + + * Usage: strace <command we want to trace> + * Verbose mode usage: strace -v <command> + * Gather statistics: strace -c <command> + +We used the “-c” option to gather fine-grained run-time statistics in use +by three workloads we have chose for this analysis. + + * perf + * stress-ng + * paxtest + +What is cscope and how do we use it? +==================================== + +Now let’s look at `cscope <https://cscope.sourceforge.net/>`_, a command +line tool for browsing C, C++ or Java code-bases. We can use it to find +all the references to a symbol, global definitions, functions called by a +function, functions calling a function, text strings, regular expression +patterns, files including a file. + +We can use cscope to find which system call belongs to which subsystem. +This way we can find the kernel subsystems used by a process when it is +executed. + +Let’s checkout the latest Linux repository and build cscope database:: + + git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux + cd linux + cscope -R -p10 # builds cscope.out database before starting browse session + cscope -d -p10 # starts browse session on cscope.out database + +Note: Run "cscope -R -p10" to build the database and c"scope -d -p10" to +enter into the browsing session. cscope by default cscope.out database. +To get out of this mode press ctrl+d. -p option is used to specify the +number of file path components to display. -p10 is optimal for browsing +kernel sources. + +What is perf and how do we use it? +================================== + +Perf is an analysis tool based on Linux 2.6+ systems, which abstracts the +CPU hardware difference in performance measurement in Linux, and provides +a simple command line interface. Perf is based on the perf_events interface +exported by the kernel. It is very useful for profiling the system and +finding performance bottlenecks in an application. + +If you haven't already checked out the Linux mainline repository, you can do +so and then build kernel and perf tool:: + + git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux + cd linux + make -j3 all + cd tools/perf + make + +Note: The perf command can be built without building the kernel in the +repository and can be run on older kernels. However matching the kernel +and perf revisions gives more accurate information on the subsystem usage. + +We used "perf stat" and "perf bench" options. For a detailed information on +the perf tool, run "perf -h". + +perf stat +--------- +The perf stat command generates a report of various hardware and software +events. It does so with the help of hardware counter registers found in +modern CPUs that keep the count of these activities. "perf stat cal" shows +stats for cal command. + +Perf bench +---------- +The perf bench command contains multiple multi-threaded microkernel +benchmarks for executing different subsystems in the Linux kernel and +system calls. This allows us to easily measure the impact of changes, +which can help mitigate performance regressions. It also acts as a common +benchmarking framework, enabling developers to easily create test cases, +integrate transparently, and use performance-rich tooling. + +"perf bench all" command runs the following benchmarks: + + * sched/messaging + * sched/pipe + * syscall/basic + * mem/memcpy + * mem/memset + +What is stress-ng and how do we use it? +======================================= + +As mentioned earlier, stress-ng is used for performing stress testing on +the kernel. It allows you to exercise various physical subsystems of the +computer, as well as interfaces of the OS kernel, using stressor-s. They +are available for CPU, CPU cache, devices, I/O, interrupts, file system, +memory, network, operating system, pipelines, schedulers, and virtual +machines. + +The netdev stressor starts N workers that exercise various netdevice ioctl +commands across all the available network devices. The following ioctls are +exercised: + + * SIOCGIFCONF, SIOCGIFINDEX, SIOCGIFNAME, SIOCGIFFLAGS + * SIOCGIFADDR, SIOCGIFNETMASK, SIOCGIFMETRIC, SIOCGIFMTU + * SIOCGIFHWADDR, SIOCGIFMAP, SIOCGIFTXQLEN + +The following command runs the stressor:: + + stress-ng --netdev 1 -t 60 --metrics command. + +We can use the perf record command to record the events and information +associated with a process. This command records the profiling data in the +perf.data file in the same directory. + +Using the following commands you can record the events associated with the +netdev stressor, view the generated report perf.data and annotate the to +view the statistics of each instruction of the program:: + + perf record stress-ng --netdev 1 -t 60 --metrics command. + perf report + perf annotate + +What is paxtest and how do we use it? +===================================== + +paxtest is a program that tests buffer overflows in the kernel. It tests +kernel enforcements over memory usage. Generally, execution in some memory +segments makes buffer overflows possible. It runs a set of programs that +attempt to subvert memory usage. It is used as a regression test suite for +PaX, and will be useful to test other memory protection patches for the +kernel. + +paxtest provides kiddie and blackhat modes. The paxtest kiddie mode runs +in normal mode, whereas the blackhat mode tries to get around the protection +of the kernel testing for vulnerabilities. We focus on the kiddie mode here +and combine "paxtest kiddie" run with "perf record" to collect CPU stack +traces for the paxtest kiddie run to see which function is calling other +functions in the performance profile. Then the "dwarf" (DWARF's Call Frame +Information) mode can be used to unwind the stack. + +The following command can be used to view resulting report in call-graph +format:: + + perf record --call-graph dwarf paxtest kiddie + perf report --stdio + +Tracing workloads +================= + +Now that we understand the workloads, let's start tracing them. + +Tracing perf bench all workload +------------------------------- + +Run the following command to trace perf bench all workload:: + + strace -c perf bench all + +**System Calls made by the workload** + +The below table shows the system calls invoked by the workload, number of +times each system call is invoked, and the corresponding Linux subsystem. + ++-------------------+-----------+-----------------+-------------------------+ +| System Call | # calls | Linux Subsystem | System Call (API) | ++===================+===========+=================+=========================+ +| getppid | 10000001 | Process Mgmt | sys_getpid() | ++-------------------+-----------+-----------------+-------------------------+ +| clone | 1077 | Process Mgmt. | sys_clone() | ++-------------------+-----------+-----------------+-------------------------+ +| prctl | 23 | Process Mgmt. | sys_prctl() | ++-------------------+-----------+-----------------+-------------------------+ +| prlimit64 | 7 | Process Mgmt. | sys_prlimit64() | ++-------------------+-----------+-----------------+-------------------------+ +| getpid | 10 | Process Mgmt. | sys_getpid() | ++-------------------+-----------+-----------------+-------------------------+ +| uname | 3 | Process Mgmt. | sys_uname() | ++-------------------+-----------+-----------------+-------------------------+ +| sysinfo | 1 | Process Mgmt. | sys_sysinfo() | ++-------------------+-----------+-----------------+-------------------------+ +| getuid | 1 | Process Mgmt. | sys_getuid() | ++-------------------+-----------+-----------------+-------------------------+ +| getgid | 1 | Process Mgmt. | sys_getgid() | ++-------------------+-----------+-----------------+-------------------------+ +| geteuid | 1 | Process Mgmt. | sys_geteuid() | ++-------------------+-----------+-----------------+-------------------------+ +| getegid | 1 | Process Mgmt. | sys_getegid | ++-------------------+-----------+-----------------+-------------------------+ +| close | 49951 | Filesystem | sys_close() | ++-------------------+-----------+-----------------+-------------------------+ +| pipe | 604 | Filesystem | sys_pipe() | ++-------------------+-----------+-----------------+-------------------------+ +| openat | 48560 | Filesystem | sys_opennat() | ++-------------------+-----------+-----------------+-------------------------+ +| fstat | 8338 | Filesystem | sys_fstat() | ++-------------------+-----------+-----------------+-------------------------+ +| stat | 1573 | Filesystem | sys_stat() | ++-------------------+-----------+-----------------+-------------------------+ +| pread64 | 9646 | Filesystem | sys_pread64() | ++-------------------+-----------+-----------------+-------------------------+ +| getdents64 | 1873 | Filesystem | sys_getdents64() | ++-------------------+-----------+-----------------+-------------------------+ +| access | 3 | Filesystem | sys_access() | ++-------------------+-----------+-----------------+-------------------------+ +| lstat | 1880 | Filesystem | sys_lstat() | ++-------------------+-----------+-----------------+-------------------------+ +| lseek | 6 | Filesystem | sys_lseek() | ++-------------------+-----------+-----------------+-------------------------+ +| ioctl | 3 | Filesystem | sys_ioctl() | ++-------------------+-----------+-----------------+-------------------------+ +| dup2 | 1 | Filesystem | sys_dup2() | ++-------------------+-----------+-----------------+-------------------------+ +| execve | 2 | Filesystem | sys_execve() | ++-------------------+-----------+-----------------+-------------------------+ +| fcntl | 8779 | Filesystem | sys_fcntl() | ++-------------------+-----------+-----------------+-------------------------+ +| statfs | 1 | Filesystem | sys_statfs() | ++-------------------+-----------+-----------------+-------------------------+ +| epoll_create | 2 | Filesystem | sys_epoll_create() | ++-------------------+-----------+-----------------+-------------------------+ +| epoll_ctl | 64 | Filesystem | sys_epoll_ctl() | ++-------------------+-----------+-----------------+-------------------------+ +| newfstatat | 8318 | Filesystem | sys_newfstatat() | ++-------------------+-----------+-----------------+-------------------------+ +| eventfd2 | 192 | Filesystem | sys_eventfd2() | ++-------------------+-----------+-----------------+-------------------------+ +| mmap | 243 | Memory Mgmt. | sys_mmap() | ++-------------------+-----------+-----------------+-------------------------+ +| mprotect | 32 | Memory Mgmt. | sys_mprotect() | ++-------------------+-----------+-----------------+-------------------------+ +| brk | 21 | Memory Mgmt. | sys_brk() | ++-------------------+-----------+-----------------+-------------------------+ +| munmap | 128 | Memory Mgmt. | sys_munmap() | ++-------------------+-----------+-----------------+-------------------------+ +| set_mempolicy | 156 | Memory Mgmt. | sys_set_mempolicy() | ++-------------------+-----------+-----------------+-------------------------+ +| set_tid_address | 1 | Process Mgmt. | sys_set_tid_address() | ++-------------------+-----------+-----------------+-------------------------+ +| set_robust_list | 1 | Futex | sys_set_robust_list() | ++-------------------+-----------+-----------------+-------------------------+ +| futex | 341 | Futex | sys_futex() | ++-------------------+-----------+-----------------+-------------------------+ +| sched_getaffinity | 79 | Scheduler | sys_sched_getaffinity() | ++-------------------+-----------+-----------------+-------------------------+ +| sched_setaffinity | 223 | Scheduler | sys_sched_setaffinity() | ++-------------------+-----------+-----------------+-------------------------+ +| socketpair | 202 | Network | sys_socketpair() | ++-------------------+-----------+-----------------+-------------------------+ +| rt_sigprocmask | 21 | Signal | sys_rt_sigprocmask() | ++-------------------+-----------+-----------------+-------------------------+ +| rt_sigaction | 36 | Signal | sys_rt_sigaction() | ++-------------------+-----------+-----------------+-------------------------+ +| rt_sigreturn | 2 | Signal | sys_rt_sigreturn() | ++-------------------+-----------+-----------------+-------------------------+ +| wait4 | 889 | Time | sys_wait4() | ++-------------------+-----------+-----------------+-------------------------+ +| clock_nanosleep | 37 | Time | sys_clock_nanosleep() | ++-------------------+-----------+-----------------+-------------------------+ +| capget | 4 | Capability | sys_capget() | ++-------------------+-----------+-----------------+-------------------------+ + +Tracing stress-ng netdev stressor workload +------------------------------------------ + +Run the following command to trace stress-ng netdev stressor workload:: + + strace -c stress-ng --netdev 1 -t 60 --metrics + +**System Calls made by the workload** + +The below table shows the system calls invoked by the workload, number of +times each system call is invoked, and the corresponding Linux subsystem. + ++-------------------+-----------+-----------------+-------------------------+ +| System Call | # calls | Linux Subsystem | System Call (API) | ++===================+===========+=================+=========================+ +| openat | 74 | Filesystem | sys_openat() | ++-------------------+-----------+-----------------+-------------------------+ +| close | 75 | Filesystem | sys_close() | ++-------------------+-----------+-----------------+-------------------------+ +| read | 58 | Filesystem | sys_read() | ++-------------------+-----------+-----------------+-------------------------+ +| fstat | 20 | Filesystem | sys_fstat() | ++-------------------+-----------+-----------------+-------------------------+ +| flock | 10 | Filesystem | sys_flock() | ++-------------------+-----------+-----------------+-------------------------+ +| write | 7 | Filesystem | sys_write() | ++-------------------+-----------+-----------------+-------------------------+ +| getdents64 | 8 | Filesystem | sys_getdents64() | ++-------------------+-----------+-----------------+-------------------------+ +| pread64 | 8 | Filesystem | sys_pread64() | ++-------------------+-----------+-----------------+-------------------------+ +| lseek | 1 | Filesystem | sys_lseek() | ++-------------------+-----------+-----------------+-------------------------+ +| access | 2 | Filesystem | sys_access() | ++-------------------+-----------+-----------------+-------------------------+ +| getcwd | 1 | Filesystem | sys_getcwd() | ++-------------------+-----------+-----------------+-------------------------+ +| execve | 1 | Filesystem | sys_execve() | ++-------------------+-----------+-----------------+-------------------------+ +| mmap | 61 | Memory Mgmt. | sys_mmap() | ++-------------------+-----------+-----------------+-------------------------+ +| munmap | 3 | Memory Mgmt. | sys_munmap() | ++-------------------+-----------+-----------------+-------------------------+ +| mprotect | 20 | Memory Mgmt. | sys_mprotect() | ++-------------------+-----------+-----------------+-------------------------+ +| mlock | 2 | Memory Mgmt. | sys_mlock() | ++-------------------+-----------+-----------------+-------------------------+ +| brk | 3 | Memory Mgmt. | sys_brk() | ++-------------------+-----------+-----------------+-------------------------+ +| rt_sigaction | 21 | Signal | sys_rt_sigaction() | ++-------------------+-----------+-----------------+-------------------------+ +| rt_sigprocmask | 1 | Signal | sys_rt_sigprocmask() | ++-------------------+-----------+-----------------+-------------------------+ +| sigaltstack | 1 | Signal | sys_sigaltstack() | ++-------------------+-----------+-----------------+-------------------------+ +| rt_sigreturn | 1 | Signal | sys_rt_sigreturn() | ++-------------------+-----------+-----------------+-------------------------+ +| getpid | 8 | Process Mgmt. | sys_getpid() | ++-------------------+-----------+-----------------+-------------------------+ +| prlimit64 | 5 | Process Mgmt. | sys_prlimit64() | ++-------------------+-----------+-----------------+-------------------------+ +| arch_prctl | 2 | Process Mgmt. | sys_arch_prctl() | ++-------------------+-----------+-----------------+-------------------------+ +| sysinfo | 2 | Process Mgmt. | sys_sysinfo() | ++-------------------+-----------+-----------------+-------------------------+ +| getuid | 2 | Process Mgmt. | sys_getuid() | ++-------------------+-----------+-----------------+-------------------------+ +| uname | 1 | Process Mgmt. | sys_uname() | ++-------------------+-----------+-----------------+-------------------------+ +| setpgid | 1 | Process Mgmt. | sys_setpgid() | ++-------------------+-----------+-----------------+-------------------------+ +| getrusage | 1 | Process Mgmt. | sys_getrusage() | ++-------------------+-----------+-----------------+-------------------------+ +| geteuid | 1 | Process Mgmt. | sys_geteuid() | ++-------------------+-----------+-----------------+-------------------------+ +| getppid | 1 | Process Mgmt. | sys_getppid() | ++-------------------+-----------+-----------------+-------------------------+ +| sendto | 3 | Network | sys_sendto() | ++-------------------+-----------+-----------------+-------------------------+ +| connect | 1 | Network | sys_connect() | ++-------------------+-----------+-----------------+-------------------------+ +| socket | 1 | Network | sys_socket() | ++-------------------+-----------+-----------------+-------------------------+ +| clone | 1 | Process Mgmt. | sys_clone() | ++-------------------+-----------+-----------------+-------------------------+ +| set_tid_address | 1 | Process Mgmt. | sys_set_tid_address() | ++-------------------+-----------+-----------------+-------------------------+ +| wait4 | 2 | Time | sys_wait4() | ++-------------------+-----------+-----------------+-------------------------+ +| alarm | 1 | Time | sys_alarm() | ++-------------------+-----------+-----------------+-------------------------+ +| set_robust_list | 1 | Futex | sys_set_robust_list() | ++-------------------+-----------+-----------------+-------------------------+ + +Tracing paxtest kiddie workload +------------------------------- + +Run the following command to trace paxtest kiddie workload:: + + strace -c paxtest kiddie + +**System Calls made by the workload** + +The below table shows the system calls invoked by the workload, number of +times each system call is invoked, and the corresponding Linux subsystem. + ++-------------------+-----------+-----------------+----------------------+ +| System Call | # calls | Linux Subsystem | System Call (API) | ++===================+===========+=================+======================+ +| read | 3 | Filesystem | sys_read() | ++-------------------+-----------+-----------------+----------------------+ +| write | 11 | Filesystem | sys_write() | ++-------------------+-----------+-----------------+----------------------+ +| close | 41 | Filesystem | sys_close() | ++-------------------+-----------+-----------------+----------------------+ +| stat | 24 | Filesystem | sys_stat() | ++-------------------+-----------+-----------------+----------------------+ +| fstat | 2 | Filesystem | sys_fstat() | ++-------------------+-----------+-----------------+----------------------+ +| pread64 | 6 | Filesystem | sys_pread64() | ++-------------------+-----------+-----------------+----------------------+ +| access | 1 | Filesystem | sys_access() | ++-------------------+-----------+-----------------+----------------------+ +| pipe | 1 | Filesystem | sys_pipe() | ++-------------------+-----------+-----------------+----------------------+ +| dup2 | 24 | Filesystem | sys_dup2() | ++-------------------+-----------+-----------------+----------------------+ +| execve | 1 | Filesystem | sys_execve() | ++-------------------+-----------+-----------------+----------------------+ +| fcntl | 26 | Filesystem | sys_fcntl() | ++-------------------+-----------+-----------------+----------------------+ +| openat | 14 | Filesystem | sys_openat() | ++-------------------+-----------+-----------------+----------------------+ +| rt_sigaction | 7 | Signal | sys_rt_sigaction() | ++-------------------+-----------+-----------------+----------------------+ +| rt_sigreturn | 38 | Signal | sys_rt_sigreturn() | ++-------------------+-----------+-----------------+----------------------+ +| clone | 38 | Process Mgmt. | sys_clone() | ++-------------------+-----------+-----------------+----------------------+ +| wait4 | 44 | Time | sys_wait4() | ++-------------------+-----------+-----------------+----------------------+ +| mmap | 7 | Memory Mgmt. | sys_mmap() | ++-------------------+-----------+-----------------+----------------------+ +| mprotect | 3 | Memory Mgmt. | sys_mprotect() | ++-------------------+-----------+-----------------+----------------------+ +| munmap | 1 | Memory Mgmt. | sys_munmap() | ++-------------------+-----------+-----------------+----------------------+ +| brk | 3 | Memory Mgmt. | sys_brk() | ++-------------------+-----------+-----------------+----------------------+ +| getpid | 1 | Process Mgmt. | sys_getpid() | ++-------------------+-----------+-----------------+----------------------+ +| getuid | 1 | Process Mgmt. | sys_getuid() | ++-------------------+-----------+-----------------+----------------------+ +| getgid | 1 | Process Mgmt. | sys_getgid() | ++-------------------+-----------+-----------------+----------------------+ +| geteuid | 2 | Process Mgmt. | sys_geteuid() | ++-------------------+-----------+-----------------+----------------------+ +| getegid | 1 | Process Mgmt. | sys_getegid() | ++-------------------+-----------+-----------------+----------------------+ +| getppid | 1 | Process Mgmt. | sys_getppid() | ++-------------------+-----------+-----------------+----------------------+ +| arch_prctl | 2 | Process Mgmt. | sys_arch_prctl() | ++-------------------+-----------+-----------------+----------------------+ + +Conclusion +========== + +This document is intended to be used as a guide on how to gather fine-grained +information on the resources in use by workloads using strace. + +References +========== + + * `Discovery Linux Kernel Subsystems used by OpenAPS <https://elisa.tech/blog/2022/02/02/discovery-linux-kernel-subsystems-used-by-openaps>`_ + * `ELISA-White-Papers-Discovering Linux kernel subsystems used by a workload <https://github.com/elisa-tech/ELISA-White-Papers/blob/master/Processes/Discovering_Linux_kernel_subsystems_used_by_a_workload.md>`_ + * `strace <https://man7.org/linux/man-pages/man1/strace.1.html>`_ + * `perf <https://man7.org/linux/man-pages/man1/perf.1.html>`_ + * `paxtest README <https://github.com/opntr/paxtest-freebsd/blob/hardenedbsd/0.9.14-hbsd/README>`_ + * `stress-ng <https://www.mankier.com/1/stress-ng>`_ + * `Monitoring and managing system status and performance <https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/index>`_ diff --git a/Documentation/admin-guide/xfs.rst b/Documentation/admin-guide/xfs.rst index 8de008c0c5ad..e2561416391c 100644 --- a/Documentation/admin-guide/xfs.rst +++ b/Documentation/admin-guide/xfs.rst @@ -296,7 +296,7 @@ The following sysctls are available for the XFS filesystem: XFS_ERRLEVEL_LOW: 1 XFS_ERRLEVEL_HIGH: 5 - fs.xfs.panic_mask (Min: 0 Default: 0 Max: 256) + fs.xfs.panic_mask (Min: 0 Default: 0 Max: 511) Causes certain error conditions to call BUG(). Value is a bitmask; OR together the tags which represent errors which should cause panics: |