Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
 Contact HP
HP.com home
HP Tru64 UNIX and TruCluster Server Version 5.1B-5: Patch Summary and Release Notes > Chapter 4 TruCluster Server Patches

Summary of TruCluster Server Software Patches

 

Table of Contents
Content starts here

The following sections provide brief descriptions of the changes delivered in this patch kit and in previous Version 5.1B patch kits for the TruCluster Server software products.

Each patch provides fixes to subsets of the operating system. Subset names (listed in italic font in the following list) consist of three parts; for example, for subset TCRBASE540, the TCR indicates that the subset is part of the TruCluster Server product, the BASE indicates a category, and the 540 indicates that the subset belongs to the Version 5.1B operating system.

New Patches

The patch summaries in this section describe changes to the TruCluster Server software products that are new in this release.

PATCH 28001.00

TCRBASE540

  • Fixes a problem in which a CFS client read operation returns the wrong data due to stale metadata associated with the file frag.

  • Adds a check to prevent the caller from binding to a cluster alias address that the node has not joined.

  • Fixes an infinite loop under certain circumstances in cms_do_mount_rpc().

  • Added option to unset all flags for a service in /etc/clua_services.

  • Corrects a reference count issue in the KGS subsystem.

  • Fixes node panic with ics_unable_to_make_progress: netisrs stalled, though netisr thread was not actually stalled.

  • Provides a fix for a domain panic caused by hung IOs on a busy or faulty disk drive. The panic can happen after all but one path to the disk drive being disabled then re-enabled.

  • Corrects a problem where a 'local open' on a previously opened tape drive results in an erroneous "no such device" message.

  • Provides a fix for a cluster boot-time hang, caused by a fault quorum disk.

  • Fixes multiple issues with RDG(Reliable DataGram) component in a LAN cluster.

  • Fixes an issue with CFS failover subsystem where, under certain domain configurations failover process may hang.

  • Fixes a problem with fuser(8) where usage of the -a option leaves the filesystem incapable of unmounting even if no files or directories on the filesystem are in use.

  • Fixes a problem where, under certain circumstances, a close on socket of type AF_UNIX may result in a system panic.

  • Provides enhancements to the DRD trace framework.

  • Optimize the performance of ics0 interface in a LAN cluster.

  • Fixes an issue with aliasd routing in a cluster.

  • Avoids panic due to bad quorum disk during boot process.

  • Fixes an issue wherein Internode Communication Subsystem panics when it receives messages for an unknown service.

  • Updates volstat utility and kernel to report cluster-wide LSM statistics.

  • Add support in cluster alias to handle socket unlisten.

PATCH 28002.00

TCRMAN540

  • Provides the latest reference pages for sys_attrs_cfs(5), sys_attrs_clubase(5), and sys_attrs_rdg(5).

  • Updates clu_alias.config(4) and exports.aliases(4) reference pages.

  • Updates sys_attrs_icsnet(5) reference page to reflect icsnet_mtu attribute.

  • Updates the following reference pages: clua_services(4), cfsd.conf(4), sys_attrs_ics_ll_tcp(5)

  • Updates the following reference pages: imcs(1), dlm_rd_collect(3), dlm_rd_validate(3), imc_rderrcnt(3), sys_attrs_cms(5), sys_attrs_drd(5), sys_attrs_icsnet(5)

Patches Delivered in Previous Kits

The following TruCluster Server patches were delivered in previous Version 5.1B patch kits. These patches will be installed on your system if you did not install the previous kit.

Patch 27001.00

TCRBASE540

  • Eliminates numerous panics and hung devices by fixing drd so it no longer accesses a device that has a deletion pending or in progress.

  • Fixes an RM simple lock timeout issue that may occur in noisy Memory channel rails.

  • Enhances the error message generated when the clu_bdmgr command cannot access a member boot disk.

  • Fixes a configuration issue found in non-CAM devices and CD_ROM devices.

  • Fixes the cause of potential cluster hangs during some Memory Channel hardware failures that result in an MC rail failover.

  • Fixes the CFS AIO write error path so the I/O completion steps are not repeated.

  • Fixes a flaw in CFS file locking code that causes a "vrele: bad ref count" panic.

  • Fixes the cause of an assertion failure in cfs_vnops.c.

  • Corrects a problem in which the simultaneous booting of multiple nodes results in a panic due to an unknown node in a remote member node list.

  • Corrects a problem in a Memory Channel cluster in which a panic occurs in a booted member when a booting member goes down because of panic/halt/shutdown.

  • Fixes a problem in which a thread enters dio code while an extent map is being refreshed.

  • Fixes a problem of v_numoutput not decremented for aio dio error paths.

  • Removes the cause of a panic that may occur in CFS at boot time if a remote node goes down.

  • Corrects several ICS signal-forwarding issues.

  • Fixes a race between the close system call for a block device file and the recovery process for the file system.

  • Clarifies a usage message seen with the cfsstat command.

  • Corrects a problem in clu_mibs daemon that can cause various eSNMP sub agents, such as pmgrd and os_mibs, to terminate.

  • Fixes a problem to prevent the relocation of UFS read/write file system to the original node.

  • Provides new option to the mountd daemon to specify a port number for mountd to bind to.

  • Corrects a problem in which a DRD event thread may run infinitely while responding for bid server transaction.

  • Fixes and AdvFS domain panic caused by cfsd.

  • Corrects a problem in CAA in which a resource does not fail over when two resources have the same values for the FAILOVER_DELAY and REQUIRED_RESOURCES attributes.

  • Fixes a hang during cluster bootup caused by early reservation conflicts.

  • Provides enhancements to the caa_relocate command.

  • Provides a new command, clu_ping, to determine the status of the interconnects in a stretched cluster environment.

  • Improves CFS client writing to do the following:

    • reduce the logging of ERROR 69 for user disk space quota exhausted.

    • support partial write success.

    • increase the interconnect transfer size for multi-page synchronous writes.

    • prevent read ahead past the end of a file.

  • Helps ensure more accurate block reservation accounting in CFS.

  • Addresses an issue seen on Tru64 UNIX LAN clusters, whereby a booting node may panic with "lock_wait" while spawning threads for cluster interconnect channels.

  • Provides a solution to display a warning message if deleting a particular cluster member would cease NTP services for the rest of the cluster.

  • Improves the routing fail-over mechanism when one or more network interfaces on any cluster member fails.

  • Fixes a "kernel memory fault" panic in cfs_fo_failover_done().

  • Fixes a problem wherein the DRD subsystem may cause a system panic when strategy routines are called from a light weight context (LWC).

  • Fixes display errors in the cfsstat command when using the icschanbps option.

  • Fixes display errors in cfsstat command when using the icschanbps option.

  • Fixes a deadlock issue between cluster nodes because of cfs_async_io_thread running on them.

  • Corrects an erroneous error message displayed by drdmgr.

  • Fixes a cnx_qdisk_thread hang problem.

  • Fixes a memory leak in CFS.

  • Fixes disk I/O hang in DRD.

  • Fixes a hang with disklabel that occurs if a local open fails for the same disk simultaneously.

  • Fixes incorrect CFS token structure warnings.

  • Prevents file inconsistency due to a race between lookup and remove.

  • Provides a new cluster-specific link aggregation distribution algorithm when using LAG in a LAN cluster.

  • Fixes a simple lock timeout panic issue in kch and a possible hang at boot time

  • Prevents an AIO DirectIO to return invalid data while reading a fragged file.

  • Fixes a cluster hang issue during cluster boot-up, when local disk open operations fail while disklabel is in progress.

  • Fixes an error in the DRD subsystem wherein un-initialized disk attributes can cause a system panic.

  • Fixes KMF in rdg_get_completion() routine.

  • Fixes a problem in which a cluster alias subsystem tries to free the mbuf that is already freed by ICS subsystem.

  • Corrects reference counting issues within the DRD subsystem that can prevent the deletion of hwids.

  • Adds a new option, custom_gated, to cluamgr and aliasd.

  • Fixes a deadlock that can happen during failover of global root and var file systems when vfast is enabled on them.

  • Fixes resource leaks seen after a locked device file is revoked.

  • Fixes system panics seen on relocating file systems with locked revoked devices.

  • Fixes a problem with CAA placement policy when host names in "HOSTING MEMBERS" are in uppercase letters.

  • Corrects a problem in which CAA is incorrectly showing the status of network resources on a halted member.

  • Fixes a problem in cfs block reservation code where cfs attempted to release a lock more than once.

  • Introduces a code tracing capability of the aliasd and aliasd_niff daemons to improve troubleshooting.

  • Prevents a race that can occur during the planned relocation of a file system.

  • Improves the reliability of the DRD subsystem when faced with tape devices and tape device failures.

  • Introduces a mechanism to improve reliability for synchronizing cluster alias ID sets among cluster members.

  • Fixes the cause of the following CNX panic in cluster reconfiguration:

    cnx_change_cluster_tx_state: illegal transaction state

  • Fixes an ICS panic issue that occurs early in the boot process.

  • Fixes a problem that causes the cluster alias manager SUITlet to falsely interpret any cluster alias with virtual={t|f} configured as a virtual alias regardless of its actual setting.

  • Corrects problems in which SysMan drdmgr dumps tcl stack when a user tries to manage devices or file systems of a cluster node that is down.

  • Corrects an issue to allow the Device Request Dispatcher, DRD, to retry to get disk attributes when EINPROGRESS is returned from the disk driver.

  • Address issues with “address already in use” messages from klogin and kshell.

  • Corrects a potential security vulnerability in CAA.

  • Fixes a kernel memory fault.

  • Corrects a problem in which the MC-API call imc_ckerrcnt_mr()incorrectly returns an error status, although the functions error count parameter is not increasing.

  • Preserves the error code from an asynchronous write error on a CFS client and returns the error from the close() system call.

  • Fixes a Distributed Lock Manager panic when calling the dlm_get_lkinfo() routine passing an lkid of a lock block that has already been declared dead by the deadlock detection thread.

  • Corrects a problem to allow the use of 255 in the LAN Interconnect IP address.

  • Fixes a CFS client panic during a file system read operation where the server goes down. and the client itself becomes the server and attempts to release the direct I/O token that had already been released.

  • Fixes a forced unmount of nonfailoverable file system (that is, NFS and AutoFS) panic in the case that the initiator is down.

  • Enables a cluster to boot even if the cluster root domain devices are private to different cluster members. Although this is not a recommended configuration, it should not result in an unbootable cluster. Currently, this is with respect to cluster root domains not under LSM control.

  • Corrects a potential data inconsistency caused by a problem in the CFS block reservation code, which calculates incorrectly the amount of space requested and used by direct I/O writes.

  • Resolves a kernel memory fault in m_copym.

  • Fixes a problem with the -b option of caa_report.

  • Fixes a problem with caa_stop -f by allowing the administrator to reset a resource state from UNKNOWN to OFFLINE even if the hosting member is down.

  • Corrects a potential data inconsistency that may occur when a domain is nearly full. Client write requests shipped synchronously to the server will no longer have subsets of pages written asynchronously due to a race with virtual memory.

  • Improves the scaling of IP reassembly code on large SMP machines. NFS servers are especially susceptible when a large number of clients attempt to write at the same time.

  • Helps to close a race where synchronous writes may obtain disk allocations that were promised to cached client writes.

  • Fixes a problem in which CAA might prevent alias based services from properly functioning by binding to one the cluster alias reserved ports.

  • Corrects a problem in a Memory Channel cluster where rebooting a node without performing a hardware reset can crash other members with a RM_AUDIT_ACK_BLOCK panic.

  • Fixes a problem in the Memory Channel driver.

  • Improves the responsiveness of EINPROGRESS handling during the issuing of I/O barriers by removing a possible infinite loop scenario that could occur due to the deletion of a storage device.

  • Fixes a problem that causes a panic with the message "CNX MGR: Invalid configuration for cluster seq disk" during simultaneous booting of cluster nodes.

  • Fixes the panic "CNX MGR: Invalid configuration for cluster seq disk" that occurs during the simultaneous booting of cluster nodes.

  • Fixes a possible race condition between a SCSI reservation conflict and an I/O drain that can result in a hang.

  • Alleviates a condition in which a cluster member takes an extremely long time to boot when using LSM.

  • Fixes a problem that caa_relocate AutoFS does not kill the autofsd daemon.

  • Allows rewrites when the domain is close to out of space.

  • Ensures correct processing in the close() system call.

  • Provides a CAA action script that can be used by a NIS Slave running to help assign a crontab entry to update NIS maps.

  • Fixes a problem in which a cluster member leaves the cluster alias yet continues to respond to it.

  • Corrects a problem that causes applications (including cluamgr) to get a dummy cluster alias reported from the cluaioc_get_nextalias() call. The IP address for this alias is 0.0.0.0.

  • Fixes a problem in which aliasd creates multiple similar virtual subnet static routes in the gated.conf.memberX, thereby causing gated to fail to load.

  • Fixes issues associated with the initialization of the Memory Channel driver.

  • Provides a function to query the status of aliasd.

  • Fixes an IPv6 bind problem in a cluster environment.

  • Fixes multiple disable or enable problems with cluamgr.

  • Fixes a tok_wait hang problem on Sierra Clusters.

  • Adds the ability to change the default interconnect interface name.

  • Corrects several problems in the cluster install and upgrade utilities.

  • Fixes a problem in which an RDG (Reliable DataGram) kernel thread can starve other timeshare threads on a uniprocessor cluster member. In particular, system services such as networking threads can be affected.

  • Fixes minor issues with cfsstat command-line options and return values.

  • Prevents panics seen with cluster server-only (for example, MFS) mounts.

  • Fixes a condition that causes the panic pg_nwriters going negative when ubc_page_release() is called from cfs_getpage().

  • Corrects a problem in the RDG component in which multiple Oracle instances are unable to be properly configured when using RDG over a LAN rather than Memory Channel.

  • Provides a sticky connection feature for a cluster alias.

  • Updates sysconfig to use the cluster interconnect, allowing for a greater SSI collaboration. This will help with changing variables on hung systems, single user systems, and normal running systems.

  • Improves device error processing in drd.

  • Corrects a boot hang problem seen on large-scale Sierra Cluster configurations caused by a missed wake up in the kernel group services code.

  • Alters the behavior of the cluster NFS client with TCP mounts so that when a remote server is down, the cluster NFS client will use nonreserved ports to see if the remote server is up.

  • Introduces a new CFS tunable attribute that may benefit the performance of client reads of clone files under certain circumstances.

  • Addresses an assertion caused by a bad user pointer passed to the kernel via sys_call.

  • Corrects a condition that results in excessive context switching and CPU load due to a heavy use of the cluster alias on large SMP and NUMA machines .

  • Enhances /sbin/advfs/tag2name to print out the name of the associated directory, given the tag of an index file.

  • Increases performance scalability and extends the reliability of the Internode Communications Subsystem in a cluster configured with Memory Channel as the cluster interconnect.

  • Improves detection of possible race conditions during CFS recovery.

  • Adds a cluster panic facility to the kernel.

  • Addresses the following:

    • An issue in which new ICS server daemons and handles are created one at a time each time the low water mark for each is reached, thereby causing a nanny daemon to be called more frequently than it needs to.

    • An issue in which no mechanism exists for the user to adjust the high and low water marks for ICS free handles, which can result in poor performance during rapidly increasing loads.

  • Fixes a problem in which cluster alias connections are not distributed among cluster members according to the defined selection weight.

  • Fixes a memory leak in the cluster alias subsystem.

  • Fixes an issue with ICS (Internode Communication Services) on a NUMA-based system in a cluster.

  • Fixes a problem in the cluster kernel in which a cluster member panics while doing remote I/O over the interconnect.

  • Fixes a hang that occurs when multiple nodes are shutting down simultaneously; fixes a Cluster File System panic that occurs when using raw Asynchronous I/O; and provides additional code to assist in problem diagnosis.

  • Corrects a problem in which a panic displaying the message “error CNX MGR: cnx_comm_error: invalid node state” occurs on a LAN cluster running under load when other members are rebooting.

  • Addresses an error in which caa_register -u produces with no balance data.

  • Addresses a resource inaccessibility issue that can occur if the hosting member crashes during a remote caa_stop operation.

  • Updates the attributes on a directory when files are removed by a cluster node that is not the file system server.

  • Fixes a problem associated with non-SCSI storage.

  • Corrects a potential security vulnerability in the cluster interconnect security configuration that may result in a denial of service (DoS) on systems running TruCluster Server software.

  • Causes UDP datagrams that do not come from the correct port to be discarded.

  • Addresses a node hang that occurs during the testing of Memory Channel cable pulls. A cluster member may hang when a Memory Channel cable is pulled, the node is taken down, the cable is plugged back in, and the node is rebooted.

  • Fixes a cluster deadlock that may occur during a failover and recovery when direct I/O is in use.

  • Fixes a race condition in the Device Request Dispatcher.

  • Corrects a condition that can cause excessive FIDS_LOCK contention when a large number of files are using system-based file locking.

  • Fixes a problem with cfsd core dumping shortly after startup if it is enabled or shortly after enabling it. The problem fixed by this patch is only seen after applying a recent dsfmgr patch.

  • Corrects diagnostic code that could cause a panic during a kernel boot.

  • Eliminates a performance problem when a node acting as CFS server of an NFS client file system is write-appending to an external NFS server.

  • Prevents a panic when an AutoFS file system is auto-unmounted.

  • Corrects the cause of a cluster member panic with kernel memory fault when running nmap or nessus targeting at the cluster alias.

  • Resolves a problem in which the caa_register command allows a CAA resource to be registered even when its profile contains an unknown attribute. This fix prevents the caa_register command from registering a resource with an unknown attribute and will cause it to return an error message that includes the unknown attribute information.

  • Fixes a condition in which uptimes greater than 100 percent are reported for resources by caa_report.

  • Fixes a problem in which resources that never started have an ending timestamp.

  • Fixes a problem in which CAA dumps core when trying to deal with cluster member ID 63.

  • Fixes an problem where access to the quorum disk can be lost if the quorum disk is on a parallel SCSI bus and multiple bus resets are encountered.

  • Relieves pressure on the CMS global DLM lock by allowing AutoFS auto unmounts to back off.

  • Fixes cfsmgr to properly return a failure status when a relocation request has failed.

  • Fixes a race condition where stale name cache entries allow file access after file unlink.

  • Corrects a problem in which cfsd will terminate prematurely and core dump when a node leaves the cluster very shortly after joining the cluster.

  • Fixes a timing window during asynchronous reads on a CFS client.

  • Fixes a panic that may occur during an unmount.

  • Corrects several problems with various installation commands and utilities.

  • Fixes a memory leak in the clu_get_info interface.

  • Enhances cluster file system performance when using file locks to coordinate file access.

  • Causes the correct error message for freezefs -q to be displayed on a non-AdvFS file system.

  • Fixes a problem in one of the shipped rc scripts whereby Oracle fails during startup on a clustered system.

  • Addresses a panic that occurs on a booting node.

  • Fixes a coding error, a memory leak, and a deinitialization problem in the cluster interconnect networking layer.

  • Fixes a problem in the Device Request Dispatcher.

  • Provides clu_upgrade enhancements.

  • Increases performance by reducing the lock miss rate in the ics_mct_llnode_info_lock.

  • Addresses the panic “Assert Failed: (cp-c_flags & CDIRECTIO) = 0” in the cluster file system.

  • Corrects a problem where a CFS lookup for a mount could leave stale state behind that could adversely affect subsequent NFS operations.

  • Fixes an internal problem in the kernel's AdvFS, UFS, and NFS file systems where extended attributes with extremely long names, greater than 247 characters, could not be set on files. The new limit is 254 + a null string terminator.

  • Corrects problems with LSM disks and the cluster quorum tools. When a member having LSM disks local to it is down, the quorum tools fail to update quorum. This causes other cluster commands to fail.

  • Corrects a problem in which mounting on a directory in a clone fileset fails with the message "Device Busy."

  • Prevents a Kernel Memory Fault Panic in some cases where AdvFS administration commands are performed on a mounted fileset of an inaccessible AdvFS domain.

  • Fixes a problem in which CAAD might dump core due to a race condition when multiple events to which it subscribes arrive simultaneously.

  • Improves the fragment gathering mechanism to boost performance.

  • Fixes panic problem when attempting to unload clua.mod.

  • Fixes a condition that causes a boot up panic when ippport_userreserved is 1000 or less.

  • Fixes a cfsmgr core dump when passing the incorrect number of arguments upon force unmounting a served file system.

  • Fixes a problem in which a CFS client for a file with a hole preceding a frag might drop the frag.

  • Optimizes cluster file system lock recovery, potentially speeding up the time required to failover a file system to a new server.

  • Corrects a condition in which superfluous "rm_event, index too big" messages may appear on system consoles.

  • Addresses a panic that may occur when a node is joining the cluster. A node recognizing the joining node panics while it is trying to establish a preboot channel connection with the peer node, causing the following message to be displayed on the console or in /var/adm/messages:

    panic (cpu x): ics_mct: rx conn 3

  • Corrects the LSM partition types in the CNX partition of boot disk for the clu_partmgr utility.

  • Modifies the aliasd daemon to include interface aliases when determining whether or not an interface is appropriate for use as the ARP address for a cluster alias when selecting the proxy ARP master.

  • Fixes the potential of multiple assert_wait and timeout panics due to kernel EVM threads not properly preempting.

  • Fixes a problem in the Memory Channel driver.

  • Corrects a condition that occurs during a rolling upgrade in which the clu_ifaccess script removes the tag file for /etc/ifaccess and sends out a warning message.

  • Forces a reboot to resolve communications problems in a two node cluster rather than hang.

  • Corrects lock acquires after mpsleep.

  • Causes a rebuild delay remainder to be minimally second.

  • Allows the cluster to provide new functions to the dupatch command before a member is rolled, and also provides a mechanism for backing out the added functions.

  • Addresses a memory leak in the Memory Channel transport layer.

  • Fixes a problem in which a system may panic with a kernel memory fault when a device that is being opened by one program is being deleted with the hwmgr utility.

  • Fixes a condition that causes a panic when a valid NFS packet with corrupted embedded length field is received.

  • Fixes a condition that causes an unnecessary panic due to request connection deregistration with an invalid IP address.

  • Provides performance improvement for CFS filesets mounted with the server_only option. A log sync for create transactions is not needed for such filesets.

  • Fixes a problem with single physical rail Memory Channel configurations and cleans up stale data left on an off-line physical rail by the Memory Channel driver.

  • Fixes a rare cluster hang caused by dead locks that occurred between the CFS client and server during multiple write operations.

  • Fixes multiple problems seen with the TruCluster RDG component, including panics of the following types "rdg: unwiring", "vl_unwire: page is not wired", and "KMF: from _otsmove."

  • Allows users to add new members and create a cluster with different netmasks.

  • Removes member0-specific installation files on an undo install, which could prevent the reinstallation of the patch.

  • Allows users to continue forward when they add a member to a one-node cluster during a rolling upgrade or rolling patch.

  • Enables CAA to start up and fail over system services before any of the user services.

  • Fixes an unaligned kernel access in the cluster I/O stack.

  • Addresses a potential hang in the NFS server that occurs when file systems are being relocated in a cluster.

  • Provides the ability to lower the cluster_rebuild_delay.

  • Fixes the long delay during an NFS connection failover when servicing cluster member dies.

  • Fixes a panic in clua.mod that is caused by receiving a delete-cnx-request from a member when that cnx is in the UNREGISTER state.

  • Fixes a reconnection problem when an interface comes down and then goes up.

  • Fixes a panic problem in clua.mod that occurs when max_aliasid is increased and aliases are added.

  • Fixes a situation that causes a core dump in aliasd when all interfaces are removed on a cluster member that is set up with at least one cluster alias that was added with virtual=t and without a subnet.

  • Fixes a problem when disabling and re-enabling cluster alias source route on a given interface.

  • Fixes a problem where clua.mod does not handle TCP RST messages appropriately.

  • Fixes a problem of restoring static routes when an interface revives.

  • Corrects a problem in which a rolling upgrade stops advancing when adding a cluster member to a one-node cluster.

  • Fixes an initialization issue with the internode communications subsystem.

  • Corrects a problem in which a domain panic on the cluster_root does not result as it should in a regular panic for the cluster node on which the domain panic occurs.

  • Fixes several small issues with clu_upgrade:

    • A "process not found" message displayed when finishing the setup stage of clu_upgrade has been removed.

    • The ability to roll on a one-node cluster is maintained.

  • Addresses a problem on LAN clusters related to improper keep-alive timeouts that can be identified when the following console message is displayed during normal operations (that is, no know failures and no nodes are rebooting):

    • WARNING: ics_socket_event: error 60 on channel 0, assume node # is down

  • Fixes a problem that occurs when the interconnect is configured using NetRAIN, cluster_rebuild_delay is set significantly below the default value, and members are rebooting or failures are occurring on the active links. The console message seen when this occurs is “CNX QDISK: Yielding to foreign owner with provisional quorum.”

  • Fixes a problem in which I/O barriers may be stalled when a drive becomes hung.

  • Prevents write failures from a cluster NFS client that may occur when a second user without write access is concurrently reading the file.

  • Fixes a problem that occurs during reboots on heavily loaded cluster using the LAN interconnect and generates the following messages:

    • WARNING: ics_socket_event: error 54 on channel 0

    • WARNING: ics_socket_event: error 60 on channel 0

  • Fixes kmf in drd_kgs_bid_stop_server_io_drained when a node leaves during a drd kgs transaction.

  • Corrects a problem in which drd continually tries to perform a munsa unreject on the drive when a device is deleted while it is in the munsa reject state.

  • Corrects a problem in which multiple path failures cause drd to return ENODEV even when a server is available in the cluster.

  • Fixes several error handling in drd for device error conditions.

  • Fixes problem in which a device cannot be opened due to heavy load on the device.

  • Fixes a problem in which a CD-ROM is not mountable in a cluster.

  • Fixes loss of quorum disk.

  • Makes quorum disk parameters configurable.

  • Eliminates a window for kernel memory fault panics on AdvFS system calls that are performed via function shipping using the clu_msfs_syscall_fship routine.

  • Improves drd tracing.

  • Fixes a Sierra Cluster KCH set free race condition.

  • Fixes two errors in clu_upgrade that prevents completing the setup stage.

  • Prevents a get_cs_toks() KMF/assert crash.

  • Fixes a rm_audit_sync_block panic that occurs when using a long fiber as the Memory Channel interconnect.

  • Fixes a timing window in the Internode Communications Subsystem ddr device error handling.

  • Fixes the rm_audit_sync_block panic when using a long fiber with VHUB as the Memory Channel interconnect.

  • Fixes clu_bdmgr to facilitate CLSM sliced disks for cluster_root domain.

  • Modifies the manner of checking for user file limits for CFS remote DIO writes.

  • Ensures that signals for EFBIG writes are properly generated on a client.

  • Ensures the correct processing of CFS in future releases.

  • Fixes a multiple free problem of 32-byte memory bucket caused by multiple callbacks from KCH to CLUA.

  • Fixes an incorrect if statement, which although a low- risk problem, could block access to a disk device.

  • Corrects a confusing error message.

  • Fixes a problem seen in a LAN cluster when the CPUs on a member system are not installed contiguously in the lower order slots.

  • Allows the quorum disk to be used in spite of transient errors with the quorum disk hardware.

  • Corrects an internal logic error that causes the performance of file deletion to be suboptimal.

  • Fixes a deadlock that occurs when no members have valid paths to a device and all the nodes in the cluster are attempting failover at the same time.

  • Fixes problems seen in the TruCluster RDG component.

  • Fixes a race condition in a routine that allocates memory for Memory Channel logical rail and physical rail use. It prevents a KMF during boot, occasionally seen on some AlphaServer GS1280 systems.

  • Fixes a race condition which leads to a panic that occurs when a device is deleted on a busy system.

  • Adds the ability to log enabled DRD events to circular memory buffer.

  • Corrects an Invalid Current Server panic.

  • Increases tolerance for intermittent disk boot disk errors early in the boot process.

  • Corrects a problem in which I/O operations hang when I/O barriers fail due to the loss of access to drives.

  • Fixes a TruCluster NFS server failure that occurs when clients access file systems forcibly removed with the cfsmgr -u command.

  • Fixes an incorrect return status for asynchronous direct I/O reads in a cluster if the read request goes beyond the end of the file (EOF).

  • Fixes the problem of unintentional loading of gated when nogated is specified with other requested cluamgr operations.

  • Fixes a problem in which backplane RAID devices can become inaccessible.

  • Provides the following tape-related fixes:

    • Corrects a problem in which hwmgr redirect commands fail on tape devices.

    • Prevents the reuse of a dsk number upon deleting and adding a new tape.

    • Corrects a problem in which drdmgr commands can hang on tapes.

    • Updates the code base to make failbacks more proactive.

  • Improves defenses against user error during the roll stage of rolling upgrade.

  • Fixes TruCluster Distributed Lock Manager (dlm) system panic due to lock transaction ID's being out of synch after a rebuild.

  • Corrects a problem in which the TruCluster component DRD (Device Request Dispatcher) does not always return standard error codes.

  • Prevents a kernel memory fault panic when drd_open is called on a device with a valid local path that has no local devt passed in, and this member has the lowest cluster ID of any member in the cluster.

  • Prevents CFS token sequence number reuse errors on fast systems.

  • Prevent domain panic on a file system that is local to a failed cluster member.

  • Prevent CFS write() from updating file access time or panicking on a directory.

  • Modifies the way the clu_upgrade command behaves regarding the availability of backup space in the setup and preinstall stages and adds an appropriate error message.

  • Corrects a problem within the TruCluster Kernel Group Services (kgs/kch) subsystem in which the simultaneous booting of multiple nodes may result in a panic due to an unknown node in a remote member node list.

  • Removes a delay in the TruCluster component DRD (Device Request Dispatcher) event threads during system booting.

  • Corrects a kernel memory fault in drd_local_device_close.

  • Fixes a kernel memory fault issue on LAN-based clusters that do not have a Memory Channel adapter installed on the systems.

  • Fixes problem of non-root users not being able to execute the caa_stat command.

  • Provides enhancements to CAA commands and the caad daemon.

  • Resolves a resource exhaustion problem in the TruCluster kgs/kch subsystem on high-end clusters, typically with large storage configurations.

  • Fixes an assert failure in cfs the server.

  • Resolves a problem that occurs when adjusting sysconfig clua attributes sticky_entry_timeout and sticky_db_cleanup_interval.

  • Ensures that if only a portion of an AIO/DIO write completes, the correct number of bytes written will be returned.

  • Allows CFS to correctly handle a token race condition without creating a panic.

  • Prevents a node in a cluster from hanging at boot time.

  • Corrects misspellings of file system in the cfsmgr utility.

  • Implements the fast fail policy within DRD.

  • Corrects a problem in which backplane RAID devices can become inaccessible when installed on systems running Version 5.1B-2 (Patch Kit 4).

  • Enhances the fuser command to provide cluster-wide query capability.

  • Ensures that the number of icsmct receive threads does not exceed the number of CPUs.

  • Corrects a condition in which drd_get_disk_attributes hang if too many errors are encountered, causing new devices to be inaccessible in a cluster from some cluster members.

  • Corrects a problem in which a cluster CFS client would panic in cfscall_writepages, reporting ASSERT (error != EDQUOT) . This correction eliminates that failure and allows for the proper writing up to the fileset quota and to the end of space for a domain.

  • Fixes a rare, three-way deadlock condition when Internode Communication Services (ICS) traffic is in a throttled state and a cluster member that is participating in the throttled traffic is halted.

  • Fixes a kernel memory fault in strlen on a cluster member during a mount of an AdvFS files ystem with an improperly specified file system.

  • Allows the ulimit -f command to function correctly in a cluster.

  • Prevents a kernel memory fault panic that may occur with client writes on nearly full domains.

  • Prevents a panic on a device close when device connectivity is lost.

  • Fixes a mounting KMF of partitioning file system in a cluster.

  • Fixes a problem in which a CAA resource and its dependents become inaccessible when the resource fails to start on the node where it is failed over to and there are no more nodes to consider for failover.

  • Fixes Oracle socket connection problem.

  • Fixes incorrect error handling that could result in memory leak.

  • Provides event definitions for traps in cluster MIB files to support Openview NMS.

  • Modifies ics_tcp to check response buffer for NULL before freeing it.

  • Fixes a problem in which booting times in excess of 2 hours occur in a two-node LAN cluster using an ee (DE6xx) adapter as the cluster interconnect and connected directly by a crossover cable.

  • Corrects a scenario during a cluster member boot whereby a booting member may cause booted members to panic on a kernel memory fault shortly after the messages "Registering CMS Services" and the "rm slave" are printed to the booting console for each MC card.

  • Fixes a problem that could cause the system panic "clua_realloc_port: corrupt list pointers panic".

  • Corrects trapOID for traps generated from the clu_mibs subagent and provides event definitions for traps in MIB files to support Openview.

  • Fixes an inappropriate message that is displayed during CAA resource relocation when invoked from SysMan.

  • Fixes 64-byte memory leak in the drd/kgs interface.

  • Modifies CNX to check for communication errors while a node joins the cluster.

  • Fixes a synchronization issue with a cluster alias ID set among cluster members.

  • Prevents a panic from occurring during a failover mount if the AdvFS on-disk file system ID (fsid) does not match the current cluster-wide fsid for the file system.

  • Fixes an intermittent core issue in the aliasd daemon caused by improper handling of the interface list.

  • Fixes an assertion panic "set-num_rmt_mbr_nodes = 0".

  • Prevents a single-node panic in a cluster than can occur under the following conditions:

    • A memory file system of size 4GB or greater is created with the default 512-byte sector size.

    • A memory file system of size 2GB or greater is created with a 1024-byte sector size and other sector sizes.

  • Prevents a kernel memory fault panic that may be seen under certain error conditions with MFS file systems.

  • Corrects a problem with kch memory usage.

Patch 27002.00

TCRMAN540

  • Provides a new command, clu_ping, to determine the status of the interconnects in a stretched cluster environment.

  • Updates the caa_relocate(8) and cluamgr(8) reference pages.

  • Revises the clua_services(4) and sys_attrs_clua(5) TruCluster reference pages.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
2009 Hewlett-Packard Development Company, L.P.