android_kernel_xiaomi_sm7250

Author	SHA1	Message	Date
Christoph Lameter	42a9fdbb12	SLUB: Optimize cacheline use for zeroing We touch a cacheline in the kmem_cache structure for zeroing to get the size. However, the hot paths in slab_alloc and slab_free do not reference any other fields in kmem_cache, so we may have to just bring in the cacheline for this one access. Add a new field to kmem_cache_cpu that contains the object size. That cacheline must already be used in the hotpaths. So we save one cacheline on every slab_alloc if we zero. We need to update the kmem_cache_cpu object size if an aliasing operation changes the objsize of an non debug slab. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:43:01 -07:00
Christoph Lameter	4c93c355d5	SLUB: Place kmem_cache_cpu structures in a NUMA aware way The kmem_cache_cpu structures introduced are currently an array placed in the kmem_cache struct. Meaning the kmem_cache_cpu structures are overwhelmingly on the wrong node for systems with a higher amount of nodes. These are performance critical structures since the per node information has to be touched for every alloc and free in a slab. In order to place the kmem_cache_cpu structure optimally we put an array of pointers to kmem_cache_cpu structs in kmem_cache (similar to SLAB). However, the kmem_cache_cpu structures can now be allocated in a more intelligent way. We would like to put per cpu structures for the same cpu but different slab caches in cachelines together to save space and decrease the cache footprint. However, the slab allocators itself control only allocations per node. We set up a simple per cpu array for every processor with 100 per cpu structures which is usually enough to get them all set up right. If we run out then we fall back to kmalloc_node. This also solves the bootstrap problem since we do not have to use slab allocator functions early in boot to get memory for the small per cpu structures. Pro: - NUMA aware placement improves memory performance - All global structures in struct kmem_cache become readonly - Dense packing of per cpu structures reduces cacheline footprint in SMP and NUMA. - Potential avoidance of exclusive cacheline fetches on the free and alloc hotpath since multiple kmem_cache_cpu structures are in one cacheline. This is particularly important for the kmalloc array. Cons: - Additional reference to one read only cacheline (per cpu array of pointers to kmem_cache_cpu) in both slab_alloc() and slab_free(). [akinobu.mita@gmail.com: fix cpu hotplug offline/online path] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: "Pekka Enberg" <penberg@cs.helsinki.fi> Cc: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:43:01 -07:00
Christoph Lameter	b3fba8da65	SLUB: Move page->offset to kmem_cache_cpu->offset We need the offset from the page struct during slab_alloc and slab_free. In both cases we also reference the cacheline of the kmem_cache_cpu structure. We can therefore move the offset field into the kmem_cache_cpu structure freeing up 16 bits in the page struct. Moving the offset allows an allocation from slab_alloc() without touching the page struct in the hot path. The only thing left in slab_free() that touches the page struct cacheline for per cpu freeing is the checking of SlabDebug(page). The next patch deals with that. Use the available 16 bits to broaden page->inuse. More than 64k objects per slab become possible and we can get rid of the checks for that limitation. No need anymore to shrink the order of slabs if we boot with 2M sized slabs (slub_min_order=9). No need anymore to switch off the offset calculation for very large slabs since the field in the kmem_cache_cpu structure is 32 bits and so the offset field can now handle slab sizes of up to 8GB. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:43:01 -07:00
Christoph Lameter	8e65d24c7c	SLUB: Do not use page->mapping After moving the lockless_freelist to kmem_cache_cpu we no longer need page->lockless_freelist. Restructure the use of the struct page fields in such a way that we never touch the mapping field. This is turn allows us to remove the special casing of SLUB when determining the mapping of a page (needed for corner cases of virtual caches machines that need to flush caches of processors mapping a page). Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:43:01 -07:00
Christoph Lameter	dfb4f09609	SLUB: Avoid page struct cacheline bouncing due to remote frees to cpu slab A remote free may access the same page struct that also contains the lockless freelist for the cpu slab. If objects have a short lifetime and are freed by a different processor then remote frees back to the slab from which we are currently allocating are frequent. The cacheline with the page struct needs to be repeately acquired in exclusive mode by both the allocating thread and the freeing thread. If this is frequent enough then performance will suffer because of cacheline bouncing. This patchset puts the lockless_freelist pointer in its own cacheline. In order to make that happen we introduce a per cpu structure called kmem_cache_cpu. Instead of keeping an array of pointers to page structs we now keep an array to a per cpu structure that--among other things--contains the pointer to the lockless freelist. The freeing thread can then keep possession of exclusive access to the page struct cacheline while the allocating thread keeps its exclusive access to the cacheline containing the per cpu structure. This works as long as the allocating cpu is able to service its request from the lockless freelist. If the lockless freelist runs empty then the allocating thread needs to acquire exclusive access to the cacheline with the page struct lock the slab. The allocating thread will then check if new objects were freed to the per cpu slab. If so it will keep the slab as the cpu slab and continue with the recently remote freed objects. So the allocating thread can take a series of just freed remote pages and dish them out again. Ideally allocations could be just recycling objects in the same slab this way which will lead to an ideal allocation / remote free pattern. The number of objects that can be handled in this way is limited by the capacity of one slab. Increasing slab size via slub_min_objects/ slub_max_order may increase the number of objects and therefore performance. If the allocating thread runs out of objects and finds that no objects were put back by the remote processor then it will retrieve a new slab (from the partial lists or from the page allocator) and start with a whole new set of objects while the remote thread may still be freeing objects to the old cpu slab. This may then repeat until the new slab is also exhausted. If remote freeing has freed objects in the earlier slab then that earlier slab will now be on the partial freelist and the allocating thread will pick that slab next for allocation. So the loop is extended. However, both threads need to take the list_lock to make the swizzling via the partial list happen. It is likely that this kind of scheme will keep the objects being passed around to a small set that can be kept in the cpu caches leading to increased performance. More code cleanups become possible: - Instead of passing a cpu we can now pass a kmem_cache_cpu structure around. Allows reducing the number of parameters to various functions. - Can define a new node_match() function for NUMA to encapsulate locality checks. Effect on allocations: Cachelines touched before this patch: Write: page cache struct and first cacheline of object Cachelines touched after this patch: Write: kmem_cache_cpu cacheline and first cacheline of object Read: page cache struct (but see later patch that avoids touching that cacheline) The handling when the lockless alloc list runs empty gets to be a bit more complicated since another cacheline has now to be written to. But that is halfway out of the hot path. Effect on freeing: Cachelines touched before this patch: Write: page_struct and first cacheline of object Cachelines touched after this patch depending on how we free: Write(to cpu_slab): kmem_cache_cpu struct and first cacheline of object Write(to other): page struct and first cacheline of object Read(to cpu_slab): page struct to id slab etc. (but see later patch that avoids touching the page struct on free) Read(to other): cpu local kmem_cache_cpu struct to verify its not the cpu slab. Summary: Pro: - Distinct cachelines so that concurrent remote frees and local allocs on a cpuslab can occur without cacheline bouncing. - Avoids potential bouncing cachelines because of neighboring per cpu pointer updates in kmem_cache's cpu_slab structure since it now grows to a cacheline (Therefore remove the comment that talks about that concern). Cons: - Freeing objects now requires the reading of one additional cacheline. That can be mitigated for some cases by the following patches but its not possible to completely eliminate these references. - Memory usage grows slightly. The size of each per cpu object is blown up from one word (pointing to the page_struct) to one cacheline with various data. So this is NR_CPUSNR_SLABSL1_BYTES more memory use. Lets say NR_SLABS is 100 and a cache line size of 128 then we have just increased SLAB metadata requirements by 12.8k per cpu. (Another later patch reduces these requirements) Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:43:01 -07:00
Mel Gorman	467c996c1e	Print out statistics in relation to fragmentation avoidance to /proc/pagetypeinfo This patch provides fragmentation avoidance statistics via /proc/pagetypeinfo. The information is collected only on request so there is no runtime overhead. The statistics are in three parts: The first part prints information on the size of blocks that pages are being grouped on and looks like Page block order: 10 Pages per block: 1024 The second part is a more detailed version of /proc/buddyinfo and looks like Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10 Node 0, zone DMA, type Unmovable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Reclaimable 1 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Reserve 0 4 4 0 0 0 0 1 0 1 0 Node 0, zone Normal, type Unmovable 111 8 4 4 2 3 1 0 0 0 0 Node 0, zone Normal, type Reclaimable 293 89 8 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Movable 1 6 13 9 7 6 3 0 0 0 0 Node 0, zone Normal, type Reserve 0 0 0 0 0 0 0 0 0 0 4 The third part looks like Number of blocks type Unmovable Reclaimable Movable Reserve Node 0, zone DMA 0 1 2 1 Node 0, zone Normal 3 17 94 4 To walk the zones within a node with interrupts disabled, walk_zones_in_node() is introduced and shared between /proc/buddyinfo, /proc/zoneinfo and /proc/pagetypeinfo to reduce code duplication. It seems specific to what vmstat.c requires but could be broken out as a general utility function in mmzone.c if there were other other potential users. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:43:00 -07:00
Mel Gorman	d9c2340052	Do not depend on MAX_ORDER when grouping pages by mobility Currently mobility grouping works at the MAX_ORDER_NR_PAGES level. This makes sense for the majority of users where this is also the huge page size. However, on platforms like ia64 where the huge page size is runtime configurable it is desirable to group at a lower order. On x86_64 and occasionally on x86, the hugepage size may not always be MAX_ORDER_NR_PAGES. This patch groups pages together based on the value of HUGETLB_PAGE_ORDER. It uses a compile-time constant if possible and a variable where the huge page size is runtime configurable. It is assumed that grouping should be done at the lowest sensible order and that the user would not want to override this. If this is not true, page_block order could be forced to a variable initialised via a boot-time kernel parameter. One potential issue with this patch is that IA64 now parses hugepagesz with early_param() instead of __setup(). __setup() is called after the memory allocator has been initialised and the pageblock bitmaps already setup. In tests on one IA64 there did not seem to be any problem with using early_param() and in fact may be more correct as it guarantees the parameter is handled before the parsing of hugepages=. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:43:00 -07:00
Mel Gorman	64c5e135bf	don't group high order atomic allocations Grouping high-order atomic allocations together was intended to allow bursty users of atomic allocations to work such as e1000 in situations where their preallocated buffers were depleted. This did not work in at least one case with a wireless network adapter needing order-1 allocations frequently. To resolve that, the free pages used for min_free_kbytes were moved to separate contiguous blocks with the patch bias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks. It is felt that keeping the free pages in the same contiguous blocks should be sufficient for bursty short-lived high-order atomic allocations to succeed, maybe even with the e1000. Even if there is a failure, increasing the value of min_free_kbytes will free pages as contiguous bloks in contrast to the standard buddy allocator which makes no attempt to keep the minimum number of free pages contiguous. This patch backs out grouping high order atomic allocations together to determine if it is really needed or not. If a new report comes in about high-order atomic allocations failing, the feature can be reintroduced to determine if it fixes the problem or not. As a side-effect, this patch reduces by 1 the number of bits required to track the mobility type of pages within a MAX_ORDER_NR_PAGES block. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:43:00 -07:00
Mel Gorman	ac0e5b7a6b	remove PAGE_GROUP_BY_MOBILITY Grouping pages by mobility can be disabled at compile-time. This was considered undesirable by a number of people. However, in the current stack of patches, it is not a simple case of just dropping the configurable patch as it would cause merge conflicts. This patch backs out the configuration option. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:43:00 -07:00
Mel Gorman	56fd56b868	Bias the location of pages freed for min_free_kbytes in the same MAX_ORDER_NR_PAGES blocks The standard buddy allocator always favours the smallest block of pages. The effect of this is that the pages free to satisfy min_free_kbytes tends to be preserved since boot time at the same location of memory ffor a very long time and as a contiguous block. When an administrator sets the reserve at 16384 at boot time, it tends to be the same MAX_ORDER blocks that remain free. This allows the occasional high atomic allocation to succeed up until the point the blocks are split. In practice, it is difficult to split these blocks but when they do split, the benefit of having min_free_kbytes for contiguous blocks disappears. Additionally, increasing min_free_kbytes once the system has been running for some time has no guarantee of creating contiguous blocks. On the other hand, CONFIG_PAGE_GROUP_BY_MOBILITY favours splitting large blocks when there are no free pages of the appropriate type available. A side-effect of this is that all blocks in memory tends to be used up and the contiguous free blocks from boot time are not preserved like in the vanilla allocator. This can cause a problem if a new caller is unwilling to reclaim or does not reclaim for long enough. A failure scenario was found for a wireless network device allocating order-1 atomic allocations but the allocations were not intense or frequent enough for a whole block of pages to be preserved for MIGRATE_HIGHALLOC. This was reproduced on a desktop by booting with mem=256mb, forcing the driver to allocate at order-1, running a bittorrent client (downloading a debian ISO) and building a kernel with -j2. This patch addresses the problem on the desktop machine booted with mem=256mb. It works by setting aside a reserve of MAX_ORDER_NR_PAGES blocks, the number of which depends on the value of min_free_kbytes. These blocks are only fallen back to when there is no other free pages. Then the smallest possible page is used just like the normal buddy allocator instead of the largest possible page to preserve contiguous pages The pages in free lists in the reserve blocks are never taken for another migrate type. The results is that even if min_free_kbytes is set to a low value, contiguous blocks will be preserved in the MIGRATE_RESERVE blocks. This works better than the vanilla allocator because if min_free_kbytes is increased, a new reserve block will be chosen based on the location of reclaimable pages and the block will free up as contiguous pages. In the vanilla allocator, no effort is made to target a block of pages to free as contiguous pages and min_free_kbytes pages are scattered randomly. This effect has been observed on the test machine. min_free_kbytes was set initially low but it was kept as a contiguous free block within MIGRATE_RESERVE. min_free_kbytes was then set to a higher value and over a period of time, the free blocks were within the reserve and coalescing. How long it takes to free up depends on how quickly LRU is rotating. Amusingly, this means that more activity will free the blocks faster. This mechanism potentially replaces MIGRATE_HIGHALLOC as it may be more effective than grouping contiguous free pages together. It all depends on whether the number of active atomic high allocations exceeds min_free_kbytes or not. If the number of active allocations exceeds min_free_kbytes, it's worth it but maybe in that situation, min_free_kbytes should be set higher. Once there are no more reports of allocation failures, a patch will be submitted that backs out MIGRATE_HIGHALLOC and see if the reports stay missing. Credit to Mariusz Kozlowski for discovering the problem, describing the failure scenario and testing patches and scenarios. [akpm@linux-foundation.org: cleanups] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:43:00 -07:00
Mel Gorman	5c0e306647	Fix corruption of memmap on IA64 SPARSEMEM when mem_section is not a power of 2 There are problems in the use of SPARSEMEM and pageblock flags that causes problems on ia64. The first part of the problem is that units are incorrect in SECTION_BLOCKFLAGS_BITS computation. This results in a map_section's section_mem_map being treated as part of a bitmap which isn't good. This was evident with an invalid virtual address when mem_init attempted to free bootmem pages while relinquishing control from the bootmem allocator. The second part of the problem occurs because the pageblock flags bitmap is be located with the mem_section. The SECTIONS_PER_ROOT computation using sizeof (mem_section) may not be a power of 2 depending on the size of the bitmap. This renders masks and other such things not power of 2 base. This issue was seen with SPARSEMEM_EXTREME on ia64. This patch moves the bitmap outside of mem_section and uses a pointer instead in the mem_section. The bitmaps are allocated when the section is being initialised. Note that sparse_early_usemap_alloc() does not use alloc_remap() like sparse_early_mem_map_alloc(). The allocation required for the bitmap on x86, the only architecture that uses alloc_remap is typically smaller than a cache line. alloc_remap() pads out allocations to the cache size which would be a needless waste. Credit to Bob Picco for identifying the original problem and effecting a fix for the SECTION_BLOCKFLAGS_BITS calculation. Credit to Andy Whitcroft for devising the best way of allocating the bitmaps only when required for the section. [wli@holomorphy.com: warning fix] Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: William Irwin <bill.irwin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:43:00 -07:00
Mel Gorman	e010487dbe	Group high-order atomic allocations In rare cases, the kernel needs to allocate a high-order block of pages without sleeping. For example, this is the case with e1000 cards configured to use jumbo frames. Migrating or reclaiming pages in this situation is not an option. This patch groups these allocations together as much as possible by adding a new MIGRATE_TYPE. The MIGRATE_HIGHATOMIC type are exactly what they sound like. Care is taken that pages of other migrate types do not use the same blocks as high-order atomic allocations. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:43:00 -07:00
Mel Gorman	e12ba74d8f	Group short-lived and reclaimable kernel allocations This patch marks a number of allocations that are either short-lived such as network buffers or are reclaimable such as inode allocations. When something like updatedb is called, long-lived and unmovable kernel allocations tend to be spread throughout the address space which increases fragmentation. This patch groups these allocations together as much as possible by adding a new MIGRATE_TYPE. The MIGRATE_RECLAIMABLE type is for allocations that can be reclaimed on demand, but not moved. i.e. they can be migrated by deleting them and re-reading the information from elsewhere. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:43:00 -07:00
Mel Gorman	b92a6edd4b	Add a configure option to group pages by mobility The grouping mechanism has some memory overhead and a more complex allocation path. This patch allows the strategy to be disabled for small memory systems or if it is known the workload is suffering because of the strategy. It also acts to show where the page groupings strategy interacts with the standard buddy allocator. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Joel Schopp <jschopp@austin.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:59 -07:00
Mel Gorman	b2a0ac8875	Split the free lists for movable and unmovable allocations This patch adds the core of the fragmentation reduction strategy. It works by grouping pages together based on their ability to migrate or be reclaimed. Basically, it works by breaking the list in zone->free_area list into MIGRATE_TYPES number of lists. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:59 -07:00
Mel Gorman	835c134ec4	Add a bitmap that is used to track flags affecting a block of pages Here is the latest revision of the anti-fragmentation patches. Of particular note in this version is special treatment of high-order atomic allocations. Care is taken to group them together and avoid grouping pages of other types near them. Artifical tests imply that it works. I'm trying to get the hardware together that would allow setting up of a "real" test. If anyone already has a setup and test that can trigger the atomic-allocation problem, I'd appreciate a test of these patches and a report. The second major change is that these patches will apply cleanly with patches that implement anti-fragmentation through zones. kernbench shows effectively no performance difference varying between -0.2% and +2% on a variety of test machines. Success rates for huge page allocation are dramatically increased. For example, on a ppc64 machine, the vanilla kernel was only able to allocate 1% of memory as a hugepage and this was due to a single hugepage reserved as min_free_kbytes. With these patches applied, 17% was allocatable as superpages. With reclaim-related fixes from Andy Whitcroft, it was 40% and further reclaim-related improvements should increase this further. Changelog Since V28 o Group high-order atomic allocations together o It is no longer required to set min_free_kbytes to 10% of memory. A value of 16384 in most cases will be sufficient o Now applied with zone-based anti-fragmentation o Fix incorrect VM_BUG_ON within buffered_rmqueue() o Reorder the stack so later patches do not back out work from earlier patches o Fix bug were journal pages were being treated as movable o Bias placement of non-movable pages to lower PFNs o More agressive clustering of reclaimable pages in reactions to workloads like updatedb that flood the size of inode caches Changelog Since V27 o Renamed anti-fragmentation to Page Clustering. Anti-fragmentation was giving the mistaken impression that it was the 100% solution for high order allocations. Instead, it greatly increases the chances high-order allocations will succeed and lays the foundation for defragmentation and memory hot-remove to work properly o Redefine page groupings based on ability to migrate or reclaim instead of basing on reclaimability alone o Get rid of spurious inits o Per-cpu lists are no longer split up per-type. Instead the per-cpu list is searched for a page of the appropriate type o Added more explanation commentary o Fix up bug in pageblock code where bitmap was used before being initalised Changelog Since V26 o Fix double init of lists in setup_pageset Changelog Since V25 o Fix loop order of for_each_rclmtype_order so that order of loop matches args o gfpflags_to_rclmtype uses gfp_t instead of unsigned long o Rename get_pageblock_type() to get_page_rclmtype() o Fix alignment problem in move_freepages() o Add mechanism for assigning flags to blocks of pages instead of page->flags o On fallback, do not examine the preferred list of free pages a second time The purpose of these patches is to reduce external fragmentation by grouping pages of related types together. When pages are migrated (or reclaimed under memory pressure), large contiguous pages will be freed. This patch works by categorising allocations by their ability to migrate; Movable - The pages may be moved with the page migration mechanism. These are generally userspace pages. Reclaimable - These are allocations for some kernel caches that are reclaimable or allocations that are known to be very short-lived. Unmovable - These are pages that are allocated by the kernel that are not trivially reclaimed. For example, the memory allocated for a loaded module would be in this category. By default, allocations are considered to be of this type HighAtomic - These are high-order allocations belonging to callers that cannot sleep or perform any IO. In practice, this is restricted to jumbo frame allocation for network receive. It is assumed that the allocations are short-lived Instead of having one MAX_ORDER-sized array of free lists in struct free_area, there is one for each type of reclaimability. Once a 2^MAX_ORDER block of pages is split for a type of allocation, it is added to the free-lists for that type, in effect reserving it. Hence, over time, pages of the different types can be clustered together. When the preferred freelists are expired, the largest possible block is taken from an alternative list. Buddies that are split from that large block are placed on the preferred allocation-type freelists to mitigate fragmentation. This implementation gives best-effort for low fragmentation in all zones. Ideally, min_free_kbytes needs to be set to a value equal to 4 * (1 << (MAX_ORDER-1)) pages in most cases. This would be 16384 on x86 and x86_64 for example. Our tests show that about 60-70% of physical memory can be allocated on a desktop after a few days uptime. In benchmarks and stress tests, we are finding that 80% of memory is available as contiguous blocks at the end of the test. To compare, a standard kernel was getting < 1% of memory as large pages on a desktop and about 8-12% of memory as large pages at the end of stress tests. Following this email are 12 patches that implement thie page grouping feature. The first patch introduces a mechanism for storing flags related to a whole block of pages. Then allocations are split between movable and all other allocations. Following that are patches to deal with per-cpu pages and make the mechanism configurable. The next patch moves free pages between lists when partially allocated blocks are used for pages of another migrate type. The second last patch groups reclaimable kernel allocations such as inode caches together. The final patch related to groupings keeps high-order atomic allocations. The last two patches are more concerned with control of fragmentation. The second last patch biases placement of non-movable allocations towards the start of memory. This is with a view of supporting memory hot-remove of DIMMs with higher PFNs in the future. The biasing could be enforced a lot heavier but it would cost. The last patch agressively clusters reclaimable pages like inode caches together. The fragmentation reduction strategy needs to track if pages within a block can be moved or reclaimed so that pages are freed to the appropriate list. This patch adds a bitmap for flags affecting a whole a MAX_ORDER block of pages. In non-SPARSEMEM configurations, the bitmap is stored in the struct zone and allocated during initialisation. SPARSEMEM statically allocates the bitmap in a struct mem_section so that bitmaps do not have to be resized during memory hotadd. This wastes a small amount of memory per unused section (usually sizeof(unsigned long)) but the complexity of dynamically allocating the memory is quite high. Additional credit to Andy Whitcroft who reviewed up an earlier implementation of the mechanism an suggested how to make it a lot cleaner. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:59 -07:00
Christoph Lameter	6cb062296f	Categorize GFP flags The function of GFP_LEVEL_MASK seems to be unclear. In order to clear up the mystery we get rid of it and replace GFP_LEVEL_MASK with 3 sets of GFP flags: GFP_RECLAIM_MASK Flags used to control page allocator reclaim behavior. GFP_CONSTRAINT_MASK Flags used to limit where allocations can occur. GFP_SLAB_BUG_MASK Flags that the slab allocator BUG()s on. These replace the uses of GFP_LEVEL mask in the slab allocators and in vmalloc.c. The use of the flags not included in these sets may occur as a result of a slab allocation standing in for a page allocation when constructing scatter gather lists. Extraneous flags are cleared and not passed through to the page allocator. __GFP_MOVABLE/RECLAIMABLE, __GFP_COLD and __GFP_COMP will now be ignored if passed to a slab allocator. Change the allocation of allocator meta data in SLAB and vmalloc to not pass through flags listed in GFP_CONSTRAINT_MASK. SLAB already removes the __GFP_THISNODE flag for such allocations. Generalize that to also cover vmalloc. The use of GFP_CONSTRAINT_MASK also includes __GFP_HARDWALL. The impact of allocator metadata placement on access latency to the cachelines of the object itself is minimal since metadata is only referenced on alloc and free. The attempt is still made to place the meta data optimally but we consistently allow fallback both in SLAB and vmalloc (SLUB does not need to allocate metadata like that). Allocator metadata may serve multiple in kernel users and thus should not be subject to the limitations arising from a single allocation context. [akpm@linux-foundation.org: fix fallback_alloc()] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:59 -07:00
Christoph Lameter	0e1e7c7a73	Memoryless nodes: Use N_HIGH_MEMORY for cpusets cpusets try to ensure that any node added to a cpuset's mems_allowed is on-line and contains memory. The assumption was that online nodes contained memory. Thus, it is possible to add memoryless nodes to a cpuset and then add tasks to this cpuset. This results in continuous series of oom-kill and apparent system hang. Change cpusets to use node_states[N_HIGH_MEMORY] [a.k.a. node_memory_map] in place of node_online_map when vetting memories. Return error if admin attempts to write a non-empty mems_allowed node mask containing only memoryless-nodes. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@skynet.ie> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:59 -07:00
Christoph Lameter	523b945855	Memoryless nodes: Fix GFP_THISNODE behavior GFP_THISNODE checks that the zone selected is within the pgdat (node) of the first zone of a nodelist. That only works if the node has memory. A memoryless node will have its first node on another pgdat (node). GFP_THISNODE currently will return simply memory on the first pgdat. Thus it is returning memory on other nodes. GFP_THISNODE should fail if there is no local memory on a node. Add a new set of zonelists for each node that only contain the nodes that belong to the zones itself so that no fallback is possible. Then modify gfp_type to pickup the right zone based on the presence of __GFP_THISNODE. Drop the existing GFP_THISNODE checks from the page_allocators hot path. Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Bob Picco <bob.picco@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@skynet.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:59 -07:00
Christoph Lameter	37c0708dbe	Memoryless nodes: Add N_CPU node state We need the check for a node with cpu in zone reclaim. Zone reclaim will not allow remote zone reclaim if a node has a cpu. [Lee.Schermerhorn@hp.com: Move setup of N_CPU node state mask] Signed-off-by: Christoph Lameter <clameter@sgi.com> Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Bob Picco <bob.picco@hp.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@skynet.ie> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:58 -07:00
Christoph Lameter	7ea1530ab3	Memoryless nodes: introduce mask of nodes with memory It is necessary to know if nodes have memory since we have recently begun to add support for memoryless nodes. For that purpose we introduce a two new node states: N_HIGH_MEMORY and N_NORMAL_MEMORY. A node has its bit in N_HIGH_MEMORY set if it has any memory regardless of the type of mmemory. If a node has memory then it has at least one zone defined in its pgdat structure that is located in the pgdat itself. A node has its bit in N_NORMAL_MEMORY set if it has a lower zone than ZONE_HIGHMEM. This means it is possible to allocate memory that is not subject to kmap. N_HIGH_MEMORY and N_NORMAL_MEMORY can then be used in various places to insure that we do the right thing when we encounter a memoryless node. [akpm@linux-foundation.org: build fix] [Lee.Schermerhorn@hp.com: update N_HIGH_MEMORY node state for memory hotadd] [y-goto@jp.fujitsu.com: Fix memory hotplug + sparsemem build] Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Bob Picco <bob.picco@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@skynet.ie> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:58 -07:00
Christoph Lameter	1380891071	Memoryless nodes: Generic management of nodemasks for various purposes Why do we need to support memoryless nodes? KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > For fujitsu, problem is called "empty" node. > > When ACPI's SRAT table includes "possible nodes", ia64 bootstrap(acpi_numa_init) > creates nodes, which includes no memory, no cpu. > > I tried to remove empty-node in past, but that was denied. > It was because we can hot-add cpu to the empty node. > (node-hotplug triggered by cpu is not implemented now. and it will be ugly.) > > > For HP, (Lee can comment on this later), they have memory-less-node. > As far as I hear, HP's machine can have following configration. > > (example) > Node0: CPU0 memory AAA MB > Node1: CPU1 memory AAA MB > Node2: CPU2 memory AAA MB > Node3: CPU3 memory AAA MB > Node4: Memory XXX GB > > AAA is very small value (below 16MB) and will be omitted by ia64 bootstrap. > After boot, only Node 4 has valid memory (but have no cpu.) > > Maybe this is memory-interleave by firmware config. Christoph Lameter <clameter@sgi.com> wrote: > Future SGI platforms (actually also current one can have but nothing like > that is deployed to my knowledge) have nodes with only cpus. Current SGI > platforms have nodes with just I/O that we so far cannot manage in the > core. So the arch code maps them to the nearest memory node. Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote: > For the HP platforms, we can configure each cell with from 0% to 100% > "cell local memory". When we configure with <100% CLM, the "missing > percentages" are interleaved by hardware on a cache-line granularity to > improve bandwidth at the expense of latency for numa-challenged > applications [and OSes, but not our problem ;-)]. When we boot Linux on > such a config, all of the real nodes have no memory--it all resides in a > single interleaved pseudo-node. > > When we boot Linux on a 100% CLM configuration [== NUMA], we still have > the interleaved pseudo-node. It contains a few hundred MB stolen from > the real nodes to contain the DMA zone. [Interleaved memory resides at > phys addr 0]. The memoryless-nodes patches, along with the zoneorder > patches, support this config as well. > > Also, when we boot a NUMA config with the "mem=" command line, > specifying less memory than actually exists, Linux takes the excluded > memory "off the top" rather than distributing it across the nodes. This > can result in memoryless nodes, as well. > This patch: Preparation for memoryless node patches. Provide a generic way to keep nodemasks describing various characteristics of NUMA nodes. Remove the node_online_map and the node_possible map and realize the same functionality using two nodes stats: N_POSSIBLE and N_ONLINE. [Lee.Schermerhorn@hp.com: Initialize N_*_MEMORY and N_CPU masks for non-NUMA config] Signed-off-by: Christoph Lameter <clameter@sgi.com> Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Bob Picco <bob.picco@hp.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@skynet.ie> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:58 -07:00
Nick Piggin	55144768e1	fs: remove some AOP_TRUNCATED_PAGE prepare/commit_write no longer returns AOP_TRUNCATED_PAGE since OCFS2 and GFS2 were converted to the new aops, so we can make some simplifications for that. [michal.k.k.piotrowski@gmail.com: fix warning] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:58 -07:00
Nick Piggin	03158cd7eb	fs: restore nobh Implement nobh in new aops. This is a bit tricky. FWIW, nobh_truncate is now implemented in a way that does not create blocks in sparse regions, which is a silly thing for it to have been doing (isn't it?) ext2 survives fsx and fsstress. jfs is converted as well... ext3 should be easy to do (but not done yet). [akpm@linux-foundation.org: coding-style fixes] Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:58 -07:00
Nick Piggin	a20fa20c54	With reiserfs no longer using the weird generic_cont_expand, remove it completely. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:56 -07:00
Nick Piggin	89e107877b	fs: new cont helpers Rework the generic block "cont" routines to handle the new aops. Supporting cont_prepare_write would take quite a lot of code to support, so remove it instead (and we later convert all filesystems to use it). write_begin gets passed AOP_FLAG_CONT_EXPAND when called from generic_cont_expand, so filesystems can avoid the old hacks they used. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:55 -07:00
Nick Piggin	afddba49d1	fs: introduce write_begin, write_end, and perform_write aops These are intended to replace prepare_write and commit_write with more flexible alternatives that are also able to avoid the buffered write deadlock problems efficiently (which prepare_write is unable to do). [mark.fasheh@oracle.com: API design contributions, code review and fixes] [akpm@linux-foundation.org: various fixes] [dmonakhov@sw.ru: new aop block_write_begin fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:55 -07:00
Nick Piggin	2f718ffc16	mm: buffered write iterator Add an iterator data structure to operate over an iovec. Add usercopy operators needed by generic_file_buffered_write, and convert that function over. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:55 -07:00
Nick Piggin	08291429cf	mm: fix pagecache write deadlocks Modify the core write() code so that it won't take a pagefault while holding a lock on the pagecache page. There are a number of different deadlocks possible if we try to do such a thing: 1. generic_buffered_write 2. lock_page 3. prepare_write 4. unlock_page+vmtruncate 5. copy_from_user 6. mmap_sem(r) 7. handle_mm_fault 8. lock_page (filemap_nopage) 9. commit_write 10. unlock_page a. sys_munmap / sys_mlock / others b. mmap_sem(w) c. make_pages_present d. get_user_pages e. handle_mm_fault f. lock_page (filemap_nopage) 2,8 - recursive deadlock if page is same 2,8;2,8 - ABBA deadlock is page is different 2,6;b,f - ABBA deadlock if page is same The solution is as follows: 1. If we find the destination page is uptodate, continue as normal, but use atomic usercopies which do not take pagefaults and do not zero the uncopied tail of the destination. The destination is already uptodate, so we can commit_write the full length even if there was a partial copy: it does not matter that the tail was not modified, because if it is dirtied and written back to disk it will not cause any problems (uptodate means that the destination page is as new or newer than the copy on disk). 1a. The above requires that fault_in_pages_readable correctly returns access information, because atomic usercopies cannot distinguish between non-present pages in a readable mapping, from lack of a readable mapping. 2. If we find the destination page is non uptodate, unlock it (this could be made slightly more optimal), then allocate a temporary page to copy the source data into. Relock the destination page and continue with the copy. However, instead of a usercopy (which might take a fault), copy the data from the pinned temporary page via the kernel address space. (also, rename maxlen to seglen, because it was confusing) This increases the CPU/memory copy cost by almost 50% on the affected workloads. That will be solved by introducing a new set of pagecache write aops in a subsequent patch. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:54 -07:00
Lee Schermerhorn	754af6f5a8	Mem Policy: add MPOL_F_MEMS_ALLOWED get_mempolicy() flag Allow an application to query the memories allowed by its context. Updated numa_memory_policy.txt to mention that applications can use this to obtain allowed memories for constructing valid policies. TODO: update out-of-tree libnuma wrapper[s], or maybe add a new wrapper--e.g., numa_get_mems_allowed() ? Also, update numa syscall man pages. Tested with memtoy V>=0.13. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:54 -07:00
Martin Schwidefsky	c92ff1bde0	move mm_struct and vm_area_struct Move the definitions of struct mm_struct and struct vma_area_struct to include/mm_types.h. This allows to define more function in asm/pgtable.h and friends with inline assemblies instead of macros. Compile tested on i386, powerpc, powerpc64, s390-32, s390-64 and x86_64. [aurelien@aurel32.net: build fix] Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:53 -07:00
Nick Piggin	c0bc9875b7	radix-tree: use indirect bit Rather than sign direct radix-tree pointers with a special bit, sign the indirect one that hangs off the root. This means that, given a lookup_slot operation, the invalid result will be differentiated from the valid (previously, valid results could have the bit either set or clear). This does not affect slot lookups which occur under lock -- they can never return an invalid result. Is needed in future for lockless pagecache. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:53 -07:00
Nick Piggin	557ed1fa26	remove ZERO_PAGE The commit `b5810039a5` contains the note A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue. And indeed this cacheline bouncing has shown up on large SGI systems. There was a situation where an Altix system was essentially livelocked tearing down ZERO_PAGE pagetables when an HPC app aborted during startup. This situation can be avoided in userspace, but it does highlight the potential scalability problem with refcounting ZERO_PAGE, and corner cases where it can really hurt (we don't want the system to livelock!). There are several broad ways to fix this problem: 1. add back some special casing to avoid refcounting ZERO_PAGE 2. per-node or per-cpu ZERO_PAGES 3. remove the ZERO_PAGE completely I will argue for 3. The others should also fix the problem, but they result in more complex code than does 3, with little or no real benefit that I can see. Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a false optimisation: if an application is performance critical, it would not be doing many read faults of new memory, or at least it could be expected to write to that memory soon afterwards. If cache or memory use is critical, it should not be working with a significant number of ZERO_PAGEs anyway (a more compact representation of zeroes should be used). As a sanity check -- mesuring on my desktop system, there are never many mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not increase much without it. When running a make -j4 kernel compile on my dual core system, there are about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000 ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second is torn down without being COWed). So removing ZERO_PAGE will save 1,000 page faults per second when running kbuild, while keeping it only saves less than 1 page clearing operation per second. 1 page clear is cheaper than a thousand faults, presumably, so there isn't an obvious loss. Neither the logical argument nor these basic tests give a guarantee of no regressions. However, this is a reasonable opportunity to try to remove the ZERO_PAGE from the pagefault path. If it is found to cause regressions, we can reintroduce it and just avoid refcounting it. The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked. I don't see much use to them except on benchmarks. All other users of ZERO_PAGE are converted just to use ZERO_PAGE(0) for simplicity. We can look at replacing them all and maybe ripping out ZERO_PAGE completely when we are more satisfied with this solution. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus "snif" Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:53 -07:00
Christoph Lameter	aadb4bc4a1	SLUB: direct pass through of page size or higher kmalloc requests This gets rid of all kmalloc caches larger than page size. A kmalloc request larger than PAGE_SIZE > 2 is going to be passed through to the page allocator. This works both inline where we will call __get_free_pages instead of kmem_cache_alloc and in __kmalloc. kfree is modified to check if the object is in a slab page. If not then the page is freed via the page allocator instead. Roughly similar to what SLOB does. Advantages: - Reduces memory overhead for kmalloc array - Large kmalloc operations are faster since they do not need to pass through the slab allocator to get to the page allocator. - Performance increase of 10%-20% on alloc and 50% on free for PAGE_SIZEd allocations. SLUB must call page allocator for each alloc anyways since the higher order pages which that allowed avoiding the page alloc calls are not available in a reliable way anymore. So we are basically removing useless slab allocator overhead. - Large kmallocs yields page aligned object which is what SLAB did. Bad things like using page sized kmalloc allocations to stand in for page allocate allocs can be transparently handled and are not distinguishable from page allocator uses. - Checking for too large objects can be removed since it is done by the page allocator. Drawbacks: - No accounting for large kmalloc slab allocations anymore - No debugging of large kmalloc slab allocations. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:53 -07:00
Fengguang Wu	57f6b96c09	filemap: convert some unsigned long to pgoff_t Convert some 'unsigned long' to pgoff_t. Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:53 -07:00
Fengguang Wu	535443f515	readahead: remove several readahead macros Remove VM_MAX_CACHE_HIT, MAX_RA_PAGES and MIN_RA_PAGES. Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:52 -07:00
Fengguang Wu	6df8ba4f8a	radixtree: introduce radix_tree_next_hole() Introduce radix_tree_next_hole(root, index, max_scan) to scan radix tree for the first hole. It will be used in interleaved readahead. The implementation is dumb and obviously correct. It can help debug(and document) the possible smart one in future. Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:52 -07:00
Fengguang Wu	f4e6b498d6	readahead: combine file_ra_state.prev_index/prev_offset into prev_pos Combine the file_ra_state members unsigned long prev_index unsigned int prev_offset into loff_t prev_pos It is more consistent and better supports huge files. Thanks to Peter for the nice proposal! [akpm@linux-foundation.org: fix shift overflow] Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:52 -07:00
Fengguang Wu	0bb7ba6b9c	readahead: mmap read-around simplification Fold file_ra_state.mmap_hit into file_ra_state.mmap_miss and make it an int. Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:52 -07:00
Fengguang Wu	937085aa35	readahead: compacting file_ra_state Use 'unsigned int' instead of 'unsigned long' for readahead sizes. This helps reduce memory consumption on 64bit CPU when a lot of files are opened. CC: Andi Kleen <andi@firstfloor.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:52 -07:00
Jesper Juhl	39e91e4331	Clean up duplicate includes in include/linux/memory_hotplug.h This patch cleans up duplicate includes in include/linux/memory_hotplug.h Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:52 -07:00
Andy Whitcroft	29c71111d0	vmemmap: generify initialisation via helpers Convert the common vmemmap population into initialisation helpers for use by architecture vmemmap populators. All architecture implementing the SPARSEMEM_VMEMMAP variant supply an architecture specific vmemmap_populate() initialiser, which may make use of the helpers. This allows us to clean up and remove the initialisation Kconfig entries. With this patch there is a single SPARSEMEM_VMEMMAP_ENABLE Kconfig option to indicate use of that variant. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:51 -07:00
Christoph Lameter	8f6aac419b	Generic Virtual Memmap support for SPARSEMEM SPARSEMEM is a pretty nice framework that unifies quite a bit of code over all the arches. It would be great if it could be the default so that we can get rid of various forms of DISCONTIG and other variations on memory maps. So far what has hindered this are the additional lookups that SPARSEMEM introduces for virt_to_page and page_address. This goes so far that the code to do this has to be kept in a separate function and cannot be used inline. This patch introduces a virtual memmap mode for SPARSEMEM, in which the memmap is mapped into a virtually contigious area, only the active sections are physically backed. This allows virt_to_page page_address and cohorts become simple shift/add operations. No page flag fields, no table lookups, nothing involving memory is required. The two key operations pfn_to_page and page_to_page become: #define __pfn_to_page(pfn) (vmemmap + (pfn)) #define __page_to_pfn(page) ((page) - vmemmap) By having a virtual mapping for the memmap we allow simple access without wasting physical memory. As kernel memory is typically already mapped 1:1 this introduces no additional overhead. The virtual mapping must be big enough to allow a struct page to be allocated and mapped for all valid physical pages. This vill make a virtual memmap difficult to use on 32 bit platforms that support 36 address bits. However, if there is enough virtual space available and the arch already maps its 1-1 kernel space using TLBs (f.e. true of IA64 and x86_64) then this technique makes SPARSEMEM lookups even more efficient than CONFIG_FLATMEM. FLATMEM needs to read the contents of the mem_map variable to get the start of the memmap and then add the offset to the required entry. vmemmap is a constant to which we can simply add the offset. This patch has the potential to allow us to make SPARSMEM the default (and even the only) option for most systems. It should be optimal on UP, SMP and NUMA on most platforms. Then we may even be able to remove the other memory models: FLATMEM, DISCONTIG etc. [apw@shadowen.org: config cleanups, resplit code etc] [kamezawa.hiroyu@jp.fujitsu.com: Fix sparsemem_vmemmap init] [apw@shadowen.org: vmemmap: remove excess debugging] [apw@shadowen.org: simplify initialisation code and reduce duplication] [apw@shadowen.org: pull out the vmemmap code into its own file] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:51 -07:00
Andy Whitcroft	540557b943	sparsemem: record when a section has a valid mem_map We have flags to indicate whether a section actually has a valid mem_map associated with it. This is never set and we rely solely on the present bit to indicate a section is valid. By definition a section is not valid if it has no mem_map and there is a window during init where the present bit is set but there is no mem_map, during which pfn_valid() will return true incorrectly. Use the existing SECTION_HAS_MEM_MAP flag to indicate the presence of a valid mem_map. Switch valid_section{,_nr} and pfn_valid() to this bit. Add a new present_section{,_nr} and pfn_present() interfaces for those users who care to know that a section is going to be valid. [akpm@linux-foundation.org: coding-syle fixes] Signed-off-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:51 -07:00
Guennadi Liakhovetski	b3b708fa27	wake up from a serial port Enable wakeup from serial ports, make it run-time configurable over sysfs, e.g., echo enabled > /sys/devices/platform/serial8250.0/tty/ttyS0/power/wakeup Requires # CONFIG_SYSFS_DEPRECATED is not set Following suggestions from Alan and Russell moved the may_wake_up checks to serial_core.c. This time actually tested - it does even work. Could someone, please, verify, that put_device after device_find_child is correct? Also would be nice to test with a Natsemi UART, that can wake up the system, if such systems exist. For this you just have to apply the patch below, issue the above "echo" command to one of your Natsemi port, suspend and resume your system, and verify that your Natsemi port still works. If you are actually capable of waking up the system from that port, would be nice to test that as well. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:50 -07:00
Guennadi Liakhovetski	aa5346a212	provide stubs for enable_irq_wake() and disable_irq_wake() Provide {enable,disable}_irq_wakeup dummies for undefined cross-compilers for platforms without CONFIG_GENERIC_IRQ. Needed by wake-up-from-a-serial-port.patch Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:50 -07:00
Alan Cox	bf0df636e5	8250_pci: Autodetect mainpine cards Add support for a whole range of boards. Some are partly autodetected but not fully correctly others (PCI Express notably) not at all. Stick all the right entries in. Thanks to Mainpine for information and testing. Signed-off-by: Alan Cox <alan@redhat.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:50 -07:00
James Bottomley	32e8f70230	introduce DMA_MASK_NONE as a signal for unable to do DMA Some devices are incapable of DMA and need to be recognised as such. Introduce a NONE dma mask to facilitate this plus an inline function: is_device_dma_capable() to check this. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Tejun Heo <htejun@gmail.com> Cc: Natalie Protasevich <protasnb@gmail.com> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:50 -07:00
Ralf Baechle	0322a2b840	Add assembler equivalents to __init{,date}_refok I need __INIT_REFOK to fix a MODPOST warning for a few MIPS configs which have to call init code from .text very early in the game due to bootloader issues. __INITDATA_REFOK is just for consistency. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:49 -07:00
Randy Dunlap	bfe8df3d31	slow down printk during boot Optionally add a boot delay after each kernel printk() call, crudely measured in milliseconds, with a maximum delay of 10 seconds per printk. Enable CONFIG_BOOT_PRINTK_DELAY=y and then add (e.g.): "lpj=loops_per_jiffy boot_delay=100" to the kernel command line. It has been useful in cases like "during boot, my machine just reboots or the screen goes black" by slowing down printk, (and adding initcall_debug), we can usually see the last thing that happened before the lights went out which is usually a valuable clue. [akpm@linux-foundation.org: not all architectures implement CONFIG_HZ] [akpm@linux-foundation.org: fix lots of stuff] [bunk@stusta.de: kernel/printk.c: make 2 variables static] [heiko.carstens@de.ibm.com: fix slow down printk on boot compile error] Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:49 -07:00
Timur Tabi	b0c813ceee	[ALSA] ASoC CS4270 codec device driver This patch adds ALSA SoC support for the Cirrus Logic CS4270 codec. The following features are suppored: 1) Stand-alone and software mode 2) Software mode via I2C only 3) Master mode, not Slave 4) No power management Signed-off-by: Timur Tabi <timur@freescale.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jaroslav Kysela <perex@suse.cz>	2007-10-16 15:58:19 +02:00
Hans-Christian Egtvedt	eafe570847	[ALSA] ALSA sound driver for the AT73C213 DAC using Atmel SSC driver This patch adds support for the AT73C213 DAC using the misc Atmel SSC driver in I2S mode. The driver also requires a SPI to setup the registers and control volume. It has been tested with an AT32AP7000 on the ATSTK1000 development board. The driver should also work with any Atmel device with an SSC module supported by the Atmel SSC driver (atmel-ssc). The atmel-ssc driver is just submitted to the Linux kernel. Please see mail thread http://lkml.org/lkml/2007/7/16/32 Signed-off-by: Hans-Christian Egtvedt <hcegtvedt@atmel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jaroslav Kysela <perex@suse.cz>	2007-10-16 15:57:50 +02:00
Jens Axboe	55c16a7004	IDE: sg chaining support Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-16 11:21:00 +02:00
Jens Axboe	ba2da2f8d6	i2o: sg chaining support Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-16 11:21:00 +02:00
Jens Axboe	8726021626	libata: convert to using sg helpers This converts libata to using the sg helpers for looking up sg elements, instead of doing it manually. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-16 11:14:12 +02:00
Jens Axboe	70eb8040dc	Add chained sg support to linux/scatterlist.h The core of the patch - allow the last sg element in a scatterlist table to point to the start of a new table. We overload the LSB of the page pointer to indicate whether this is a valid sg entry, or merely a link to the next list. Includes a fix from Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> correcting the ifdef ARCH_HAS_SG_CHAIN guarding sg_last(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-16 11:08:51 +02:00
Jens Axboe	96b418c960	Add sg helpers for iterating over a scatterlist table First step to being able to change the scatterlist setup without having to modify drivers (a lot :-) Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-16 11:07:10 +02:00
Adrian Bunk	bb879463b5	remove ide_get_error_location() Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-16 11:05:06 +02:00
Jens Axboe	fd5d806266	block: convert blkdev_issue_flush() to use empty barriers Then we can get rid of ->issue_flush_fn() and all the driver private implementations of that. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-16 11:05:02 +02:00
Jens Axboe	bf2de6f5a4	block: Initial support for data-less (or empty) barrier support This implements functionality to pass down or insert a barrier in a queue, without having data attached to it. The ->prepare_flush_fn() infrastructure from data barriers are reused to provide this functionality. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-16 11:03:56 +02:00
Jens Axboe	a0cd128542	block: add end_queued_request() and end_dequeued_request() helpers We can use this helper in the elevator core for BLKPREP_KILL, and it'll also be useful for the empty barrier patch. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-16 11:03:53 +02:00
Randy Dunlap	e6716b87d5	docbook: fix filesystems content Fix filesystems docbook warnings. Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'name' Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'mode' Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'parent' Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'value' Warning(linux-2.6.23-git8//include/linux/jbd.h:404): No description found for parameter 'h_lockdep_map' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-15 17:56:36 -07:00
Randy Dunlap	fd39c86b3d	docbook: fix usb content Fix USB docbook warnings. Warning(linux-2.6.23-git8//include/linux/usb/gadget.h:487): No description found for parameter 'g' Warning(linux-2.6.23-git8//include/linux/usb/gadget.h:506): No description found for parameter 'g' Warning(linux-2.6.23-git8//drivers/usb/core/hub.c:1416): No description found for parameter 'usb_dev' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-15 17:56:36 -07:00
Linus Torvalds	65a6ec0d72	Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm * 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (95 commits) [ARM] 4578/1: CM-x270: PCMCIA support [ARM] 4577/1: ITE 8152 PCI bridge support [ARM] 4576/1: CM-X270 machine support [ARM] pxa: Avoid pxa_gpio_mode() in gpio_direction_{in,out}put() [ARM] pxa: move pxa_set_mode() from pxa2xx_mainstone.c to mainstone.c [ARM] pxa: move pxa_set_mode() from pxa2xx_lubbock.c to lubbock.c [ARM] pxa: Make cpu_is_pxaXXX dependent on configuration symbols [ARM] pxa: PXA3xx base support [NET] smc91x: fix PXA DMA support code [SERIAL] Fix console initialisation ordering [ARM] pxa: tidy up arch/arm/mach-pxa/Makefile [ARM] Update arch/arm/Kconfig for drivers/Kconfig changes [ARM] 4600/1: fix kernel build failure with build-id-supporting binutils [ARM] 4599/1: Preserve ATAG list for use with kexec (2.6.23) [ARM] Rename consistent_sync() as dma_cache_maint() [ARM] 4572/1: ep93xx: add cirrus logic edb9307 support [ARM] 4596/1: S3C2412: Correct IRQs for SDI+CF and add decoding support [ARM] 4595/1: ns9xxx: define registers as void __iomem * instead of volatile u32 [ARM] 4594/1: ns9xxx: use the new gpio functions [ARM] 4593/1: ns9xxx: implement generic clockevents ...	2007-10-15 16:08:50 -07:00
Linus Torvalds	541010e4b8	Merge branch 'locks' of git://linux-nfs.org/~bfields/linux * 'locks' of git://linux-nfs.org/~bfields/linux: nfsd: remove IS_ISMNDLCK macro Rework /proc/locks via seq_files and seq_list helpers fs/locks.c: use list_for_each_entry() instead of list_for_each() NFS: clean up explicit check for mandatory locks AFS: clean up explicit check for mandatory locks 9PFS: clean up explicit check for mandatory locks GFS2: clean up explicit check for mandatory locks Cleanup macros for distinguishing mandatory locks Documentation: move locks.txt in filesystems/ locks: add warning about mandatory locking races Documentation: move mandatory locking documentation to filesystems/ locks: Fix potential OOPS in generic_setlease() Use list_first_entry in locks_wake_up_blocks locks: fix flock_lock_file() comment Memory shortage can result in inconsistent flocks state locks: kill redundant local variable locks: reverse order of posix_locks_conflict() arguments	2007-10-15 16:07:40 -07:00
Linus Torvalds	a52cefc80f	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits) [IPV6]: Consolidate the ip6_pol_route_(input\|output) pair [TCP]: Make snd_cwnd_cnt 32-bit [TCP]: Update the /proc/net/tcp documentation [NETNS]: Don't panic on creating the namespace's loopback [NEIGH]: Ensure that pneigh_lookup is protected with RTNL [INET]: kmalloc+memset -> kzalloc in frag_alloc_queue [ISDN]: Fix compile with CONFIG_ISDN_X25 disabled. [IPV6]: Replace sk_buff ** with sk_buff * in input handlers [SELINUX]: Update for netfilter ->hook() arg changes. [INET]: Consolidate the xxx_put [INET]: Small cleanup for xxx_put after evictor consolidation [INET]: Consolidate the xxx_evictor [INET]: Consolidate the xxx_frag_destroy [INET]: Consolidate xxx_the secret_rebuild [INET]: Consolidate the xxx_frag_kill [INET]: Collect common frag sysctl variables together [INET]: Collect frag queues management objects together [INET]: Move common fields from frag_queues in one place. [TG3]: Fix performance regression on 5705. [ISDN]: Remove local copy of device name to make sure renames work. ...	2007-10-15 14:06:58 -07:00
Linus Torvalds	f2e1d89f9b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (40 commits) Input: use full RCU API Input: remove tsdev interface Input: add support for Blackfin BF54x Keypad controller Input: appletouch - another fix for idle reset logic HWMON: hdaps - switch to using input-polldev Input: add support for SEGA Dreamcast keyboard Input: omap-keyboard - don't pretend we support changing keymap Input: lifebook - fix X and Y axis range Input: usbtouchscreen - add support for GeneralTouch devices Input: fix open count handling in input interfaces Input: keyboard - add CapsShift lock Input: adbhid - produce all CapsLock key events Input: ALPS - add signature for ThinkPad R61 Input: jornada720_kbd - send MSC_SCAN events Input: add support for the HP Jornada 7xx (710/720/728) touchscreen Input: add support for HP Jornada 7xx onboard keyboard Input: add support for HP Jornada onboard keyboard (HP6XX) Input: ucb1400_ts - use schedule_timeout_uninterruptible Input: xpad - fix dependancy on LEDS class Input: auto-select INPUT for MAC_EMUMOUSEBTN option ... Resolved conflicts manually in drivers/hwmon/applesmc.c: converting from a class device to a device and converting to use input-polldev created a few apparently trivial clashes..	2007-10-15 13:41:39 -07:00
Ilpo Järvinen	f78a1b3892	[TCP]: Make snd_cwnd_cnt 32-bit Very little point of having 32-bit snd_cnwd if this is not 32-bit as well, as a number of snd_cwnd incrementation formulas assume that snd_cwnd_cnt can be at least as large as snd_cwnd. Whether 32-bit is useful was discussed when `e0ef57cc56` was made: http://marc.info/?l=linux-netdev&m=117218144409825&w=2 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:59:43 -07:00
Karsten Keil	faca94ffae	[ISDN]: Remove local copy of device name to make sure renames work. Signed-off-by: Karsten Keil <kkeil@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:37 -07:00
Herbert Xu	3db05fea51	[NETFILTER]: Replace sk_buff ** with sk_buff * With all the users of the double pointers removed, this patch mops up by finally replacing all occurances of sk_buff ** in the netfilter API by sk_buff *. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:29 -07:00
Herbert Xu	37d4187922	[NETFILTER]: Do not copy skb in skb_make_writable Now that all callers of netfilter can guarantee that the skb is not shared, we no longer have to copy the skb in skb_make_writable. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:27 -07:00
Herbert Xu	e0053ec07e	[SKBUFF]: Add skb_morph This patch creates a new function skb_morph that's just like skb_clone except that it lets user provide the spare skb that will be overwritten by the one that's to be cloned. This will be used by IP fragment reassembly so that we get back the same skb that went in last (rather than the head skb that we get now which requires us to carry around double pointers all over the place). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:24 -07:00
Brice Goglin	eabd7e35c0	Add skb_is_gso_v6 Add skb_is_gso_v6(). Signed-off-by: Brice Goglin <brice@myri.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-10-15 14:24:07 -04:00
Mike Rapoport	a8fc078955	[ARM] 4577/1: ITE 8152 PCI bridge support This patch provides driver for ITE 8152 PCI bridge. Signed-off-by: Mike Rapoport <mike@compulab.co.il> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-10-15 18:53:59 +01:00
Linus Torvalds	f4921aff5b	Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 * git://git.linux-nfs.org/pub/linux/nfs-2.6: (131 commits) NFSv4: Fix a typo in nfs_inode_reclaim_delegation NFS: Add a boot parameter to disable 64 bit inode numbers NFS: nfs_refresh_inode should clear cache_validity flags on success NFS: Fix a connectathon regression in NFSv3 and NFSv4 NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode SUNRPC: Don't call xprt_release in call refresh SUNRPC: Don't call xprt_release() if call_allocate fails SUNRPC: Fix buggy UDP transmission [23/37] Clean up duplicate includes in [2.6 patch] net/sunrpc/rpcb_clnt.c: make struct rpcb_program static SUNRPC: Use correct type in buffer length calculations SUNRPC: Fix default hostname created in rpc_create() nfs: add server port to rpc_pipe info file NFS: Get rid of some obsolete macros NFS: Simplify filehandle revalidation NFS: Ensure that nfs_link() returns a hashed dentry NFS: Be strict about dentry revalidation when doing exclusive create NFS: Don't zap the readdir caches upon error NFS: Remove the redundant nfs_reval_fsid() NFSv3: Always use directory post-op attributes in nfs3_proc_lookup ... Fix up trivial conflict due to sock_owned_by_user() cleanup manually in net/sunrpc/xprtsock.c	2007-10-15 10:47:35 -07:00
Linus Torvalds	419217cb1d	Merge branch 'v2.6.24-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep * 'v2.6.24-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep: lockdep: annotate dir vs file i_mutex lockdep: per filesystem inode lock class lockdep: annotate kprobes irq fiddling lockdep: annotate rcu_read_{,un}lock{,_bh} lockdep: annotate journal_start() lockdep: s390: connect the sysexit hook lockdep: x86_64: connect the sysexit hook lockdep: i386: connect the sysexit hook lockdep: syscall exit check lockdep: fixup mutex annotations lockdep: fix mismatched lockdep_depth/curr_chain_hash lockdep: Avoid /proc/lockdep & lock_stat infinite output lockdep: maintainers	2007-10-15 10:40:41 -07:00
Linus Torvalds	b5869ce7f6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: (140 commits) sched: sync wakeups preempt too sched: affine sync wakeups sched: guest CPU accounting: maintain guest state in KVM sched: guest CPU accounting: maintain stats in account_system_time() sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields sched: guest CPU accounting: add guest-CPU /proc/stat field sched: domain sysctl fixes: add terminator comment sched: domain sysctl fixes: do not crash on allocation failure sched: domain sysctl fixes: unregister the sysctl table before domains sched: domain sysctl fixes: use for_each_online_cpu() sched: domain sysctl fixes: use kcalloc() Make scheduler debug file operations const sched: enable wake-idle on CONFIG_SCHED_MC=y sched: reintroduce topology.h tunings sched: allow the immediate migration of cache-cold tasks sched: debug, improve migration statistics sched: debug: increase width of debug line sched: activate task_hot() only on fair-scheduled tasks sched: reintroduce cache-hot affinity sched: speed up context-switches a bit ...	2007-10-15 08:22:16 -07:00
Linus Torvalds	df3d80f5a5	Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (207 commits) [SCSI] gdth: fix CONFIG_ISA build failure [SCSI] esp_scsi: remove __dev{init,exit} [SCSI] gdth: !use_sg cleanup and use of scsi accessors [SCSI] gdth: Move members from SCp to gdth_cmndinfo, stage 2 [SCSI] gdth: Setup proper per-command private data [SCSI] gdth: Remove gdth_ctr_tab[] [SCSI] gdth: switch to modern scsi host registration [SCSI] gdth: gdth_interrupt() gdth_get_status() & gdth_wait() fixes [SCSI] gdth: clean up host private data [SCSI] gdth: Remove virt hosts [SCSI] gdth: Reorder scsi_host_template intitializers [SCSI] gdth: kill gdth_{read,write}[bwl] wrappers [SCSI] gdth: Remove 2.4.x support, in-kernel changelog [SCSI] gdth: split out pci probing [SCSI] gdth: split out eisa probing [SCSI] gdth: split out isa probing gdth: Make one abuse of scsi_cmnd less obvious [SCSI] NCR5380: Use scsi_eh API for REQUEST_SENSE invocation [SCSI] usb storage: use scsi_eh API in REQUEST_SENSE execution [SCSI] scsi_error: Refactoring scsi_error to facilitate in synchronous REQUEST_SENSE ...	2007-10-15 08:19:33 -07:00
Linus Torvalds	37ca506adc	Merge branch 'nfs-server-stable' of git://linux-nfs.org/~bfields/linux * 'nfs-server-stable' of git://linux-nfs.org/~bfields/linux: knfsd: query filesystem for NFSv4 getattr of FATTR4_MAXNAME knfsd: nfsv4 delegation recall should take reference on client knfsd: don't shutdown callbacks until nfsv4 client is freed knfsd: let nfsd manage timing out its own leases knfsd: Add source address to sunrpc svc errors knfsd: 64 bit ino support for NFS server svcgss: move init code into separate function knfsd: remove code duplication in nfsd4_setclientid() nfsd warning fix knfsd: fix callback rpc cred knfsd: move nfsv4 slab creation/destruction to module init/exit knfsd: spawn kernel thread to probe callback channel knfsd: nfs4 name->id mapping not correctly parsing negative downcall knfsd: demote some printk()s to dprintk()s knfsd: cleanup of nfsd4 cmp_* functions knfsd: delete code made redundant by map_new_errors nfsd: fix horrible indentation in nfsd_setattr nfsd: remove unused cache_for_each macro nfsd: tone down inaccurate dprintk	2007-10-15 08:16:53 -07:00
Jiri Kosina	57d292bd7e	HID: fix HIDIOCGRDESC memory access in hidraw Fix bogus copying of data into userspace when HIDIOCGRDESC is issued. HID-transport layer makes sure that dev->hid->rdesc is not larger than HID_MAX_DESCRIPTOR_SIZE. Noticed-by: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-15 08:12:00 -07:00
Laurent Vivier	94886b84b1	sched: guest CPU accounting: maintain stats in account_system_time() modify account_system_time() to add cputime to cpustat->guest if we are running a VCPU. We add this cputime to cpustat->user instead of cpustat->system because this part of KVM code is in fact user code although it is executed in the kernel. We duplicate VCPU time between guest and user to allow an unmodified "top(1)" to display correct value. A modified "top(1)" is able to display good cpu user time and cpu guest time by subtracting cpu guest time from cpu user time. Update "gtime" in task_struct accordingly. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Acked-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-15 17:00:19 +02:00
Laurent Vivier	9ac52315d4	sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields like for cpustat, introduce the "gtime" (guest time of the task) and "cgtime" (guest time of the task children) fields for the tasks. Modify signal_struct and task_struct. Modify /proc/<pid>/stat to display these new fields. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Acked-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-15 17:00:19 +02:00
Laurent Vivier	5e84cfde51	sched: guest CPU accounting: add guest-CPU /proc/stat field as recent CPUs introduce a third running state, after "user" and "system", we need a new field, "guest", in cpustat to store the time used by the CPU to run virtual CPU. Modify /proc/stat to display this new field. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Acked-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-15 17:00:19 +02:00
Ingo Molnar	7a6c6bcee0	sched: enable wake-idle on CONFIG_SCHED_MC=y most multicore CPUs today have shared L2 caches, so tune things so that the spreading amongst cores is more aggressive. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-15 17:00:19 +02:00
Ingo Molnar	95dbb421d1	sched: reintroduce topology.h tunings reintroduce the 2.6.22 topology.h tunings again - they result in slightly better balancing. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-15 17:00:19 +02:00
Ingo Molnar	cc367732ff	sched: debug, improve migration statistics add new migration statistics when SCHED_DEBUG and SCHEDSTATS is enabled. Available in /proc/<PID>/sched. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-15 17:00:18 +02:00
Ingo Molnar	da84d96176	sched: reintroduce cache-hot affinity reintroduce a simplified version of cache-hot/cold scheduling affinity. This improves performance with certain SMP workloads, such as sysbench. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-15 17:00:18 +02:00
Mike Galbraith	95938a35c5	sched: prevent wakeup over-scheduling Prevent wakeup over-scheduling. Once a task has been preempted by a task of the same or lower priority, it becomes ineligible for repeated preemption by same until it has been ticked, or slept. Instead, the task is marked for preemption at the next tick. Tasks of higher priority still preempt immediately. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-15 17:00:14 +02:00
Dhaval Giani	5cb350baf5	sched: group scheduling, sysfs tunables Add tunables in sysfs to modify a user's cpu share. A directory is created in sysfs for each new user in the system. /sys/kernel/uids/<uid>/cpu_share Reading this file returns the cpu shares granted for the user. Writing into this file modifies the cpu share for the user. Only an administrator is allowed to modify a user's cpu share. Ex: # cd /sys/kernel/uids/ # cat 512/cpu_share 1024 # echo 2048 > 512/cpu_share # cat 512/cpu_share 2048 # Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-15 17:00:14 +02:00
Ingo Molnar	4cf86d77f5	sched: cleanup: rename task_grp to task_group cleanup: rename task_grp to task_group. No need to save two characters and 'grp' is annoying to read. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-15 17:00:14 +02:00
Mike Galbraith	af92723262	sched: cleanup, remove the TASK_NONINTERACTIVE flag Here's another piece of low hanging obsolete fruit. Remove obsolete TASK_NONINTERACTIVE. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-15 17:00:13 +02:00
Ingo Molnar	5522d5d5f7	sched: mark scheduling classes as const mark scheduling classes as const. The speeds up the code a bit and shrinks it: text data bss dec hex filename 40027 4018 292 44337 ad31 sched.o.before 40190 3842 292 44324 ad24 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:12 +02:00
Peter Zijlstra	5f6d858ecc	sched: speed up and simplify vslice calculations speed up and simplify vslice calculations. [ From: Mike Galbraith <efault@gmx.de>: build fix ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-15 17:00:12 +02:00
Ingo Molnar	2d72376b3a	sched: clean up schedstats, cnt -> count rename all 'cnt' fields and variables to the less yucky 'count' name. yuckage noticed by Andrew Morton. no change in code, other than the /proc/sched_debug bkl_count string got a bit larger: text data bss dec hex filename 38236 3506 24 41766 a326 sched.o.before 38240 3506 24 41770 a32a sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:12 +02:00
Ingo Molnar	94359f05cb	sched: undo some of the recent changes undo some of the recent changes that are not needed after all, such as last_min_vruntime. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2007-10-15 17:00:11 +02:00
Peter Zijlstra	67e9fb2a39	sched: add vslice add vslice: the load-dependent "virtual slice" a task should run ideally, so that the observed latency stays within the sched_latency window. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:10 +02:00
Ingo Molnar	c18b8a7cbc	sched: remove unneeded tunables remove unneeded tunables. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:10 +02:00
Ingo Molnar	b8efb56172	sched debug: BKL usage statistics add per task and per rq BKL usage statistics. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:10 +02:00
Srivatsa Vaddagiri	24e377a832	sched: add fair-user scheduler Enable user-id based fair group scheduling. This is useful for anyone who wants to test the group scheduler w/o having to enable CONFIG_CGROUPS. A separate scheduling group (i.e struct task_grp) is automatically created for every new user added to the system. Upon uid change for a task, it is made to move to the corresponding scheduling group. A /proc tunable (/proc/root_user_share) is also provided to tune root user's quota of cpu bandwidth. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:09 +02:00
Srivatsa Vaddagiri	9b5b77512d	sched: clean up code under CONFIG_FAIR_GROUP_SCHED With the view of supporting user-id based fair scheduling (and not just container-based fair scheduling), this patch renames several functions and makes them independent of whether they are being used for container or user-id based fair scheduling. Also fix a problem reported by KAMEZAWA Hiroyuki (wrt allocating less-sized array for tg->cfs_rq[] and tf->se[]). Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:09 +02:00
Srivatsa Vaddagiri	83b699ed20	sched: revert recent removal of set_curr_task() Revert removal of set_curr_task. Use put_prev_task/set_curr_task when changing groups/policies Signed-off-by: Srivatsa Vaddagiri < vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2007-10-15 17:00:08 +02:00
Dmitry Adamushko	f6b53205e1	sched: rework enqueue/dequeue_entity() to get rid of set_curr_task() rework enqueue/dequeue_entity() to get rid of sched_class::set_curr_task(). This simplifies sched_setscheduler(), rt_mutex_setprio() and sched_move_tasks(). text data bss dec hex filename 24330 2734 20 27084 69cc sched.o.before 24233 2730 20 26983 6967 sched.o.after Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:08 +02:00
Dmitry Adamushko	4530d7ab0f	sched: simplify sched_class::yield_task() the 'p' (task_struct) parameter in the sched_class :: yield_task() is redundant as the caller is always the 'current'. Get rid of it. text data bss dec hex filename 24341 2734 20 27095 69d7 sched.o.before 24330 2734 20 27084 69cc sched.o.after Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:08 +02:00
Dmitry Adamushko	30cfdcfc5f	sched: do not keep current in the tree and get rid of sched_entity::fair_key Get rid of 'sched_entity::fair_key'. As a side effect, 'current' is not kept withing the tree for SCHED_NORMAL/BATCH tasks anymore. This simplifies some parts of code (e.g. entity_tick() and yield_task_fair()) and also somewhat optimizes them (e.g. a single update_curr() now vs. dequeue/enqueue() before in entity_tick()). Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:07 +02:00
Ingo Molnar	bbdba7c0e1	sched: remove wait_runtime fields and features remove wait_runtime based fields and features, now that the CFS math has been changed over to the vruntime metric. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:06 +02:00
Ingo Molnar	e22f5bbf86	sched: remove wait_runtime limit remove the wait_runtime-limit fields and the code depending on it, now that the math has been changed over to rely on the vruntime metric. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:06 +02:00
Ingo Molnar	e9acbff648	sched: introduce se->vruntime introduce se->vruntime as a sum of weighted delta-exec's, and use that as the key into the tree. the idea to use absolute virtual time as the basic metric of scheduling has been first raised by William Lee Irwin, advanced by Tong Li and first prototyped by Roman Zippel in the "Really Fair Scheduler" (RFS) patchset. also see: http://lkml.org/lkml/2007/9/2/76 for a simpler variant of this patch. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:04 +02:00
Ingo Molnar	8ebc91d936	sched: remove stat_gran remove the stat_gran code - it was disabled by default and it causes unnecessary overhead. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:03 +02:00
Ingo Molnar	2bd8e6d422	sched: use constants if !CONFIG_SCHED_DEBUG use constants if !CONFIG_SCHED_DEBUG. this speeds up the code and reduces code-size: text data bss dec hex filename 27464 3014 16 30494 771e sched.o.before 26929 3010 20 29959 7507 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:02 +02:00
Ingo Molnar	eba1ed4b7e	sched: debug: track maximum 'slice' track the maximum amount of time a task has executed while the CPU load was at least 2x. (i.e. at least two nice-0 tasks were runnable) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-15 17:00:02 +02:00
Thomas Gleixner	1595f452f3	clockevents: introduce force broadcast notifier The 64bit SMP bootup is slightly different to the 32bit one. It enables the boot CPU local APIC timer before all CPUs are brought up. Some AMD C1E systems have the C1E feature flag only set in the secondary CPU. Due to the early enable of the boot CPU local APIC timer the APIC timer is registered as a fully functional device. When we detect the wreckage during the bringup of the secondary CPU, we need to force the boot CPU into broadcast mode. Add a new notifier reason and implement the force broadcast in the clock events layer. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-14 22:57:45 +02:00
Linus Torvalds	4fa435018d	Merge branch 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6 * 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6: (53 commits) hwmon: (vt8231) fix sparse warning hwmon: (sis5595) fix sparse warning hwmon: (w83627hf) don't assume bank 0 hwmon: (w83627hf) Fix setting fan min right after driver load hwmon: (w83627hf) De-macro sysfs callback functions hwmon: Add new combined driver for FSC chips hwmon: (ibmpex) Release IPMI user if hwmon registration fails hwmon: (dme1737) Add sch311x support hwmon: (dme1737) group functions logically hwmon: (dme1737) cleanups hwmon: IBM power meter driver hwmon: (coretemp) Add support for Celeron 4xx hwmon: (lm87) Disable VID when it should be hwmon: (w83781d) Add individual alarm and beep files hwmon: VRM is not read from registers MAINTAINERS: update hwmon subsystem git trees hwmon: Fix the code examples in documentation hwmon: update sysfs interface document - error handling hwmon: (thmc50) Fix a debug message hwmon: (thmc50) Don't create temp3 if not enabled ...	2007-10-14 12:50:19 -07:00
Al Viro	f53f4137ba	fix endianness bug in inet_lro all uses of and almost all assignments to lro_desc->tcp_ack assume that it's net-endian; one converts net-endian to host-endian and sticks it in lro_desc->tcp_ack. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-14 12:41:52 -07:00
Al Viro	9df7c98a0f	inet_lro: trivial endianness annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-14 12:41:52 -07:00
Al Viro	5ba253313d	more low-hanging fruits - kernel, fs, lib signedness Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-14 12:41:52 -07:00
Al Viro	64b33619a3	long vs. unsigned long - low-hanging fruits in drivers deal with signedness of the stuff passed to set_bit() et.al. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-14 12:41:51 -07:00
Jiri Kosina	d057fd4cb8	Merge branch 'hidraw' into for-linus	2007-10-14 14:47:56 +02:00
Jiri Kosina	86166b7bcd	HID: add hidraw interface hidraw is an interface that is going to obsolete hiddev one day. Many userland applications are using libusb instead of using kernel-provided hiddev interface. This is caused by various reasons - the HID parser in kernel doesn't handle all the HID hardware on the planet properly, some devices might require its own specific quirks/drivers, etc. hiddev interface tries to do its best to parse all the received reports properly, and presents only parsed usages into userspace. This is however often not enough, and that's the reason why many userland applications just don't use hiddev at all, and rather use libusb to read raw USB events and process them on their own. Another drawback of hiddev is that it is USB-specific. hidraw interface provides userspace readers with really raw HID reports, no matter what the low-level transport layer is (USB/BT), and gives the userland applications all the freedom to process the HID reports in a way they wish to. Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2007-10-14 14:47:26 +02:00
Khelben Blackstaff	e2bca0749c	Input: add KEY_LOGOFF HUT 1.12 defines Logoff usage 0x19c in Consumer page. There are keyboards out there emitting this usage code (for example Microsoft Wireless Laser Keyboard 6000). Add this key so that HID code could map usages to it. Signed-off-by: Khelben Blackstaff <eye.of.the.8eholder@gmail.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2007-10-14 13:40:02 +02:00
Tomoya Adachi	08f06177f4	USBHID: report descriptor fix for MacBook JIS keyboard This patch fixes the problem, that Japanese MacBook doesn't recognize some keys like '\'(yen, or backslash), '\|'(pipe), and '_'(underscore). It is due to that MacBook JIS keyboard (jp106) sends wrong report descriptor. It saids "logical maximum = 0x65", so Keyboard.0089 is mapped to Key.Unknown, while it should be accepted as Key.Yen. Signed-off-by: Tomoya Adachi <adachi@il.is.s.u-tokyo.ac.jp> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2007-10-14 13:40:01 +02:00
Stelian Pop	0ce91cf9ce	HID: enable hiddev for the SantaRosa MacBookPro IR receiver The infrared remote receiver found in the SantaRosa MacBookPro laptops (MacBookPro3,1) need to be forced to expose a HIDDEV interface (instead of HIDINPUT) so that lirc can access it using the 'macmini' driver. The patch below adds the required quirk for forcing the HIDDEV interface to be activated (HID_QUIRK_HIDDEV) and introduces a new quirk which forces the HIDINPUT interface to be ignored (HID_QUIRK_IGNORE_HIDINPUT). Note that Apple calls this receiver 'IRController4' (info taken from Apple's driver Info.plist). Older Mac{Book,Mini,Pro}s seem to all use the 'IRController1' device (USB id 05ac:8240) which doesn't need those quirks. Signed-off-by: Stelian Pop <stelian@popies.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2007-10-14 13:40:01 +02:00
Jiri Kosina	4dc21a8005	Input: add KEY_SPELLCHECK HUT 1.12 defines Spell Check usage 0x1ab in Consumer page. There are keyboards out there emitting this usage code (for example Microsoft Natural Ergonomic Keyboard 4000). Add this key so that HID code could map usages to it. Acked-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2007-10-14 13:40:00 +02:00
Peter Zijlstra	14358e6dda	lockdep: annotate dir vs file i_mutex On Mon, 2007-09-24 at 22:13 -0400, Steven Rostedt wrote: > The circular lock seems to be this: > > #1: > > sys_mmap2: down_write(&mm->mmap_sem); > nfs_revalidate_mapping: mutex_lock(&inode->i_mutex); > > > #0: > > vfs_readdir: mutex_lock(&inode->i_mutex); > - during the readdir (filldir64), we take a user fault (missing page?) > and call do_page_fault - > do_page_fault: down_read(&mm->mmap_sem); > > > So it does indeed look like a circular locking. Now the question is, "is > this a bug?". Looking like the inode of #1 must be a file or something > else that you can mmap and the inode of #0 seems it must be a directory. > I would say "no". > > Now if you can readdir on a file or mmap a directory, then this could be > an issue. > > Otherwise, I'd love to see someone teach lockdep about this issue! ;-) Make a distinction between file and dir usage of i_mutex. The inode should be complete and unused at unlock_new_inode(), re-init i_mutex depending on its type. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2007-10-14 01:38:33 +02:00
Peter Zijlstra	d475fd428c	lockdep: per filesystem inode lock class Give each filesystem its own inode lock class. The various filesystems have different locking order wrt the inode locks; esp. the pseudo filesystems differ from the rest. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2007-10-15 14:51:31 +02:00
David Brownell	6662cbb989	i2c: Rename the PEC functionality bit Rename I2C_FUNC_SMBUS_HWPEC_CALC as I2C_FUNC_SMBUS_PEC, and list that functionality as always available through the software implementation. Update documentation accordingly (and list similar requirements). The way it's currently packaged doesn't present the capability in a useful way. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-10-13 23:56:33 +02:00
David Brownell	08fb68bb4b	i2c: Move i2c-dev interfaces to i2c-dev.h Move the i2c-dev support into <linux/i2c-dev.h> where it should always have lived. Now <linux/i2c.h> no longer holds stuff related to the optional userspace /dev/i2c-X interface. Improve the descriptions for these ioctl requests. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-10-13 23:56:32 +02:00
David Brownell	53be795934	i2c: Remove i2c_algorithm.algo_control() This removes: - An effectively unused hook: i2c_algorithm.algo_control. - The i2c_control() call, used only by i2c-dev to call that unused hook or set two barely supported adapter params. (That param setting moves into i2c-dev.c ... still iffy due to lack of locking, but no other changes.) As shown by diffstat, this is a net code shrink. It also reduces the complexity of the I2C adapter and /dev interfaces. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-10-13 23:56:32 +02:00
David Brownell	a64ec07d3d	i2c: Document struct i2c_msg Clarify use of the I2C_M_* flags by highlighting the fact that most of them depend on I2C_FUNC_PROTOCOL_MANGLING. Also provide kerneldoc for i2c_smbus_read_block_data() and also for "struct i2c_msg". Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-10-13 23:56:31 +02:00
Adrian Bunk	83eaaed0d0	i2c-core: Make some code static After the i2c-isa removal some code can become static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-10-13 23:56:30 +02:00
David Brownell	3bbb835d4c	i2c: New-style devices can support driver model wakeup flags We need to be able to flag I2C devices, such as RTCs, which can issue wake events (usually through IRQ lines). This adds an i2c_board_info.flags bit, and uses it to initialize the i2c device node. (And shrinks a few lines that were overly long.) Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-10-13 23:56:29 +02:00
Jean Delvare	cee37ae407	i2c: Kill struct i2c_device_id I2C devices do not have any form of ID as PCI or USB devices have. No driver uses "MODULE_DEVICE_TABLE(i2c, ...)" because it doesn't make sense. So we can get rid of struct i2c_device_id and the associated support code. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Greg KH <greg@kroah.com>	2007-10-13 23:56:29 +02:00
Linus Torvalds	bcd11eaa22	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (27 commits) alim15x3: remove redundant m5229_revision check sc1200: fix ->dma_base equal zero handling cs5520: fix ->dma_base equal zero handling sgiioc4: add missing ->dma_base check cs5535: add missing ->dma_base check ide: remove CONFIG_IDEDMA_IVB config option ide: change master/slave IDENTIFY order ide: move ide_config_drive_speed() calls to upper layers (take 2) pdc202xx_new: check ide_config_drive_speed() return value cs5535: check ide_config_drive_speed() return value amd74xx/via82cxxx: check ide_config_drive_speed() return value au1xxx: fix au1xxx_set_pio_mode() icside: use ide_tune_dma() ide-pmac: fix PIO setup and enable autotune ide-pmac: use ide_tune_dma() (take 2) ide-pmac: remove pmac_ide_do_setfeature() (take 2) ide-pmac: remove nIEN clearing from pmac_ide_do_setfeature() ide-pmac: use __ide_wait_stat() ide-pmac: remove extra good status wait from pmac_ide_do_setfeature() ide: add __ide_wait_stat() helper ...	2007-10-13 10:13:27 -07:00
Linus Torvalds	c8c55bcb43	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: (91 commits) [MTD] [NAND] Blackfin on-chip NAND Flash Controller driver [MTD] [NOR] fix ctrl-alt-del can't reboot for intel flash bug [MTD] [NAND] Fix compiler warning in Alauda driver [JFFS2] Remove stray debugging printk [JFFS2] Handle dirents on the flash with embedded zero bytes in names. [JFFS2] Check for creation of dirents with embedded zero bytes in name. [JFFS2] Don't count all 'very dirty' blocks except in debug mode [JFFS2] Check whether garbage-collection actually obsoleted its victim. [JFFS2] Relax threshold for triggering GC due to dirty blocks. [MTD] [OneNAND] Fix typo related with recent commit [JFFS2] Trigger garbage collection when very_dirty_list size becomes excessive [MTD] [NAND] Avoid deadlock in erase callback; release chip lock first. [MTD] [NAND] Resume method for CAFÉ NAND controller [MTD] [NAND] Fix PCI ident table for CAFÉ NAND controller. [MTD] [NAND] s3c2410: fix arch moves [MTD] [OneNAND] fix numerous races [MTD] map driver for NOR flash on the Intel Vermilion Range chipset [JFFS2] Fix unpoint length [MTD] fix CFI point method for discontiguous maps [MTD] MAPS: Merge Lubbock and Mainstone drivers into common PXA2xx driver ...	2007-10-13 10:12:15 -07:00
Linus Torvalds	3749c66c67	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (106 commits) KVM: Replace enum by #define KVM: Skip pio instruction when it is emulated, not executed KVM: x86 emulator: popf KVM: x86 emulator: fix src, dst value initialization KVM: x86 emulator: jmp abs KVM: x86 emulator: lea KVM: X86 emulator: jump conditional short KVM: x86 emulator: imlpement jump conditional relative KVM: x86 emulator: sort opcodes into ascending order KVM: Improve emulation failure reporting KVM: x86 emulator: pushf KVM: x86 emulator: call near KVM: x86 emulator: push imm8 KVM: VMX: Fix exit qualification width on i386 KVM: Move main vcpu loop into subarch independent code KVM: VMX: Move vm entry failure handling to the exit handler KVM: MMU: Don't do GFP_NOWAIT allocations KVM: Rename kvm_arch_ops to kvm_x86_ops KVM: Simplify memory allocation KVM: Hoist SVM's get_cs_db_l_bits into core code. ...	2007-10-13 10:02:11 -07:00
Randy Dunlap	c4ea43c552	net core: fix kernel-doc for new function parameters Fix networking code kernel-doc for newly added parameters. Warning(linux-2.6.23-git2//net/core/sock.c:879): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:570): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:594): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:617): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:641): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:667): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:722): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:959): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:1195): No description found for parameter 'dev' Warning(linux-2.6.23-git2//net/core/dev.c:2105): No description found for parameter 'n' Warning(linux-2.6.23-git2//net/core/dev.c:3272): No description found for parameter 'net' Warning(linux-2.6.23-git2//net/core/dev.c:3445): No description found for parameter 'net' Warning(linux-2.6.23-git2//include/linux/netdevice.h:1301): No description found for parameter 'cpu' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-13 09:52:26 -07:00
Linus Torvalds	dcf397f037	Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (124 commits) sh: allow building for both r2d boards in same binary. sh: fix r2d board detection sh: Discard .exit.text/.exit.data at runtime. sh: Fix up some section alignments in linker script. sh: Fix SH-4 DMAC CHCR masking. sh: Rip out left-over nommu cond syscall cruft. sh: Make kgdb i-cache flushing less inept. sh: kgdb section mismatches and tidying. sh: cleanup struct irqaction initializers. sh: early_printk tidying. video: pvr2fb: Add TV (RGB) support to Dreamcast PVR driver. sh: Conditionalize gUSA support. sh: Follow gUSA preempt changes in __switch_to(). sh: Tidy up gUSA preempt handling. sh: __copy_user() optimizations for small copies. sh: clkfwk: Support multi-level clock propagation. sh: Fix URAM start address on SH7785. sh: Use boot_cpu_data for CPU probe. sh: Support extended mode TLB on SH-X3. sh: Bump MAX_ACTIVE_REGIONS for SH7785. ...	2007-10-13 09:49:04 -07:00
Bartlomiej Zolnierkiewicz	88b2b32bab	ide: move ide_config_drive_speed() calls to upper layers (take 2) * Convert {ide_hwif_t,ide_pci_device_t}->host_flag to be u16. * Add IDE_HFLAG_POST_SET_MODE host flag to indicate the need to program the host for the transfer mode after programming the device. Set it in au1xxx-ide, amd74xx, cs5530, cs5535, pdc202xx_new, sc1200, pmac and via82cxxx host drivers. * Add IDE_HFLAG_NO_SET_MODE host flag to indicate the need to completely skip programming of host/device for the transfer mode ("smart" hosts). Set it in it821x host driver and check it in ide_tune_dma(). * Add ide_set_pio_mode()/ide_set_dma_mode() helpers and convert all direct ->set_pio_mode/->speedproc users to use these helpers. * Move ide_config_drive_speed() calls from ->set_pio_mode/->speedproc methods to callers. * Rename ->speedproc method to ->set_dma_mode, make it void and update all implementations accordingly. * Update ide_set_xfer_rate() comments. * Unexport ide_config_drive_speed(). v2: * Fix issues noticed by Sergei: - export ide_set_dma_mode() instead of moving ->set_pio_mode abuse wrt to setting DMA modes from sc1200_set_pio_mode() to do_special() - check IDE_HFLAG_NO_SET_MODE in ide_tune_dma() - check for (hwif->set_pio_mode) == NULL in ide_set_pio_mode() - check for (hwif->set_dma_mode) == NULL in ide_set_dma_mode() - return -1 from ide_set_{pio,dma}_mode() if ->set_{pio,dma}_mode == NULL - don't set ->set_{pio,dma}_mode on it821x in "smart" mode - fix build problem in pmac.c - minor fixes in au1xxx-ide.c/cs5530.c/siimage.c - improve patch description Changes in behavior caused by this patch: - HDIO_SET_PIO_MODE ioctl would now return -ENOSYS for attempts to change PIO mode if it821x controller is in "smart" mode - removal of two debugging printk-s (from cs5530.c and sc1200.c) - transfer modes 0x00-0x07 passed from user space may be programmed twice on the device (not really an issue since 0x00 is not supported correctly by any host driver ATM, 0x01 is not supported at all and 0x02-0x07 are invalid) Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-13 17:47:51 +02:00
Bartlomiej Zolnierkiewicz	75d7d963e3	icside: use ide_tune_dma() * Add "good DMA drives" hack for icside to ide-dma.c::ide_find_dma_mode() (in the long-term it should be either removed or generalized for all hosts). * Use ide_tune_dma() in icside.c::icside_dma_check(). This results in the following changes in behavior: - pre-EIDE SWDMA modes are now also respected - drive->autodma is checked instead of hwif->autodma (doesn't really matter as icside sets both to "1") * Make ide-dma.c::__ide_dma_good_drive() static and drop "__" prefix. Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-13 17:47:50 +02:00
Bartlomiej Zolnierkiewicz	aedea5910c	ide-pmac: remove pmac_ide_do_setfeature() (take 2) Use ide_config_drive_speed() instead of pmac_ide_do_setfeature() and remove the latter, also ide-iops.c::__ide_wait_stat() could be static again. Since for IDE PMAC host driver IDE_CONTROL_REG is always true, device's ->quirk_list is always zero and ->ide_dma_host_{on,off} are nops than the only changes in behavior are: * if PIO mode is set then ->dma_off_queitly is called to disable DMA * if setting transfer mode fails ide_dump_status() is called to dump status v2: * IDE PMAC controllers allow separate PIO and DMA timings and PPC userland depends on this fact, and calls "hdparm -p" without calling "hdparm -d". Therefore to compensate for DMA being disabled by ide_config_drive_speed() for PIO modes: - add IDE_HFLAG_SET_PIO_MODE_KEEP_DMA flag and set it in PMAC host driver - add handling of the new flag to ide-io.c::do_special() Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-13 17:47:50 +02:00
Bartlomiej Zolnierkiewicz	ddf151026a	ide-pmac: use __ide_wait_stat() * Use __ide_wait_stat() instead of wait_for_ready() in pmac_ide_do_setfeature(). While at it do following changes to match __ide_wait_stat() call in ide_config_drive_speed(): * Wait WAIT_CMD time (20 sec) instead of 2 sec for device to clear BUSY_STAT. * Check DRQ_STAT bit (shouldn't be set for good device status). Also remove no longer needed wait_for_ready() from ide-iops.c. Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-13 17:47:49 +02:00
Bartlomiej Zolnierkiewicz	74af21cf4d	ide: add __ide_wait_stat() helper * Split off checking of the status register from ide_wait_stat() to __ide_wait_stat() helper. * Use the new helper in ide_config_drive_speed(). The only change in the functionality is that the function now fails if after 20 sec (WAIT_CMD) device is still busy (BUSY_STAT bit is set) while previously instead of failing the function continued with checking for the correct device status (which would give the device additional 10 usec to clear BUSY_STAT bit). * Remove stale comment for ide_config_drive_speed(). * Remove duplicate comment for ide_wait_stat() from <linux/ide.h>. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-13 17:47:49 +02:00
David Woodhouse	ebf8889bd1	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6	2007-10-13 14:58:23 +01:00
David Woodhouse	b160292cc2	Merge Linux 2.6.23	2007-10-13 14:43:54 +01:00
Kevin Hao	c4a9f88daf	[MTD] [NOR] fix ctrl-alt-del can't reboot for intel flash bug When we press ctrl-alt-del,kernel_restart_prepare will invoke cfi_intelext_reboot which will set flash to read array mode, but later when device_shutdown is invoked which may put current work queue to sleep and other process may be scheduled to running and programming flash in not FL_READY mode again. So we can't boot up if this flash is used for bootloader. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-10-13 14:36:18 +01:00
Avi Kivity	8a45450d0a	KVM: Replace enum by #define Easier for existence test (#ifdef) in userspace. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:29 +02:00
Eddie Dong	96ad2cc613	KVM: in-kernel LAPIC save and restore support This patch adds a new vcpu-based IOCTL to save and restore the local apic registers for a single vcpu. The kernel only copies the apic page as a whole, extraction of registers is left to userspace side. On restore, the APIC timer is restarted from the initial count, this introduces a little delay, but works fine. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:25 +02:00
He, Qing	6bf9e962d1	KVM: in-kernel IOAPIC save and restore support This patch adds support for in-kernel ioapic save and restore (to and from userspace). It uses the same get/set_irqchip ioctl as in-kernel PIC. Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:25 +02:00
He, Qing	6ceb9d791e	KVM: Add get/set irqchip ioctls for in-kernel PIC live migration support This patch adds two new ioctls to dump and write kernel irqchips for save/restore and live migration. PIC s/r and l/m is implemented in this patch. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:25 +02:00
Eddie Dong	b6958ce44a	KVM: Emulate hlt in the kernel By sleeping in the kernel when hlt is executed, we simplify the in-kernel guest interrupt path considerably. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:25 +02:00
Eddie Dong	97222cc831	KVM: Emulate local APIC in kernel Because lightweight exits (exits which don't involve userspace) are many times faster than heavyweight exits, it makes sense to emulate high usage devices in the kernel. The local APIC is one such device, especially for Windows and for SMP, so we add an APIC model to kvm. It also allows in-kernel host-side drivers to inject interrupts without going through userspace. [compile fix on i386 from Jindrich Makovicka] Signed-off-by: Yaozu (Eddie) Dong <Eddie.Dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:25 +02:00
Eddie Dong	85f455f7dd	KVM: Add support for in-kernel PIC emulation Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:24 +02:00
Yang, Sheng	253abdee5e	KVM: Communicate cr8 changes to userspace This allows running 64-bit Windows. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:23 +02:00
Jeff Dike	519ef35341	KVM: add hypercall nr to kvm_run Add the hypercall number to kvm_run and initialize it. This changes the ABI, but as this particular ABI was unusable before this no users are affected. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:20 +02:00
Rusty Russell	9eb829ced8	KVM: Trivial: Use standard BITMAP macros, open-code userspace-exposed header Creating one's own BITMAP macro seems suboptimal: if we use manual arithmetic in the one place exposed to userspace, we can use standard macros elsewhere. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:18 +02:00
Rusty Russell	dea8caee7b	KVM: Trivial: /dev/kvm interface is no longer experimental. KVM interface is no longer experimental. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:17 +02:00
Avi Kivity	24cbc7e9cb	KVM: Future-proof the exit information union ABI Note that as the size of struct kvm_run is not part of the ABI, we can add things at the end. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:17 +02:00
Dmitry Torokhov	b981d8b3f5	Merge master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/macintosh/adbhid.c	2007-10-12 21:27:47 -04:00
Linus Torvalds	ab9c232286	Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: (119 commits) [libata] struct pci_dev related cleanups libata: use ata_exec_internal() for PMP register access libata: implement ATA_PFLAG_RESETTING libata: add @timeout to ata_exec_internal[_sg]() ahci: fix notification handling ahci: clean up PORT_IRQ_BAD_PMP enabling ahci: kill leftover from enabling NCQ over PMP libata: wrap schedule_timeout_uninterruptible() in loop libata: skip suppress reporting if ATA_EHI_QUIET libata: clear ehi description after initial host report pata_jmicron: match vendor and class code only libata: add ST9160821AS / 3.ALD to NCQ blacklist pata_acpi: ACPI driver support libata-core: Expose gtm methods for driver use libata: add HDT722516DLA380 to NCQ blacklist libata: blacklist NCQ on Seagate Barracuda ST380817AS [libata] Turn on ACPI by default libata_scsi: Fix ATAPI transfer lengths libata: correct handling of SRST reset sequences libata: Integrate ACPI-based PATA/SATA hotplug - version 5 ...	2007-10-12 16:16:41 -07:00
Linus Torvalds	6a84258e5f	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (37 commits) PCI: merge almost all of pci_32.h and pci_64.h together PCI: X86: Introduce and enable PCI domain support PCI: Add 'nodomains' boot option, and pci_domains_supported global PCI: modify PCI bridge control ISA flag for clarity PCI: use _CRS for PCI resource allocation PCI: avoid P2P prefetch window for expansion ROMs PCI: skip ISA ioresource alignment on some systems PCI: remove transparent bridge sizing pci: write file size to inode on proc bus file write pci: use size stored in proc_dir_entry for proc bus files pci: implement "pci=noaer" PCI: fix IDE legacy mode resources MSI: Use correct data offset for 32-bit MSI in read_msi_msg() PCI: Fix incorrect argument order to list_add_tail() in PCI dynamic ID code PCI: i386: Compaq EVO N800c needs PCI bus renumbering PCI: Remove no longer correct documentation regarding MSI vector assignment PCI: re-enable onboard sound on "MSI K8T Neo2-FIR" PCI: quirk_vt82c586_acpi: Omit reading PCI revision ID PCI: quirk amd_8131_mmrbc: Omit reading pci revision ID cpqphp: Use PCI_CLASS_REVISION instead of PCI_REVISION_ID for read ...	2007-10-12 15:50:23 -07:00
Linus Torvalds	efefc6eb38	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (75 commits) PM: merge device power-management source files sysfs: add copyrights kobject: update the copyrights kset: add some kerneldoc to help describe what these strange things are Driver core: rename ktype_edd and ktype_efivar Driver core: rename ktype_driver Driver core: rename ktype_device Driver core: rename ktype_class driver core: remove subsystem_init() sysfs: move sysfs file poll implementation to sysfs_open_dirent sysfs: implement sysfs_open_dirent sysfs: move sysfs_dirent->s_children into sysfs_dirent->s_dir sysfs: make sysfs_root a regular directory dirent sysfs: open code sysfs_attach_dentry() sysfs: make s_elem an anonymous union sysfs: make bin attr open get active reference of parent too sysfs: kill unnecessary NULL pointer check in sysfs_release() sysfs: kill unnecessary sysfs_get() in open paths sysfs: reposition sysfs_dirent->s_mode. sysfs: kill sysfs_update_file() ...	2007-10-12 15:49:37 -07:00
Linus Torvalds	117494a1b6	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6: (142 commits) USB: fix race in autosuspend reschedule atmel_usba_udc: Keep track of the device status USB: Nikon D40X unusual_devs entry USB: serial core should respect driver requirements USB: documentation for USB power management USB: skip autosuspended devices during system resume USB: mutual exclusion for EHCI init and port resets USB: allow usbstorage to have LUNS greater than 2Tb USB: Adding support for SHARP WS011SH to ipaq.c USB: add atmel_usba_udc driver USB: ohci SSB bus glue USB: ehci build fixes on au1xxx, ppc-soc USB: add runtime frame_no quirk for big-endian OHCI USB: funsoft: Fix termios USB: visor: termios bits USB: unusual_devs entry for Nikon DSC D2Xs USB: re-remove <linux/usb_sl811.h> USB: move <linux/usb_gadget.h> to <linux/usb/gadget.h> USB: Export URB statistics for powertop USB: serial gadget: Disable endpoints on unload ...	2007-10-12 15:49:10 -07:00
Linus Torvalds	4d5709a7b7	Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq * master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] Don't take semaphore in cpufreq_quick_get() [CPUFREQ] Support different families in fid/did to frequency conversion [CPUFREQ] cpufreq_stats: misc cpuinit section annotations [CPUFREQ] implement !CONFIG_CPU_FREQ stub for cpufreq_unregister_notifier() [CPUFREQ] mark hotplug notifier callback as __cpuinit [CPUFREQ] Only check for transition latency on problematic governors (kconfig fix) [CPUFREQ] allow ondemand and conservative cpufreq governors to be used as default [CPUFREQ] move policy's governor initialisation out of low-level drivers into cpufreq core [CPUFREQ] Longhaul - Add support for PM133 northbridge [CPUFREQ] x86: use num_online_nodes to get physical cpus numbers for	2007-10-12 15:42:01 -07:00
Linus Torvalds	57c5b9998e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86 * git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86: (40 commits) x86: HPET add another ICH7 PCI id x86: HPET force enable ICH5 suspend/resume fix x86: HPET force enable for ICH5 x86: HPET try to activate force detected hpet x86: HPET force enable o ICH7 and later x86: HPET restructure hpet code for hpet force enable clock events: allow replacement of broadcast timer i386/x8664: cleanup the shared hpet code i386: Remove the useless #ifdef in i8253.h ACPI: remove the now unused ifdef code jiffies: remove unused macros x86_64: cleanup apic.c after clock events switch x86_64: remove now unused code x86: unify timex.h variants x86: kill 8253pit.h x86: disable apic timer for AMD C1E enabled CPUs x86: Fix irq0 / local apic timer accounting x86_64: convert to clock events x86_64: Add (not yet used) clock event functions x86_64: prepare idle loop for dynamic ticks ...	2007-10-12 15:39:39 -07:00
Jeff Garzik	32a2eea795	PCI: Add 'nodomains' boot option, and pci_domains_supported global * Introduce pci_domains_supported global, hardcoded to zero if !CONFIG_PCI_DOMAINS. * Introduce 'nodomains' boot option, which clears pci_domains_supported on platforms that enable it by default (x86, x86-64, and others when they are converted to use this). Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 15:03:18 -07:00
Gary Hade	11949255d9	PCI: modify PCI bridge control ISA flag for clarity Modify PCI Bridge Control ISA flag for clarity This patch changes PCI_BRIDGE_CTL_NO_ISA to PCI_BRIDGE_CTL_ISA and modifies it's clarifying comment and locations where used. The change reduces the chance of future confusion since it makes the set/unset meaning of the bit the same in both the bridge control register and bridge_ctl field of the pci_bus struct. Signed-off-by: Gary Hade <garyhade@us.ibm.com> Acked-by: Linas Vepstas <linas@austin.ibm.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 15:03:18 -07:00
Alex Chiang	9f672153ba	PCI: Add missing PCI capability IDs These IDs are in pciutils, but haven't been added to the kernel yet. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 15:03:16 -07:00
Thomas Backlund	b205f6b267	i386: add support for picopower irq router Add support for PicoPower PT86C523 IRQ router to be used with the in-kernel yenta driver for CardBus. With this patch cardbus works on e.g. Dell Latitude XPi P150CD. Initial patch for kernel 2.4 series by Sune Mølgaard http://molgaard.org/code/linux-2.4.31-picopower.patch Ported to 2.6.20 by Chmouel Boudjnah (http://www.chmouel.com) Testing and confirmation that it works by Austin Acton Cleaned up a little for inclusion in a 2.6.21-rc7 based kernel. Added some more cleanups according to CodingStyle, as noted by Randy Dunlap on LKML. [akpm@linux-foundation.org: build fixes] Signed-off-by: Thomas Backlund <tmb@mandriva.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 15:03:15 -07:00
Alan Stern	063a2da8f0	USB: serial core should respect driver requirements This patch (as997) fixes a bug in the USB serial core. The core needs to pay attention to drivers' requirements regarding the number and type of endpoints a device has. At the same time, the patch changes the NUM_DONT_CARE constant (which is stored in a single-byte field) from -1 to a safer, unsigned value. It also improves the kerneldoc for several fields in the usb_serial_driver structure. Finally, the patch replaces a list_for_each() with list_for_each_entry(). Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:55:34 -07:00
Alan Stern	271f9e68f3	USB: skip autosuspended devices during system resume System suspends and hibernation are supposed to be as transparent as possible. By this reasoning, if a USB device is already autosuspended before the system sleep begins then it should remain autosuspended after the system wakes up. This patch (as1001) adds a skip_sys_resume flag to the usb_device structure and uses it to avoid waking up devices which were suspended when a system sleep began. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:55:34 -07:00
David Brownell	27f5d75afa	USB: re-remove <linux/usb_sl811.h> Remove <linux/usb_sl811.h> ... somehow this was recreated when the Blackfin arch was merged, instead of using <linux/usb/sl811.h> which is the correct header. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:55:31 -07:00
David Brownell	9454a57ab5	USB: move <linux/usb_gadget.h> to <linux/usb/gadget.h> Move <linux/usb_gadget.h> to <linux/usb/gadget.h>, reducing some of the clutter in the main include directory. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:55:31 -07:00
Sarah Sharp	4d59d8a113	USB: Export URB statistics for powertop powertop currently tracks interrupts generated by uhci, ehci, and ohci, but it has no way of telling which USB device to blame USB bus activity on. This patch exports the number of URBs that are submitted for a given device. Cat the file 'urbnum' in /sys/bus/usb/devices/.../ Signed-off-by: Sarah Sharp <sarah.a.sharp@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:55:30 -07:00
Alan Stern	a691efa988	USB: remove USB_QUIRK_NO_AUTOSUSPEND This patch (as995) cleans up the remains of the former NO_AUTOSUSPEND quirk. Since autosuspend is disabled by default, we will let userspace worry about which devices can safely be suspended. Thus the lengthy series of quirk entries is no longer needed, and neither is the quirk ID. I suppose someone might eventually run across a hub that can't be suspended; let's ignore the possibility for now. The patch also cleans up the hasty way in which autosuspend gets disabled. Setting udev->autosuspend_delay to -1 wasn't quite right, because the value is always supposed to be a multiple of HZ. It's better to leave the delay value alone and set autosuspend_disabled, which is what the quirk routine used to do. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:55:28 -07:00
Alan Stern	6840d2555a	USB: flush outstanding URBs when suspending This patch (as989) makes usbcore flush all outstanding URBs for each device as the device is suspended. This will be true even when CONFIG_USB_SUSPEND is not enabled. In addition, an extra can_submit flag is added to the usb_device structure. That flag will be turned off whenever a suspend request has been received for the device, even if the device isn't actually suspended because CONFIG_USB_SUSPEND isn't set. It's no longer necessary to check for the device state being equal to USB_STATE_SUSPENDED during URB submission; that check can be replaced by a check of the can_submit flag. This also permits us to remove some questionable references to the deprecated power.power_state field. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:55:26 -07:00
Alan Stern	1431d2a44c	USB: get rid of urb->lock Now that urb->status isn't used, urb->lock doesn't protect anything. This patch (as980) removes it and replaces it with a private mutex in the one remaining place it was still used: usb_kill_urb. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:55:23 -07:00
Alan Stern	eb23105462	USB: add urb->unlinked field This patch (as970) adds a new urb->unlinked field, which is used to store the status of unlinked URBs since we can't use urb->status for that purpose any more. To help simplify the HCDs, usbcore will check urb->unlinked before calling the completion handler; if the value is set it will automatically override the status reported by the HCD. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: David Brownell <david-b@pacbell.net> CC: Olav Kongas <ok@artecdesign.ee> CC: Yoshihiro Shimoda <shimoda.yoshihiro@renesas.com> CC: Tony Olech <tony.olech@elandigitalsystems.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:55:19 -07:00
Inaky Perez-Gonzalez	da04b7a427	usb: introduce usb_device authorization bits This just modifies 'struct usb_device' to contain the 'authorized' bit. It also adds a 'wusb' bit. This is needed because nonauthorized (and thus non-authenticated) wusb devices will fail certain kind of simple requests (such as string descriptors). By knowing the device is WUSB, we just avoid them. Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:55:04 -07:00
David Brownell	a4e3ef5597	USB: gadget: gadget_is_{dualspeed,otg} predicates and cleanup This adds two small inlines to the gadget stack, which will often evaluate to compile-time constants. That can help shrink object code and remove #ifdeffery. - gadget_is_dualspeed(), currently always a compile-time constant (depending on which controller is selected). - gadget_is_otg(), usually a compile time "false", but this is a runtime test if the platform enables OTG (since it's reasonable to populate boards with different USB sockets). It also updates two peripheral controller drivers to use these: - fsl_usb2_udc, mostly OTG-related bugfixes: non-OTG devices must follow the rules about drawing VBUS power, and OTG ones need to reject invalid SET_FEATURE requests. - omap_udc, just scrubbing a bit of #ifdeffery. And also gadgetfs, which lost some #ifdefs and moved to a more standard handling of DEBUG and VERBOSE_DEBUG. The main benefits come from patches which will follow. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:55:03 -07:00
Alan Stern	d617bc83ff	USB: cleanup for previous patches This patch (as951) cleans up a few loose ends from earlier patches. Redundant checks for non-NULL urb->dev are removed, as are checks of urb->dev->bus (which can never be NULL). Conversely, a check for non-NULL urb->ep is added to the unlink paths. A homegrown round-down-to-power-of-2 loop is simplified by using the ilog2 routine. The comparison in usb_urb_dir_in() is made more transparent. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:55:01 -07:00
Alan Stern	5e60a16139	USB: avoid using urb->pipe in usbcore This patch (as946) eliminates many of the uses of urb->pipe in usbcore. Unfortunately there will have to be a significant API change, affecting all USB drivers, before we can remove it entirely. This patch contents itself with changing only the interface to usb_buffer_map_sg() and friends: The pipe argument is replaced with a direction flag. That can be done easily because those routines get used in only one place. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:55:00 -07:00
Alan Stern	fea3409112	USB: add direction bit to urb->transfer_flags This patch (as945) adds a bit to urb->transfer_flags for recording the direction of the URB. The bit is set/cleared automatically in usb_submit_urb() so drivers don't have to worry about it (although as a result, it isn't valid until the URB has been submitted). Inline routines are added for easily checking an URB's direction. They replace calls to usb_pipein in the DMA-mapping parts of hcd.c. For non-control endpoints, the direction is determined directly from the endpoint descriptor. However control endpoints are bi-directional; for them the direction is determined from the bRequestType byte and the wLength value in the setup packet. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:55:00 -07:00
Alan Stern	bdd016ba64	USB: add ep->enable This patch (as944) adds an explicit "enabled" field to the usb_host_endpoint structure and uses it in place of the current mechanism. This is merely a time-space tradeoff; it makes checking whether URBs may be submitted to an endpoint simpler. The existing mechanism is efficient when converting urb->pipe to an endpoint pointer, but it's not so efficient when urb->ep is used instead. As a side effect, the procedure for enabling an endpoint is now a little more complicated. The ad-hoc inline code in usb.c and hub.c for enabling ep0 is now replaced with calls to usb_enable_endpoint, which is no longer static. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:55:00 -07:00
Alan Stern	5b653c79c0	USB: add urb->ep This patch (as943) prepares the way for eliminating urb->pipe by introducing an endpoint pointer into struct urb. For now urb->ep is set by usb_submit_urb() from the pipe value; eventually drivers will set it themselves and we will remove urb->pipe completely. The patch also adds new inline routines to retrieve an endpoint descriptor's number and transfer type, essentially as replacements for usb_pipeendpoint and usb_pipetype. usb_submit_urb(), usb_hcd_submit_urb(), and usb_hcd_unlink_urb() are converted to use the new field and new routines. Other parts of usbcore will be converted in later patches. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:55:00 -07:00
David Brownell	efc9052e01	USB: usb_gadget.h whitespace fixes This just fixes some whitespace bugs in <linux/usb_gadget.h>, mostly extraneous spaces where a single tab suffices. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:54:59 -07:00
Tejun Heo	6d66f5cd26	sysfs: add copyrights Sysfs has gone through considerable amount of reimplementation. Add copyrights. Any objections? :-) Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:51:12 -07:00
Greg Kroah-Hartman	f0e7e1bd77	kobject: update the copyrights I've been hacking on these files for a while now, might as well make it official... Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:51:12 -07:00
Greg Kroah-Hartman	6adf7554b9	kset: add some kerneldoc to help describe what these strange things are Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:51:12 -07:00
Greg Kroah-Hartman	e4bc16621d	driver core: remove subsystem_init() There is only one user of it, and it is only a wrapper for kset_init(). Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:51:11 -07:00
Tejun Heo	a4e8b91254	sysfs: move sysfs file poll implementation to sysfs_open_dirent Sysfs file poll implementation is scattered over sysfs and kobject. Event numbering is done in sysfs_dirent but wait itself is done on kobject. This not only unecessarily bloats both kobject and sysfs_dirent but is also buggy - if a sysfs_dirent is removed while there still are pollers, the associaton betwen the kobject and sysfs_dirent breaks and kobject may be freed with the pollers still sleeping on it. This patch moves whole poll implementation into sysfs_open_dirent. Each time a sysfs_open_dirent is created, event number restarts from 1 and pollers sleep on sysfs_open_dirent. As event sequence number is meaningless without any open file and pollers should have open file and thus sysfs_open_dirent, this ephemeral event counting works and is a saner implementation. This patch fixes the dnagling sleepers bug and reduces the sizes of kobject and sysfs_dirent by one pointer. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:51:11 -07:00
Tejun Heo	5a7ad7f044	sysfs: kill sysfs_update_file() sysfs_update_file() depends on inode->i_mtime but sysfs iondes are now reclaimable making the reported modification time unreliable. There's only one user (pci hotplug) of this notification mechanism and it reportedly isn't utilized from userland. Kill sysfs_update_file(). Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:51:09 -07:00
Tejun Heo	59f6901568	sysfs: clean up header files sysfs is about to go through major overhaul making this a pretty good opportunity to clean up (out-of-tree changes and pending patches will need regeneration anyway). Clean up headers. * Kill space between * and symbolname. * Move SYSFS_* type constants and flags into fs/sysfs/sysfs.h. They're internal to sysfs. * Reformat function prototypes and add argument symbol names. * Make dummy function definition order match that of function prototypes. * Add some comments. * Reorganize fs/sysfs/sysfs.h according to which file the declared variable or feature lives in. This patch does not introduce any behavior change. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:51:09 -07:00
Kay Sievers	dc8c85871c	PTY: add kernel parameter to overwrite legacy pty count Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:51:09 -07:00
Jean Delvare	1359555eb7	Driver core: Make platform_device.id an int While platform_device.id is a u32, platform_device_add() handles "-1" as a special id value. This has potential for confusion and bugs. Making it an int instead should prevent problems from happening in the future. Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:51:07 -07:00
Kay Sievers	5c5daf657c	Driver core: exclude kobject_uevent.c for !CONFIG_HOTPLUG Move uevent specific logic from the core into kobject_uevent.c, which does no longer require to link the unused string array if hotplug is not compiled in. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:51:06 -07:00
tonyj@suse.de	60043428a5	Convert from class_device to device for drivers/video Convert from class_device to device for drivers/video. Signed-off-by: Tony Jones <tonyj@suse.de> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:51:04 -07:00
Eric W. Biederman	90bc61359d	sysfs: Remove first pass at shadow directory support While shadow directories appear to be a good idea, the current scheme of controlling their creation and destruction outside of sysfs appears to be a locking and maintenance nightmare in the face of sysfs directories dynamically coming and going. Which can now occur for directories containing network devices when CONFIG_SYSFS_DEPRECATED is not set. This patch removes everything from the initial shadow directory support that allowed the shadow directory creation to be controlled at a higher level. So except for a few bits of sysfs_rename_dir everything from commit `b592fcfe7f` is now gone. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:51:03 -07:00
Robin Getz	2ebefc5016	debugfs: helper for decimal challenged Allows debugfs helper functions to have a hex output, rather than just decimal Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:51:03 -07:00
Greg Kroah-Hartman	ce2c9cb025	kobject: remove the static array for the name Due to historical reasons, struct kobject contained a static array for the name, and a dynamic pointer in case the name got bigger than the array. That's just dumb, as people didn't always know which variable to reference, even with the accessor for the kobject name. This patch removes the static array, potentially saving a lot of memory as the majority of kobjects do not have a very long name. Thanks to Kay for the idea to do this. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:51:02 -07:00
Greg Kroah-Hartman	1ef4cfac01	Driver core: remove subsys_get() There are no more subsystems, it's a kset now so remove the function and the only two users, which are in the driver core. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:51:01 -07:00
Greg Kroah-Hartman	6e9d930d16	Driver core: remove subsys_put() There are no more subsystems, it's a kset now so remove the function and the only two users, which are in the driver core. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-10-12 14:51:01 -07:00

... 2 3 4 5 6 ...

8007 Commits