[PATCH 0/8] add mTHP support for anonymous shmem

* [PATCH 0/8] add mTHP support for anonymous shmem
@ 2024-05-06  8:46 Baolin Wang
  2024-05-06  8:46 ` [PATCH 1/8] mm: move highest_order() and next_order() out of the THP config Baolin Wang
                   ` (10 more replies)
  0 siblings, 11 replies; 52+ messages in thread
From: Baolin Wang @ 2024-05-06  8:46 UTC (permalink / raw)
  To: akpm, hughd
  Cc: willy, david, ioworker0, wangkefeng.wang, ying.huang, 21cnbao,
	ryan.roberts, shy828301, ziy, baolin.wang, linux-mm,
	linux-kernel

Anonymous pages have already been supported for multi-size (mTHP) allocation
through commit 19eaf44954df, that can allow THP to be configured through the
sysfs interface located at '/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled'.

However, the anonymous shared pages will ignore the anonymous mTHP rule
configured through the sysfs interface, and can only use the PMD-mapped
THP, that is not reasonable. Many implement anonymous page sharing through
mmap(MAP_SHARED | MAP_ANONYMOUS), especially in database usage scenarios,
therefore, users expect to apply an unified mTHP strategy for anonymous pages,
also including the anonymous shared pages, in order to enjoy the benefits of
mTHP. For example, lower latency than PMD-mapped THP, smaller memory bloat
than PMD-mapped THP, contiguous PTEs on ARM architecture to reduce TLB miss etc.

The primary strategy is similar to supporting anonymous mTHP. Introduce
a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled',
which can have all the same values as the top-level
'/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new
additional "inherit" option. By default all sizes will be set to "never"
except PMD size, which is set to "inherit". This ensures backward compatibility
with the shmem enabled of the top level, meanwhile also allows independent
control of shmem enabled for each mTHP.

Use the page fault latency tool to measure the performance of 1G anonymous shmem
with 32 threads on my machine environment with: ARM64 Architecture, 32 cores,
125G memory:
base: mm-unstable
user-time    sys_time    faults_per_sec_per_cpu     faults_per_sec
0.04s        3.10s         83516.416                  2669684.890

mm-unstable + patchset, anon shmem mTHP disabled
user-time    sys_time    faults_per_sec_per_cpu     faults_per_sec
0.02s        3.14s         82936.359                  2630746.027

mm-unstable + patchset, anon shmem 64K mTHP enabled
user-time    sys_time    faults_per_sec_per_cpu     faults_per_sec
0.08s        0.31s         678630.231                 17082522.495

From the data above, it is observed that the patchset has a minimal impact when
mTHP is not enabled (some fluctuations observed during testing). When enabling 64K
mTHP, there is a significant improvement of the page fault latency.

TODO:
 - Support mTHP for tmpfs.
 - Do not split the large folio when share memory swap out.
 - Can swap in a large folio for share memory.

Changes from RFC:
 - Rebase the patch set against the new mm-unstable branch, per Lance.
 - Add a new patch to export highest_order() and next_order().
 - Add a new patch to align mTHP size in shmem_get_unmapped_area().
 - Handle the uffd case and the VMA limits case when building mapping for
   large folio in the finish_fault() function, per Ryan.
 - Remove unnecessary 'order' variable in patch 3, per Kefeng.
 - Keep the anon shmem counters' name consistency.
 - Modify the strategy to support mTHP for anonymous shmem, discussed with
   Ryan and David.
 - Add reviewed tag from Barry.
 - Update the commit message.

Baolin Wang (8):
  mm: move highest_order() and next_order() out of the THP config
  mm: memory: extend finish_fault() to support large folio
  mm: shmem: add an 'order' parameter for shmem_alloc_hugefolio()
  mm: shmem: add THP validation for PMD-mapped THP related statistics
  mm: shmem: add multi-size THP sysfs interface for anonymous shmem
  mm: shmem: add mTHP support for anonymous shmem
  mm: shmem: add mTHP size alignment in shmem_get_unmapped_area
  mm: shmem: add mTHP counters for anonymous shmem

 Documentation/admin-guide/mm/transhuge.rst |  29 ++
 include/linux/huge_mm.h                    |  35 ++-
 mm/huge_memory.c                           |  17 +-
 mm/memory.c                                |  43 ++-
 mm/shmem.c                                 | 335 ++++++++++++++++++---
 5 files changed, 387 insertions(+), 72 deletions(-)

-- 
2.39.3

^ permalink raw reply	[flat|nested] 52+ messages in thread