All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system
@ 2024-04-29 17:04 John Groves
  2024-04-29 17:04 ` [RFC PATCH v2 01/12] famfs: Introduce famfs documentation John Groves
                   ` (12 more replies)
  0 siblings, 13 replies; 32+ messages in thread
From: John Groves @ 2024-04-29 17:04 UTC (permalink / raw)
  To: John Groves, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, Matthew Wilcox, linux-cxl, linux-fsdevel, nvdimm
  Cc: John Groves, john, Dave Chinner, Christoph Hellwig, dave.hansen,
	gregory.price, Randy Dunlap, Jerome Glisse, Aravind Ramesh,
	Ajay Joshi, Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang, John Groves

This patch set introduces famfs[1] - a special-purpose fs-dax file system
for sharable disaggregated or fabric-attached memory (FAM). Famfs is not
CXL-specific in anyway way.

* Famfs creates a simple access method for storing and sharing data in
  sharable memory. The memory is exposed and accessed as memory-mappable
  dax files.
* Famfs supports multiple hosts mounting the same file system from the
  same memory (something existing fs-dax file systems don't do).
* A famfs file system can be created on a /dev/dax device in devdax mode,
  which rests on dax functionality added in patches 2-7 of this series.

The famfs kernel file system is part the famfs framework; additional
components in user space[2] handle metadata and direct the famfs kernel
module to instantiate files that map to specific memory. The famfs user
space has documentation and a reasonably thorough test suite.

The famfs kernel module never accesses the shared memory directly (either
data or metadata). Because of this, shared memory managed by the famfs
framework does not create a RAS "blast radius" problem that should be able
to crash or de-stabilize the kernel. Poison or timeouts in famfs memory
can be expected to kill apps via SIGBUS and cause mounts to be disabled
due to memory failure notifications.

Famfs does not attempt to solve concurrency or coherency problems for apps,
although it does solve these problems in regard to its own data structures.
Apps may encounter hard concurrency problems, but there are use cases that
are imminently useful and uncomplicated from a concurrency perspective:
serial sharing is one (only one host at a time has access), and read-only
concurrent sharing is another (all hosts can read-cache without worry).

Contents:

* famfs kernel documentation [patch 1]. Note that evolving famfs user
  documentation is at [2]
* dev_dax_iomap patchset [patches 2-7] - This enables fs-dax to use the
  iomap interface via a character /dev/dax device (e.g. /dev/dax0.0). For
  historical reasons the iomap infrastructure was enabled only for
  /dev/pmem devices (which are dax block devices). As famfs is the first
  fs-dax file system that works on /dev/dax, this patch series fills in
  the bare minimum infrastructure to enable iomap api usage with /dev/dax.
* famfs patchset [patches 8-12] - this introduces the kernel component of
  famfs.

Note that there is a developing consensus that /dev/dax requires
some fundamental re-factoring (e.g. [3]) that is related but outside the
scope of this series.

Some observations about using sharable memory

* It does not make sense to online sharable memory as system-ram.
  System-ram gets zeroed when it is onlined, so sharing is basically
  nonsense.
* It does not make sense to put struct page's in sharable memory, because
  those can't be shared. However, separately providing non-sharable
  capacity to be used for struct page's might be a sensible approach if the
  size of struct page array for sharable memory is too large to put in
  conventional system-ram (albeit with possible RAS implications).
* Sharable memory is pmem-like, in that a host is likely to connect in
  order to gain access to data that is already in the memory. Moreover
  the power domain for shared memory is separate for that of the server.
  Having observed that, famfs is not intended for persistent storage. It is
  intended for sharing data sets in memory during a time frame where the
  memory and the compute nodes are expected to remain operational - such
  as during a clustered data analytics job.

Could we do this with FUSE?

The key performance requirement for famfs is efficient handling of VMA
faults. This requires caching the complete dax extent lists for all active
files so faults can be handled without upcalls, which FUSE does not do.
It would probably be possible to put this capability FUSE, but we think
that keeping famfs separate from FUSE is the simpler approach.

We will be discussing this topic at LSFMM 2024 [5] in a topic called "Famfs:
new userspace filesystem driver vs. improving FUSE/DAX" - but other famfs
related discussion will also be welcome!

This patch set is available as a branch at [6]

References

[1] https://lpc.events/event/17/contributions/1455/
[2] https://github.com/cxl-micron-reskit/famfs
[3] https://lore.kernel.org/all/166630293549.1017198.3833687373550679565.stgit@dwillia2-xfh.jf.intel.com/
[4] https://www.computeexpresslink.org/download-the-specification
[5] https://events.linuxfoundation.org/lsfmmbpf/program/schedule-at-a-glance/
[6] https://github.com/cxl-micron-reskit/famfs-linux/tree/famfs-v2


Changes since RFC v1:


* This patch series is a from-scratch refactor of the original. The code
  that maps a file to a dax device is almost identical, but a lot of
  cleanup has been done.
* The get_tree and backing device handling code has been ripped up and
  re-done (in the get-tree case, based on suggestions from Christian
  Brauner - thanks Christian; I hope I haven't done any new dumb stuff!)
  (Note this code has been extensively tested; after all known error cases
  famfs can be umounted and the module can be unloaded)
* Famfs now 'shuts down' if the dax device reports any memory errors. I/O
  and faults start reporting SIGBUS. Famfs detects memory errors via an
  iomap_ops->notify failure call from the devdax layer. This has been tested
  and appears to disable the famfs file system while leaving it able to
  dismount cleanly.
* Dropped fault counters
* Dropped support for symlinks wtihin a famfs file system; we don't think
  supporting symlinks makes sense with famfs, and it has some undesirable
  side effects, so it's out.
* Dropped support for mknod within a famfs file system (other than regular
  files and directories)
* Famfs magic number moved to magic.h
* Famfs ioctl opcodes now documented in
  Documentation/userspace-api/ioctl/ioctl-number.rst
* Dodgy kerneldoc comments cleaned up or removed; hopefully none added...
* Kconfig formatting cleaned up
* Dropped /dev/pmem support. Prior patch series would mount on either
  /dev/pmem or /dev/dax devices. This is unnecessary complexity since
  /ddev/pmem devices can be converted to /dev/dax. Famfs is, however, the
  first file system we know of that mounts from a character device.
* Famfs no longer does a filp_open() of the dax device. It finds the
  device by its dev_t and uses fs_dax_get() to effect exclusivity.
* Added a read-only module param famfs_kabi_version for checkout
  that user space was compiled for the same ABI version
* The famfs kernel module (the code in fs/famfs plus the uapi file
  famfs_ioctl.c dropped from 1030 lines of code in v1 to 760 in v2,
  according to "cloc".
* Fixed issues reported by the kernel test robot
* Many minor improvements in response to v1 code reviews


John Groves (12):
  famfs: Introduce famfs documentation
  dev_dax_iomap: Move dax_pgoff_to_phys() from device.c to bus.c
  dev_dax_iomap: Add fs_dax_get() func to prepare dax for fs-dax usage
  dev_dax_iomap: Save the kva from memremap
  dev_dax_iomap: Add dax_operations for use by fs-dax on devdax
  dev_dax_iomap: export dax_dev_get()
  famfs prep: Add fs/super.c:kill_char_super()
  famfs: module operations & fs_context
  famfs: Introduce inode_operations and super_operations
  famfs: Introduce file_operations read/write
  famfs: Introduce mmap and VM fault handling
  famfs: famfs_ioctl and core file-to-memory mapping logic & iomap_ops

 Documentation/filesystems/famfs.rst           | 135 ++++
 Documentation/filesystems/index.rst           |   1 +
 .../userspace-api/ioctl/ioctl-number.rst      |   1 +
 MAINTAINERS                                   |  11 +
 drivers/dax/Kconfig                           |   6 +
 drivers/dax/bus.c                             | 144 ++++-
 drivers/dax/dax-private.h                     |   1 +
 drivers/dax/device.c                          |  38 +-
 drivers/dax/super.c                           |  33 +-
 fs/Kconfig                                    |   2 +
 fs/Makefile                                   |   1 +
 fs/famfs/Kconfig                              |  10 +
 fs/famfs/Makefile                             |   5 +
 fs/famfs/famfs_file.c                         | 605 ++++++++++++++++++
 fs/famfs/famfs_inode.c                        | 452 +++++++++++++
 fs/famfs/famfs_internal.h                     |  52 ++
 fs/namei.c                                    |   1 +
 fs/super.c                                    |   9 +
 include/linux/dax.h                           |   6 +
 include/linux/fs.h                            |   1 +
 include/uapi/linux/famfs_ioctl.h              |  61 ++
 include/uapi/linux/magic.h                    |   1 +
 22 files changed, 1547 insertions(+), 29 deletions(-)
 create mode 100644 Documentation/filesystems/famfs.rst
 create mode 100644 fs/famfs/Kconfig
 create mode 100644 fs/famfs/Makefile
 create mode 100644 fs/famfs/famfs_file.c
 create mode 100644 fs/famfs/famfs_inode.c
 create mode 100644 fs/famfs/famfs_internal.h
 create mode 100644 include/uapi/linux/famfs_ioctl.h


base-commit: ed30a4a51bb196781c8058073ea720133a65596f
-- 
2.43.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 01/12] famfs: Introduce famfs documentation
  2024-04-29 17:04 [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system John Groves
@ 2024-04-29 17:04 ` John Groves
  2024-04-30  6:46   ` Bagas Sanjaya
  2024-04-29 17:04 ` [RFC PATCH v2 02/12] dev_dax_iomap: Move dax_pgoff_to_phys() from device.c to bus.c John Groves
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: John Groves @ 2024-04-29 17:04 UTC (permalink / raw)
  To: John Groves, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, Matthew Wilcox, linux-cxl, linux-fsdevel, nvdimm
  Cc: John Groves, john, Dave Chinner, Christoph Hellwig, dave.hansen,
	gregory.price, Randy Dunlap, Jerome Glisse, Aravind Ramesh,
	Ajay Joshi, Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang, John Groves

* Introduce Documentation/filesystems/famfs.rst into the Documentation
  tree and filesystems index
* Add famfs famfs.rst to the filesystems doc index
* Add famfs' ioctl opcodes to ioctl-number.rst
* Update MAINTAINERS FILE

Signed-off-by: John Groves <john@groves.net>
---
 Documentation/filesystems/famfs.rst           | 135 ++++++++++++++++++
 Documentation/filesystems/index.rst           |   1 +
 .../userspace-api/ioctl/ioctl-number.rst      |   1 +
 MAINTAINERS                                   |   9 ++
 4 files changed, 146 insertions(+)
 create mode 100644 Documentation/filesystems/famfs.rst

diff --git a/Documentation/filesystems/famfs.rst b/Documentation/filesystems/famfs.rst
new file mode 100644
index 000000000000..792785598d6a
--- /dev/null
+++ b/Documentation/filesystems/famfs.rst
@@ -0,0 +1,135 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _famfs_index:
+
+==================================================================
+famfs: The kernel component of the famfs shared memory file system
+==================================================================
+
+- Copyright (C) 2024 Micron Technology, Inc.
+
+Introduction
+============
+Compute Express Link (CXL) provides a mechanism for disaggregated or
+fabric-attached memory (FAM). This creates opportunities for data sharing;
+clustered apps that would otherwise have to shard or replicate data can
+share one copy in disaggregated memory.
+
+Famfs, which is not CXL-specific in any way, provides a mechanism for
+multiple hosts to use data in shared memory, by giving it a file system
+interface. With famfs, any app that understands files (which is almost
+all apps) can access data sets in shared memory. Although famfs
+supports read and write, the real point is to support mmap, which
+provides direct (dax) access to the memory - either writable or read-only.
+
+Shared memory can pose complex coherency and synchronization issues, but
+there are also simple cases. Two simple and eminently useful patterns that
+occur frequently in data analytics and AI are:
+
+* Serial Sharing - Only one host or process at a time has access to a file
+* Read-only Sharing - Multiple hosts or processes share read-only access
+  to a file
+
+The famfs kernel file system is part of the famfs framework; User space
+components [1] handle metadata allocation and distribution, and direct the
+famfs kernel module to instantiate files that map to specific memory.
+
+The famfs framework manages coherency of its own metadata and structures,
+but does not attempt to manage coherency for applications.
+
+Famfs also provides data isolation between files. That is, even though
+the host has access to an entire memory "device" (as a dax device), apps
+cannot write to memory for which the file is read-only, and mapping one
+file provides isolation from the memory of all other files. This is pretty
+basic, but some experimental shared memory usage patterns provide no such
+isolation.
+
+Principles of Operation
+=======================
+
+Without its user space components, the famfs kernel module doesn't do
+anything useful. The user space components maintain superblocks and
+metadata logs, and use the famfs kernel component to provide a file system
+view of shared memory across multiple hosts.
+
+Each host has an independent instance of the famfs kernel module. After
+mount, files are not visible until the user space component instantiates
+them (normally by playing the famfs metadata log).
+
+Once instantiated, files on each host can point to the same shared memory,
+but in-memory metadata (inodes, etc.) is ephemeral on each host that has a
+famfs instance mounted. Like ramfs, the famfs in-kernel file system has no
+backing store for metadata modifications. If metadata mutations are ever
+persisted, that must be done by the user space components. However,
+mutations to file data are saved to the shared memory - subject to write
+permission and processor cache behavior.
+
+
+Famfs is Not a Conventional File System
+---------------------------------------
+
+Famfs files can be accessed by conventional means, but there are
+limitations. The kernel component of famfs is not involved in the
+allocation of backing memory for files at all; the famfs user space
+creates files and passes the allocation extent lists into the kernel via
+the per-file FAMFSIOC_MAP_CREATE ioctl. A file that lacks this metadata is
+treated as invalid by the famfs kernel module. As a practical matter files
+must be created via the famfs library or cli, but they can be consumed as
+if they were conventional files.
+
+Famfs differs in some important ways from conventional file systems:
+
+* Files must be pre-allocated by the famfs framework; Allocation is never
+  performed on (or after) write.
+* Any operation that changes a file's size is considered to put the file
+  in an invalid state, disabling access to the data. It may be possible to
+  revisit this in the future. (Typically the famfs user space can restore
+  files to a valid state by replaying the famfs metadata log.)
+
+Famfs exists to apply the existing file system abstractions to shared
+memory so applications and workflows can more easily adapt to an
+environment with disaggregated shared memory.
+
+Memory Error Handling
+=====================
+
+Possible memory errors include timeouts, poison and unexpected
+reconfiguration of an underlying dax device. In all of these cases, famfs
+receives a call via its iomap_ops->notify_failure() function. If any
+memory errors have been detected, Access to the affected famfs mount is
+disabled to avoid further errors or corruption. Testing indicates that
+a famfs instance that has encountered errors can be unmounted cleanly, but
+Repairing memory errors or corruption is outside the scope of famfs.
+
+Key Requirements
+================
+
+The primary requirements for famfs are:
+
+1. Must support a file system abstraction backed by sharable dax memory
+2. Files must efficiently handle VMA faults
+3. Must support metadata distribution in a sharable way
+4. Must handle clients with a stale copy of metadata
+
+The famfs kernel component takes care of 1-2 above by caching each file's
+mapping metadata in the kernel.
+
+Requirements 3 and 4 are handled by the user space components, and are
+largely orthogonal to the functionality of the famfs kernel module.
+
+Requirements 3 and 4 cannot be met by conventional fs-dax file systems
+(e.g. xfs and ext4) because they use write-back metadata; it is not valid
+to mount such a file system on two hosts from the same in-memory image.
+
+
+Famfs Usage
+===========
+
+Famfs usage is documented at [1].
+
+
+References
+==========
+
+- [1] Famfs user space repository and documentation
+      https://github.com/cxl-micron-reskit/famfs
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 1f9b4c905a6a..0fe2c70a106f 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -87,6 +87,7 @@ Documentation for filesystem implementations.
    ext3
    ext4/index
    f2fs
+   famfs
    gfs2
    gfs2-uevents
    gfs2-glocks
diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index c472423412bf..ac407802cf10 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -289,6 +289,7 @@ Code  Seq#    Include File                                           Comments
 'u'   00-1F  linux/smb_fs.h                                          gone
 'u'   20-3F  linux/uvcvideo.h                                        USB video class host driver
 'u'   40-4f  linux/udmabuf.h                                         userspace dma-buf misc device
+'u'   50-5F  linux/famfs_ioctl.h                                     famfs shared memory file system
 'v'   00-1F  linux/ext2_fs.h                                         conflict!
 'v'   00-1F  linux/fs.h                                              conflict!
 'v'   00-0F  linux/sonypi.h                                          conflict!
diff --git a/MAINTAINERS b/MAINTAINERS
index ebf03f5f0619..3f2d847dcf01 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8180,6 +8180,15 @@ F:	Documentation/networking/failover.rst
 F:	include/net/failover.h
 F:	net/core/failover.c
 
+FAMFS
+M:	John Groves <jgroves@micron.com>
+M:	John Groves <John@Groves.net>
+M:	John Groves <john@jagalactic.com>
+L:	linux-cxl@vger.kernel.org
+L:	linux-fsdevel@vger.kernel.org
+S:	Supported
+F:	Documentation/filesystems/famfs.rst
+
 FANOTIFY
 M:	Jan Kara <jack@suse.cz>
 R:	Amir Goldstein <amir73il@gmail.com>
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 02/12] dev_dax_iomap: Move dax_pgoff_to_phys() from device.c to bus.c
  2024-04-29 17:04 [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system John Groves
  2024-04-29 17:04 ` [RFC PATCH v2 01/12] famfs: Introduce famfs documentation John Groves
@ 2024-04-29 17:04 ` John Groves
  2024-04-29 17:04 ` [RFC PATCH v2 03/12] dev_dax_iomap: Add fs_dax_get() func to prepare dax for fs-dax usage John Groves
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: John Groves @ 2024-04-29 17:04 UTC (permalink / raw)
  To: John Groves, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, Matthew Wilcox, linux-cxl, linux-fsdevel, nvdimm
  Cc: John Groves, john, Dave Chinner, Christoph Hellwig, dave.hansen,
	gregory.price, Randy Dunlap, Jerome Glisse, Aravind Ramesh,
	Ajay Joshi, Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang, John Groves

No changes to the function - just moved it.

dev_dax_iomap needs to call this function from
drivers/dax/bus.c.

drivers/dax/bus.c can't call functions in drivers/dax/device.c -
that creates a circular linkage dependency - but device.c can
call functions in bus.c. Also exports dax_pgoff_to_phys() since
both bus.c and device.c now call it.

Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/bus.c    | 24 ++++++++++++++++++++++++
 drivers/dax/device.c | 23 -----------------------
 2 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 797e1ebff299..f894272beab8 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1447,6 +1447,30 @@ static const struct device_type dev_dax_type = {
 	.groups = dax_attribute_groups,
 };
 
+/* see "strong" declaration in tools/testing/nvdimm/dax-dev.c  */
+__weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff,
+			      unsigned long size)
+{
+	int i;
+
+	for (i = 0; i < dev_dax->nr_range; i++) {
+		struct dev_dax_range *dax_range = &dev_dax->ranges[i];
+		struct range *range = &dax_range->range;
+		unsigned long long pgoff_end;
+		phys_addr_t phys;
+
+		pgoff_end = dax_range->pgoff + PHYS_PFN(range_len(range)) - 1;
+		if (pgoff < dax_range->pgoff || pgoff > pgoff_end)
+			continue;
+		phys = PFN_PHYS(pgoff - dax_range->pgoff) + range->start;
+		if (phys + size - 1 <= range->end)
+			return phys;
+		break;
+	}
+	return -1;
+}
+EXPORT_SYMBOL_GPL(dax_pgoff_to_phys);
+
 static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data)
 {
 	struct dax_region *dax_region = data->dax_region;
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 93ebedc5ec8c..40ba660013cf 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -50,29 +50,6 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
 	return 0;
 }
 
-/* see "strong" declaration in tools/testing/nvdimm/dax-dev.c */
-__weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff,
-		unsigned long size)
-{
-	int i;
-
-	for (i = 0; i < dev_dax->nr_range; i++) {
-		struct dev_dax_range *dax_range = &dev_dax->ranges[i];
-		struct range *range = &dax_range->range;
-		unsigned long long pgoff_end;
-		phys_addr_t phys;
-
-		pgoff_end = dax_range->pgoff + PHYS_PFN(range_len(range)) - 1;
-		if (pgoff < dax_range->pgoff || pgoff > pgoff_end)
-			continue;
-		phys = PFN_PHYS(pgoff - dax_range->pgoff) + range->start;
-		if (phys + size - 1 <= range->end)
-			return phys;
-		break;
-	}
-	return -1;
-}
-
 static void dax_set_mapping(struct vm_fault *vmf, pfn_t pfn,
 			      unsigned long fault_size)
 {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 03/12] dev_dax_iomap: Add fs_dax_get() func to prepare dax for fs-dax usage
  2024-04-29 17:04 [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system John Groves
  2024-04-29 17:04 ` [RFC PATCH v2 01/12] famfs: Introduce famfs documentation John Groves
  2024-04-29 17:04 ` [RFC PATCH v2 02/12] dev_dax_iomap: Move dax_pgoff_to_phys() from device.c to bus.c John Groves
@ 2024-04-29 17:04 ` John Groves
  2024-04-29 17:04 ` [RFC PATCH v2 04/12] dev_dax_iomap: Save the kva from memremap John Groves
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: John Groves @ 2024-04-29 17:04 UTC (permalink / raw)
  To: John Groves, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, Matthew Wilcox, linux-cxl, linux-fsdevel, nvdimm
  Cc: John Groves, john, Dave Chinner, Christoph Hellwig, dave.hansen,
	gregory.price, Randy Dunlap, Jerome Glisse, Aravind Ramesh,
	Ajay Joshi, Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang, John Groves

This function should be called by fs-dax file systems after opening the
devdax device. This adds holder_operations, which effects exclusivity
between callers of fs_dax_get().

This function serves the same role as fs_dax_get_by_bdev(), which dax
file systems call after opening the pmem block device.

This also adds the CONFIG_DEV_DAX_IOMAP Kconfig parameter

Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/Kconfig |  6 ++++++
 drivers/dax/super.c | 30 ++++++++++++++++++++++++++++++
 include/linux/dax.h |  5 +++++
 3 files changed, 41 insertions(+)

diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index a88744244149..b1ebcc77120b 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -78,4 +78,10 @@ config DEV_DAX_KMEM
 
 	  Say N if unsure.
 
+config DEV_DAX_IOMAP
+       depends on DEV_DAX && DAX
+       def_bool y
+       help
+         Support iomap mapping of devdax devices (for FS-DAX file
+         systems that reside on character /dev/dax devices)
 endif
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index aca71d7fccc1..4b55f79849b0 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -122,6 +122,36 @@ void fs_put_dax(struct dax_device *dax_dev, void *holder)
 EXPORT_SYMBOL_GPL(fs_put_dax);
 #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
 
+#if IS_ENABLED(CONFIG_DEV_DAX_IOMAP)
+/**
+ * fs_dax_get()
+ *
+ * fs-dax file systems call this function to prepare to use a devdax device for
+ * fsdax. This is like fs_dax_get_by_bdev(), but the caller already has struct
+ * dev_dax (and there  * is no bdev). The holder makes this exclusive.
+ *
+ * @dax_dev: dev to be prepared for fs-dax usage
+ * @holder: filesystem or mapped device inside the dax_device
+ * @hops: operations for the inner holder
+ *
+ * Returns: 0 on success, <0 on failure
+ */
+int fs_dax_get(struct dax_device *dax_dev, void *holder,
+	const struct dax_holder_operations *hops)
+{
+	if (!dax_dev || !dax_alive(dax_dev) || !igrab(&dax_dev->inode))
+		return -ENODEV;
+
+	if (cmpxchg(&dax_dev->holder_data, NULL, holder))
+		return -EBUSY;
+
+	dax_dev->holder_ops = hops;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(fs_dax_get);
+#endif /* DEV_DAX_IOMAP */
+
 enum dax_device_flags {
 	/* !alive + rcu grace period == no new operations / mappings */
 	DAXDEV_ALIVE,
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 9d3e3327af4c..4a86716f932a 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -57,6 +57,11 @@ struct dax_holder_operations {
 
 #if IS_ENABLED(CONFIG_DAX)
 struct dax_device *alloc_dax(void *private, const struct dax_operations *ops);
+
+#if IS_ENABLED(CONFIG_DEV_DAX_IOMAP)
+int fs_dax_get(struct dax_device *dax_dev, void *holder, const struct dax_holder_operations *hops);
+struct dax_device *inode_dax(struct inode *inode);
+#endif
 void *dax_holder(struct dax_device *dax_dev);
 void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 04/12] dev_dax_iomap: Save the kva from memremap
  2024-04-29 17:04 [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system John Groves
                   ` (2 preceding siblings ...)
  2024-04-29 17:04 ` [RFC PATCH v2 03/12] dev_dax_iomap: Add fs_dax_get() func to prepare dax for fs-dax usage John Groves
@ 2024-04-29 17:04 ` John Groves
  2024-04-29 17:04 ` [RFC PATCH v2 05/12] dev_dax_iomap: Add dax_operations for use by fs-dax on devdax John Groves
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: John Groves @ 2024-04-29 17:04 UTC (permalink / raw)
  To: John Groves, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, Matthew Wilcox, linux-cxl, linux-fsdevel, nvdimm
  Cc: John Groves, john, Dave Chinner, Christoph Hellwig, dave.hansen,
	gregory.price, Randy Dunlap, Jerome Glisse, Aravind Ramesh,
	Ajay Joshi, Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang, John Groves

Save the kva from memremap because we need it for iomap rw support.

Prior to famfs, there were no iomap users of /dev/dax - so the virtual
address from memremap was not needed.

Also: in some cases dev_dax_probe() is called with the first
dev_dax->range offset past the start of pgmap[0].range. In those cases
we need to add the difference to virt_addr in order to have the physaddr's
in dev_dax->ranges match dev_dax->virt_addr.

This happens with devdax devices that started as pmem and got converted
to devdax. I'm not sure whether the offset is due to label storage, or
page tables, but this works in all known cases.

Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/dax-private.h |  1 +
 drivers/dax/device.c      | 15 +++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index 446617b73aea..df5b3d975df4 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -63,6 +63,7 @@ struct dax_mapping {
 struct dev_dax {
 	struct dax_region *region;
 	struct dax_device *dax_dev;
+	void *virt_addr;
 	unsigned int align;
 	int target_node;
 	bool dyn_id;
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 40ba660013cf..17323b5f6f57 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -372,6 +372,7 @@ static int dev_dax_probe(struct dev_dax *dev_dax)
 	struct dax_device *dax_dev = dev_dax->dax_dev;
 	struct device *dev = &dev_dax->dev;
 	struct dev_pagemap *pgmap;
+	u64 data_offset = 0;
 	struct inode *inode;
 	struct cdev *cdev;
 	void *addr;
@@ -426,6 +427,20 @@ static int dev_dax_probe(struct dev_dax *dev_dax)
 	if (IS_ERR(addr))
 		return PTR_ERR(addr);
 
+	/* Detect whether the data is at a non-zero offset into the memory */
+	if (pgmap->range.start != dev_dax->ranges[0].range.start) {
+		u64 phys = dev_dax->ranges[0].range.start;
+		u64 pgmap_phys = dev_dax->pgmap[0].range.start;
+		u64 vmemmap_shift = dev_dax->pgmap[0].vmemmap_shift;
+
+		if (!WARN_ON(pgmap_phys > phys))
+			data_offset = phys - pgmap_phys;
+
+		pr_debug("%s: offset detected phys=%llx pgmap_phys=%llx offset=%llx shift=%llx\n",
+		       __func__, phys, pgmap_phys, data_offset, vmemmap_shift);
+	}
+	dev_dax->virt_addr = addr + data_offset;
+
 	inode = dax_inode(dax_dev);
 	cdev = inode->i_cdev;
 	cdev_init(cdev, &dax_fops);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 05/12] dev_dax_iomap: Add dax_operations for use by fs-dax on devdax
  2024-04-29 17:04 [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system John Groves
                   ` (3 preceding siblings ...)
  2024-04-29 17:04 ` [RFC PATCH v2 04/12] dev_dax_iomap: Save the kva from memremap John Groves
@ 2024-04-29 17:04 ` John Groves
  2024-04-29 17:04 ` [RFC PATCH v2 06/12] dev_dax_iomap: export dax_dev_get() John Groves
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: John Groves @ 2024-04-29 17:04 UTC (permalink / raw)
  To: John Groves, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, Matthew Wilcox, linux-cxl, linux-fsdevel, nvdimm
  Cc: John Groves, john, Dave Chinner, Christoph Hellwig, dave.hansen,
	gregory.price, Randy Dunlap, Jerome Glisse, Aravind Ramesh,
	Ajay Joshi, Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang, John Groves

Notes about this commit:

* These methods are based on pmem_dax_ops from drivers/nvdimm/pmem.c

* dev_dax_direct_access() is returns the hpa, pfn and kva. The kva was
  newly stored as dev_dax->virt_addr by dev_dax_probe().

* The hpa/pfn are used for mmap (dax_iomap_fault()), and the kva is used
  for read/write (dax_iomap_rw())

* dev_dax_recovery_write() and dev_dax_zero_page_range() have not been
  tested yet. I'm looking for suggestions as to how to test those.

Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/bus.c | 120 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 115 insertions(+), 5 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index f894272beab8..9c57d4139b74 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -7,6 +7,10 @@
 #include <linux/slab.h>
 #include <linux/dax.h>
 #include <linux/io.h>
+#include <linux/backing-dev.h>
+#include <linux/pfn_t.h>
+#include <linux/range.h>
+#include <linux/uio.h>
 #include "dax-private.h"
 #include "bus.h"
 
@@ -1471,6 +1475,105 @@ __weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff,
 }
 EXPORT_SYMBOL_GPL(dax_pgoff_to_phys);
 
+#if IS_ENABLED(CONFIG_DEV_DAX_IOMAP)
+
+static void write_dax(void *pmem_addr, struct page *page,
+		unsigned int off, unsigned int len)
+{
+	unsigned int chunk;
+	void *mem;
+
+	while (len) {
+		mem = kmap_local_page(page);
+		chunk = min_t(unsigned int, len, PAGE_SIZE - off);
+		memcpy_flushcache(pmem_addr, mem + off, chunk);
+		kunmap_local(mem);
+		len -= chunk;
+		off = 0;
+		page++;
+		pmem_addr += chunk;
+	}
+}
+
+static long __dev_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
+			long nr_pages, enum dax_access_mode mode, void **kaddr,
+			pfn_t *pfn)
+{
+	struct dev_dax *dev_dax = dax_get_private(dax_dev);
+	size_t size = nr_pages << PAGE_SHIFT;
+	size_t offset = pgoff << PAGE_SHIFT;
+	void *virt_addr = dev_dax->virt_addr + offset;
+	u64 flags = PFN_DEV|PFN_MAP;
+	phys_addr_t phys;
+	pfn_t local_pfn;
+	size_t dax_size;
+
+	WARN_ON(!dev_dax->virt_addr);
+
+	if (down_read_interruptible(&dax_dev_rwsem))
+		return 0; /* no valid data since we were killed */
+	dax_size = dev_dax_size(dev_dax);
+	up_read(&dax_dev_rwsem);
+
+	phys = dax_pgoff_to_phys(dev_dax, pgoff, nr_pages << PAGE_SHIFT);
+
+	if (kaddr)
+		*kaddr = virt_addr;
+
+	local_pfn = phys_to_pfn_t(phys, flags); /* are flags correct? */
+	if (pfn)
+		*pfn = local_pfn;
+
+	/* This the valid size at the specified address */
+	return PHYS_PFN(min_t(size_t, size, dax_size - offset));
+}
+
+static int dev_dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff,
+				    size_t nr_pages)
+{
+	long resid = nr_pages << PAGE_SHIFT;
+	long offset = pgoff << PAGE_SHIFT;
+
+	/* Break into one write per dax region */
+	while (resid > 0) {
+		void *kaddr;
+		pgoff_t poff = offset >> PAGE_SHIFT;
+		long len = __dev_dax_direct_access(dax_dev, poff,
+						   nr_pages, DAX_ACCESS, &kaddr, NULL);
+		len = min_t(long, len, PAGE_SIZE);
+		write_dax(kaddr, ZERO_PAGE(0), offset, len);
+
+		offset += len;
+		resid  -= len;
+	}
+	return 0;
+}
+
+static long dev_dax_direct_access(struct dax_device *dax_dev,
+		pgoff_t pgoff, long nr_pages, enum dax_access_mode mode,
+		void **kaddr, pfn_t *pfn)
+{
+	return __dev_dax_direct_access(dax_dev, pgoff, nr_pages, mode, kaddr, pfn);
+}
+
+static size_t dev_dax_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff,
+		void *addr, size_t bytes, struct iov_iter *i)
+{
+	size_t off;
+
+	off = offset_in_page(addr);
+
+	return _copy_from_iter_flushcache(addr, bytes, i);
+}
+
+static const struct dax_operations dev_dax_ops = {
+	.direct_access = dev_dax_direct_access,
+	.zero_page_range = dev_dax_zero_page_range,
+	.recovery_write = dev_dax_recovery_write,
+};
+
+#endif /* IS_ENABLED(CONFIG_DEV_DAX_IOMAP) */
+
 static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data)
 {
 	struct dax_region *dax_region = data->dax_region;
@@ -1526,11 +1629,18 @@ static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data)
 		}
 	}
 
-	/*
-	 * No dax_operations since there is no access to this device outside of
-	 * mmap of the resulting character device.
-	 */
-	dax_dev = alloc_dax(dev_dax, NULL);
+	if (IS_ENABLED(CONFIG_DEV_DAX_IOMAP))
+		/* holder_ops currently populated separately in a slightly
+		 * hacky way
+		 */
+		dax_dev = alloc_dax(dev_dax, &dev_dax_ops);
+	else
+		/*
+		 * No dax_operations since there is no access to this device
+		 * outside of mmap of the resulting character device.
+		 */
+		dax_dev = alloc_dax(dev_dax, NULL);
+
 	if (IS_ERR(dax_dev)) {
 		rc = PTR_ERR(dax_dev);
 		goto err_alloc_dax;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 06/12] dev_dax_iomap: export dax_dev_get()
  2024-04-29 17:04 [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system John Groves
                   ` (4 preceding siblings ...)
  2024-04-29 17:04 ` [RFC PATCH v2 05/12] dev_dax_iomap: Add dax_operations for use by fs-dax on devdax John Groves
@ 2024-04-29 17:04 ` John Groves
  2024-04-29 17:04 ` [RFC PATCH v2 07/12] famfs prep: Add fs/super.c:kill_char_super() John Groves
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: John Groves @ 2024-04-29 17:04 UTC (permalink / raw)
  To: John Groves, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, Matthew Wilcox, linux-cxl, linux-fsdevel, nvdimm
  Cc: John Groves, john, Dave Chinner, Christoph Hellwig, dave.hansen,
	gregory.price, Randy Dunlap, Jerome Glisse, Aravind Ramesh,
	Ajay Joshi, Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang, John Groves

famfs needs access to dev_dax_get()

Signed-off-by: John Groves <john@groves.net>
---
 drivers/dax/super.c | 3 ++-
 include/linux/dax.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 4b55f79849b0..8475093ba973 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -452,7 +452,7 @@ static int dax_set(struct inode *inode, void *data)
 	return 0;
 }
 
-static struct dax_device *dax_dev_get(dev_t devt)
+struct dax_device *dax_dev_get(dev_t devt)
 {
 	struct dax_device *dax_dev;
 	struct inode *inode;
@@ -475,6 +475,7 @@ static struct dax_device *dax_dev_get(dev_t devt)
 
 	return dax_dev;
 }
+EXPORT_SYMBOL_GPL(dax_dev_get);
 
 struct dax_device *alloc_dax(void *private, const struct dax_operations *ops)
 {
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 4a86716f932a..29d3dd6452c3 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -61,6 +61,7 @@ struct dax_device *alloc_dax(void *private, const struct dax_operations *ops);
 #if IS_ENABLED(CONFIG_DEV_DAX_IOMAP)
 int fs_dax_get(struct dax_device *dax_dev, void *holder, const struct dax_holder_operations *hops);
 struct dax_device *inode_dax(struct inode *inode);
+struct dax_device *dax_dev_get(dev_t devt);
 #endif
 void *dax_holder(struct dax_device *dax_dev);
 void put_dax(struct dax_device *dax_dev);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 07/12] famfs prep: Add fs/super.c:kill_char_super()
  2024-04-29 17:04 [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system John Groves
                   ` (5 preceding siblings ...)
  2024-04-29 17:04 ` [RFC PATCH v2 06/12] dev_dax_iomap: export dax_dev_get() John Groves
@ 2024-04-29 17:04 ` John Groves
  2024-05-02 18:17   ` Al Viro
  2024-04-29 17:04 ` [RFC PATCH v2 08/12] famfs: module operations & fs_context John Groves
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: John Groves @ 2024-04-29 17:04 UTC (permalink / raw)
  To: John Groves, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, Matthew Wilcox, linux-cxl, linux-fsdevel, nvdimm
  Cc: John Groves, john, Dave Chinner, Christoph Hellwig, dave.hansen,
	gregory.price, Randy Dunlap, Jerome Glisse, Aravind Ramesh,
	Ajay Joshi, Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang, John Groves

Famfs needs a slightly different kill_super variant than already existed.
Putting it local to famfs would require exporting d_genocide(); this
seemed a bit cleaner.

Signed-off-by: John Groves <john@groves.net>
---
 fs/super.c         | 9 +++++++++
 include/linux/fs.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/fs/super.c b/fs/super.c
index 69ce6c600968..cd276d30b522 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1236,6 +1236,15 @@ void kill_litter_super(struct super_block *sb)
 }
 EXPORT_SYMBOL(kill_litter_super);
 
+void kill_char_super(struct super_block *sb)
+{
+	if (sb->s_root)
+		d_genocide(sb->s_root);
+	generic_shutdown_super(sb);
+	kill_super_notify(sb);
+}
+EXPORT_SYMBOL(kill_char_super);
+
 int set_anon_super_fc(struct super_block *sb, struct fs_context *fc)
 {
 	return set_anon_super(sb, NULL);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8dfd53b52744..cc586f30397d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2511,6 +2511,7 @@ void generic_shutdown_super(struct super_block *sb);
 void kill_block_super(struct super_block *sb);
 void kill_anon_super(struct super_block *sb);
 void kill_litter_super(struct super_block *sb);
+void kill_char_super(struct super_block *sb);
 void deactivate_super(struct super_block *sb);
 void deactivate_locked_super(struct super_block *sb);
 int set_anon_super(struct super_block *s, void *data);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 08/12] famfs: module operations & fs_context
  2024-04-29 17:04 [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system John Groves
                   ` (6 preceding siblings ...)
  2024-04-29 17:04 ` [RFC PATCH v2 07/12] famfs prep: Add fs/super.c:kill_char_super() John Groves
@ 2024-04-29 17:04 ` John Groves
  2024-04-30 11:01   ` Christian Brauner
  2024-05-02 18:23   ` Al Viro
  2024-04-29 17:04 ` [RFC PATCH v2 09/12] famfs: Introduce inode_operations and super_operations John Groves
                   ` (4 subsequent siblings)
  12 siblings, 2 replies; 32+ messages in thread
From: John Groves @ 2024-04-29 17:04 UTC (permalink / raw)
  To: John Groves, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, Matthew Wilcox, linux-cxl, linux-fsdevel, nvdimm
  Cc: John Groves, john, Dave Chinner, Christoph Hellwig, dave.hansen,
	gregory.price, Randy Dunlap, Jerome Glisse, Aravind Ramesh,
	Ajay Joshi, Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang, John Groves

Start building up from the famfs module operations. This commit
includes the following:

* Register as a file system
* Parse mount parameters
* Allocate or find (and initialize) a superblock via famfs_get_tree()
* Lookup the host dax device, and bail if it's in use (or not dax)
* Register as the holder of the dax device if it's available
* Add Kconfig and Makefile misc to build famfs
* Add FAMFS_SUPER_MAGIC to include/uapi/linux/magic.h
* Add export of fs/namei.c:may_open_dev(), which famfs needs to call
* Update MAINTAINERS file for the fs/famfs/ path

The following exports had to happen to enable famfs:

* This uses the new fs/super.c:kill_char_super() - the other kill*super
  helpers were not quite right.
* This uses the dev_dax_iomap export of dax_dev_get()

This commit builds but is otherwise too incomplete to run

Signed-off-by: John Groves <john@groves.net>
---
 MAINTAINERS                |   1 +
 fs/Kconfig                 |   2 +
 fs/Makefile                |   1 +
 fs/famfs/Kconfig           |  10 ++
 fs/famfs/Makefile          |   5 +
 fs/famfs/famfs_inode.c     | 345 +++++++++++++++++++++++++++++++++++++
 fs/famfs/famfs_internal.h  |  36 ++++
 fs/namei.c                 |   1 +
 include/uapi/linux/magic.h |   1 +
 9 files changed, 402 insertions(+)
 create mode 100644 fs/famfs/Kconfig
 create mode 100644 fs/famfs/Makefile
 create mode 100644 fs/famfs/famfs_inode.c
 create mode 100644 fs/famfs/famfs_internal.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 3f2d847dcf01..365d678e2f40 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8188,6 +8188,7 @@ L:	linux-cxl@vger.kernel.org
 L:	linux-fsdevel@vger.kernel.org
 S:	Supported
 F:	Documentation/filesystems/famfs.rst
+F:	fs/famfs
 
 FANOTIFY
 M:	Jan Kara <jack@suse.cz>
diff --git a/fs/Kconfig b/fs/Kconfig
index a46b0cbc4d8f..53b4629e92a0 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -140,6 +140,8 @@ source "fs/autofs/Kconfig"
 source "fs/fuse/Kconfig"
 source "fs/overlayfs/Kconfig"
 
+source "fs/famfs/Kconfig"
+
 menu "Caches"
 
 source "fs/netfs/Kconfig"
diff --git a/fs/Makefile b/fs/Makefile
index 6ecc9b0a53f2..3393f399a9e9 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -129,3 +129,4 @@ obj-$(CONFIG_EFIVAR_FS)		+= efivarfs/
 obj-$(CONFIG_EROFS_FS)		+= erofs/
 obj-$(CONFIG_VBOXSF_FS)		+= vboxsf/
 obj-$(CONFIG_ZONEFS_FS)		+= zonefs/
+obj-$(CONFIG_FAMFS)             += famfs/
diff --git a/fs/famfs/Kconfig b/fs/famfs/Kconfig
new file mode 100644
index 000000000000..edb8980820f7
--- /dev/null
+++ b/fs/famfs/Kconfig
@@ -0,0 +1,10 @@
+
+
+config FAMFS
+       tristate "famfs: shared memory file system"
+       depends on DEV_DAX && FS_DAX && DEV_DAX_IOMAP
+       help
+	  Support for the famfs file system. Famfs is a dax file system that
+	  can support scale-out shared access to fabric-attached memory
+	  (e.g. CXL shared memory). Famfs is not a general purpose file system;
+	  it is an enabler for data sets in shared memory.
diff --git a/fs/famfs/Makefile b/fs/famfs/Makefile
new file mode 100644
index 000000000000..62230bcd6793
--- /dev/null
+++ b/fs/famfs/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_FAMFS) += famfs.o
+
+famfs-y := famfs_inode.o
diff --git a/fs/famfs/famfs_inode.c b/fs/famfs/famfs_inode.c
new file mode 100644
index 000000000000..61306240fc0b
--- /dev/null
+++ b/fs/famfs/famfs_inode.c
@@ -0,0 +1,345 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * famfs - dax file system for shared fabric-attached memory
+ *
+ * Copyright 2023-2024 Micron Technology, inc
+ *
+ * This file system, originally based on ramfs the dax support from xfs,
+ * is intended to allow multiple host systems to mount a common file system
+ * view of dax files that map to shared memory.
+ */
+
+#include <linux/fs.h>
+#include <linux/time.h>
+#include <linux/init.h>
+#include <linux/string.h>
+#include <linux/parser.h>
+#include <linux/magic.h>
+#include <linux/slab.h>
+#include <linux/fs_context.h>
+#include <linux/fs_parser.h>
+#include <linux/dax.h>
+#include <linux/hugetlb.h>
+#include <linux/iomap.h>
+#include <linux/path.h>
+#include <linux/namei.h>
+
+#include "famfs_internal.h"
+
+#define FAMFS_DEFAULT_MODE	0755
+
+static struct inode *famfs_get_inode(struct super_block *sb,
+				     const struct inode *dir,
+				     umode_t mode, dev_t dev)
+{
+	struct inode *inode = new_inode(sb);
+	struct timespec64 tv;
+
+	if (!inode)
+		return NULL;
+
+	inode->i_ino = get_next_ino();
+	inode_init_owner(&nop_mnt_idmap, inode, dir, mode);
+	inode->i_mapping->a_ops = &ram_aops;
+	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
+	mapping_set_unevictable(inode->i_mapping);
+	tv = inode_set_ctime_current(inode);
+	inode_set_mtime_to_ts(inode, tv);
+	inode_set_atime_to_ts(inode, tv);
+
+	switch (mode & S_IFMT) {
+	default:
+		init_special_inode(inode, mode, dev);
+		break;
+	case S_IFREG:
+		inode->i_op = NULL /* famfs_file_inode_operations */;
+		inode->i_fop = NULL /* &famfs_file_operations */;
+		break;
+	case S_IFDIR:
+		inode->i_op = NULL /* famfs_dir_inode_operations */;
+		inode->i_fop = &simple_dir_operations;
+
+		/* Directory inodes start off with i_nlink == 2 (for ".") */
+		inc_nlink(inode);
+		break;
+	case S_IFLNK:
+		inode->i_op = &page_symlink_inode_operations;
+		inode_nohighmem(inode);
+		break;
+	}
+	return inode;
+}
+
+/*
+ * famfs dax_operations  (for char dax)
+ */
+static int
+famfs_dax_notify_failure(struct dax_device *dax_dev, u64 offset,
+			u64 len, int mf_flags)
+{
+	struct super_block *sb = dax_holder(dax_dev);
+	struct famfs_fs_info *fsi = sb->s_fs_info;
+
+	pr_err("%s: rootdev=%s offset=%lld len=%llu flags=%x\n", __func__,
+	       fsi->rootdev, offset, len, mf_flags);
+
+	return 0;
+}
+
+static const struct dax_holder_operations famfs_dax_holder_ops = {
+	.notify_failure		= famfs_dax_notify_failure,
+};
+
+/*****************************************************************************
+ * fs_context_operations
+ */
+
+static int
+famfs_fill_super(struct super_block *sb, struct fs_context *fc)
+{
+	int rc = 0;
+
+	sb->s_maxbytes		= MAX_LFS_FILESIZE;
+	sb->s_blocksize		= PAGE_SIZE;
+	sb->s_blocksize_bits	= PAGE_SHIFT;
+	sb->s_magic		= FAMFS_SUPER_MAGIC;
+	sb->s_op		= NULL /* famfs_super_ops */;
+	sb->s_time_gran		= 1;
+
+	return rc;
+}
+
+static int
+lookup_daxdev(const char *pathname, dev_t *devno)
+{
+	struct inode *inode;
+	struct path path;
+	int err;
+
+	if (!pathname || !*pathname)
+		return -EINVAL;
+
+	err = kern_path(pathname, LOOKUP_FOLLOW, &path);
+	if (err)
+		return err;
+
+	inode = d_backing_inode(path.dentry);
+	if (!S_ISCHR(inode->i_mode)) {
+		err = -EINVAL;
+		goto out_path_put;
+	}
+
+	if (!may_open_dev(&path)) { /* had to export this */
+		err = -EACCES;
+		goto out_path_put;
+	}
+
+	 /* if it's dax, i_rdev is struct dax_device */
+	*devno = inode->i_rdev;
+
+out_path_put:
+	path_put(&path);
+	return err;
+}
+
+static int
+famfs_get_tree(struct fs_context *fc)
+{
+	struct famfs_fs_info *fsi = fc->s_fs_info;
+	struct dax_device *dax_devp;
+	struct super_block *sb;
+	struct inode *inode;
+	dev_t daxdevno;
+	int err;
+
+	/* TODO: clean up chatty messages */
+
+	err = lookup_daxdev(fc->source, &daxdevno);
+	if (err)
+		return err;
+
+	fsi->daxdevno = daxdevno;
+
+	/* This will set sb->s_dev=daxdevno */
+	sb = sget_dev(fc, daxdevno);
+	if (IS_ERR(sb)) {
+		pr_err("%s: sget_dev error\n", __func__);
+		return PTR_ERR(sb);
+	}
+
+	if (sb->s_root) {
+		pr_info("%s: found a matching suerblock for %s\n",
+			__func__, fc->source);
+
+		/* We don't expect to find a match by dev_t; if we do, it must
+		 * already be mounted, so we bail
+		 */
+		err = -EBUSY;
+		goto deactivate_out;
+	} else {
+		pr_info("%s: initializing new superblock for %s\n",
+			__func__, fc->source);
+		err = famfs_fill_super(sb, fc);
+		if (err)
+			goto deactivate_out;
+	}
+
+	/* This will fail if it's not a dax device */
+	dax_devp = dax_dev_get(daxdevno);
+	if (!dax_devp) {
+		pr_warn("%s: device %s not found or not dax\n",
+		       __func__, fc->source);
+		err = -ENODEV;
+		goto deactivate_out;
+	}
+
+	err = fs_dax_get(dax_devp, sb, &famfs_dax_holder_ops);
+	if (err) {
+		pr_err("%s: fs_dax_get(%lld) failed\n", __func__, (u64)daxdevno);
+		err = -EBUSY;
+		goto deactivate_out;
+	}
+	fsi->dax_devp = dax_devp;
+
+	inode = famfs_get_inode(sb, NULL, S_IFDIR | fsi->mount_opts.mode, 0);
+	sb->s_root = d_make_root(inode);
+	if (!sb->s_root) {
+		pr_err("%s: d_make_root() failed\n", __func__);
+		err = -ENOMEM;
+		fs_put_dax(fsi->dax_devp, sb);
+		goto deactivate_out;
+	}
+
+	sb->s_flags |= SB_ACTIVE;
+
+	WARN_ON(fc->root);
+	fc->root = dget(sb->s_root);
+	return err;
+
+deactivate_out:
+	pr_debug("%s: deactivating sb=%llx\n", __func__, (u64)sb);
+	deactivate_locked_super(sb);
+	return err;
+}
+
+/*****************************************************************************/
+
+enum famfs_param {
+	Opt_mode,
+	Opt_dax,
+};
+
+const struct fs_parameter_spec famfs_fs_parameters[] = {
+	fsparam_u32oct("mode",	  Opt_mode),
+	fsparam_string("dax",     Opt_dax),
+	{}
+};
+
+static int famfs_parse_param(struct fs_context *fc, struct fs_parameter *param)
+{
+	struct famfs_fs_info *fsi = fc->s_fs_info;
+	struct fs_parse_result result;
+	int opt;
+
+	opt = fs_parse(fc, famfs_fs_parameters, param, &result);
+	if (opt == -ENOPARAM) {
+		opt = vfs_parse_fs_param_source(fc, param);
+		if (opt != -ENOPARAM)
+			return opt;
+
+		return 0;
+	}
+	if (opt < 0)
+		return opt;
+
+	switch (opt) {
+	case Opt_mode:
+		fsi->mount_opts.mode = result.uint_32 & S_IALLUGO;
+		break;
+	case Opt_dax:
+		if (strcmp(param->string, "always"))
+			pr_notice("%s: invalid dax mode %s\n",
+				  __func__, param->string);
+		break;
+	}
+
+	return 0;
+}
+
+static void famfs_free_fc(struct fs_context *fc)
+{
+	struct famfs_fs_info *fsi = fc->s_fs_info;
+
+	if (fsi && fsi->rootdev)
+		kfree(fsi->rootdev);
+
+	kfree(fsi);
+}
+
+static const struct fs_context_operations famfs_context_ops = {
+	.free		= famfs_free_fc,
+	.parse_param	= famfs_parse_param,
+	.get_tree	= famfs_get_tree,
+};
+
+static int famfs_init_fs_context(struct fs_context *fc)
+{
+	struct famfs_fs_info *fsi;
+
+	fsi = kzalloc(sizeof(*fsi), GFP_KERNEL);
+	if (!fsi)
+		return -ENOMEM;
+
+	fsi->mount_opts.mode = FAMFS_DEFAULT_MODE;
+	fc->s_fs_info        = fsi;
+	fc->ops              = &famfs_context_ops;
+	return 0;
+}
+
+static void famfs_kill_sb(struct super_block *sb)
+{
+	struct famfs_fs_info *fsi = sb->s_fs_info;
+
+	if (fsi->dax_devp)
+		fs_put_dax(fsi->dax_devp, sb);
+	if (fsi && fsi->rootdev)
+		kfree(fsi->rootdev);
+	kfree(fsi);
+	sb->s_fs_info = NULL;
+
+	kill_char_super(sb); /* new */
+}
+
+#define MODULE_NAME "famfs"
+static struct file_system_type famfs_fs_type = {
+	.name		  = MODULE_NAME,
+	.init_fs_context  = famfs_init_fs_context,
+	.parameters	  = famfs_fs_parameters,
+	.kill_sb	  = famfs_kill_sb,
+	.fs_flags	  = FS_USERNS_MOUNT,
+};
+
+/******************************************************************************
+ * Module stuff
+ */
+static int __init init_famfs_fs(void)
+{
+	int rc;
+
+	rc = register_filesystem(&famfs_fs_type);
+
+	return rc;
+}
+
+static void
+__exit famfs_exit(void)
+{
+	unregister_filesystem(&famfs_fs_type);
+	pr_info("%s: unregistered\n", __func__);
+}
+
+fs_initcall(init_famfs_fs);
+module_exit(famfs_exit);
+
+MODULE_AUTHOR("John Groves, Micron Technology");
+MODULE_LICENSE("GPL");
diff --git a/fs/famfs/famfs_internal.h b/fs/famfs/famfs_internal.h
new file mode 100644
index 000000000000..951b32ec4fbd
--- /dev/null
+++ b/fs/famfs/famfs_internal.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * famfs - dax file system for shared fabric-attached memory
+ *
+ * Copyright 2023-2024 Micron Technology, Inc.
+ *
+ * This file system, originally based on ramfs the dax support from xfs,
+ * is intended to allow multiple host systems to mount a common file system
+ * view of dax files that map to shared memory.
+ */
+#ifndef FAMFS_INTERNAL_H
+#define FAMFS_INTERNAL_H
+
+struct famfs_mount_opts {
+	umode_t mode;
+};
+
+/**
+ * @famfs_fs_info
+ *
+ * @mount_opts: the mount options
+ * @dax_devp:   The underlying character devdax device
+ * @rootdev:    Dax device path used in mount
+ * @daxdevno:   Dax device dev_t
+ * @deverror:   True if the dax device has called our notify_failure entry
+ *              point, or if other "shutdown" conditions exist
+ */
+struct famfs_fs_info {
+	struct famfs_mount_opts  mount_opts;
+	struct dax_device       *dax_devp;
+	char                    *rootdev;
+	dev_t                    daxdevno;
+	bool                     deverror;
+};
+
+#endif /* FAMFS_INTERNAL_H */
diff --git a/fs/namei.c b/fs/namei.c
index c5b2a25be7d0..f24b268473cd 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3229,6 +3229,7 @@ bool may_open_dev(const struct path *path)
 	return !(path->mnt->mnt_flags & MNT_NODEV) &&
 		!(path->mnt->mnt_sb->s_iflags & SB_I_NODEV);
 }
+EXPORT_SYMBOL(may_open_dev);
 
 static int may_open(struct mnt_idmap *idmap, const struct path *path,
 		    int acc_mode, int flag)
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index 1b40a968ba91..e9bdd6a415e2 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -37,6 +37,7 @@
 #define HOSTFS_SUPER_MAGIC	0x00c0ffee
 #define OVERLAYFS_SUPER_MAGIC	0x794c7630
 #define FUSE_SUPER_MAGIC	0x65735546
+#define FAMFS_SUPER_MAGIC	0x87b282ff
 
 #define MINIX_SUPER_MAGIC	0x137F		/* minix v1 fs, 14 char names */
 #define MINIX_SUPER_MAGIC2	0x138F		/* minix v1 fs, 30 char names */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 09/12] famfs: Introduce inode_operations and super_operations
  2024-04-29 17:04 [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system John Groves
                   ` (7 preceding siblings ...)
  2024-04-29 17:04 ` [RFC PATCH v2 08/12] famfs: module operations & fs_context John Groves
@ 2024-04-29 17:04 ` John Groves
  2024-04-29 17:04 ` [RFC PATCH v2 10/12] famfs: Introduce file_operations read/write John Groves
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: John Groves @ 2024-04-29 17:04 UTC (permalink / raw)
  To: John Groves, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, Matthew Wilcox, linux-cxl, linux-fsdevel, nvdimm
  Cc: John Groves, john, Dave Chinner, Christoph Hellwig, dave.hansen,
	gregory.price, Randy Dunlap, Jerome Glisse, Aravind Ramesh,
	Ajay Joshi, Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang, John Groves

The famfs inode and super operations are pretty much generic.

This commit builds but is still too incomplete to run

Signed-off-by: John Groves <john@groves.net>
---
 fs/famfs/famfs_inode.c | 113 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 110 insertions(+), 3 deletions(-)

diff --git a/fs/famfs/famfs_inode.c b/fs/famfs/famfs_inode.c
index 61306240fc0b..e00e9cdecadf 100644
--- a/fs/famfs/famfs_inode.c
+++ b/fs/famfs/famfs_inode.c
@@ -28,6 +28,9 @@
 
 #define FAMFS_DEFAULT_MODE	0755
 
+static const struct inode_operations famfs_file_inode_operations;
+static const struct inode_operations famfs_dir_inode_operations;
+
 static struct inode *famfs_get_inode(struct super_block *sb,
 				     const struct inode *dir,
 				     umode_t mode, dev_t dev)
@@ -52,11 +55,11 @@ static struct inode *famfs_get_inode(struct super_block *sb,
 		init_special_inode(inode, mode, dev);
 		break;
 	case S_IFREG:
-		inode->i_op = NULL /* famfs_file_inode_operations */;
+		inode->i_op = &famfs_file_inode_operations;
 		inode->i_fop = NULL /* &famfs_file_operations */;
 		break;
 	case S_IFDIR:
-		inode->i_op = NULL /* famfs_dir_inode_operations */;
+		inode->i_op = &famfs_dir_inode_operations;
 		inode->i_fop = &simple_dir_operations;
 
 		/* Directory inodes start off with i_nlink == 2 (for ".") */
@@ -70,6 +73,110 @@ static struct inode *famfs_get_inode(struct super_block *sb,
 	return inode;
 }
 
+/***************************************************************************
+ * famfs inode_operations: these are currently pretty much boilerplate
+ */
+
+static const struct inode_operations famfs_file_inode_operations = {
+	/* All generic */
+	.setattr	   = simple_setattr,
+	.getattr	   = simple_getattr,
+};
+
+/*
+ * File creation. Allocate an inode, and we're done..
+ */
+static int
+famfs_mknod(struct mnt_idmap *idmap, struct inode *dir, struct dentry *dentry,
+	    umode_t mode, dev_t dev)
+{
+	struct famfs_fs_info *fsi = dir->i_sb->s_fs_info;
+	struct timespec64 tv;
+	struct inode *inode;
+
+	if (fsi->deverror)
+		return -ENODEV;
+
+	inode = famfs_get_inode(dir->i_sb, dir, mode, dev);
+	if (!inode)
+		return -ENOSPC;
+
+	d_instantiate(dentry, inode);
+	dget(dentry);	/* Extra count - pin the dentry in core */
+	tv = inode_set_ctime_current(inode);
+	inode_set_mtime_to_ts(inode, tv);
+	inode_set_atime_to_ts(inode, tv);
+
+	return 0;
+}
+
+static int famfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
+		       struct dentry *dentry, umode_t mode)
+{
+	struct famfs_fs_info *fsi = dir->i_sb->s_fs_info;
+	int rc;
+
+	if (fsi->deverror)
+		return -ENODEV;
+
+	rc = famfs_mknod(&nop_mnt_idmap, dir, dentry, mode | S_IFDIR, 0);
+	if (rc)
+		return rc;
+
+	inc_nlink(dir);
+
+	return 0;
+}
+
+static int famfs_create(struct mnt_idmap *idmap, struct inode *dir,
+			struct dentry *dentry, umode_t mode, bool excl)
+{
+	struct famfs_fs_info *fsi = dir->i_sb->s_fs_info;
+
+	if (fsi->deverror)
+		return -ENODEV;
+
+	return famfs_mknod(&nop_mnt_idmap, dir, dentry, mode | S_IFREG, 0);
+}
+
+static const struct inode_operations famfs_dir_inode_operations = {
+	.create		= famfs_create,
+	.lookup		= simple_lookup,
+	.link		= simple_link,
+	.unlink		= simple_unlink,
+	.mkdir		= famfs_mkdir,
+	.rmdir		= simple_rmdir,
+	.rename		= simple_rename,
+};
+
+/*****************************************************************************
+ * famfs super_operations
+ *
+ * TODO: implement a famfs_statfs() that shows size, free and available space,
+ * etc.
+ */
+
+/*
+ * famfs_show_options() - Display the mount options in /proc/mounts.
+ */
+static int famfs_show_options(struct seq_file *m, struct dentry *root)
+{
+	struct famfs_fs_info *fsi = root->d_sb->s_fs_info;
+
+	if (fsi->mount_opts.mode != FAMFS_DEFAULT_MODE)
+		seq_printf(m, ",mode=%o", fsi->mount_opts.mode);
+
+	return 0;
+}
+
+static const struct super_operations famfs_super_ops = {
+	.statfs		= simple_statfs,
+	.drop_inode	= generic_delete_inode,
+	.show_options	= famfs_show_options,
+};
+
+/*****************************************************************************/
+
 /*
  * famfs dax_operations  (for char dax)
  */
@@ -103,7 +210,7 @@ famfs_fill_super(struct super_block *sb, struct fs_context *fc)
 	sb->s_blocksize		= PAGE_SIZE;
 	sb->s_blocksize_bits	= PAGE_SHIFT;
 	sb->s_magic		= FAMFS_SUPER_MAGIC;
-	sb->s_op		= NULL /* famfs_super_ops */;
+	sb->s_op		= &famfs_super_ops;
 	sb->s_time_gran		= 1;
 
 	return rc;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 10/12] famfs: Introduce file_operations read/write
  2024-04-29 17:04 [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system John Groves
                   ` (8 preceding siblings ...)
  2024-04-29 17:04 ` [RFC PATCH v2 09/12] famfs: Introduce inode_operations and super_operations John Groves
@ 2024-04-29 17:04 ` John Groves
  2024-05-02 18:29   ` Al Viro
  2024-04-29 17:04 ` [RFC PATCH v2 11/12] famfs: Introduce mmap and VM fault handling John Groves
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: John Groves @ 2024-04-29 17:04 UTC (permalink / raw)
  To: John Groves, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, Matthew Wilcox, linux-cxl, linux-fsdevel, nvdimm
  Cc: John Groves, john, Dave Chinner, Christoph Hellwig, dave.hansen,
	gregory.price, Randy Dunlap, Jerome Glisse, Aravind Ramesh,
	Ajay Joshi, Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang, John Groves

This commit introduces fs/famfs/famfs_file.c and the famfs
file_operations for read/write.

This is not usable yet because:

* It calls dax_iomap_rw() with NULL iomap_ops (which will be
  introduced in a subsequent commit).
* famfs_ioctl() is coming in a later commit, and it is necessary
  to map a file to a memory allocation.

Signed-off-by: John Groves <john@groves.net>
---
 fs/famfs/Makefile         |   2 +-
 fs/famfs/famfs_file.c     | 122 ++++++++++++++++++++++++++++++++++++++
 fs/famfs/famfs_inode.c    |   2 +-
 fs/famfs/famfs_internal.h |   2 +
 4 files changed, 126 insertions(+), 2 deletions(-)
 create mode 100644 fs/famfs/famfs_file.c

diff --git a/fs/famfs/Makefile b/fs/famfs/Makefile
index 62230bcd6793..8cac90c090a4 100644
--- a/fs/famfs/Makefile
+++ b/fs/famfs/Makefile
@@ -2,4 +2,4 @@
 
 obj-$(CONFIG_FAMFS) += famfs.o
 
-famfs-y := famfs_inode.o
+famfs-y := famfs_inode.o famfs_file.o
diff --git a/fs/famfs/famfs_file.c b/fs/famfs/famfs_file.c
new file mode 100644
index 000000000000..48036c71d4ed
--- /dev/null
+++ b/fs/famfs/famfs_file.c
@@ -0,0 +1,122 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * famfs - dax file system for shared fabric-attached memory
+ *
+ * Copyright 2023-2024 Micron Technology, Inc.
+ *
+ * This file system, originally based on ramfs the dax support from xfs,
+ * is intended to allow multiple host systems to mount a common file system
+ * view of dax files that map to shared memory.
+ */
+
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/dax.h>
+#include <linux/iomap.h>
+
+#include "famfs_internal.h"
+
+/*********************************************************************
+ * file_operations
+ */
+
+/* Reject I/O to files that aren't in a valid state */
+static ssize_t
+famfs_file_invalid(struct inode *inode)
+{
+	if (!IS_DAX(inode)) {
+		pr_debug("%s: inode %llx IS_DAX is false\n", __func__, (u64)inode);
+		return -ENXIO;
+	}
+	return 0;
+}
+
+static ssize_t
+famfs_rw_prep(struct kiocb *iocb, struct iov_iter *ubuf)
+{
+	struct inode *inode = iocb->ki_filp->f_mapping->host;
+	struct super_block *sb = inode->i_sb;
+	struct famfs_fs_info *fsi = sb->s_fs_info;
+	size_t i_size = i_size_read(inode);
+	size_t count = iov_iter_count(ubuf);
+	size_t max_count;
+	ssize_t rc;
+
+	if (fsi->deverror)
+		return -ENODEV;
+
+	rc = famfs_file_invalid(inode);
+	if (rc)
+		return rc;
+
+	max_count = max_t(size_t, 0, i_size - iocb->ki_pos);
+
+	if (count > max_count)
+		iov_iter_truncate(ubuf, max_count);
+
+	if (!iov_iter_count(ubuf))
+		return 0;
+
+	return rc;
+}
+
+static ssize_t
+famfs_dax_read_iter(struct kiocb *iocb, struct iov_iter	*to)
+{
+	ssize_t rc;
+
+	rc = famfs_rw_prep(iocb, to);
+	if (rc)
+		return rc;
+
+	if (!iov_iter_count(to))
+		return 0;
+
+	rc = dax_iomap_rw(iocb, to, NULL /*&famfs_iomap_ops */);
+
+	file_accessed(iocb->ki_filp);
+	return rc;
+}
+
+/**
+ * famfs_dax_write_iter()
+ *
+ * We need our own write-iter in order to prevent append
+ *
+ * @iocb:
+ * @from: iterator describing the user memory source for the write
+ */
+static ssize_t
+famfs_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+	ssize_t rc;
+
+	rc = famfs_rw_prep(iocb, from);
+	if (rc)
+		return rc;
+
+	if (!iov_iter_count(from))
+		return 0;
+
+	return dax_iomap_rw(iocb, from, NULL /*&famfs_iomap_ops*/);
+}
+
+const struct file_operations famfs_file_operations = {
+	.owner             = THIS_MODULE,
+
+	/* Custom famfs operations */
+	.write_iter	   = famfs_dax_write_iter,
+	.read_iter	   = famfs_dax_read_iter,
+	.unlocked_ioctl    = NULL /*famfs_file_ioctl*/,
+	.mmap		   = NULL /* famfs_file_mmap */,
+
+	/* Force PMD alignment for mmap */
+	.get_unmapped_area = thp_get_unmapped_area,
+
+	/* Generic Operations */
+	.fsync		   = noop_fsync,
+	.splice_read	   = filemap_splice_read,
+	.splice_write	   = iter_file_splice_write,
+	.llseek		   = generic_file_llseek,
+};
+
diff --git a/fs/famfs/famfs_inode.c b/fs/famfs/famfs_inode.c
index e00e9cdecadf..490a2c0fd326 100644
--- a/fs/famfs/famfs_inode.c
+++ b/fs/famfs/famfs_inode.c
@@ -56,7 +56,7 @@ static struct inode *famfs_get_inode(struct super_block *sb,
 		break;
 	case S_IFREG:
 		inode->i_op = &famfs_file_inode_operations;
-		inode->i_fop = NULL /* &famfs_file_operations */;
+		inode->i_fop = &famfs_file_operations;
 		break;
 	case S_IFDIR:
 		inode->i_op = &famfs_dir_inode_operations;
diff --git a/fs/famfs/famfs_internal.h b/fs/famfs/famfs_internal.h
index 951b32ec4fbd..36efaef425e7 100644
--- a/fs/famfs/famfs_internal.h
+++ b/fs/famfs/famfs_internal.h
@@ -11,6 +11,8 @@
 #ifndef FAMFS_INTERNAL_H
 #define FAMFS_INTERNAL_H
 
+extern const struct file_operations famfs_file_operations;
+
 struct famfs_mount_opts {
 	umode_t mode;
 };
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 11/12] famfs: Introduce mmap and VM fault handling
  2024-04-29 17:04 [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system John Groves
                   ` (9 preceding siblings ...)
  2024-04-29 17:04 ` [RFC PATCH v2 10/12] famfs: Introduce file_operations read/write John Groves
@ 2024-04-29 17:04 ` John Groves
  2024-04-29 17:04 ` [RFC PATCH v2 12/12] famfs: famfs_ioctl and core file-to-memory mapping logic & iomap_ops John Groves
  2024-04-29 18:32 ` [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system Matthew Wilcox
  12 siblings, 0 replies; 32+ messages in thread
From: John Groves @ 2024-04-29 17:04 UTC (permalink / raw)
  To: John Groves, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, Matthew Wilcox, linux-cxl, linux-fsdevel, nvdimm
  Cc: John Groves, john, Dave Chinner, Christoph Hellwig, dave.hansen,
	gregory.price, Randy Dunlap, Jerome Glisse, Aravind Ramesh,
	Ajay Joshi, Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang, John Groves

This commit adds vm_operations, plus famfs_mmap() and fault handlers.
It is still missing iomap_ops, iomap mapping resolution, and
famfs_ioctl() for setting up file-to-memory mappings.

Signed-off-by: John Groves <john@groves.net>
---
 fs/famfs/famfs_file.c | 108 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 106 insertions(+), 2 deletions(-)

diff --git a/fs/famfs/famfs_file.c b/fs/famfs/famfs_file.c
index 48036c71d4ed..585b776dd73c 100644
--- a/fs/famfs/famfs_file.c
+++ b/fs/famfs/famfs_file.c
@@ -16,6 +16,88 @@
 
 #include "famfs_internal.h"
 
+/*********************************************************************
+ * vm_operations
+ */
+static vm_fault_t
+__famfs_filemap_fault(struct vm_fault *vmf, unsigned int pe_size,
+		      bool write_fault)
+{
+	struct inode *inode = file_inode(vmf->vma->vm_file);
+	struct super_block *sb = inode->i_sb;
+	struct famfs_fs_info *fsi = sb->s_fs_info;
+	vm_fault_t ret;
+	pfn_t pfn;
+
+	if (fsi->deverror)
+		return VM_FAULT_SIGBUS;
+
+	if (!IS_DAX(file_inode(vmf->vma->vm_file))) {
+		pr_err("%s: file not marked IS_DAX!!\n", __func__);
+		return VM_FAULT_SIGBUS;
+	}
+
+	if (write_fault) {
+		sb_start_pagefault(inode->i_sb);
+		file_update_time(vmf->vma->vm_file);
+	}
+
+	ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, NULL /*&famfs_iomap_ops */);
+	if (ret & VM_FAULT_NEEDDSYNC)
+		ret = dax_finish_sync_fault(vmf, pe_size, pfn);
+
+	if (write_fault)
+		sb_end_pagefault(inode->i_sb);
+
+	return ret;
+}
+
+static inline bool
+famfs_is_write_fault(struct vm_fault *vmf)
+{
+	return (vmf->flags & FAULT_FLAG_WRITE) &&
+	       (vmf->vma->vm_flags & VM_SHARED);
+}
+
+static vm_fault_t
+famfs_filemap_fault(struct vm_fault *vmf)
+{
+	return __famfs_filemap_fault(vmf, 0, famfs_is_write_fault(vmf));
+}
+
+static vm_fault_t
+famfs_filemap_huge_fault(struct vm_fault *vmf, unsigned int pe_size)
+{
+	return __famfs_filemap_fault(vmf, pe_size, famfs_is_write_fault(vmf));
+}
+
+static vm_fault_t
+famfs_filemap_page_mkwrite(struct vm_fault *vmf)
+{
+	return __famfs_filemap_fault(vmf, 0, true);
+}
+
+static vm_fault_t
+famfs_filemap_pfn_mkwrite(struct vm_fault *vmf)
+{
+	return __famfs_filemap_fault(vmf, 0, true);
+}
+
+static vm_fault_t
+famfs_filemap_map_pages(struct vm_fault	*vmf, pgoff_t start_pgoff,
+			pgoff_t	end_pgoff)
+{
+	return filemap_map_pages(vmf, start_pgoff, end_pgoff);
+}
+
+const struct vm_operations_struct famfs_file_vm_ops = {
+	.fault		= famfs_filemap_fault,
+	.huge_fault	= famfs_filemap_huge_fault,
+	.map_pages	= famfs_filemap_map_pages,
+	.page_mkwrite	= famfs_filemap_page_mkwrite,
+	.pfn_mkwrite	= famfs_filemap_pfn_mkwrite,
+};
+
 /*********************************************************************
  * file_operations
  */
@@ -25,7 +107,8 @@ static ssize_t
 famfs_file_invalid(struct inode *inode)
 {
 	if (!IS_DAX(inode)) {
-		pr_debug("%s: inode %llx IS_DAX is false\n", __func__, (u64)inode);
+		pr_debug("%s: inode %llx IS_DAX is false\n",
+			 __func__, (u64)inode);
 		return -ENXIO;
 	}
 	return 0;
@@ -101,6 +184,27 @@ famfs_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	return dax_iomap_rw(iocb, from, NULL /*&famfs_iomap_ops*/);
 }
 
+static int
+famfs_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct inode *inode = file_inode(file);
+	struct super_block *sb = inode->i_sb;
+	struct famfs_fs_info *fsi = sb->s_fs_info;
+	ssize_t rc;
+
+	if (fsi->deverror)
+		return -ENODEV;
+
+	rc = famfs_file_invalid(inode);
+	if (rc)
+		return (int)rc;
+
+	file_accessed(file);
+	vma->vm_ops = &famfs_file_vm_ops;
+	vm_flags_set(vma, VM_HUGEPAGE);
+	return 0;
+}
+
 const struct file_operations famfs_file_operations = {
 	.owner             = THIS_MODULE,
 
@@ -108,7 +212,7 @@ const struct file_operations famfs_file_operations = {
 	.write_iter	   = famfs_dax_write_iter,
 	.read_iter	   = famfs_dax_read_iter,
 	.unlocked_ioctl    = NULL /*famfs_file_ioctl*/,
-	.mmap		   = NULL /* famfs_file_mmap */,
+	.mmap		   = famfs_file_mmap,
 
 	/* Force PMD alignment for mmap */
 	.get_unmapped_area = thp_get_unmapped_area,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 12/12] famfs: famfs_ioctl and core file-to-memory mapping logic & iomap_ops
  2024-04-29 17:04 [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system John Groves
                   ` (10 preceding siblings ...)
  2024-04-29 17:04 ` [RFC PATCH v2 11/12] famfs: Introduce mmap and VM fault handling John Groves
@ 2024-04-29 17:04 ` John Groves
  2024-04-29 18:32 ` [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system Matthew Wilcox
  12 siblings, 0 replies; 32+ messages in thread
From: John Groves @ 2024-04-29 17:04 UTC (permalink / raw)
  To: John Groves, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, Matthew Wilcox, linux-cxl, linux-fsdevel, nvdimm
  Cc: John Groves, john, Dave Chinner, Christoph Hellwig, dave.hansen,
	gregory.price, Randy Dunlap, Jerome Glisse, Aravind Ramesh,
	Ajay Joshi, Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang, John Groves

* Add uapi include file famfs_ioctl.h. The famfs user space uses ioctl on
  individual files to pass in mapping information and file size. This
  would be hard to do via sysfs or other means, since it's file-specific.
* Add the per-file ioctl function famfs_file_ioctl() into
  struct file_operations, and introduces the famfs_file_init_dax()
  function (which is called by famfs_file_ioct())
* Add the famfs iomap_ops. When either dax_iomap_fault() or dax_iomap_rw()
  is called, we get a callback via our iomap_begin() handler. The question
  being asked is "please resolve (file, offset) to (daxdev, offset)". The
  function famfs_meta_to_dax_offset() does this.
* Expose the famfs ABI version as
  /sys/module/famfs/parameters/famfs_kabi_version

The current ioctls are:

FAMFS_IOC_MAP_CREATE - famfs_file_init_dax() associates a dax extent
                       list with a file, making it into a proper famfs
                       file.Starting with an empty file (which is not
                       useful), This turns the file into a DAX file backed
                       by the specified extent list from devdax memory.
FAMFSIOC_NOP         - A convenient way for user space to verify it's a
                       famfs file
FAMFSIOC_MAP_GET     - Get the header of the metadata for a file
FAMFSIOC_MAP_GETEXT  - Get the extents for a file

The last two, together, are comparable to xfs_bmap. Our user space tools
use them primarly in testing.

Signed-off-by: John Groves <john@groves.net>
---
 MAINTAINERS                      |   1 +
 fs/famfs/famfs_file.c            | 391 ++++++++++++++++++++++++++++++-
 fs/famfs/famfs_internal.h        |  14 ++
 include/uapi/linux/famfs_ioctl.h |  61 +++++
 4 files changed, 461 insertions(+), 6 deletions(-)
 create mode 100644 include/uapi/linux/famfs_ioctl.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 365d678e2f40..29d81be488bc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8189,6 +8189,7 @@ L:	linux-fsdevel@vger.kernel.org
 S:	Supported
 F:	Documentation/filesystems/famfs.rst
 F:	fs/famfs
+F:	include/uapi/linux/famfs_ioctl.h
 
 FANOTIFY
 M:	Jan Kara <jack@suse.cz>
diff --git a/fs/famfs/famfs_file.c b/fs/famfs/famfs_file.c
index 585b776dd73c..ac34e606ca1b 100644
--- a/fs/famfs/famfs_file.c
+++ b/fs/famfs/famfs_file.c
@@ -14,8 +14,371 @@
 #include <linux/dax.h>
 #include <linux/iomap.h>
 
+#include <linux/famfs_ioctl.h>
 #include "famfs_internal.h"
 
+/* Expose famfs kernel abi version as a read-only module parameter */
+static int famfs_kabi_version = FAMFS_KABI_VERSION;
+module_param(famfs_kabi_version, int, 0444);
+MODULE_PARM_DESC(famfs_kabi_version, "famfs kernel abi version");
+
+/**
+ * famfs_meta_alloc() - Allocate famfs file metadata
+ * @metap:       Pointer to an mcache_map_meta pointer
+ * @ext_count:  The number of extents needed
+ */
+static int
+famfs_meta_alloc(struct famfs_file_meta **metap, size_t ext_count)
+{
+	struct famfs_file_meta *meta;
+
+	meta = kzalloc(struct_size(meta, tfs_extents, ext_count), GFP_KERNEL);
+	if (!meta)
+		return -ENOMEM;
+
+	meta->tfs_extent_ct = ext_count;
+	meta->error = false;
+	*metap = meta;
+
+	return 0;
+}
+
+static void
+famfs_meta_free(struct famfs_file_meta *map)
+{
+	kfree(map);
+}
+
+/**
+ * famfs_file_init_dax() - FAMFSIOC_MAP_CREATE ioctl handler
+ * @file: the un-initialized file
+ * @arg:  ptr to struct mcioc_map in user space
+ *
+ * Setup the dax mapping for a file. Files are created empty, and then function
+ * is called by famfs_file_ioctl() to setup the mapping and set the file size.
+ */
+static int
+famfs_file_init_dax(struct file *file, void __user *arg)
+{
+	struct famfs_file_meta *meta = NULL;
+	struct famfs_ioc_map imap;
+	struct famfs_fs_info *fsi;
+	size_t extent_total = 0;
+	int alignment_errs = 0;
+	struct super_block *sb;
+	struct inode *inode;
+	size_t ext_count;
+	int rc;
+	int i;
+
+	inode = file_inode(file);
+	if (!inode) {
+		rc = -EBADF;
+		goto errout;
+	}
+
+	sb  = inode->i_sb;
+	fsi = sb->s_fs_info;
+	if (fsi->deverror)
+		return -ENODEV;
+
+	rc = copy_from_user(&imap, arg, sizeof(imap));
+	if (rc)
+		return -EFAULT;
+
+	ext_count = imap.ext_list_count;
+	if (ext_count < 1) {
+		rc = -ENOSPC;
+		goto errout;
+	}
+
+	if (ext_count > FAMFS_MAX_EXTENTS) {
+		rc = -E2BIG;
+		goto errout;
+	}
+
+	rc = famfs_meta_alloc(&meta, ext_count);
+	if (rc)
+		goto errout;
+
+	meta->file_type = imap.file_type;
+	meta->file_size = imap.file_size;
+
+	/* Fill in the internal file metadata structure */
+	for (i = 0; i < imap.ext_list_count; i++) {
+		size_t len;
+		off_t  offset;
+
+		offset = imap.ext_list[i].offset;
+		len    = imap.ext_list[i].len;
+
+		extent_total += len;
+
+		if (WARN_ON(offset == 0 && meta->file_type != FAMFS_SUPERBLOCK)) {
+			rc = -EINVAL;
+			goto errout;
+		}
+
+		meta->tfs_extents[i].offset = offset;
+		meta->tfs_extents[i].len    = len;
+
+		/* All extent addresses/offsets must be 2MiB aligned,
+		 * and all but the last length must be a 2MiB multiple.
+		 */
+		if (!IS_ALIGNED(offset, PMD_SIZE)) {
+			pr_err("%s: error ext %d hpa %lx not aligned\n",
+			       __func__, i, offset);
+			alignment_errs++;
+		}
+		if (i < (imap.ext_list_count - 1) && !IS_ALIGNED(len, PMD_SIZE)) {
+			pr_err("%s: error ext %d length %ld not aligned\n",
+			       __func__, i, len);
+			alignment_errs++;
+		}
+	}
+
+	/*
+	 * File size can be <= ext list size, since extent sizes are constrained
+	 * to PMD multiples
+	 */
+	if (imap.file_size > extent_total) {
+		pr_err("%s: file size %lld larger than ext list size %lld\n",
+		       __func__, (u64)imap.file_size, (u64)extent_total);
+		rc = -EINVAL;
+		goto errout;
+	}
+
+	if (alignment_errs > 0) {
+		pr_err("%s: there were %d alignment errors in the extent list\n",
+		       __func__, alignment_errs);
+		rc = -EINVAL;
+		goto errout;
+	}
+
+	/* Publish the famfs metadata on inode->i_private */
+	inode_lock(inode);
+	if (inode->i_private) {
+		rc = -EEXIST; /* file already has famfs metadata */
+	} else {
+		inode->i_private = meta;
+		i_size_write(inode, imap.file_size);
+		inode->i_flags |= S_DAX;
+	}
+	inode_unlock(inode);
+
+ errout:
+	if (rc)
+		famfs_meta_free(meta);
+
+	return rc;
+}
+
+/**
+ * famfs_file_ioctl() - Top-level famfs file ioctl handler
+ * @file: the file
+ * @cmd:  ioctl opcode
+ * @arg:  ioctl opcode argument (if any)
+ */
+static long
+famfs_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	struct inode *inode = file_inode(file);
+	struct famfs_fs_info *fsi = inode->i_sb->s_fs_info;
+	long rc;
+
+	if (fsi->deverror && (cmd != FAMFSIOC_NOP))
+		return -ENODEV;
+
+	switch (cmd) {
+	case FAMFSIOC_NOP:
+		rc = 0;
+		break;
+
+	case FAMFSIOC_MAP_CREATE:
+		rc = famfs_file_init_dax(file, (void *)arg);
+		break;
+
+	case FAMFSIOC_MAP_GET: {
+		struct inode *inode = file_inode(file);
+		struct famfs_file_meta *meta = inode->i_private;
+		struct famfs_ioc_map umeta;
+
+		memset(&umeta, 0, sizeof(umeta));
+
+		if (meta) {
+			/* TODO: do more to harmonize these structures */
+			umeta.extent_type    = meta->tfs_extent_type;
+			umeta.file_size      = i_size_read(inode);
+			umeta.ext_list_count = meta->tfs_extent_ct;
+
+			rc = copy_to_user((void __user *)arg, &umeta,
+					  sizeof(umeta));
+			if (rc)
+				pr_err("%s: copy_to_user returned %ld\n",
+				       __func__, rc);
+
+		} else {
+			rc = -EINVAL;
+		}
+		break;
+	}
+	case FAMFSIOC_MAP_GETEXT: {
+		struct inode *inode = file_inode(file);
+		struct famfs_file_meta *meta = inode->i_private;
+
+		if (meta)
+			rc = copy_to_user((void __user *)arg, meta->tfs_extents,
+			      meta->tfs_extent_ct * sizeof(struct famfs_extent));
+		else
+			rc = -EINVAL;
+		break;
+	}
+	default:
+		rc = -ENOTTY;
+		break;
+	}
+
+	return rc;
+}
+
+/*********************************************************************
+ * iomap_operations
+ *
+ * This stuff uses the iomap (dax-related) helpers to resolve file offsets to
+ * offsets within a dax device.
+ */
+
+static ssize_t famfs_file_invalid(struct inode *inode);
+
+/**
+ * famfs_meta_to_dax_offset() - Resolve (file, offset, len) to (daxdev, offset, len)
+ *
+ * This function is called by famfs_iomap_begin() to resolve an offset in a
+ * file to an offset in a dax device. This is upcalled from dax from calls to
+ * both  * dax_iomap_fault() and dax_iomap_rw(). Dax finishes the job resolving
+ * a fault to a specific physical page (the fault case) or doing a memcpy
+ * variant (the rw case)
+ *
+ * Pages can be PTE (4k), PMD (2MiB) or (theoretically) PuD (1GiB)
+ * (these sizes are for X86; may vary on other cpu architectures
+ *
+ * @inode:  The file where the fault occurred
+ * @iomap:       To be filled in to indicate where to find the right memory,
+ *               relative  to a dax device.
+ * @file_offset: Within the file where the fault occurred (will be page boundary)
+ * @len:         The length of the faulted mapping (will be a page multiple)
+ *               (will be trimmed in *iomap if it's disjoint in the extent list)
+ * @flags:
+ *
+ * Return values: 0. (info is returned in a modified @iomap struct)
+ */
+static int
+famfs_meta_to_dax_offset(struct inode *inode, struct iomap *iomap,
+			 loff_t file_offset, off_t len, unsigned int flags)
+{
+	struct famfs_file_meta *meta = inode->i_private;
+	int i;
+	loff_t local_offset = file_offset;
+	struct famfs_fs_info  *fsi = inode->i_sb->s_fs_info;
+
+	if (fsi->deverror || famfs_file_invalid(inode))
+		goto err_out;
+
+	iomap->offset = file_offset;
+
+	for (i = 0; i < meta->tfs_extent_ct; i++) {
+		loff_t dax_ext_offset = meta->tfs_extents[i].offset;
+		loff_t dax_ext_len    = meta->tfs_extents[i].len;
+
+		if ((dax_ext_offset == 0) &&
+		    (meta->file_type != FAMFS_SUPERBLOCK))
+			pr_warn("%s: zero offset on non-superblock file!!\n",
+				__func__);
+
+		/* local_offset is the offset minus the size of extents skipped
+		 * so far; If local_offset < dax_ext_len, the data of interest
+		 * starts in this extent
+		 */
+		if (local_offset < dax_ext_len) {
+			loff_t ext_len_remainder = dax_ext_len - local_offset;
+
+			/*
+			 * OK, we found the file metadata extent where this
+			 * data begins
+			 * @local_offset      - The offset within the current
+			 *                      extent
+			 * @ext_len_remainder - Remaining length of ext after
+			 *                      skipping local_offset
+			 * Outputs:
+			 * iomap->addr:   the offset within the dax device where
+			 *                the  data starts
+			 * iomap->offset: the file offset
+			 * iomap->length: the valid length resolved here
+			 */
+			iomap->addr    = dax_ext_offset + local_offset;
+			iomap->offset  = file_offset;
+			iomap->length  = min_t(loff_t, len, ext_len_remainder);
+			iomap->dax_dev = fsi->dax_devp;
+			iomap->type    = IOMAP_MAPPED;
+			iomap->flags   = flags;
+
+			return 0;
+		}
+		local_offset -= dax_ext_len; /* Get ready for the next extent */
+	}
+
+ err_out:
+	/* We fell out the end of the extent list.
+	 * Set iomap to zero length in this case, and return 0
+	 * This just means that the r/w is past EOF
+	 */
+	iomap->addr    = 0; /* there is no valid dax device offset */
+	iomap->offset  = file_offset; /* file offset */
+	iomap->length  = 0; /* this had better result in no access to dax mem */
+	iomap->dax_dev = fsi->dax_devp;
+	iomap->type    = IOMAP_MAPPED;
+	iomap->flags   = flags;
+
+	return 0;
+}
+
+/**
+ * famfs_iomap_begin() - Handler for iomap_begin upcall from dax
+ *
+ * This function is pretty simple because files are
+ * * never partially allocated
+ * * never have holes (never sparse)
+ * * never "allocate on write"
+ *
+ * @inode:  inode for the file being accessed
+ * @offset: offset within the file
+ * @length: Length being accessed at offset
+ * @flags:
+ * @iomap:  iomap struct to be filled in, resolving (offset, length) to
+ *          (daxdev, offset, len)
+ * @srcmap:
+ */
+static int
+famfs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
+		  unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
+{
+	struct famfs_file_meta *meta = inode->i_private;
+	size_t size;
+
+	size = i_size_read(inode);
+
+	WARN_ON(size != meta->file_size);
+
+	return famfs_meta_to_dax_offset(inode, iomap, offset, length, flags);
+}
+
+/* Note: We never need a special set of write_iomap_ops because famfs never
+ * performs allocation on write.
+ */
+const struct iomap_ops famfs_iomap_ops = {
+	.iomap_begin		= famfs_iomap_begin,
+};
+
 /*********************************************************************
  * vm_operations
  */
@@ -42,7 +405,7 @@ __famfs_filemap_fault(struct vm_fault *vmf, unsigned int pe_size,
 		file_update_time(vmf->vma->vm_file);
 	}
 
-	ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, NULL /*&famfs_iomap_ops */);
+	ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &famfs_iomap_ops);
 	if (ret & VM_FAULT_NEEDDSYNC)
 		ret = dax_finish_sync_fault(vmf, pe_size, pfn);
 
@@ -106,9 +469,25 @@ const struct vm_operations_struct famfs_file_vm_ops = {
 static ssize_t
 famfs_file_invalid(struct inode *inode)
 {
+	struct famfs_file_meta *meta = inode->i_private;
+	size_t i_size = i_size_read(inode);
+
+	if (!meta) {
+		pr_debug("%s: un-initialized famfs file\n", __func__);
+		return -EIO;
+	}
+	if (meta->error) {
+		pr_debug("%s: previously detected metadata errors\n", __func__);
+		return -EIO;
+	}
+	if (i_size != meta->file_size) {
+		pr_warn("%s: i_size overwritten from %ld to %ld\n",
+		       __func__, meta->file_size, i_size);
+		meta->error = true;
+		return -ENXIO;
+	}
 	if (!IS_DAX(inode)) {
-		pr_debug("%s: inode %llx IS_DAX is false\n",
-			 __func__, (u64)inode);
+		pr_debug("%s: inode %llx IS_DAX is false\n", __func__, (u64)inode);
 		return -ENXIO;
 	}
 	return 0;
@@ -155,7 +534,7 @@ famfs_dax_read_iter(struct kiocb *iocb, struct iov_iter	*to)
 	if (!iov_iter_count(to))
 		return 0;
 
-	rc = dax_iomap_rw(iocb, to, NULL /*&famfs_iomap_ops */);
+	rc = dax_iomap_rw(iocb, to, &famfs_iomap_ops);
 
 	file_accessed(iocb->ki_filp);
 	return rc;
@@ -181,7 +560,7 @@ famfs_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (!iov_iter_count(from))
 		return 0;
 
-	return dax_iomap_rw(iocb, from, NULL /*&famfs_iomap_ops*/);
+	return dax_iomap_rw(iocb, from, &famfs_iomap_ops);
 }
 
 static int
@@ -211,7 +590,7 @@ const struct file_operations famfs_file_operations = {
 	/* Custom famfs operations */
 	.write_iter	   = famfs_dax_write_iter,
 	.read_iter	   = famfs_dax_read_iter,
-	.unlocked_ioctl    = NULL /*famfs_file_ioctl*/,
+	.unlocked_ioctl    = famfs_file_ioctl,
 	.mmap		   = famfs_file_mmap,
 
 	/* Force PMD alignment for mmap */
diff --git a/fs/famfs/famfs_internal.h b/fs/famfs/famfs_internal.h
index 36efaef425e7..a45757d4cdea 100644
--- a/fs/famfs/famfs_internal.h
+++ b/fs/famfs/famfs_internal.h
@@ -11,8 +11,22 @@
 #ifndef FAMFS_INTERNAL_H
 #define FAMFS_INTERNAL_H
 
+#include <linux/famfs_ioctl.h>
+
 extern const struct file_operations famfs_file_operations;
 
+/*
+ * Each famfs dax file has this hanging from its inode->i_private.
+ */
+struct famfs_file_meta {
+	bool                   error;
+	enum famfs_file_type   file_type;
+	size_t                 file_size;
+	enum famfs_extent_type tfs_extent_type;
+	size_t                 tfs_extent_ct;
+	struct famfs_extent    tfs_extents[];
+};
+
 struct famfs_mount_opts {
 	umode_t mode;
 };
diff --git a/include/uapi/linux/famfs_ioctl.h b/include/uapi/linux/famfs_ioctl.h
new file mode 100644
index 000000000000..97ff5a2a8d13
--- /dev/null
+++ b/include/uapi/linux/famfs_ioctl.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * famfs - dax file system for shared fabric-attached memory
+ *
+ * Copyright 2023-2024 Micron Technology, Inc.
+ *
+ * This file system, originally based on ramfs the dax support from xfs,
+ * is intended to allow multiple host systems to mount a common file system
+ * view of dax files that map to shared memory.
+ */
+#ifndef FAMFS_IOCTL_H
+#define FAMFS_IOCTL_H
+
+#include <linux/ioctl.h>
+#include <linux/uuid.h>
+
+#define FAMFS_KABI_VERSION 42
+#define FAMFS_MAX_EXTENTS 2
+
+/* We anticipate the possiblity of supporting additional types of extents */
+enum famfs_extent_type {
+	SIMPLE_DAX_EXTENT,
+	INVALID_EXTENT_TYPE,
+};
+
+struct famfs_extent {
+	__u64              offset;
+	__u64              len;
+};
+
+enum famfs_file_type {
+	FAMFS_REG,
+	FAMFS_SUPERBLOCK,
+	FAMFS_LOG,
+};
+
+/**
+ * struct famfs_ioc_map - the famfs per-file metadata structure
+ * @extent_type: what type of extents are in this ext_list
+ * @file_type: Mark the superblock and log as special files. Maybe more later.
+ * @file_size: Size of the file, which is <= the size of the ext_list
+ * @ext_list_count: Number of extents
+ * @ext_list: 1 or more extents
+ */
+struct famfs_ioc_map {
+	enum famfs_extent_type    extent_type;
+	enum famfs_file_type      file_type;
+	__u64                     file_size;
+	__u64                     ext_list_count;
+	struct famfs_extent       ext_list[FAMFS_MAX_EXTENTS];
+};
+
+#define FAMFSIOC_MAGIC 'u'
+
+/* famfs file ioctl opcodes */
+#define FAMFSIOC_MAP_CREATE    _IOW(FAMFSIOC_MAGIC, 0x50, struct famfs_ioc_map)
+#define FAMFSIOC_MAP_GET       _IOR(FAMFSIOC_MAGIC, 0x51, struct famfs_ioc_map)
+#define FAMFSIOC_MAP_GETEXT    _IOR(FAMFSIOC_MAGIC, 0x52, struct famfs_extent)
+#define FAMFSIOC_NOP           _IO(FAMFSIOC_MAGIC,  0x53)
+
+#endif /* FAMFS_IOCTL_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system
  2024-04-29 17:04 [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system John Groves
                   ` (11 preceding siblings ...)
  2024-04-29 17:04 ` [RFC PATCH v2 12/12] famfs: famfs_ioctl and core file-to-memory mapping logic & iomap_ops John Groves
@ 2024-04-29 18:32 ` Matthew Wilcox
  2024-04-29 23:08   ` Kent Overstreet
  2024-04-30  2:11   ` John Groves
  12 siblings, 2 replies; 32+ messages in thread
From: Matthew Wilcox @ 2024-04-29 18:32 UTC (permalink / raw)
  To: John Groves
  Cc: Jonathan Corbet, Jonathan Cameron, Dan Williams, Vishal Verma,
	Dave Jiang, Alexander Viro, Christian Brauner, Jan Kara,
	linux-cxl, linux-fsdevel, nvdimm, John Groves, john,
	Dave Chinner, Christoph Hellwig, dave.hansen, gregory.price,
	Randy Dunlap, Jerome Glisse, Aravind Ramesh, Ajay Joshi,
	Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang

On Mon, Apr 29, 2024 at 12:04:16PM -0500, John Groves wrote:
> This patch set introduces famfs[1] - a special-purpose fs-dax file system
> for sharable disaggregated or fabric-attached memory (FAM). Famfs is not
> CXL-specific in anyway way.
> 
> * Famfs creates a simple access method for storing and sharing data in
>   sharable memory. The memory is exposed and accessed as memory-mappable
>   dax files.
> * Famfs supports multiple hosts mounting the same file system from the
>   same memory (something existing fs-dax file systems don't do).

Yes, but we do already have two filesystems that support shared storage,
and are rather more advanced than famfs -- GFS2 and OCFS2.  What are
the pros and cons of improving either of those to support DAX rather
than starting again with a new filesystem?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system
  2024-04-29 18:32 ` [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system Matthew Wilcox
@ 2024-04-29 23:08   ` Kent Overstreet
  2024-04-30  2:24     ` John Groves
  2024-04-30  2:11   ` John Groves
  1 sibling, 1 reply; 32+ messages in thread
From: Kent Overstreet @ 2024-04-29 23:08 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: John Groves, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, linux-cxl, linux-fsdevel, nvdimm, John Groves, john,
	Dave Chinner, Christoph Hellwig, dave.hansen, gregory.price,
	Randy Dunlap, Jerome Glisse, Aravind Ramesh, Ajay Joshi,
	Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Steve French, Nathan Lynch, Michael Ellerman,
	Thomas Zimmermann, Julien Panis, Stanislav Fomichev,
	Dongsheng Yang

On Mon, Apr 29, 2024 at 07:32:55PM +0100, Matthew Wilcox wrote:
> On Mon, Apr 29, 2024 at 12:04:16PM -0500, John Groves wrote:
> > This patch set introduces famfs[1] - a special-purpose fs-dax file system
> > for sharable disaggregated or fabric-attached memory (FAM). Famfs is not
> > CXL-specific in anyway way.
> > 
> > * Famfs creates a simple access method for storing and sharing data in
> >   sharable memory. The memory is exposed and accessed as memory-mappable
> >   dax files.
> > * Famfs supports multiple hosts mounting the same file system from the
> >   same memory (something existing fs-dax file systems don't do).
> 
> Yes, but we do already have two filesystems that support shared storage,
> and are rather more advanced than famfs -- GFS2 and OCFS2.  What are
> the pros and cons of improving either of those to support DAX rather
> than starting again with a new filesystem?

I could see a shared memory filesystem as being a completely different
beast than a shared block storage filesystem - and I've never heard
anyone talking about gfs2 or ocfs2 as codebases we particularly liked.

This looks like it might not even be persistent? Does it survive a
reboot? If not, that means it'll be much smaller than a conventional
filesystem.

But yeah, a bit more on where this is headed would be nice.

Another concern is that every filesystem tends to be another huge
monolithic codebase without a lot of code sharing between them - how
much are we going to be adding in the end?

Can we start looking for more code sharing, more library code to factor
out?

Some description of the internal data structures would really help here.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system
  2024-04-29 18:32 ` [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system Matthew Wilcox
  2024-04-29 23:08   ` Kent Overstreet
@ 2024-04-30  2:11   ` John Groves
  2024-04-30 21:01     ` Matthew Wilcox
  1 sibling, 1 reply; 32+ messages in thread
From: John Groves @ 2024-04-30  2:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jonathan Corbet, Jonathan Cameron, Dan Williams, Vishal Verma,
	Dave Jiang, Alexander Viro, Christian Brauner, Jan Kara,
	linux-cxl, linux-fsdevel, nvdimm, John Groves, john,
	Dave Chinner, Christoph Hellwig, dave.hansen, gregory.price,
	Randy Dunlap, Jerome Glisse, Aravind Ramesh, Ajay Joshi,
	Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang

On 24/04/29 07:32PM, Matthew Wilcox wrote:
> On Mon, Apr 29, 2024 at 12:04:16PM -0500, John Groves wrote:
> > This patch set introduces famfs[1] - a special-purpose fs-dax file system
> > for sharable disaggregated or fabric-attached memory (FAM). Famfs is not
> > CXL-specific in anyway way.
> > 
> > * Famfs creates a simple access method for storing and sharing data in
> >   sharable memory. The memory is exposed and accessed as memory-mappable
> >   dax files.
> > * Famfs supports multiple hosts mounting the same file system from the
> >   same memory (something existing fs-dax file systems don't do).
> 
> Yes, but we do already have two filesystems that support shared storage,
> and are rather more advanced than famfs -- GFS2 and OCFS2.  What are
> the pros and cons of improving either of those to support DAX rather
> than starting again with a new filesystem?
> 

Thanks for paying attention to this Willy.

This is a fair question; I'll share some thoughts on the rationale, but it's
probably something that should be an ongoing dialog. We already have a LSFMM
session planned that will discuss whether the famfs functionality should be
merged into fuse, but GFS2 and OCFS2 are also potential candidates.

(I've already seen Kent's reply and will get to that next)

I work for a memory company, and the motivation here is to make disaggregated
shared memory practically usable. Any approach that moves in that direction 
is goodness as far as we're concerned -- provided it doesn't insert years of 
delay. 

Some thoughts on famfs:

* Famfs is not, not, not a general purpose file system.
* One can think of famfs as a shared memory allocator where allocations can be
  accessed as files. For certain data analytics work flows (especially 
  involving Apache Arrow data frames) this is really powerful. Consumers of
  data frames commonly use mmap(MAP_SHARED), and can benefit from the memory
  de-duplication of shared memory and don't need any new abstractions.
* Famfs is not really a data storage tool. It's more of a shared-memroy 
  allocation tool that has the benefit of allocations being accesssible 
  (and memory-mappable) as files. So a lot of software can automatically use 
  it.
* Famfs is oriented to dumping sharable data into files and then allowing a
  scale-out cluster to share it (often read-only) to access a single copy in
  shared memory.
* Although this audience probably already understands this, please forgive me
  for putting a fine point on it: memory mapping a famfs/fs-dax file does 
  not use system-ram as a cache - it directly accesses the memory associated 
  with a file. This would be true of all file systems with proper fs-dax 
  support (of which there are not many, and currently only famfs that supports
  shared access to media/memory).

Some thoughts on shared-storage file systems:

* I'm no expert on GFS2 or OCFS2, but I've been around memory, file systems 
  and storage since well before the turn of the century...
* If you had brought up the existing fs-dax file systems, I would have pointed
  that they use write-back metadata, which does not reconcile with shared
  access to media - but these file systems do handle that.
* The shared media file systems are still oriented to block devices that
  provide durable storage and page-oriented access. CXL DRAM is a character 
  dax (devdax) device and does not provide durable storage.
* fs-dax-style memory mapping for volatile cxl memory requires the 
  dev_dax_iomap portion of this patch set - or something similar. 
* A scale-out shared media file system presumably requires some commitment to
  configure and manage some complexity in a distributed environment; whether
  that should be mandatory for enablement of shared memory is worthy of
  discussion.
* Adding memory to the storage tier for GFS2/OCFS2 would add non-persistent
  media to the storage tier; whether this makes sense would be a topic that
  GFS2/OCFS2 developers/architects should get involved in if they're 
  interested.

Although disaggregated shared memory is not commercially available yet, famfs 
is being actively tested by multiple companies for several use cases and 
patterns with real and simulated shared memory. Demonstrations will start to
surface in the coming weeks & months.

Regards,
John



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system
  2024-04-29 23:08   ` Kent Overstreet
@ 2024-04-30  2:24     ` John Groves
  2024-04-30  3:11       ` Kent Overstreet
  0 siblings, 1 reply; 32+ messages in thread
From: John Groves @ 2024-04-30  2:24 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Matthew Wilcox, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, linux-cxl, linux-fsdevel, nvdimm, John Groves, john,
	Dave Chinner, Christoph Hellwig, dave.hansen, gregory.price,
	Randy Dunlap, Jerome Glisse, Aravind Ramesh, Ajay Joshi,
	Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Steve French, Nathan Lynch, Michael Ellerman,
	Thomas Zimmermann, Julien Panis, Stanislav Fomichev,
	Dongsheng Yang

On 24/04/29 07:08PM, Kent Overstreet wrote:
> On Mon, Apr 29, 2024 at 07:32:55PM +0100, Matthew Wilcox wrote:
> > On Mon, Apr 29, 2024 at 12:04:16PM -0500, John Groves wrote:
> > > This patch set introduces famfs[1] - a special-purpose fs-dax file system
> > > for sharable disaggregated or fabric-attached memory (FAM). Famfs is not
> > > CXL-specific in anyway way.
> > > 
> > > * Famfs creates a simple access method for storing and sharing data in
> > >   sharable memory. The memory is exposed and accessed as memory-mappable
> > >   dax files.
> > > * Famfs supports multiple hosts mounting the same file system from the
> > >   same memory (something existing fs-dax file systems don't do).
> > 
> > Yes, but we do already have two filesystems that support shared storage,
> > and are rather more advanced than famfs -- GFS2 and OCFS2.  What are
> > the pros and cons of improving either of those to support DAX rather
> > than starting again with a new filesystem?
> 
> I could see a shared memory filesystem as being a completely different
> beast than a shared block storage filesystem - and I've never heard
> anyone talking about gfs2 or ocfs2 as codebases we particularly liked.

Thanks for your attention on famfs, Kent.

I think of it as a completely different beast. See my reply to Willy re:
famfs being more of a memory allocator with the benefit of allocations 
being accessible (and memory-mappable) as files.

> 
> This looks like it might not even be persistent? Does it survive a
> reboot? If not, that means it'll be much smaller than a conventional
> filesystem.

Right; cxl memory *can* be persistent, but most of the future products
I'm aware of will not be persistent. Those of us who work at memory
companies have been educated in recent years as to the value (or
lack thereof) of persistence (see 3DX / Optane).

But since shared memory is probably on a separate power domain from
a server, it is likely to persist across reboots. But it still ain't
storage.

> 
> But yeah, a bit more on where this is headed would be nice.

The famfs user space repo has some good documentation as to the on-
media structure of famfs. Scroll down on [1] (the documentation from
the famfs user space repo). There is quite a bit of info in the docs
from that repo.

The other docs from the cover letter are also useful...

> 
> Another concern is that every filesystem tends to be another huge
> monolithic codebase without a lot of code sharing between them - how
> much are we going to be adding in the end?

A fair concern. Famfs is kinda fuse-like, in that the metadata handling
is mostly in user space. Famfs is currently <1 KLOC of code in the 
kernel. That may grow, but it's not clear that there is a risk of
"huge monolithic". 

But it's something we should consider - and I'll be at LSFMM and 
happy to engage about this.

> 
> Can we start looking for more code sharing, more library code to factor
> out?
> 
> Some description of the internal data structures would really help here.


[1] https://github.com/cxl-micron-reskit/famfs/blob/master/README.md

Best regards,
John

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system
  2024-04-30  2:24     ` John Groves
@ 2024-04-30  3:11       ` Kent Overstreet
  2024-05-01  2:09         ` John Groves
  0 siblings, 1 reply; 32+ messages in thread
From: Kent Overstreet @ 2024-04-30  3:11 UTC (permalink / raw)
  To: John Groves
  Cc: Matthew Wilcox, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, linux-cxl, linux-fsdevel, nvdimm, John Groves, john,
	Dave Chinner, Christoph Hellwig, dave.hansen, gregory.price,
	Randy Dunlap, Jerome Glisse, Aravind Ramesh, Ajay Joshi,
	Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Steve French, Nathan Lynch, Michael Ellerman,
	Thomas Zimmermann, Julien Panis, Stanislav Fomichev,
	Dongsheng Yang

On Mon, Apr 29, 2024 at 09:24:19PM -0500, John Groves wrote:
> On 24/04/29 07:08PM, Kent Overstreet wrote:
> > On Mon, Apr 29, 2024 at 07:32:55PM +0100, Matthew Wilcox wrote:
> > > On Mon, Apr 29, 2024 at 12:04:16PM -0500, John Groves wrote:
> > > > This patch set introduces famfs[1] - a special-purpose fs-dax file system
> > > > for sharable disaggregated or fabric-attached memory (FAM). Famfs is not
> > > > CXL-specific in anyway way.
> > > > 
> > > > * Famfs creates a simple access method for storing and sharing data in
> > > >   sharable memory. The memory is exposed and accessed as memory-mappable
> > > >   dax files.
> > > > * Famfs supports multiple hosts mounting the same file system from the
> > > >   same memory (something existing fs-dax file systems don't do).
> > > 
> > > Yes, but we do already have two filesystems that support shared storage,
> > > and are rather more advanced than famfs -- GFS2 and OCFS2.  What are
> > > the pros and cons of improving either of those to support DAX rather
> > > than starting again with a new filesystem?
> > 
> > I could see a shared memory filesystem as being a completely different
> > beast than a shared block storage filesystem - and I've never heard
> > anyone talking about gfs2 or ocfs2 as codebases we particularly liked.
> 
> Thanks for your attention on famfs, Kent.
> 
> I think of it as a completely different beast. See my reply to Willy re:
> famfs being more of a memory allocator with the benefit of allocations 
> being accessible (and memory-mappable) as files.

That's pretty much what I expected.

I would suggest talking to RDMA people; RDMA does similar things with
exposing address spaces across machine, and an "external" memory
allocator is a basic building block there as well - it'd be great if we
could get that turned into some clean library code.

GPU people as well, possibly.

> The famfs user space repo has some good documentation as to the on-
> media structure of famfs. Scroll down on [1] (the documentation from
> the famfs user space repo). There is quite a bit of info in the docs
> from that repo.

Ok, looking through that now.

So youv've got a metadata log; that looks more like a conventional
filesystem than a conventional purely in-memory thing.

But you say it's a shared filesystem, and it doesn't say anything about
that. Inter node locking?

Perhaps the ocfs2/gfs2 comparison is appropriate, after all.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 01/12] famfs: Introduce famfs documentation
  2024-04-29 17:04 ` [RFC PATCH v2 01/12] famfs: Introduce famfs documentation John Groves
@ 2024-04-30  6:46   ` Bagas Sanjaya
  0 siblings, 0 replies; 32+ messages in thread
From: Bagas Sanjaya @ 2024-04-30  6:46 UTC (permalink / raw)
  To: John Groves, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, Matthew Wilcox, Linux CXL,
	Linux Filesystems Development, Linux NVIDMM, Linux Documentation,
	Linux Kernel Mailing List
  Cc: John Groves, john, Dave Chinner, Christoph Hellwig, dave.hansen,
	gregory.price, Randy Dunlap, Jerome Glisse, Aravind Ramesh,
	Ajay Joshi, Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang, Mao Zhu, Ran Sun,
	Xiang wangx, Shaomin Deng, Charles Han, Attreyee M

[-- Attachment #1: Type: text/plain, Size: 445 bytes --]

On Mon, Apr 29, 2024 at 12:04:17PM -0500, John Groves wrote:
> * Introduce Documentation/filesystems/famfs.rst into the Documentation
>   tree and filesystems index
> * Add famfs famfs.rst to the filesystems doc index
> * Add famfs' ioctl opcodes to ioctl-number.rst
> * Update MAINTAINERS FILE
> 

The doc LGTM, thanks!

Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 08/12] famfs: module operations & fs_context
  2024-04-29 17:04 ` [RFC PATCH v2 08/12] famfs: module operations & fs_context John Groves
@ 2024-04-30 11:01   ` Christian Brauner
  2024-05-02 15:51     ` John Groves
  2024-05-03 14:15     ` John Groves
  2024-05-02 18:23   ` Al Viro
  1 sibling, 2 replies; 32+ messages in thread
From: Christian Brauner @ 2024-04-30 11:01 UTC (permalink / raw)
  To: John Groves
  Cc: Jonathan Corbet, Jonathan Cameron, Dan Williams, Vishal Verma,
	Dave Jiang, Alexander Viro, Jan Kara, Matthew Wilcox, linux-cxl,
	linux-fsdevel, nvdimm, John Groves, john, Dave Chinner,
	Christoph Hellwig, dave.hansen, gregory.price, Randy Dunlap,
	Jerome Glisse, Aravind Ramesh, Ajay Joshi, Eishan Mirakhur,
	Ravi Shankar, Srinivasulu Thanneeru, Luis Chamberlain,
	Amir Goldstein, Chandan Babu R, Bagas Sanjaya, Darrick J . Wong,
	Kent Overstreet, Steve French, Nathan Lynch, Michael Ellerman,
	Thomas Zimmermann, Julien Panis, Stanislav Fomichev,
	Dongsheng Yang

On Mon, Apr 29, 2024 at 12:04:24PM -0500, John Groves wrote:
> Start building up from the famfs module operations. This commit
> includes the following:
> 
> * Register as a file system
> * Parse mount parameters
> * Allocate or find (and initialize) a superblock via famfs_get_tree()
> * Lookup the host dax device, and bail if it's in use (or not dax)
> * Register as the holder of the dax device if it's available
> * Add Kconfig and Makefile misc to build famfs
> * Add FAMFS_SUPER_MAGIC to include/uapi/linux/magic.h
> * Add export of fs/namei.c:may_open_dev(), which famfs needs to call
> * Update MAINTAINERS file for the fs/famfs/ path
> 
> The following exports had to happen to enable famfs:
> 
> * This uses the new fs/super.c:kill_char_super() - the other kill*super
>   helpers were not quite right.
> * This uses the dev_dax_iomap export of dax_dev_get()
> 
> This commit builds but is otherwise too incomplete to run
> 
> Signed-off-by: John Groves <john@groves.net>
> ---
>  MAINTAINERS                |   1 +
>  fs/Kconfig                 |   2 +
>  fs/Makefile                |   1 +
>  fs/famfs/Kconfig           |  10 ++
>  fs/famfs/Makefile          |   5 +
>  fs/famfs/famfs_inode.c     | 345 +++++++++++++++++++++++++++++++++++++
>  fs/famfs/famfs_internal.h  |  36 ++++
>  fs/namei.c                 |   1 +
>  include/uapi/linux/magic.h |   1 +
>  9 files changed, 402 insertions(+)
>  create mode 100644 fs/famfs/Kconfig
>  create mode 100644 fs/famfs/Makefile
>  create mode 100644 fs/famfs/famfs_inode.c
>  create mode 100644 fs/famfs/famfs_internal.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3f2d847dcf01..365d678e2f40 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8188,6 +8188,7 @@ L:	linux-cxl@vger.kernel.org
>  L:	linux-fsdevel@vger.kernel.org
>  S:	Supported
>  F:	Documentation/filesystems/famfs.rst
> +F:	fs/famfs
>  
>  FANOTIFY
>  M:	Jan Kara <jack@suse.cz>
> diff --git a/fs/Kconfig b/fs/Kconfig
> index a46b0cbc4d8f..53b4629e92a0 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -140,6 +140,8 @@ source "fs/autofs/Kconfig"
>  source "fs/fuse/Kconfig"
>  source "fs/overlayfs/Kconfig"
>  
> +source "fs/famfs/Kconfig"
> +
>  menu "Caches"
>  
>  source "fs/netfs/Kconfig"
> diff --git a/fs/Makefile b/fs/Makefile
> index 6ecc9b0a53f2..3393f399a9e9 100644
> --- a/fs/Makefile
> +++ b/fs/Makefile
> @@ -129,3 +129,4 @@ obj-$(CONFIG_EFIVAR_FS)		+= efivarfs/
>  obj-$(CONFIG_EROFS_FS)		+= erofs/
>  obj-$(CONFIG_VBOXSF_FS)		+= vboxsf/
>  obj-$(CONFIG_ZONEFS_FS)		+= zonefs/
> +obj-$(CONFIG_FAMFS)             += famfs/
> diff --git a/fs/famfs/Kconfig b/fs/famfs/Kconfig
> new file mode 100644
> index 000000000000..edb8980820f7
> --- /dev/null
> +++ b/fs/famfs/Kconfig
> @@ -0,0 +1,10 @@
> +
> +
> +config FAMFS
> +       tristate "famfs: shared memory file system"
> +       depends on DEV_DAX && FS_DAX && DEV_DAX_IOMAP
> +       help
> +	  Support for the famfs file system. Famfs is a dax file system that
> +	  can support scale-out shared access to fabric-attached memory
> +	  (e.g. CXL shared memory). Famfs is not a general purpose file system;
> +	  it is an enabler for data sets in shared memory.
> diff --git a/fs/famfs/Makefile b/fs/famfs/Makefile
> new file mode 100644
> index 000000000000..62230bcd6793
> --- /dev/null
> +++ b/fs/famfs/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-$(CONFIG_FAMFS) += famfs.o
> +
> +famfs-y := famfs_inode.o
> diff --git a/fs/famfs/famfs_inode.c b/fs/famfs/famfs_inode.c
> new file mode 100644
> index 000000000000..61306240fc0b
> --- /dev/null
> +++ b/fs/famfs/famfs_inode.c
> @@ -0,0 +1,345 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * famfs - dax file system for shared fabric-attached memory
> + *
> + * Copyright 2023-2024 Micron Technology, inc
> + *
> + * This file system, originally based on ramfs the dax support from xfs,
> + * is intended to allow multiple host systems to mount a common file system
> + * view of dax files that map to shared memory.
> + */
> +
> +#include <linux/fs.h>
> +#include <linux/time.h>
> +#include <linux/init.h>
> +#include <linux/string.h>
> +#include <linux/parser.h>
> +#include <linux/magic.h>
> +#include <linux/slab.h>
> +#include <linux/fs_context.h>
> +#include <linux/fs_parser.h>
> +#include <linux/dax.h>
> +#include <linux/hugetlb.h>
> +#include <linux/iomap.h>
> +#include <linux/path.h>
> +#include <linux/namei.h>
> +
> +#include "famfs_internal.h"
> +
> +#define FAMFS_DEFAULT_MODE	0755
> +
> +static struct inode *famfs_get_inode(struct super_block *sb,
> +				     const struct inode *dir,
> +				     umode_t mode, dev_t dev)
> +{
> +	struct inode *inode = new_inode(sb);
> +	struct timespec64 tv;
> +
> +	if (!inode)
> +		return NULL;
> +
> +	inode->i_ino = get_next_ino();
> +	inode_init_owner(&nop_mnt_idmap, inode, dir, mode);
> +	inode->i_mapping->a_ops = &ram_aops;
> +	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
> +	mapping_set_unevictable(inode->i_mapping);
> +	tv = inode_set_ctime_current(inode);
> +	inode_set_mtime_to_ts(inode, tv);
> +	inode_set_atime_to_ts(inode, tv);
> +
> +	switch (mode & S_IFMT) {
> +	default:
> +		init_special_inode(inode, mode, dev);
> +		break;
> +	case S_IFREG:
> +		inode->i_op = NULL /* famfs_file_inode_operations */;
> +		inode->i_fop = NULL /* &famfs_file_operations */;
> +		break;
> +	case S_IFDIR:
> +		inode->i_op = NULL /* famfs_dir_inode_operations */;
> +		inode->i_fop = &simple_dir_operations;
> +
> +		/* Directory inodes start off with i_nlink == 2 (for ".") */
> +		inc_nlink(inode);
> +		break;
> +	case S_IFLNK:
> +		inode->i_op = &page_symlink_inode_operations;
> +		inode_nohighmem(inode);
> +		break;
> +	}
> +	return inode;
> +}
> +
> +/*
> + * famfs dax_operations  (for char dax)
> + */
> +static int
> +famfs_dax_notify_failure(struct dax_device *dax_dev, u64 offset,
> +			u64 len, int mf_flags)
> +{
> +	struct super_block *sb = dax_holder(dax_dev);
> +	struct famfs_fs_info *fsi = sb->s_fs_info;
> +
> +	pr_err("%s: rootdev=%s offset=%lld len=%llu flags=%x\n", __func__,
> +	       fsi->rootdev, offset, len, mf_flags);
> +
> +	return 0;
> +}
> +
> +static const struct dax_holder_operations famfs_dax_holder_ops = {
> +	.notify_failure		= famfs_dax_notify_failure,
> +};
> +
> +/*****************************************************************************
> + * fs_context_operations
> + */
> +
> +static int
> +famfs_fill_super(struct super_block *sb, struct fs_context *fc)
> +{
> +	int rc = 0;
> +
> +	sb->s_maxbytes		= MAX_LFS_FILESIZE;
> +	sb->s_blocksize		= PAGE_SIZE;
> +	sb->s_blocksize_bits	= PAGE_SHIFT;
> +	sb->s_magic		= FAMFS_SUPER_MAGIC;
> +	sb->s_op		= NULL /* famfs_super_ops */;
> +	sb->s_time_gran		= 1;
> +
> +	return rc;
> +}
> +
> +static int
> +lookup_daxdev(const char *pathname, dev_t *devno)
> +{
> +	struct inode *inode;
> +	struct path path;
> +	int err;
> +
> +	if (!pathname || !*pathname)
> +		return -EINVAL;
> +
> +	err = kern_path(pathname, LOOKUP_FOLLOW, &path);
> +	if (err)
> +		return err;
> +
> +	inode = d_backing_inode(path.dentry);
> +	if (!S_ISCHR(inode->i_mode)) {
> +		err = -EINVAL;
> +		goto out_path_put;
> +	}
> +
> +	if (!may_open_dev(&path)) { /* had to export this */
> +		err = -EACCES;
> +		goto out_path_put;
> +	}
> +
> +	 /* if it's dax, i_rdev is struct dax_device */
> +	*devno = inode->i_rdev;
> +
> +out_path_put:
> +	path_put(&path);
> +	return err;
> +}
> +
> +static int
> +famfs_get_tree(struct fs_context *fc)
> +{
> +	struct famfs_fs_info *fsi = fc->s_fs_info;
> +	struct dax_device *dax_devp;
> +	struct super_block *sb;
> +	struct inode *inode;
> +	dev_t daxdevno;
> +	int err;
> +
> +	/* TODO: clean up chatty messages */
> +
> +	err = lookup_daxdev(fc->source, &daxdevno);
> +	if (err)
> +		return err;
> +
> +	fsi->daxdevno = daxdevno;
> +
> +	/* This will set sb->s_dev=daxdevno */
> +	sb = sget_dev(fc, daxdevno);

This will open the dax device as a block device. However, nothing in
your ->kill_sb method or kill_char_super() closes it again. So you're
leaking block device references and leaving unitialized memory around as
you've claimed that device but never ended your claim.

> +	if (IS_ERR(sb)) {
> +		pr_err("%s: sget_dev error\n", __func__);
> +		return PTR_ERR(sb);
> +	}
> +
> +	if (sb->s_root) {
> +		pr_info("%s: found a matching suerblock for %s\n",
> +			__func__, fc->source);
> +
> +		/* We don't expect to find a match by dev_t; if we do, it must
> +		 * already be mounted, so we bail
> +		 */
> +		err = -EBUSY;
> +		goto deactivate_out;
> +	} else {
> +		pr_info("%s: initializing new superblock for %s\n",
> +			__func__, fc->source);
> +		err = famfs_fill_super(sb, fc);
> +		if (err)
> +			goto deactivate_out;
> +	}
> +
> +	/* This will fail if it's not a dax device */
> +	dax_devp = dax_dev_get(daxdevno);
> +	if (!dax_devp) {
> +		pr_warn("%s: device %s not found or not dax\n",
> +		       __func__, fc->source);
> +		err = -ENODEV;
> +		goto deactivate_out;
> +	}
> +
> +	err = fs_dax_get(dax_devp, sb, &famfs_dax_holder_ops);
> +	if (err) {
> +		pr_err("%s: fs_dax_get(%lld) failed\n", __func__, (u64)daxdevno);
> +		err = -EBUSY;
> +		goto deactivate_out;
> +	}
> +	fsi->dax_devp = dax_devp;
> +
> +	inode = famfs_get_inode(sb, NULL, S_IFDIR | fsi->mount_opts.mode, 0);
> +	sb->s_root = d_make_root(inode);
> +	if (!sb->s_root) {
> +		pr_err("%s: d_make_root() failed\n", __func__);
> +		err = -ENOMEM;
> +		fs_put_dax(fsi->dax_devp, sb);
> +		goto deactivate_out;
> +	}
> +
> +	sb->s_flags |= SB_ACTIVE;
> +
> +	WARN_ON(fc->root);
> +	fc->root = dget(sb->s_root);
> +	return err;
> +
> +deactivate_out:
> +	pr_debug("%s: deactivating sb=%llx\n", __func__, (u64)sb);
> +	deactivate_locked_super(sb);
> +	return err;
> +}
> +
> +/*****************************************************************************/
> +
> +enum famfs_param {
> +	Opt_mode,
> +	Opt_dax,
> +};
> +
> +const struct fs_parameter_spec famfs_fs_parameters[] = {
> +	fsparam_u32oct("mode",	  Opt_mode),
> +	fsparam_string("dax",     Opt_dax),
> +	{}
> +};
> +
> +static int famfs_parse_param(struct fs_context *fc, struct fs_parameter *param)
> +{
> +	struct famfs_fs_info *fsi = fc->s_fs_info;
> +	struct fs_parse_result result;
> +	int opt;
> +
> +	opt = fs_parse(fc, famfs_fs_parameters, param, &result);
> +	if (opt == -ENOPARAM) {
> +		opt = vfs_parse_fs_param_source(fc, param);
> +		if (opt != -ENOPARAM)
> +			return opt;

This shouldn't be needed. The VFS will handle all that for you.

> +
> +		return 0;
> +	}
> +	if (opt < 0)
> +		return opt;
> +
> +	switch (opt) {
> +	case Opt_mode:
> +		fsi->mount_opts.mode = result.uint_32 & S_IALLUGO;
> +		break;
> +	case Opt_dax:
> +		if (strcmp(param->string, "always"))
> +			pr_notice("%s: invalid dax mode %s\n",
> +				  __func__, param->string);
> +		break;
> +	}
> +
> +	return 0;
> +}
> +
> +static void famfs_free_fc(struct fs_context *fc)
> +{
> +	struct famfs_fs_info *fsi = fc->s_fs_info;
> +
> +	if (fsi && fsi->rootdev)
> +		kfree(fsi->rootdev);

Dead code since rootdev is unused an unset?

> +
> +	kfree(fsi);
> +}
> +
> +static const struct fs_context_operations famfs_context_ops = {
> +	.free		= famfs_free_fc,
> +	.parse_param	= famfs_parse_param,
> +	.get_tree	= famfs_get_tree,
> +};
> +
> +static int famfs_init_fs_context(struct fs_context *fc)
> +{
> +	struct famfs_fs_info *fsi;
> +
> +	fsi = kzalloc(sizeof(*fsi), GFP_KERNEL);
> +	if (!fsi)
> +		return -ENOMEM;
> +
> +	fsi->mount_opts.mode = FAMFS_DEFAULT_MODE;
> +	fc->s_fs_info        = fsi;
> +	fc->ops              = &famfs_context_ops;
> +	return 0;
> +}
> +
> +static void famfs_kill_sb(struct super_block *sb)
> +{
> +	struct famfs_fs_info *fsi = sb->s_fs_info;
> +
> +	if (fsi->dax_devp)
> +		fs_put_dax(fsi->dax_devp, sb);
> +	if (fsi && fsi->rootdev)
> +		kfree(fsi->rootdev);
> +	kfree(fsi);
> +	sb->s_fs_info = NULL;
> +
> +	kill_char_super(sb); /* new */
> +}

Can likely just be

static void famfs_kill_sb(struct super_block *sb)
{
	struct famfs_fs_info *fsi = sb->s_fs_info;

	generic_shutdown_super(sb);

        if (sb->s_bdev_file)
		bdev_fput(sb->s_bdev_file);

	if (fsi->dax_devp)
		fs_put_dax(fsi->dax_devp, sb);

	kfree(fsi);
}

and then you don't need any custom helpers at all.

> +
> +#define MODULE_NAME "famfs"
> +static struct file_system_type famfs_fs_type = {
> +	.name		  = MODULE_NAME,
> +	.init_fs_context  = famfs_init_fs_context,
> +	.parameters	  = famfs_fs_parameters,
> +	.kill_sb	  = famfs_kill_sb,
> +	.fs_flags	  = FS_USERNS_MOUNT,

Sorry, no. This should not be mountable by unprivileged users and
containers if it's using a real device and especially not since it's not
even a mature filesystem.

> +};
> +
> +/******************************************************************************
> + * Module stuff
> + */
> +static int __init init_famfs_fs(void)
> +{
> +	int rc;
> +
> +	rc = register_filesystem(&famfs_fs_type);
> +
> +	return rc;
> +}
> +
> +static void
> +__exit famfs_exit(void)
> +{
> +	unregister_filesystem(&famfs_fs_type);
> +	pr_info("%s: unregistered\n", __func__);
> +}
> +
> +fs_initcall(init_famfs_fs);
> +module_exit(famfs_exit);
> +
> +MODULE_AUTHOR("John Groves, Micron Technology");
> +MODULE_LICENSE("GPL");
> diff --git a/fs/famfs/famfs_internal.h b/fs/famfs/famfs_internal.h
> new file mode 100644
> index 000000000000..951b32ec4fbd
> --- /dev/null
> +++ b/fs/famfs/famfs_internal.h
> @@ -0,0 +1,36 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * famfs - dax file system for shared fabric-attached memory
> + *
> + * Copyright 2023-2024 Micron Technology, Inc.
> + *
> + * This file system, originally based on ramfs the dax support from xfs,
> + * is intended to allow multiple host systems to mount a common file system
> + * view of dax files that map to shared memory.
> + */
> +#ifndef FAMFS_INTERNAL_H
> +#define FAMFS_INTERNAL_H
> +
> +struct famfs_mount_opts {
> +	umode_t mode;
> +};
> +
> +/**
> + * @famfs_fs_info
> + *
> + * @mount_opts: the mount options
> + * @dax_devp:   The underlying character devdax device
> + * @rootdev:    Dax device path used in mount
> + * @daxdevno:   Dax device dev_t
> + * @deverror:   True if the dax device has called our notify_failure entry
> + *              point, or if other "shutdown" conditions exist
> + */
> +struct famfs_fs_info {
> +	struct famfs_mount_opts  mount_opts;
> +	struct dax_device       *dax_devp;
> +	char                    *rootdev;
> +	dev_t                    daxdevno;
> +	bool                     deverror;
> +};
> +
> +#endif /* FAMFS_INTERNAL_H */
> diff --git a/fs/namei.c b/fs/namei.c
> index c5b2a25be7d0..f24b268473cd 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3229,6 +3229,7 @@ bool may_open_dev(const struct path *path)
>  	return !(path->mnt->mnt_flags & MNT_NODEV) &&
>  		!(path->mnt->mnt_sb->s_iflags & SB_I_NODEV);
>  }
> +EXPORT_SYMBOL(may_open_dev);
>  
>  static int may_open(struct mnt_idmap *idmap, const struct path *path,
>  		    int acc_mode, int flag)
> diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
> index 1b40a968ba91..e9bdd6a415e2 100644
> --- a/include/uapi/linux/magic.h
> +++ b/include/uapi/linux/magic.h
> @@ -37,6 +37,7 @@
>  #define HOSTFS_SUPER_MAGIC	0x00c0ffee
>  #define OVERLAYFS_SUPER_MAGIC	0x794c7630
>  #define FUSE_SUPER_MAGIC	0x65735546
> +#define FAMFS_SUPER_MAGIC	0x87b282ff
>  
>  #define MINIX_SUPER_MAGIC	0x137F		/* minix v1 fs, 14 char names */
>  #define MINIX_SUPER_MAGIC2	0x138F		/* minix v1 fs, 30 char names */
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system
  2024-04-30  2:11   ` John Groves
@ 2024-04-30 21:01     ` Matthew Wilcox
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2024-04-30 21:01 UTC (permalink / raw)
  To: John Groves
  Cc: Jonathan Corbet, Jonathan Cameron, Dan Williams, Vishal Verma,
	Dave Jiang, Alexander Viro, Christian Brauner, Jan Kara,
	linux-cxl, linux-fsdevel, nvdimm, John Groves, john,
	Dave Chinner, Christoph Hellwig, dave.hansen, gregory.price,
	Randy Dunlap, Jerome Glisse, Aravind Ramesh, Ajay Joshi,
	Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang

On Mon, Apr 29, 2024 at 09:11:52PM -0500, John Groves wrote:
> On 24/04/29 07:32PM, Matthew Wilcox wrote:
> > On Mon, Apr 29, 2024 at 12:04:16PM -0500, John Groves wrote:
> > > This patch set introduces famfs[1] - a special-purpose fs-dax file system
> > > for sharable disaggregated or fabric-attached memory (FAM). Famfs is not
> > > CXL-specific in anyway way.
> > > 
> > > * Famfs creates a simple access method for storing and sharing data in
> > >   sharable memory. The memory is exposed and accessed as memory-mappable
> > >   dax files.
> > > * Famfs supports multiple hosts mounting the same file system from the
> > >   same memory (something existing fs-dax file systems don't do).
> > 
> > Yes, but we do already have two filesystems that support shared storage,
> > and are rather more advanced than famfs -- GFS2 and OCFS2.  What are
> > the pros and cons of improving either of those to support DAX rather
> > than starting again with a new filesystem?
> > 
> 
> Thanks for paying attention to this Willy.

Well, don't mistake this for an endorsement!  I remain convinced that
this is a science project, not a product.  I am hugely sceptical of
disaggregated systems, mostly because I've seen so many fail.  And they
rarely attempt to answer the "janitor tripped over the cable" problem,
the "we need to upgrade the firmware on the switch" problem, or a bunch
of other problems I've outlined in the past on this list.

So I am not supportive of any changes you want to make to the core kernel
to support this kind of adventure.  Play in your own sandbox all you
like, but not one line of code change in the core.  Unless it's something
generally beneficial, of course; you mentioned refactoring DAX and that
might be a good thing for everybody.

> * Famfs is not, not, not a general purpose file system.
> * One can think of famfs as a shared memory allocator where allocations can be
>   accessed as files. For certain data analytics work flows (especially 
>   involving Apache Arrow data frames) this is really powerful. Consumers of
>   data frames commonly use mmap(MAP_SHARED), and can benefit from the memory
>   de-duplication of shared memory and don't need any new abstractions.

... and are OK with the extra latency?

> * Famfs is not really a data storage tool. It's more of a shared-memroy 
>   allocation tool that has the benefit of allocations being accesssible 
>   (and memory-mappable) as files. So a lot of software can automatically use 
>   it.
> * Famfs is oriented to dumping sharable data into files and then allowing a
>   scale-out cluster to share it (often read-only) to access a single copy in
>   shared memory.

Depending on the exact workload, I can see this being more efficient
than replicating the data to each member of the cluster.  In other
workloads, it'll be a loss, of course.

> * I'm no expert on GFS2 or OCFS2, but I've been around memory, file systems 
>   and storage since well before the turn of the century...
> * If you had brought up the existing fs-dax file systems, I would have pointed
>   that they use write-back metadata, which does not reconcile with shared
>   access to media - but these file systems do handle that.
> * The shared media file systems are still oriented to block devices that
>   provide durable storage and page-oriented access. CXL DRAM is a character 

I'd say "block oriented" rather than page oriented, but I agree.

>   dax (devdax) device and does not provide durable storage.
> * fs-dax-style memory mapping for volatile cxl memory requires the 
>   dev_dax_iomap portion of this patch set - or something similar. 
> * A scale-out shared media file system presumably requires some commitment to
>   configure and manage some complexity in a distributed environment; whether
>   that should be mandatory for enablement of shared memory is worthy of
>   discussion.
> * Adding memory to the storage tier for GFS2/OCFS2 would add non-persistent
>   media to the storage tier; whether this makes sense would be a topic that
>   GFS2/OCFS2 developers/architects should get involved in if they're 
>   interested.
> 
> Although disaggregated shared memory is not commercially available yet, famfs 
> is being actively tested by multiple companies for several use cases and 
> patterns with real and simulated shared memory. Demonstrations will start to
> surface in the coming weeks & months.

I guess we'll see.  SGI died for a reason.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system
  2024-04-30  3:11       ` Kent Overstreet
@ 2024-05-01  2:09         ` John Groves
  0 siblings, 0 replies; 32+ messages in thread
From: John Groves @ 2024-05-01  2:09 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Matthew Wilcox, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Alexander Viro, Christian Brauner,
	Jan Kara, linux-cxl, linux-fsdevel, nvdimm, John Groves, john,
	Dave Chinner, Christoph Hellwig, dave.hansen, gregory.price,
	Randy Dunlap, Jerome Glisse, Aravind Ramesh, Ajay Joshi,
	Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Steve French, Nathan Lynch, Michael Ellerman,
	Thomas Zimmermann, Julien Panis, Stanislav Fomichev,
	Dongsheng Yang

On 24/04/29 11:11PM, Kent Overstreet wrote:
> On Mon, Apr 29, 2024 at 09:24:19PM -0500, John Groves wrote:
> > On 24/04/29 07:08PM, Kent Overstreet wrote:
> > > On Mon, Apr 29, 2024 at 07:32:55PM +0100, Matthew Wilcox wrote:
> > > > On Mon, Apr 29, 2024 at 12:04:16PM -0500, John Groves wrote:
> > > > > This patch set introduces famfs[1] - a special-purpose fs-dax file system
> > > > > for sharable disaggregated or fabric-attached memory (FAM). Famfs is not
> > > > > CXL-specific in anyway way.
> > > > > 
> > > > > * Famfs creates a simple access method for storing and sharing data in
> > > > >   sharable memory. The memory is exposed and accessed as memory-mappable
> > > > >   dax files.
> > > > > * Famfs supports multiple hosts mounting the same file system from the
> > > > >   same memory (something existing fs-dax file systems don't do).
> > > > 
> > > > Yes, but we do already have two filesystems that support shared storage,
> > > > and are rather more advanced than famfs -- GFS2 and OCFS2.  What are
> > > > the pros and cons of improving either of those to support DAX rather
> > > > than starting again with a new filesystem?
> > > 
> > > I could see a shared memory filesystem as being a completely different
> > > beast than a shared block storage filesystem - and I've never heard
> > > anyone talking about gfs2 or ocfs2 as codebases we particularly liked.
> > 
> > Thanks for your attention on famfs, Kent.
> > 
> > I think of it as a completely different beast. See my reply to Willy re:
> > famfs being more of a memory allocator with the benefit of allocations 
> > being accessible (and memory-mappable) as files.
> 
> That's pretty much what I expected.
> 
> I would suggest talking to RDMA people; RDMA does similar things with
> exposing address spaces across machine, and an "external" memory
> allocator is a basic building block there as well - it'd be great if we
> could get that turned into some clean library code.
> 
> GPU people as well, possibly.

Thanks for your attention Kent.

I'm on it. Part of the core idea behind famfs is that page-oriented data
movement can be avoided with actual shared memory. Yes, the memory is likely to 
be slower (either BW or latency or both) but it's cacheline access rather than 
full-page (or larger) retrieval, which is a win for some access patterns (and
not so for others).

Part of the issue is communicating the fact that shared access to cachelines
is possible.

There are some interesting possibilities with GPUs retrieving famfs files
(or portions thereof), but I have no insight as to the motivations of GPU 
vendors.

> 
> > The famfs user space repo has some good documentation as to the on-
> > media structure of famfs. Scroll down on [1] (the documentation from
> > the famfs user space repo). There is quite a bit of info in the docs
> > from that repo.
> 
> Ok, looking through that now.
> 
> So youv've got a metadata log; that looks more like a conventional
> filesystem than a conventional purely in-memory thing.
> 
> But you say it's a shared filesystem, and it doesn't say anything about
> that. Inter node locking?
> 
> Perhaps the ocfs2/gfs2 comparison is appropriate, after all.

Famfs is intended to be mounted from more than one host from the same in-memory
image. A metadata log is kinda the simpliest approach to make that work (let me
know your thoughts if you disagree on that). When a client mounts, playing the 
log from the shared memory brings that client mount into sync with the source 
(the Master).

No inter-node locking is currently needed because only the node that created
the file system (the Master) can write the log. Famfs is not intended to be 
a general-purpose FS...

The famfs log is currently append-only, and I think of it as a "code-first"
implementation of a shared memory FS that that gets the job done in something
approaching the simplest possible approach.

If the approach evolves to full allocate-on-write, then moving to a file system
platform that handles that would make sense. If it remains (as I suspect will
make sense) a way to share collections of data sets, or indexes, or other 
data that is published and then consumed [all or mostly] read-only, this
simple approach may be long-term sufficient.

Regards,
John




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 08/12] famfs: module operations & fs_context
  2024-04-30 11:01   ` Christian Brauner
@ 2024-05-02 15:51     ` John Groves
  2024-05-03 14:15     ` John Groves
  1 sibling, 0 replies; 32+ messages in thread
From: John Groves @ 2024-05-02 15:51 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jonathan Corbet, Jonathan Cameron, Dan Williams, Vishal Verma,
	Dave Jiang, Alexander Viro, Jan Kara, Matthew Wilcox, linux-cxl,
	linux-fsdevel, nvdimm, John Groves, john, Dave Chinner,
	Christoph Hellwig, dave.hansen, gregory.price, Randy Dunlap,
	Jerome Glisse, Aravind Ramesh, Ajay Joshi, Eishan Mirakhur,
	Ravi Shankar, Srinivasulu Thanneeru, Luis Chamberlain,
	Amir Goldstein, Chandan Babu R, Bagas Sanjaya, Darrick J . Wong,
	Kent Overstreet, Steve French, Nathan Lynch, Michael Ellerman,
	Thomas Zimmermann, Julien Panis, Stanislav Fomichev,
	Dongsheng Yang

On 24/04/30 01:01PM, Christian Brauner wrote:
> On Mon, Apr 29, 2024 at 12:04:24PM -0500, John Groves wrote:
> > Start building up from the famfs module operations. This commit
> > includes the following:
> > 
> > * Register as a file system
> > * Parse mount parameters
> > * Allocate or find (and initialize) a superblock via famfs_get_tree()
> > * Lookup the host dax device, and bail if it's in use (or not dax)
> > * Register as the holder of the dax device if it's available
> > * Add Kconfig and Makefile misc to build famfs
> > * Add FAMFS_SUPER_MAGIC to include/uapi/linux/magic.h
> > * Add export of fs/namei.c:may_open_dev(), which famfs needs to call
> > * Update MAINTAINERS file for the fs/famfs/ path
> > 
> > The following exports had to happen to enable famfs:
> > 
> > * This uses the new fs/super.c:kill_char_super() - the other kill*super
> >   helpers were not quite right.
> > * This uses the dev_dax_iomap export of dax_dev_get()
> > 
> > This commit builds but is otherwise too incomplete to run
> > 
> > Signed-off-by: John Groves <john@groves.net>
> > ---
> >  MAINTAINERS                |   1 +
> >  fs/Kconfig                 |   2 +
> >  fs/Makefile                |   1 +
> >  fs/famfs/Kconfig           |  10 ++
> >  fs/famfs/Makefile          |   5 +
> >  fs/famfs/famfs_inode.c     | 345 +++++++++++++++++++++++++++++++++++++
> >  fs/famfs/famfs_internal.h  |  36 ++++
> >  fs/namei.c                 |   1 +
> >  include/uapi/linux/magic.h |   1 +
> >  9 files changed, 402 insertions(+)
> >  create mode 100644 fs/famfs/Kconfig
> >  create mode 100644 fs/famfs/Makefile
> >  create mode 100644 fs/famfs/famfs_inode.c
> >  create mode 100644 fs/famfs/famfs_internal.h
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 3f2d847dcf01..365d678e2f40 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -8188,6 +8188,7 @@ L:	linux-cxl@vger.kernel.org
> >  L:	linux-fsdevel@vger.kernel.org
> >  S:	Supported
> >  F:	Documentation/filesystems/famfs.rst
> > +F:	fs/famfs
> >  
> >  FANOTIFY
> >  M:	Jan Kara <jack@suse.cz>
> > diff --git a/fs/Kconfig b/fs/Kconfig
> > index a46b0cbc4d8f..53b4629e92a0 100644
> > --- a/fs/Kconfig
> > +++ b/fs/Kconfig
> > @@ -140,6 +140,8 @@ source "fs/autofs/Kconfig"
> >  source "fs/fuse/Kconfig"
> >  source "fs/overlayfs/Kconfig"
> >  
> > +source "fs/famfs/Kconfig"
> > +
> >  menu "Caches"
> >  
> >  source "fs/netfs/Kconfig"
> > diff --git a/fs/Makefile b/fs/Makefile
> > index 6ecc9b0a53f2..3393f399a9e9 100644
> > --- a/fs/Makefile
> > +++ b/fs/Makefile
> > @@ -129,3 +129,4 @@ obj-$(CONFIG_EFIVAR_FS)		+= efivarfs/
> >  obj-$(CONFIG_EROFS_FS)		+= erofs/
> >  obj-$(CONFIG_VBOXSF_FS)		+= vboxsf/
> >  obj-$(CONFIG_ZONEFS_FS)		+= zonefs/
> > +obj-$(CONFIG_FAMFS)             += famfs/
> > diff --git a/fs/famfs/Kconfig b/fs/famfs/Kconfig
> > new file mode 100644
> > index 000000000000..edb8980820f7
> > --- /dev/null
> > +++ b/fs/famfs/Kconfig
> > @@ -0,0 +1,10 @@
> > +
> > +
> > +config FAMFS
> > +       tristate "famfs: shared memory file system"
> > +       depends on DEV_DAX && FS_DAX && DEV_DAX_IOMAP
> > +       help
> > +	  Support for the famfs file system. Famfs is a dax file system that
> > +	  can support scale-out shared access to fabric-attached memory
> > +	  (e.g. CXL shared memory). Famfs is not a general purpose file system;
> > +	  it is an enabler for data sets in shared memory.
> > diff --git a/fs/famfs/Makefile b/fs/famfs/Makefile
> > new file mode 100644
> > index 000000000000..62230bcd6793
> > --- /dev/null
> > +++ b/fs/famfs/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +obj-$(CONFIG_FAMFS) += famfs.o
> > +
> > +famfs-y := famfs_inode.o
> > diff --git a/fs/famfs/famfs_inode.c b/fs/famfs/famfs_inode.c
> > new file mode 100644
> > index 000000000000..61306240fc0b
> > --- /dev/null
> > +++ b/fs/famfs/famfs_inode.c
> > @@ -0,0 +1,345 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * famfs - dax file system for shared fabric-attached memory
> > + *
> > + * Copyright 2023-2024 Micron Technology, inc
> > + *
> > + * This file system, originally based on ramfs the dax support from xfs,
> > + * is intended to allow multiple host systems to mount a common file system
> > + * view of dax files that map to shared memory.
> > + */
> > +
> > +#include <linux/fs.h>
> > +#include <linux/time.h>
> > +#include <linux/init.h>
> > +#include <linux/string.h>
> > +#include <linux/parser.h>
> > +#include <linux/magic.h>
> > +#include <linux/slab.h>
> > +#include <linux/fs_context.h>
> > +#include <linux/fs_parser.h>
> > +#include <linux/dax.h>
> > +#include <linux/hugetlb.h>
> > +#include <linux/iomap.h>
> > +#include <linux/path.h>
> > +#include <linux/namei.h>
> > +
> > +#include "famfs_internal.h"
> > +
> > +#define FAMFS_DEFAULT_MODE	0755
> > +
> > +static struct inode *famfs_get_inode(struct super_block *sb,
> > +				     const struct inode *dir,
> > +				     umode_t mode, dev_t dev)
> > +{
> > +	struct inode *inode = new_inode(sb);
> > +	struct timespec64 tv;
> > +
> > +	if (!inode)
> > +		return NULL;
> > +
> > +	inode->i_ino = get_next_ino();
> > +	inode_init_owner(&nop_mnt_idmap, inode, dir, mode);
> > +	inode->i_mapping->a_ops = &ram_aops;
> > +	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
> > +	mapping_set_unevictable(inode->i_mapping);
> > +	tv = inode_set_ctime_current(inode);
> > +	inode_set_mtime_to_ts(inode, tv);
> > +	inode_set_atime_to_ts(inode, tv);
> > +
> > +	switch (mode & S_IFMT) {
> > +	default:
> > +		init_special_inode(inode, mode, dev);
> > +		break;
> > +	case S_IFREG:
> > +		inode->i_op = NULL /* famfs_file_inode_operations */;
> > +		inode->i_fop = NULL /* &famfs_file_operations */;
> > +		break;
> > +	case S_IFDIR:
> > +		inode->i_op = NULL /* famfs_dir_inode_operations */;
> > +		inode->i_fop = &simple_dir_operations;
> > +
> > +		/* Directory inodes start off with i_nlink == 2 (for ".") */
> > +		inc_nlink(inode);
> > +		break;
> > +	case S_IFLNK:
> > +		inode->i_op = &page_symlink_inode_operations;
> > +		inode_nohighmem(inode);
> > +		break;
> > +	}
> > +	return inode;
> > +}
> > +
> > +/*
> > + * famfs dax_operations  (for char dax)
> > + */
> > +static int
> > +famfs_dax_notify_failure(struct dax_device *dax_dev, u64 offset,
> > +			u64 len, int mf_flags)
> > +{
> > +	struct super_block *sb = dax_holder(dax_dev);
> > +	struct famfs_fs_info *fsi = sb->s_fs_info;
> > +
> > +	pr_err("%s: rootdev=%s offset=%lld len=%llu flags=%x\n", __func__,
> > +	       fsi->rootdev, offset, len, mf_flags);
> > +
> > +	return 0;
> > +}
> > +
> > +static const struct dax_holder_operations famfs_dax_holder_ops = {
> > +	.notify_failure		= famfs_dax_notify_failure,
> > +};
> > +
> > +/*****************************************************************************
> > + * fs_context_operations
> > + */
> > +
> > +static int
> > +famfs_fill_super(struct super_block *sb, struct fs_context *fc)
> > +{
> > +	int rc = 0;
> > +
> > +	sb->s_maxbytes		= MAX_LFS_FILESIZE;
> > +	sb->s_blocksize		= PAGE_SIZE;
> > +	sb->s_blocksize_bits	= PAGE_SHIFT;
> > +	sb->s_magic		= FAMFS_SUPER_MAGIC;
> > +	sb->s_op		= NULL /* famfs_super_ops */;
> > +	sb->s_time_gran		= 1;
> > +
> > +	return rc;
> > +}
> > +
> > +static int
> > +lookup_daxdev(const char *pathname, dev_t *devno)
> > +{
> > +	struct inode *inode;
> > +	struct path path;
> > +	int err;
> > +
> > +	if (!pathname || !*pathname)
> > +		return -EINVAL;
> > +
> > +	err = kern_path(pathname, LOOKUP_FOLLOW, &path);
> > +	if (err)
> > +		return err;
> > +
> > +	inode = d_backing_inode(path.dentry);
> > +	if (!S_ISCHR(inode->i_mode)) {
> > +		err = -EINVAL;
> > +		goto out_path_put;
> > +	}
> > +
> > +	if (!may_open_dev(&path)) { /* had to export this */
> > +		err = -EACCES;
> > +		goto out_path_put;
> > +	}
> > +
> > +	 /* if it's dax, i_rdev is struct dax_device */
> > +	*devno = inode->i_rdev;
> > +
> > +out_path_put:
> > +	path_put(&path);
> > +	return err;
> > +}
> > +
> > +static int
> > +famfs_get_tree(struct fs_context *fc)
> > +{
> > +	struct famfs_fs_info *fsi = fc->s_fs_info;
> > +	struct dax_device *dax_devp;
> > +	struct super_block *sb;
> > +	struct inode *inode;
> > +	dev_t daxdevno;
> > +	int err;
> > +
> > +	/* TODO: clean up chatty messages */
> > +
> > +	err = lookup_daxdev(fc->source, &daxdevno);
> > +	if (err)
> > +		return err;
> > +
> > +	fsi->daxdevno = daxdevno;
> > +
> > +	/* This will set sb->s_dev=daxdevno */
> > +	sb = sget_dev(fc, daxdevno);
> 
> This will open the dax device as a block device. However, nothing in
> your ->kill_sb method or kill_char_super() closes it again. So you're
> leaking block device references and leaving unitialized memory around as
> you've claimed that device but never ended your claim.

My uptake is admittedly a bit slow with the superblock handling code; I'm
still working to get my head around it. Thank you for your help and
patience on this.

By "This will open the dax device as a block device", are you referring to
the call from sget_fc() to get_filesystem() - which then does a
__module_get(fs->owner)? That's the only thing I see that one might refer
to as "open" - and I do see [I think] that the famfs code is not doing a
module_put. Or maybe I'm missing something else...

Looking at xfs as an example, bdev_file_open_by_path() is called from 
xfs_fill_super(), which is called back from get_tree_bdev() after the
superblock is found or allocated.

In famfs, I'm not currently using a get_tree helper because there doesn't 
appear to be one that's quite right for a character-backed FS (?). The 
call in famfs that I think is most analogous to bdev_file_open_by_path() 
is dax_dev_get(), which is called after we've found or allocated a 
superblock via sget_dev()->sget_fc().

Using sget_dev() looked ok to me, but it does put a devdax (char device)
dev_t in superblock->s_dev - which may be a bit squirrely because it's
usually a block dev_t. But I don't see s_dev being used except in the
setup_bdev_super() path, which famfs is not using.

So if there is an open that I'm not closing, I'm not seeing it yet (other
than the apparent missing module_put()). Can you elaborate a bit?

> 
> > +	if (IS_ERR(sb)) {
> > +		pr_err("%s: sget_dev error\n", __func__);
> > +		return PTR_ERR(sb);
> > +	}
> > +
> > +	if (sb->s_root) {
> > +		pr_info("%s: found a matching suerblock for %s\n",
> > +			__func__, fc->source);
> > +
> > +		/* We don't expect to find a match by dev_t; if we do, it must
> > +		 * already be mounted, so we bail
> > +		 */
> > +		err = -EBUSY;
> > +		goto deactivate_out;
> > +	} else {
> > +		pr_info("%s: initializing new superblock for %s\n",
> > +			__func__, fc->source);
> > +		err = famfs_fill_super(sb, fc);
> > +		if (err)
> > +			goto deactivate_out;
> > +	}
> > +
> > +	/* This will fail if it's not a dax device */
> > +	dax_devp = dax_dev_get(daxdevno);
> > +	if (!dax_devp) {
> > +		pr_warn("%s: device %s not found or not dax\n",
> > +		       __func__, fc->source);
> > +		err = -ENODEV;
> > +		goto deactivate_out;
> > +	}
> > +
> > +	err = fs_dax_get(dax_devp, sb, &famfs_dax_holder_ops);
> > +	if (err) {
> > +		pr_err("%s: fs_dax_get(%lld) failed\n", __func__, (u64)daxdevno);
> > +		err = -EBUSY;
> > +		goto deactivate_out;
> > +	}
> > +	fsi->dax_devp = dax_devp;
> > +
> > +	inode = famfs_get_inode(sb, NULL, S_IFDIR | fsi->mount_opts.mode, 0);
> > +	sb->s_root = d_make_root(inode);
> > +	if (!sb->s_root) {
> > +		pr_err("%s: d_make_root() failed\n", __func__);
> > +		err = -ENOMEM;
> > +		fs_put_dax(fsi->dax_devp, sb);
> > +		goto deactivate_out;
> > +	}
> > +
> > +	sb->s_flags |= SB_ACTIVE;
> > +
> > +	WARN_ON(fc->root);
> > +	fc->root = dget(sb->s_root);
> > +	return err;
> > +
> > +deactivate_out:
> > +	pr_debug("%s: deactivating sb=%llx\n", __func__, (u64)sb);
> > +	deactivate_locked_super(sb);
> > +	return err;
> > +}
> > +
> > +/*****************************************************************************/
> > +
> > +enum famfs_param {
> > +	Opt_mode,
> > +	Opt_dax,
> > +};
> > +
> > +const struct fs_parameter_spec famfs_fs_parameters[] = {
> > +	fsparam_u32oct("mode",	  Opt_mode),
> > +	fsparam_string("dax",     Opt_dax),
> > +	{}
> > +};
> > +
> > +static int famfs_parse_param(struct fs_context *fc, struct fs_parameter *param)
> > +{
> > +	struct famfs_fs_info *fsi = fc->s_fs_info;
> > +	struct fs_parse_result result;
> > +	int opt;
> > +
> > +	opt = fs_parse(fc, famfs_fs_parameters, param, &result);
> > +	if (opt == -ENOPARAM) {
> > +		opt = vfs_parse_fs_param_source(fc, param);
> > +		if (opt != -ENOPARAM)
> > +			return opt;
> 
> This shouldn't be needed. The VFS will handle all that for you.
> 
> > +
> > +		return 0;
> > +	}
> > +	if (opt < 0)
> > +		return opt;
> > +
> > +	switch (opt) {
> > +	case Opt_mode:
> > +		fsi->mount_opts.mode = result.uint_32 & S_IALLUGO;
> > +		break;
> > +	case Opt_dax:
> > +		if (strcmp(param->string, "always"))
> > +			pr_notice("%s: invalid dax mode %s\n",
> > +				  __func__, param->string);
> > +		break;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static void famfs_free_fc(struct fs_context *fc)
> > +{
> > +	struct famfs_fs_info *fsi = fc->s_fs_info;
> > +
> > +	if (fsi && fsi->rootdev)
> > +		kfree(fsi->rootdev);
> 
> Dead code since rootdev is unused an unset?

Good catch, thank you. rootdev was used in v1 but not in v2.

> 
> > +
> > +	kfree(fsi);
> > +}
> > +
> > +static const struct fs_context_operations famfs_context_ops = {
> > +	.free		= famfs_free_fc,
> > +	.parse_param	= famfs_parse_param,
> > +	.get_tree	= famfs_get_tree,
> > +};
> > +
> > +static int famfs_init_fs_context(struct fs_context *fc)
> > +{
> > +	struct famfs_fs_info *fsi;
> > +
> > +	fsi = kzalloc(sizeof(*fsi), GFP_KERNEL);
> > +	if (!fsi)
> > +		return -ENOMEM;
> > +
> > +	fsi->mount_opts.mode = FAMFS_DEFAULT_MODE;
> > +	fc->s_fs_info        = fsi;
> > +	fc->ops              = &famfs_context_ops;
> > +	return 0;
> > +}
> > +
> > +static void famfs_kill_sb(struct super_block *sb)
> > +{
> > +	struct famfs_fs_info *fsi = sb->s_fs_info;
> > +
> > +	if (fsi->dax_devp)
> > +		fs_put_dax(fsi->dax_devp, sb);
> > +	if (fsi && fsi->rootdev)
> > +		kfree(fsi->rootdev);
> > +	kfree(fsi);
> > +	sb->s_fs_info = NULL;
> > +
> > +	kill_char_super(sb); /* new */
> > +}
> 
> Can likely just be
> 
> static void famfs_kill_sb(struct super_block *sb)
> {
> 	struct famfs_fs_info *fsi = sb->s_fs_info;
> 
> 	generic_shutdown_super(sb);
> 
>         if (sb->s_bdev_file)
> 		bdev_fput(sb->s_bdev_file);
> 
> 	if (fsi->dax_devp)
> 		fs_put_dax(fsi->dax_devp, sb);
> 
> 	kfree(fsi);
> }
> 
> and then you don't need any custom helpers at all.

Thanks; will give this a try

> 
> > +
> > +#define MODULE_NAME "famfs"
> > +static struct file_system_type famfs_fs_type = {
> > +	.name		  = MODULE_NAME,
> > +	.init_fs_context  = famfs_init_fs_context,
> > +	.parameters	  = famfs_fs_parameters,
> > +	.kill_sb	  = famfs_kill_sb,
> > +	.fs_flags	  = FS_USERNS_MOUNT,
> 
> Sorry, no. This should not be mountable by unprivileged users and
> containers if it's using a real device and especially not since it's not
> even a mature filesystem.

I'm a derp. That should be:

	.fs_flags = FS_REQUIRES_DEV;

Right?

<snip>

Thank you for your help!
John


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 07/12] famfs prep: Add fs/super.c:kill_char_super()
  2024-04-29 17:04 ` [RFC PATCH v2 07/12] famfs prep: Add fs/super.c:kill_char_super() John Groves
@ 2024-05-02 18:17   ` Al Viro
  2024-05-02 22:25     ` John Groves
  0 siblings, 1 reply; 32+ messages in thread
From: Al Viro @ 2024-05-02 18:17 UTC (permalink / raw)
  To: John Groves
  Cc: Jonathan Corbet, Jonathan Cameron, Dan Williams, Vishal Verma,
	Dave Jiang, Christian Brauner, Jan Kara, Matthew Wilcox,
	linux-cxl, linux-fsdevel, nvdimm, John Groves, john,
	Dave Chinner, Christoph Hellwig, dave.hansen, gregory.price,
	Randy Dunlap, Jerome Glisse, Aravind Ramesh, Ajay Joshi,
	Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang

On Mon, Apr 29, 2024 at 12:04:23PM -0500, John Groves wrote:
> Famfs needs a slightly different kill_super variant than already existed.
> Putting it local to famfs would require exporting d_genocide(); this
> seemed a bit cleaner.

What's wrong with kill_litter_super()?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 08/12] famfs: module operations & fs_context
  2024-04-29 17:04 ` [RFC PATCH v2 08/12] famfs: module operations & fs_context John Groves
  2024-04-30 11:01   ` Christian Brauner
@ 2024-05-02 18:23   ` Al Viro
  2024-05-02 21:50     ` John Groves
  1 sibling, 1 reply; 32+ messages in thread
From: Al Viro @ 2024-05-02 18:23 UTC (permalink / raw)
  To: John Groves
  Cc: Jonathan Corbet, Jonathan Cameron, Dan Williams, Vishal Verma,
	Dave Jiang, Christian Brauner, Jan Kara, Matthew Wilcox,
	linux-cxl, linux-fsdevel, nvdimm, John Groves, john,
	Dave Chinner, Christoph Hellwig, dave.hansen, gregory.price,
	Randy Dunlap, Jerome Glisse, Aravind Ramesh, Ajay Joshi,
	Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang

On Mon, Apr 29, 2024 at 12:04:24PM -0500, John Groves wrote:
> +	case S_IFREG:
> +		inode->i_op = NULL /* famfs_file_inode_operations */;
> +		inode->i_fop = NULL /* &famfs_file_operations */;

Don't.  We should never, ever store NULL in either.  
	inode->i_op = &empty_iops;
	inode->i_fop = &no_open_fops;
in inode_init_always() is there precisely to avoid doing that.

IOW, the right thing would be something along the lines of
		/* inode->i_op = famfs_file_inode_operations */;
if you want a placeholder for a patch later in the series - or
simply /* methods will be set here in a commit or two */

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 10/12] famfs: Introduce file_operations read/write
  2024-04-29 17:04 ` [RFC PATCH v2 10/12] famfs: Introduce file_operations read/write John Groves
@ 2024-05-02 18:29   ` Al Viro
  2024-05-02 21:51     ` John Groves
  0 siblings, 1 reply; 32+ messages in thread
From: Al Viro @ 2024-05-02 18:29 UTC (permalink / raw)
  To: John Groves
  Cc: Jonathan Corbet, Jonathan Cameron, Dan Williams, Vishal Verma,
	Dave Jiang, Christian Brauner, Jan Kara, Matthew Wilcox,
	linux-cxl, linux-fsdevel, nvdimm, John Groves, john,
	Dave Chinner, Christoph Hellwig, dave.hansen, gregory.price,
	Randy Dunlap, Jerome Glisse, Aravind Ramesh, Ajay Joshi,
	Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang

On Mon, Apr 29, 2024 at 12:04:26PM -0500, John Groves wrote:
> +const struct file_operations famfs_file_operations = {
> +	.owner             = THIS_MODULE,

Not needed, unless you are planning something really weird
(using it for misc device, etc.)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 08/12] famfs: module operations & fs_context
  2024-05-02 18:23   ` Al Viro
@ 2024-05-02 21:50     ` John Groves
  0 siblings, 0 replies; 32+ messages in thread
From: John Groves @ 2024-05-02 21:50 UTC (permalink / raw)
  To: Al Viro
  Cc: Jonathan Corbet, Jonathan Cameron, Dan Williams, Vishal Verma,
	Dave Jiang, Christian Brauner, Jan Kara, Matthew Wilcox,
	linux-cxl, linux-fsdevel, nvdimm, John Groves, john,
	Dave Chinner, Christoph Hellwig, dave.hansen, gregory.price,
	Randy Dunlap, Jerome Glisse, Aravind Ramesh, Ajay Joshi,
	Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang

On 24/05/02 07:23PM, Al Viro wrote:
> On Mon, Apr 29, 2024 at 12:04:24PM -0500, John Groves wrote:
> > +	case S_IFREG:
> > +		inode->i_op = NULL /* famfs_file_inode_operations */;
> > +		inode->i_fop = NULL /* &famfs_file_operations */;
> 
> Don't.  We should never, ever store NULL in either.  
> 	inode->i_op = &empty_iops;
> 	inode->i_fop = &no_open_fops;
> in inode_init_always() is there precisely to avoid doing that.
> 
> IOW, the right thing would be something along the lines of
> 		/* inode->i_op = famfs_file_inode_operations */;
> if you want a placeholder for a patch later in the series - or
> simply /* methods will be set here in a commit or two */

OK, will do - thanks.

John


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 10/12] famfs: Introduce file_operations read/write
  2024-05-02 18:29   ` Al Viro
@ 2024-05-02 21:51     ` John Groves
  0 siblings, 0 replies; 32+ messages in thread
From: John Groves @ 2024-05-02 21:51 UTC (permalink / raw)
  To: Al Viro
  Cc: Jonathan Corbet, Jonathan Cameron, Dan Williams, Vishal Verma,
	Dave Jiang, Christian Brauner, Jan Kara, Matthew Wilcox,
	linux-cxl, linux-fsdevel, nvdimm, John Groves, john,
	Dave Chinner, Christoph Hellwig, dave.hansen, gregory.price,
	Randy Dunlap, Jerome Glisse, Aravind Ramesh, Ajay Joshi,
	Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang

On 24/05/02 07:29PM, Al Viro wrote:
> On Mon, Apr 29, 2024 at 12:04:26PM -0500, John Groves wrote:
> > +const struct file_operations famfs_file_operations = {
> > +	.owner             = THIS_MODULE,
> 
> Not needed, unless you are planning something really weird
> (using it for misc device, etc.)

Got it - thanks!

John


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 07/12] famfs prep: Add fs/super.c:kill_char_super()
  2024-05-02 18:17   ` Al Viro
@ 2024-05-02 22:25     ` John Groves
  2024-05-03  9:04       ` Christian Brauner
  0 siblings, 1 reply; 32+ messages in thread
From: John Groves @ 2024-05-02 22:25 UTC (permalink / raw)
  To: Al Viro
  Cc: Jonathan Corbet, Jonathan Cameron, Dan Williams, Vishal Verma,
	Dave Jiang, Christian Brauner, Jan Kara, Matthew Wilcox,
	linux-cxl, linux-fsdevel, nvdimm, John Groves, john,
	Dave Chinner, Christoph Hellwig, dave.hansen, gregory.price,
	Randy Dunlap, Jerome Glisse, Aravind Ramesh, Ajay Joshi,
	Eishan Mirakhur, Ravi Shankar, Srinivasulu Thanneeru,
	Luis Chamberlain, Amir Goldstein, Chandan Babu R, Bagas Sanjaya,
	Darrick J . Wong, Kent Overstreet, Steve French, Nathan Lynch,
	Michael Ellerman, Thomas Zimmermann, Julien Panis,
	Stanislav Fomichev, Dongsheng Yang

On 24/05/02 07:17PM, Al Viro wrote:
> On Mon, Apr 29, 2024 at 12:04:23PM -0500, John Groves wrote:
> > Famfs needs a slightly different kill_super variant than already existed.
> > Putting it local to famfs would require exporting d_genocide(); this
> > seemed a bit cleaner.
> 
> What's wrong with kill_litter_super()?

I struggled with that, I don't have my head fully around the superblock
handling code.

But when I replace kill_char_super() with kill_litter_super()...

- first mount works
- first umount works
- second mount works
- second umount does this (which I don't properly understand):

May 02 17:21:58 f39-dev1 kernel: ------------[ cut here ]------------
May 02 17:21:58 f39-dev1 kernel: ida_free called for id=1 which is not allocated.
May 02 17:21:58 f39-dev1 kernel: WARNING: CPU: 1 PID: 1173 at lib/idr.c:525 ida_free+0xe3/0x140
May 02 17:21:58 f39-dev1 kernel: Modules linked in: famfs rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace netfs qrtr rfkill snd_hda_codec_generic intel_rapl_msr sunrpc snd_hda_intel snd_intel_dspcfg intel_rapl_common snd_intel_sdw_acpi snd_hda_codec kmem snd_hda_core device_dax kvm_intel snd_hwdep iTCO_wdt kvm intel_pmc_bxt snd_seq iTCO_vendor_support dax_hmem snd_seq_device cxl_acpi cxl_core rapl snd_pcm snd_timer snd einj pcspkr soundcore i2c_i801 lpc_ich i2c_smbus vfat fat virtio_balloon joydev fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 virtio_net virtio_console net_failover virtio_gpu failover virtio_blk virtio_dma_buf serio_raw scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath qemu_fw_cfg
May 02 17:21:58 f39-dev1 kernel: CPU: 1 PID: 1173 Comm: umount Tainted: G        W          6.9.0-rc5+ #266
May 02 17:21:58 f39-dev1 kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230301gitf80f052277c8-26.fc38 03/01/2023
May 02 17:21:58 f39-dev1 kernel: RIP: 0010:ida_free+0xe3/0x140
May 02 17:21:58 f39-dev1 kernel: Code: 8d 7d a0 e8 9f 2e 02 00 eb 62 41 83 fe 3e 76 3c 48 8b 7d a0 4c 89 ee e8 5b 73 04 00 89 de 48 c7 c7 60 51 be 82 e8 3d 03 0b ff <0f> 0b 48 8b 45 d8 65 48 2b 04 25 28 00 00 00 75 3f 48 83 c4 40 5b
May 02 17:21:58 f39-dev1 kernel: RSP: 0018:ffffc90000c37c50 EFLAGS: 00010286
May 02 17:21:58 f39-dev1 kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
May 02 17:21:58 f39-dev1 kernel: RDX: 0000000000000002 RSI: 0000000000000027 RDI: 00000000ffffffff
May 02 17:21:58 f39-dev1 kernel: RBP: ffffc90000c37cb0 R08: 0000000000000000 R09: 0000000000000003
May 02 17:21:58 f39-dev1 kernel: R10: ffffc90000c37aa0 R11: ffffffff82f3c3a8 R12: 00c7fffffffffffc
May 02 17:21:58 f39-dev1 kernel: R13: 0000000000000202 R14: 0000000000000001 R15: 0000000000000000
May 02 17:21:58 f39-dev1 kernel: FS:  00007f0ff81c0800(0000) GS:ffff88886fc80000(0000) knlGS:0000000000000000
May 02 17:21:58 f39-dev1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 02 17:21:58 f39-dev1 kernel: CR2: 00007f6b841c95a8 CR3: 00000001254c8001 CR4: 0000000000170ef0
May 02 17:21:58 f39-dev1 kernel: Call Trace:
May 02 17:21:58 f39-dev1 kernel:  <TASK>
May 02 17:21:58 f39-dev1 kernel:  ? show_regs+0x64/0x70
May 02 17:21:58 f39-dev1 kernel:  ? __warn+0x88/0x130
May 02 17:21:58 f39-dev1 kernel:  ? ida_free+0xe3/0x140
May 02 17:21:58 f39-dev1 kernel:  ? report_bug+0x192/0x1c0
May 02 17:21:58 f39-dev1 kernel:  ? handle_bug+0x44/0x90
May 02 17:21:58 f39-dev1 kernel:  ? exc_invalid_op+0x18/0x70
May 02 17:21:58 f39-dev1 kernel:  ? asm_exc_invalid_op+0x1b/0x20
May 02 17:21:58 f39-dev1 kernel:  ? ida_free+0xe3/0x140
May 02 17:21:58 f39-dev1 kernel:  kill_litter_super+0x4c/0x60
May 02 17:21:58 f39-dev1 kernel:  famfs_kill_sb+0x57/0x60 [famfs]
May 02 17:21:58 f39-dev1 kernel:  deactivate_locked_super+0x35/0xb0
May 02 17:21:58 f39-dev1 kernel:  deactivate_super+0x40/0x50
May 02 17:21:58 f39-dev1 kernel:  cleanup_mnt+0xc3/0x160
May 02 17:21:58 f39-dev1 kernel:  __cleanup_mnt+0x12/0x20
May 02 17:21:58 f39-dev1 kernel:  task_work_run+0x60/0x90
May 02 17:21:58 f39-dev1 kernel:  syscall_exit_to_user_mode+0x21a/0x220
May 02 17:21:58 f39-dev1 kernel:  do_syscall_64+0x8d/0x180
May 02 17:21:58 f39-dev1 kernel:  ? do_faccessat+0x1b8/0x2e0
May 02 17:21:58 f39-dev1 kernel:  ? syscall_exit_to_user_mode+0x7c/0x220
May 02 17:21:58 f39-dev1 kernel:  ? do_syscall_64+0x8d/0x180
May 02 17:21:58 f39-dev1 kernel:  ? syscall_exit_to_user_mode+0x7c/0x220
May 02 17:21:58 f39-dev1 kernel:  ? do_syscall_64+0x8d/0x180
May 02 17:21:58 f39-dev1 kernel:  ? do_syscall_64+0x8d/0x180
May 02 17:21:58 f39-dev1 kernel:  ? do_user_addr_fault+0x315/0x6e0
May 02 17:21:58 f39-dev1 kernel:  ? irqentry_exit_to_user_mode+0x71/0x220
May 02 17:21:58 f39-dev1 kernel:  ? irqentry_exit+0x3b/0x50
May 02 17:21:58 f39-dev1 kernel:  ? exc_page_fault+0x90/0x190
May 02 17:21:58 f39-dev1 kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
May 02 17:21:58 f39-dev1 kernel: RIP: 0033:0x7f0ff83df41b
May 02 17:21:58 f39-dev1 kernel: Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e1 19 0c 00 f7 d8
May 02 17:21:58 f39-dev1 kernel: RSP: 002b:00007fffe039cfd8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
May 02 17:21:58 f39-dev1 kernel: RAX: 0000000000000000 RBX: 0000555ad6c2fb90 RCX: 00007f0ff83df41b
May 02 17:21:58 f39-dev1 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000555ad6c34ba0
May 02 17:21:58 f39-dev1 kernel: RBP: 00007fffe039d0b0 R08: 0000000000000020 R09: 0000000000000001
May 02 17:21:58 f39-dev1 kernel: R10: 0000000000000004 R11: 0000000000000246 R12: 0000555ad6c2fc90
May 02 17:21:58 f39-dev1 kernel: R13: 0000000000000000 R14: 0000555ad6c34ba0 R15: 0000555ad6c2ffa0
May 02 17:21:58 f39-dev1 kernel:  </TASK>
May 02 17:21:58 f39-dev1 kernel: ---[ end trace 0000000000000000 ]---


With kill_char_super(), it can mount and dismount for days with no
issues that I have seen.

Thanks,
John



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 07/12] famfs prep: Add fs/super.c:kill_char_super()
  2024-05-02 22:25     ` John Groves
@ 2024-05-03  9:04       ` Christian Brauner
  2024-05-03 15:38         ` John Groves
  0 siblings, 1 reply; 32+ messages in thread
From: Christian Brauner @ 2024-05-03  9:04 UTC (permalink / raw)
  To: John Groves
  Cc: Al Viro, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Jan Kara, Matthew Wilcox, linux-cxl,
	linux-fsdevel, nvdimm, John Groves, john, Dave Chinner,
	Christoph Hellwig, dave.hansen, gregory.price, Randy Dunlap,
	Jerome Glisse, Aravind Ramesh, Ajay Joshi, Eishan Mirakhur,
	Ravi Shankar, Srinivasulu Thanneeru, Luis Chamberlain,
	Amir Goldstein, Chandan Babu R, Bagas Sanjaya, Darrick J . Wong,
	Kent Overstreet, Steve French, Nathan Lynch, Michael Ellerman,
	Thomas Zimmermann, Julien Panis, Stanislav Fomichev,
	Dongsheng Yang

On Thu, May 02, 2024 at 05:25:33PM -0500, John Groves wrote:
> On 24/05/02 07:17PM, Al Viro wrote:
> > On Mon, Apr 29, 2024 at 12:04:23PM -0500, John Groves wrote:
> > > Famfs needs a slightly different kill_super variant than already existed.
> > > Putting it local to famfs would require exporting d_genocide(); this
> > > seemed a bit cleaner.
> > 
> > What's wrong with kill_litter_super()?
> 
> I struggled with that, I don't have my head fully around the superblock
> handling code.

Fyi, see my other mail where I point out what's wrong and one way to fix it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 08/12] famfs: module operations & fs_context
  2024-04-30 11:01   ` Christian Brauner
  2024-05-02 15:51     ` John Groves
@ 2024-05-03 14:15     ` John Groves
  1 sibling, 0 replies; 32+ messages in thread
From: John Groves @ 2024-05-03 14:15 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jonathan Corbet, Jonathan Cameron, Dan Williams, Vishal Verma,
	Dave Jiang, Alexander Viro, Jan Kara, Matthew Wilcox, linux-cxl,
	linux-fsdevel, nvdimm, John Groves, john, Dave Chinner,
	Christoph Hellwig, dave.hansen, gregory.price, Randy Dunlap,
	Jerome Glisse, Aravind Ramesh, Ajay Joshi, Eishan Mirakhur,
	Ravi Shankar, Srinivasulu Thanneeru, Luis Chamberlain,
	Amir Goldstein, Chandan Babu R, Bagas Sanjaya, Darrick J . Wong,
	Kent Overstreet, Steve French, Nathan Lynch, Michael Ellerman,
	Thomas Zimmermann, Julien Panis, Stanislav Fomichev,
	Dongsheng Yang

On 24/04/30 01:01PM, Christian Brauner wrote:
> On Mon, Apr 29, 2024 at 12:04:24PM -0500, John Groves wrote:
> > Start building up from the famfs module operations. This commit
> > includes the following:
> > 
> > * Register as a file system
> > * Parse mount parameters
> > * Allocate or find (and initialize) a superblock via famfs_get_tree()
> > * Lookup the host dax device, and bail if it's in use (or not dax)
> > * Register as the holder of the dax device if it's available
> > * Add Kconfig and Makefile misc to build famfs
> > * Add FAMFS_SUPER_MAGIC to include/uapi/linux/magic.h
> > * Add export of fs/namei.c:may_open_dev(), which famfs needs to call
> > * Update MAINTAINERS file for the fs/famfs/ path
> > 
> > The following exports had to happen to enable famfs:
> > 
> > * This uses the new fs/super.c:kill_char_super() - the other kill*super
> >   helpers were not quite right.
> > * This uses the dev_dax_iomap export of dax_dev_get()
> > 
> > This commit builds but is otherwise too incomplete to run
> > 
> > Signed-off-by: John Groves <john@groves.net>
> > ---
> >  MAINTAINERS                |   1 +
> >  fs/Kconfig                 |   2 +
> >  fs/Makefile                |   1 +
> >  fs/famfs/Kconfig           |  10 ++
> >  fs/famfs/Makefile          |   5 +
> >  fs/famfs/famfs_inode.c     | 345 +++++++++++++++++++++++++++++++++++++
> >  fs/famfs/famfs_internal.h  |  36 ++++
> >  fs/namei.c                 |   1 +
> >  include/uapi/linux/magic.h |   1 +
> >  9 files changed, 402 insertions(+)
> >  create mode 100644 fs/famfs/Kconfig
> >  create mode 100644 fs/famfs/Makefile
> >  create mode 100644 fs/famfs/famfs_inode.c
> >  create mode 100644 fs/famfs/famfs_internal.h
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 3f2d847dcf01..365d678e2f40 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -8188,6 +8188,7 @@ L:	linux-cxl@vger.kernel.org
> >  L:	linux-fsdevel@vger.kernel.org
> >  S:	Supported
> >  F:	Documentation/filesystems/famfs.rst
> > +F:	fs/famfs
> >  
> >  FANOTIFY
> >  M:	Jan Kara <jack@suse.cz>
> > diff --git a/fs/Kconfig b/fs/Kconfig
> > index a46b0cbc4d8f..53b4629e92a0 100644
> > --- a/fs/Kconfig
> > +++ b/fs/Kconfig
> > @@ -140,6 +140,8 @@ source "fs/autofs/Kconfig"
> >  source "fs/fuse/Kconfig"
> >  source "fs/overlayfs/Kconfig"
> >  
> > +source "fs/famfs/Kconfig"
> > +
> >  menu "Caches"
> >  
> >  source "fs/netfs/Kconfig"
> > diff --git a/fs/Makefile b/fs/Makefile
> > index 6ecc9b0a53f2..3393f399a9e9 100644
> > --- a/fs/Makefile
> > +++ b/fs/Makefile
> > @@ -129,3 +129,4 @@ obj-$(CONFIG_EFIVAR_FS)		+= efivarfs/
> >  obj-$(CONFIG_EROFS_FS)		+= erofs/
> >  obj-$(CONFIG_VBOXSF_FS)		+= vboxsf/
> >  obj-$(CONFIG_ZONEFS_FS)		+= zonefs/
> > +obj-$(CONFIG_FAMFS)             += famfs/
> > diff --git a/fs/famfs/Kconfig b/fs/famfs/Kconfig
> > new file mode 100644
> > index 000000000000..edb8980820f7
> > --- /dev/null
> > +++ b/fs/famfs/Kconfig
> > @@ -0,0 +1,10 @@
> > +
> > +
> > +config FAMFS
> > +       tristate "famfs: shared memory file system"
> > +       depends on DEV_DAX && FS_DAX && DEV_DAX_IOMAP
> > +       help
> > +	  Support for the famfs file system. Famfs is a dax file system that
> > +	  can support scale-out shared access to fabric-attached memory
> > +	  (e.g. CXL shared memory). Famfs is not a general purpose file system;
> > +	  it is an enabler for data sets in shared memory.
> > diff --git a/fs/famfs/Makefile b/fs/famfs/Makefile
> > new file mode 100644
> > index 000000000000..62230bcd6793
> > --- /dev/null
> > +++ b/fs/famfs/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +obj-$(CONFIG_FAMFS) += famfs.o
> > +
> > +famfs-y := famfs_inode.o
> > diff --git a/fs/famfs/famfs_inode.c b/fs/famfs/famfs_inode.c
> > new file mode 100644
> > index 000000000000..61306240fc0b
> > --- /dev/null
> > +++ b/fs/famfs/famfs_inode.c
> > @@ -0,0 +1,345 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * famfs - dax file system for shared fabric-attached memory
> > + *
> > + * Copyright 2023-2024 Micron Technology, inc
> > + *
> > + * This file system, originally based on ramfs the dax support from xfs,
> > + * is intended to allow multiple host systems to mount a common file system
> > + * view of dax files that map to shared memory.
> > + */
> > +
> > +#include <linux/fs.h>
> > +#include <linux/time.h>
> > +#include <linux/init.h>
> > +#include <linux/string.h>
> > +#include <linux/parser.h>
> > +#include <linux/magic.h>
> > +#include <linux/slab.h>
> > +#include <linux/fs_context.h>
> > +#include <linux/fs_parser.h>
> > +#include <linux/dax.h>
> > +#include <linux/hugetlb.h>
> > +#include <linux/iomap.h>
> > +#include <linux/path.h>
> > +#include <linux/namei.h>
> > +
> > +#include "famfs_internal.h"
> > +
> > +#define FAMFS_DEFAULT_MODE	0755
> > +
> > +static struct inode *famfs_get_inode(struct super_block *sb,
> > +				     const struct inode *dir,
> > +				     umode_t mode, dev_t dev)
> > +{
> > +	struct inode *inode = new_inode(sb);
> > +	struct timespec64 tv;
> > +
> > +	if (!inode)
> > +		return NULL;
> > +
> > +	inode->i_ino = get_next_ino();
> > +	inode_init_owner(&nop_mnt_idmap, inode, dir, mode);
> > +	inode->i_mapping->a_ops = &ram_aops;
> > +	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
> > +	mapping_set_unevictable(inode->i_mapping);
> > +	tv = inode_set_ctime_current(inode);
> > +	inode_set_mtime_to_ts(inode, tv);
> > +	inode_set_atime_to_ts(inode, tv);
> > +
> > +	switch (mode & S_IFMT) {
> > +	default:
> > +		init_special_inode(inode, mode, dev);
> > +		break;
> > +	case S_IFREG:
> > +		inode->i_op = NULL /* famfs_file_inode_operations */;
> > +		inode->i_fop = NULL /* &famfs_file_operations */;
> > +		break;
> > +	case S_IFDIR:
> > +		inode->i_op = NULL /* famfs_dir_inode_operations */;
> > +		inode->i_fop = &simple_dir_operations;
> > +
> > +		/* Directory inodes start off with i_nlink == 2 (for ".") */
> > +		inc_nlink(inode);
> > +		break;
> > +	case S_IFLNK:
> > +		inode->i_op = &page_symlink_inode_operations;
> > +		inode_nohighmem(inode);
> > +		break;
> > +	}
> > +	return inode;
> > +}
> > +
> > +/*
> > + * famfs dax_operations  (for char dax)
> > + */
> > +static int
> > +famfs_dax_notify_failure(struct dax_device *dax_dev, u64 offset,
> > +			u64 len, int mf_flags)
> > +{
> > +	struct super_block *sb = dax_holder(dax_dev);
> > +	struct famfs_fs_info *fsi = sb->s_fs_info;
> > +
> > +	pr_err("%s: rootdev=%s offset=%lld len=%llu flags=%x\n", __func__,
> > +	       fsi->rootdev, offset, len, mf_flags);
> > +
> > +	return 0;
> > +}
> > +
> > +static const struct dax_holder_operations famfs_dax_holder_ops = {
> > +	.notify_failure		= famfs_dax_notify_failure,
> > +};
> > +
> > +/*****************************************************************************
> > + * fs_context_operations
> > + */
> > +
> > +static int
> > +famfs_fill_super(struct super_block *sb, struct fs_context *fc)
> > +{
> > +	int rc = 0;
> > +
> > +	sb->s_maxbytes		= MAX_LFS_FILESIZE;
> > +	sb->s_blocksize		= PAGE_SIZE;
> > +	sb->s_blocksize_bits	= PAGE_SHIFT;
> > +	sb->s_magic		= FAMFS_SUPER_MAGIC;
> > +	sb->s_op		= NULL /* famfs_super_ops */;
> > +	sb->s_time_gran		= 1;
> > +
> > +	return rc;
> > +}
> > +
> > +static int
> > +lookup_daxdev(const char *pathname, dev_t *devno)
> > +{
> > +	struct inode *inode;
> > +	struct path path;
> > +	int err;
> > +
> > +	if (!pathname || !*pathname)
> > +		return -EINVAL;
> > +
> > +	err = kern_path(pathname, LOOKUP_FOLLOW, &path);
> > +	if (err)
> > +		return err;
> > +
> > +	inode = d_backing_inode(path.dentry);
> > +	if (!S_ISCHR(inode->i_mode)) {
> > +		err = -EINVAL;
> > +		goto out_path_put;
> > +	}
> > +
> > +	if (!may_open_dev(&path)) { /* had to export this */
> > +		err = -EACCES;
> > +		goto out_path_put;
> > +	}
> > +
> > +	 /* if it's dax, i_rdev is struct dax_device */
> > +	*devno = inode->i_rdev;
> > +
> > +out_path_put:
> > +	path_put(&path);
> > +	return err;
> > +}
> > +
> > +static int
> > +famfs_get_tree(struct fs_context *fc)
> > +{
> > +	struct famfs_fs_info *fsi = fc->s_fs_info;
> > +	struct dax_device *dax_devp;
> > +	struct super_block *sb;
> > +	struct inode *inode;
> > +	dev_t daxdevno;
> > +	int err;
> > +
> > +	/* TODO: clean up chatty messages */
> > +
> > +	err = lookup_daxdev(fc->source, &daxdevno);
> > +	if (err)
> > +		return err;
> > +
> > +	fsi->daxdevno = daxdevno;
> > +
> > +	/* This will set sb->s_dev=daxdevno */
> > +	sb = sget_dev(fc, daxdevno);
> 
> This will open the dax device as a block device. However, nothing in
> your ->kill_sb method or kill_char_super() closes it again. So you're
> leaking block device references and leaving unitialized memory around as
> you've claimed that device but never ended your claim.
> 
> > +	if (IS_ERR(sb)) {
> > +		pr_err("%s: sget_dev error\n", __func__);
> > +		return PTR_ERR(sb);
> > +	}
> > +
> > +	if (sb->s_root) {
> > +		pr_info("%s: found a matching suerblock for %s\n",
> > +			__func__, fc->source);
> > +
> > +		/* We don't expect to find a match by dev_t; if we do, it must
> > +		 * already be mounted, so we bail
> > +		 */
> > +		err = -EBUSY;
> > +		goto deactivate_out;
> > +	} else {
> > +		pr_info("%s: initializing new superblock for %s\n",
> > +			__func__, fc->source);
> > +		err = famfs_fill_super(sb, fc);
> > +		if (err)
> > +			goto deactivate_out;
> > +	}
> > +
> > +	/* This will fail if it's not a dax device */
> > +	dax_devp = dax_dev_get(daxdevno);
> > +	if (!dax_devp) {
> > +		pr_warn("%s: device %s not found or not dax\n",
> > +		       __func__, fc->source);
> > +		err = -ENODEV;
> > +		goto deactivate_out;
> > +	}
> > +
> > +	err = fs_dax_get(dax_devp, sb, &famfs_dax_holder_ops);
> > +	if (err) {
> > +		pr_err("%s: fs_dax_get(%lld) failed\n", __func__, (u64)daxdevno);
> > +		err = -EBUSY;
> > +		goto deactivate_out;
> > +	}
> > +	fsi->dax_devp = dax_devp;
> > +
> > +	inode = famfs_get_inode(sb, NULL, S_IFDIR | fsi->mount_opts.mode, 0);
> > +	sb->s_root = d_make_root(inode);
> > +	if (!sb->s_root) {
> > +		pr_err("%s: d_make_root() failed\n", __func__);
> > +		err = -ENOMEM;
> > +		fs_put_dax(fsi->dax_devp, sb);
> > +		goto deactivate_out;
> > +	}
> > +
> > +	sb->s_flags |= SB_ACTIVE;
> > +
> > +	WARN_ON(fc->root);
> > +	fc->root = dget(sb->s_root);
> > +	return err;
> > +
> > +deactivate_out:
> > +	pr_debug("%s: deactivating sb=%llx\n", __func__, (u64)sb);
> > +	deactivate_locked_super(sb);
> > +	return err;
> > +}
> > +
> > +/*****************************************************************************/
> > +
> > +enum famfs_param {
> > +	Opt_mode,
> > +	Opt_dax,
> > +};
> > +
> > +const struct fs_parameter_spec famfs_fs_parameters[] = {
> > +	fsparam_u32oct("mode",	  Opt_mode),
> > +	fsparam_string("dax",     Opt_dax),
> > +	{}
> > +};
> > +
> > +static int famfs_parse_param(struct fs_context *fc, struct fs_parameter *param)
> > +{
> > +	struct famfs_fs_info *fsi = fc->s_fs_info;
> > +	struct fs_parse_result result;
> > +	int opt;
> > +
> > +	opt = fs_parse(fc, famfs_fs_parameters, param, &result);
> > +	if (opt == -ENOPARAM) {
> > +		opt = vfs_parse_fs_param_source(fc, param);
> > +		if (opt != -ENOPARAM)
> > +			return opt;
> 
> This shouldn't be needed. The VFS will handle all that for you.
> 
> > +
> > +		return 0;
> > +	}
> > +	if (opt < 0)
> > +		return opt;
> > +
> > +	switch (opt) {
> > +	case Opt_mode:
> > +		fsi->mount_opts.mode = result.uint_32 & S_IALLUGO;
> > +		break;
> > +	case Opt_dax:
> > +		if (strcmp(param->string, "always"))
> > +			pr_notice("%s: invalid dax mode %s\n",
> > +				  __func__, param->string);
> > +		break;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static void famfs_free_fc(struct fs_context *fc)
> > +{
> > +	struct famfs_fs_info *fsi = fc->s_fs_info;
> > +
> > +	if (fsi && fsi->rootdev)
> > +		kfree(fsi->rootdev);
> 
> Dead code since rootdev is unused an unset?
> 
> > +
> > +	kfree(fsi);
> > +}
> > +
> > +static const struct fs_context_operations famfs_context_ops = {
> > +	.free		= famfs_free_fc,
> > +	.parse_param	= famfs_parse_param,
> > +	.get_tree	= famfs_get_tree,
> > +};
> > +
> > +static int famfs_init_fs_context(struct fs_context *fc)
> > +{
> > +	struct famfs_fs_info *fsi;
> > +
> > +	fsi = kzalloc(sizeof(*fsi), GFP_KERNEL);
> > +	if (!fsi)
> > +		return -ENOMEM;
> > +
> > +	fsi->mount_opts.mode = FAMFS_DEFAULT_MODE;
> > +	fc->s_fs_info        = fsi;
> > +	fc->ops              = &famfs_context_ops;
> > +	return 0;
> > +}
> > +
> > +static void famfs_kill_sb(struct super_block *sb)
> > +{
> > +	struct famfs_fs_info *fsi = sb->s_fs_info;
> > +
> > +	if (fsi->dax_devp)
> > +		fs_put_dax(fsi->dax_devp, sb);
> > +	if (fsi && fsi->rootdev)
> > +		kfree(fsi->rootdev);
> > +	kfree(fsi);
> > +	sb->s_fs_info = NULL;
> > +
> > +	kill_char_super(sb); /* new */
> > +}
> 
> Can likely just be
> 
> static void famfs_kill_sb(struct super_block *sb)
> {
> 	struct famfs_fs_info *fsi = sb->s_fs_info;
> 
> 	generic_shutdown_super(sb);
> 
>         if (sb->s_bdev_file)
> 		bdev_fput(sb->s_bdev_file);
> 
> 	if (fsi->dax_devp)
> 		fs_put_dax(fsi->dax_devp, sb);
> 
> 	kfree(fsi);
> }
> 
> and then you don't need any custom helpers at all.

I replaced famfs_kill_sb() with this function.  On [the first] umount, I
get one of these dentry bugs for each file in the file system:


    May 03 07:27:03 f39-dev1 kernel: ------------[ cut here ]------------
    May 03 07:27:03 f39-dev1 kernel: BUG: Dentry 0000000033362594{i=217d,n=smoke_loop4.log}  still in use (1) [unmount of famfs famfs]
    May 03 07:27:03 f39-dev1 kernel: WARNING: CPU: 0 PID: 1138 at fs/dcache.c:1524 umount_check+0x56/0x70
    May 03 07:27:03 f39-dev1 kernel: Modules linked in: famfs rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace netfs qrtr rfkill intel_rapl_msr sunrpc snd_hda_codec_generic intel_rapl_common snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi kvm_intel snd_hda_codec kmem snd_hda_core iTCO_wdt intel_pmc_bxt snd_hwdep device_dax iTCO_vendor_support kvm snd_seq snd_seq_device rapl dax_hmem cxl_acpi snd_pcm cxl_core i2c_i801 snd_timer einj pcspkr i2c_smbus snd lpc_ich soundcore virtio_balloon joydev vfat fat fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 virtio_blk virtio_console virtio_gpu virtio_net net_failover virtio_dma_buf failover serio_raw scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath qemu_fw_cfg
    May 03 07:27:03 f39-dev1 kernel: CPU: 0 PID: 1138 Comm: umount Tainted: G        W          6.9.0-rc5+ #266
    May 03 07:27:03 f39-dev1 kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230301gitf80f052277c8-26.fc38 03/01/2023
    May 03 07:27:03 f39-dev1 kernel: RIP: 0010:umount_check+0x56/0x70
    May 03 07:27:03 f39-dev1 kernel: Code: 03 00 00 48 8b 40 28 48 89 e5 4c 8b 08 48 8b 46 30 48 85 c0 74 04 48 8b 50 40 51 48 c7 c7 b0 a6 ae 82 48 89 f1 e8 ba 56 c4 ff <0f> 0b 58 31 c0 c9 c3 cc cc cc cc 41 83 f8 01 75 ba eb a8 0f 1f 80
    May 03 07:27:03 f39-dev1 kernel: RSP: 0018:ffffc90000717bd0 EFLAGS: 00010282
    May 03 07:27:03 f39-dev1 kernel: RAX: 0000000000000000 RBX: 0000000000000f34 RCX: 0000000000000000
    May 03 07:27:03 f39-dev1 kernel: RDX: 0000000000000004 RSI: ffffffff82b1c111 RDI: 00000000ffffffff
    May 03 07:27:03 f39-dev1 kernel: RBP: ffffc90000717bd8 R08: 0000000000000000 R09: 0000000000000003
    May 03 07:27:03 f39-dev1 kernel: R10: ffffc90000717a20 R11: ffffffff82f3c3a8 R12: ffff8881007be840
    May 03 07:27:03 f39-dev1 kernel: R13: ffffffff814d8ae0 R14: ffff8881007be8a0 R15: ffff88810d82ab40
    May 03 07:27:03 f39-dev1 kernel: FS:  00007f3163f71800(0000) GS:ffff88886fc00000(0000) knlGS:0000000000000000
    May 03 07:27:03 f39-dev1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    May 03 07:27:03 f39-dev1 kernel: CR2: 00007fff84673f78 CR3: 000000010c338002 CR4: 0000000000170ef0
    May 03 07:27:03 f39-dev1 kernel: Call Trace:
    May 03 07:27:03 f39-dev1 kernel:  <TASK>
    May 03 07:27:03 f39-dev1 kernel:  ? show_regs+0x64/0x70
    May 03 07:27:03 f39-dev1 kernel:  ? __warn+0x88/0x130
    May 03 07:27:03 f39-dev1 kernel:  ? umount_check+0x56/0x70
    May 03 07:27:03 f39-dev1 kernel:  ? report_bug+0x192/0x1c0
    May 03 07:27:03 f39-dev1 kernel:  ? handle_bug+0x44/0x90
    May 03 07:27:03 f39-dev1 kernel:  ? exc_invalid_op+0x18/0x70
    May 03 07:27:03 f39-dev1 kernel:  ? asm_exc_invalid_op+0x1b/0x20
    May 03 07:27:03 f39-dev1 kernel:  ? __pfx_umount_check+0x10/0x10
    May 03 07:27:03 f39-dev1 kernel:  ? umount_check+0x56/0x70
    May 03 07:27:03 f39-dev1 kernel:  d_walk+0xc3/0x280
    May 03 07:27:03 f39-dev1 kernel:  shrink_dcache_for_umount+0x4e/0x130
    May 03 07:27:03 f39-dev1 kernel:  generic_shutdown_super+0x1f/0x120
    May 03 07:27:03 f39-dev1 kernel:  famfs_kill_sb+0x1b/0x70 [famfs]
    May 03 07:27:03 f39-dev1 kernel:  deactivate_locked_super+0x35/0xb0
    May 03 07:27:03 f39-dev1 kernel:  deactivate_super+0x40/0x50
    May 03 07:27:03 f39-dev1 kernel:  cleanup_mnt+0xc3/0x160
    May 03 07:27:03 f39-dev1 kernel:  __cleanup_mnt+0x12/0x20
    May 03 07:27:03 f39-dev1 kernel:  task_work_run+0x60/0x90
    May 03 07:27:03 f39-dev1 kernel:  syscall_exit_to_user_mode+0x21a/0x220
    May 03 07:27:03 f39-dev1 kernel:  do_syscall_64+0x8d/0x180
    May 03 07:27:03 f39-dev1 kernel:  ? mntput+0x24/0x40
    May 03 07:27:03 f39-dev1 kernel:  ? path_put+0x1e/0x30
    May 03 07:27:03 f39-dev1 kernel:  ? do_faccessat+0x1b8/0x2e0
    May 03 07:27:03 f39-dev1 kernel:  ? syscall_exit_to_user_mode+0x7c/0x220
    May 03 07:27:03 f39-dev1 kernel:  ? do_syscall_64+0x8d/0x180
    May 03 07:27:03 f39-dev1 kernel:  ? putname+0x55/0x70
    May 03 07:27:03 f39-dev1 kernel:  ? syscall_exit_to_user_mode+0x7c/0x220
    May 03 07:27:03 f39-dev1 kernel:  ? do_syscall_64+0x8d/0x180
    May 03 07:27:03 f39-dev1 kernel:  ? do_user_addr_fault+0x315/0x6e0
    May 03 07:27:03 f39-dev1 kernel:  ? irqentry_exit_to_user_mode+0x71/0x220
    May 03 07:27:03 f39-dev1 kernel:  ? irqentry_exit+0x3b/0x50
    May 03 07:27:03 f39-dev1 kernel:  ? exc_page_fault+0x90/0x190
    May 03 07:27:03 f39-dev1 kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
    May 03 07:27:03 f39-dev1 kernel: RIP: 0033:0x7f316419041b
    May 03 07:27:03 f39-dev1 kernel: Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e1 19 0c 00 f7 d8
    May 03 07:27:03 f39-dev1 kernel: RSP: 002b:00007fff84675728 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
    May 03 07:27:03 f39-dev1 kernel: RAX: 0000000000000000 RBX: 00005648cab2eb90 RCX: 00007f316419041b
    May 03 07:27:03 f39-dev1 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005648cab33ba0
    May 03 07:27:03 f39-dev1 kernel: RBP: 00007fff84675800 R08: 0000000000000020 R09: 0000000000000001
    May 03 07:27:03 f39-dev1 kernel: R10: 0000000000000004 R11: 0000000000000246 R12: 00005648cab2ec90
    May 03 07:27:03 f39-dev1 kernel: R13: 0000000000000000 R14: 00005648cab33ba0 R15: 00005648cab2efa0
    May 03 07:27:03 f39-dev1 kernel:  </TASK>
    May 03 07:27:03 f39-dev1 kernel: ---[ end trace 0000000000000000 ]---

After one of the above for every file:

    May 03 07:27:03 f39-dev1 kernel: VFS: Busy inodes after unmount of famfs (famfs)

    May 03 07:27:03 f39-dev1 kernel: ------------[ cut here ]------------
    May 03 07:27:03 f39-dev1 kernel: kernel BUG at fs/super.c:649!
    May 03 07:27:03 f39-dev1 kernel: invalid opcode: 0000 [#1] PREEMPT SMP PTI
    May 03 07:27:03 f39-dev1 kernel: CPU: 3 PID: 1138 Comm: umount Tainted: G        W          6.9.0-rc5+ #266
    May 03 07:27:03 f39-dev1 kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230301gitf80f052277c8-26.fc38 03/01/2023
    May 03 07:27:03 f39-dev1 kernel: RIP: 0010:generic_shutdown_super+0x112/0x120
    May 03 07:27:03 f39-dev1 kernel: Code: cc cc e8 e1 4f f0 ff 48 8b bb 00 01 00 00 eb d9 48 8b 43 28 48 8d b3 c0 03 00 00 48 c7 c7 c0 98 ae 82 48 8b 10 e8 5e 2d d0 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
    May 03 07:27:03 f39-dev1 kernel: RSP: 0018:ffffc90000717c70 EFLAGS: 00010246
    May 03 07:27:03 f39-dev1 kernel: RAX: 000000000000002f RBX: ffff8881215d5000 RCX: 0000000000000000
    May 03 07:27:03 f39-dev1 kernel: RDX: 0000000000000000 RSI: ffffffff82b1c111 RDI: 00000000ffffffff
    May 03 07:27:03 f39-dev1 kernel: RBP: ffffc90000717c80 R08: 0000000000000000 R09: 0000000000000003
    May 03 07:27:03 f39-dev1 kernel: R10: ffffc90000717ad8 R11: ffffffff82f3c3a8 R12: ffffffffa0cf4380
    May 03 07:27:03 f39-dev1 kernel: R13: ffff888124ed359c R14: 0000000000000000 R15: 0000000000000000
    May 03 07:27:03 f39-dev1 kernel: FS:  00007f3163f71800(0000) GS:ffff88886fd80000(0000) knlGS:0000000000000000
    May 03 07:27:03 f39-dev1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    May 03 07:27:03 f39-dev1 kernel: CR2: 000055fd5fc6a370 CR3: 000000010c338003 CR4: 0000000000170ef0
    May 03 07:27:03 f39-dev1 kernel: Call Trace:
    May 03 07:27:03 f39-dev1 kernel:  <TASK>
    May 03 07:27:03 f39-dev1 kernel:  ? show_regs+0x64/0x70
    May 03 07:27:03 f39-dev1 kernel:  ? die+0x37/0x90
    May 03 07:27:03 f39-dev1 kernel:  ? do_trap+0xca/0xe0
    May 03 07:27:03 f39-dev1 kernel:  ? do_error_trap+0x73/0xa0
    May 03 07:27:03 f39-dev1 kernel:  ? generic_shutdown_super+0x112/0x120
    May 03 07:27:03 f39-dev1 kernel:  ? exc_invalid_op+0x52/0x70
    May 03 07:27:03 f39-dev1 kernel:  ? generic_shutdown_super+0x112/0x120
    May 03 07:27:03 f39-dev1 kernel:  ? asm_exc_invalid_op+0x1b/0x20
    May 03 07:27:03 f39-dev1 kernel:  ? generic_shutdown_super+0x112/0x120
    May 03 07:27:03 f39-dev1 kernel:  famfs_kill_sb+0x1b/0x70 [famfs]
    May 03 07:27:03 f39-dev1 kernel:  deactivate_locked_super+0x35/0xb0
    May 03 07:27:03 f39-dev1 kernel:  deactivate_super+0x40/0x50
    May 03 07:27:03 f39-dev1 kernel:  cleanup_mnt+0xc3/0x160
    May 03 07:27:03 f39-dev1 kernel:  __cleanup_mnt+0x12/0x20
    May 03 07:27:03 f39-dev1 kernel:  task_work_run+0x60/0x90
    May 03 07:27:03 f39-dev1 kernel:  syscall_exit_to_user_mode+0x21a/0x220
    May 03 07:27:03 f39-dev1 kernel:  do_syscall_64+0x8d/0x180
    May 03 07:27:03 f39-dev1 kernel:  ? mntput+0x24/0x40
    May 03 07:27:03 f39-dev1 kernel:  ? path_put+0x1e/0x30
    May 03 07:27:03 f39-dev1 kernel:  ? do_faccessat+0x1b8/0x2e0
    May 03 07:27:03 f39-dev1 kernel:  ? syscall_exit_to_user_mode+0x7c/0x220
    May 03 07:27:03 f39-dev1 kernel:  ? do_syscall_64+0x8d/0x180
    May 03 07:27:03 f39-dev1 kernel:  ? putname+0x55/0x70
    May 03 07:27:03 f39-dev1 kernel:  ? syscall_exit_to_user_mode+0x7c/0x220
    May 03 07:27:03 f39-dev1 kernel:  ? do_syscall_64+0x8d/0x180
    May 03 07:27:03 f39-dev1 kernel:  ? do_user_addr_fault+0x315/0x6e0
    May 03 07:27:03 f39-dev1 kernel:  ? irqentry_exit_to_user_mode+0x71/0x220
    May 03 07:27:03 f39-dev1 kernel:  ? irqentry_exit+0x3b/0x50
    May 03 07:27:03 f39-dev1 kernel:  ? exc_page_fault+0x90/0x190
    May 03 07:27:03 f39-dev1 kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
    May 03 07:27:03 f39-dev1 kernel: RIP: 0033:0x7f316419041b
    May 03 07:27:03 f39-dev1 kernel: Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e1 19 0c 00 f7 d8
    May 03 07:27:03 f39-dev1 kernel: RSP: 002b:00007fff84675728 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
    May 03 07:27:03 f39-dev1 kernel: RAX: 0000000000000000 RBX: 00005648cab2eb90 RCX: 00007f316419041b
    May 03 07:27:03 f39-dev1 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005648cab33ba0
    May 03 07:27:03 f39-dev1 kernel: RBP: 00007fff84675800 R08: 0000000000000020 R09: 0000000000000001
    May 03 07:27:03 f39-dev1 kernel: R10: 0000000000000004 R11: 0000000000000246 R12: 00005648cab2ec90
    May 03 07:27:03 f39-dev1 kernel: R13: 0000000000000000 R14: 00005648cab33ba0 R15: 00005648cab2efa0
    May 03 07:27:03 f39-dev1 kernel:  </TASK>
    May 03 07:27:03 f39-dev1 kernel: Modules linked in: famfs rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace netfs qrtr rfkill intel_rapl_msr sunrpc snd_hda_codec_generic intel_rapl_common snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi kvm_intel snd_hda_codec kmem snd_hda_core iTCO_wdt intel_pmc_bxt snd_hwdep device_dax iTCO_vendor_support kvm snd_seq snd_seq_device rapl dax_hmem cxl_acpi snd_pcm cxl_core i2c_i801 snd_timer einj pcspkr i2c_smbus snd lpc_ich soundcore virtio_balloon joydev vfat fat fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 virtio_blk virtio_console virtio_gpu virtio_net net_failover virtio_dma_buf failover serio_raw scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath qemu_fw_cfg
    May 03 07:27:03 f39-dev1 kernel: ---[ end trace 0000000000000000 ]---

    May 03 07:27:03 f39-dev1 kernel: RIP: 0010:generic_shutdown_super+0x112/0x120
    May 03 07:27:03 f39-dev1 kernel: Code: cc cc e8 e1 4f f0 ff 48 8b bb 00 01 00 00 eb d9 48 8b 43 28 48 8d b3 c0 03 00 00 48 c7 c7 c0 98 ae 82 48 8b 10 e8 5e 2d d0 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
    May 03 07:27:03 f39-dev1 kernel: RSP: 0018:ffffc90000717c70 EFLAGS: 00010246
    May 03 07:27:03 f39-dev1 kernel: RAX: 000000000000002f RBX: ffff8881215d5000 RCX: 0000000000000000
    May 03 07:27:03 f39-dev1 kernel: RDX: 0000000000000000 RSI: ffffffff82b1c111 RDI: 00000000ffffffff
    May 03 07:27:03 f39-dev1 kernel: RBP: ffffc90000717c80 R08: 0000000000000000 R09: 0000000000000003
    May 03 07:27:03 f39-dev1 kernel: R10: ffffc90000717ad8 R11: ffffffff82f3c3a8 R12: ffffffffa0cf4380
    May 03 07:27:03 f39-dev1 kernel: R13: ffff888124ed359c R14: 0000000000000000 R15: 0000000000000000
    May 03 07:27:03 f39-dev1 kernel: FS:  00007f3163f71800(0000) GS:ffff88886fd80000(0000) knlGS:0000000000000000
    May 03 07:27:03 f39-dev1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    May 03 07:27:03 f39-dev1 kernel: CR2: 000055fd5fc6a370 CR3: 000000010c338003 CR4: 0000000000170ef0

These BUG dumps are familiar; famfs_kill_sb()/kill_char_sb() in this
patch set are clean in this regard. (I'm not saying they're "right", but
clean). But to be clear, blowing away the dentries is appropriate in the
famfs case.

An important thing, I think, is that instantiation of famfs file
(which happens when user space plays the log) looks a lot like creating
ramfs files - except that after an empty ramfs-like file is created,
an ioctl is called to "tell the file where its backing memory is".
And famfs does not persist metadata changes, which is a feature and
not a bug...

I think the d_genocide() call is what cleans up the dentry cache with
famfs_kill_sb() from the patch (which calls the new kill_char_super()).

Thanks for any suggestions,
John


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 07/12] famfs prep: Add fs/super.c:kill_char_super()
  2024-05-03  9:04       ` Christian Brauner
@ 2024-05-03 15:38         ` John Groves
  0 siblings, 0 replies; 32+ messages in thread
From: John Groves @ 2024-05-03 15:38 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Al Viro, Jonathan Corbet, Jonathan Cameron, Dan Williams,
	Vishal Verma, Dave Jiang, Jan Kara, Matthew Wilcox, linux-cxl,
	linux-fsdevel, nvdimm, John Groves, john, Dave Chinner,
	Christoph Hellwig, dave.hansen, gregory.price, Randy Dunlap,
	Jerome Glisse, Aravind Ramesh, Ajay Joshi, Eishan Mirakhur,
	Ravi Shankar, Srinivasulu Thanneeru, Luis Chamberlain,
	Amir Goldstein, Chandan Babu R, Bagas Sanjaya, Darrick J . Wong,
	Kent Overstreet, Steve French, Nathan Lynch, Michael Ellerman,
	Thomas Zimmermann, Julien Panis, Stanislav Fomichev,
	Dongsheng Yang

On 24/05/03 11:04AM, Christian Brauner wrote:
> On Thu, May 02, 2024 at 05:25:33PM -0500, John Groves wrote:
> > On 24/05/02 07:17PM, Al Viro wrote:
> > > On Mon, Apr 29, 2024 at 12:04:23PM -0500, John Groves wrote:
> > > > Famfs needs a slightly different kill_super variant than already existed.
> > > > Putting it local to famfs would require exporting d_genocide(); this
> > > > seemed a bit cleaner.
> > > 
> > > What's wrong with kill_litter_super()?
> > 
> > I struggled with that, I don't have my head fully around the superblock
> > handling code.
> 
> Fyi, see my other mail where I point out what's wrong and one way to fix it.

No luck with that, but please let me know if I did it wrong.

https://lore.kernel.org/linux-fsdevel/cover.1714409084.git.john@groves.net/T/#m98890d9b46d9c83d2d144c07e6de7ae7f64a595d

Thank you,,
John


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2024-05-03 15:38 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-29 17:04 [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system John Groves
2024-04-29 17:04 ` [RFC PATCH v2 01/12] famfs: Introduce famfs documentation John Groves
2024-04-30  6:46   ` Bagas Sanjaya
2024-04-29 17:04 ` [RFC PATCH v2 02/12] dev_dax_iomap: Move dax_pgoff_to_phys() from device.c to bus.c John Groves
2024-04-29 17:04 ` [RFC PATCH v2 03/12] dev_dax_iomap: Add fs_dax_get() func to prepare dax for fs-dax usage John Groves
2024-04-29 17:04 ` [RFC PATCH v2 04/12] dev_dax_iomap: Save the kva from memremap John Groves
2024-04-29 17:04 ` [RFC PATCH v2 05/12] dev_dax_iomap: Add dax_operations for use by fs-dax on devdax John Groves
2024-04-29 17:04 ` [RFC PATCH v2 06/12] dev_dax_iomap: export dax_dev_get() John Groves
2024-04-29 17:04 ` [RFC PATCH v2 07/12] famfs prep: Add fs/super.c:kill_char_super() John Groves
2024-05-02 18:17   ` Al Viro
2024-05-02 22:25     ` John Groves
2024-05-03  9:04       ` Christian Brauner
2024-05-03 15:38         ` John Groves
2024-04-29 17:04 ` [RFC PATCH v2 08/12] famfs: module operations & fs_context John Groves
2024-04-30 11:01   ` Christian Brauner
2024-05-02 15:51     ` John Groves
2024-05-03 14:15     ` John Groves
2024-05-02 18:23   ` Al Viro
2024-05-02 21:50     ` John Groves
2024-04-29 17:04 ` [RFC PATCH v2 09/12] famfs: Introduce inode_operations and super_operations John Groves
2024-04-29 17:04 ` [RFC PATCH v2 10/12] famfs: Introduce file_operations read/write John Groves
2024-05-02 18:29   ` Al Viro
2024-05-02 21:51     ` John Groves
2024-04-29 17:04 ` [RFC PATCH v2 11/12] famfs: Introduce mmap and VM fault handling John Groves
2024-04-29 17:04 ` [RFC PATCH v2 12/12] famfs: famfs_ioctl and core file-to-memory mapping logic & iomap_ops John Groves
2024-04-29 18:32 ` [RFC PATCH v2 00/12] Introduce the famfs shared-memory file system Matthew Wilcox
2024-04-29 23:08   ` Kent Overstreet
2024-04-30  2:24     ` John Groves
2024-04-30  3:11       ` Kent Overstreet
2024-05-01  2:09         ` John Groves
2024-04-30  2:11   ` John Groves
2024-04-30 21:01     ` Matthew Wilcox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.