All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up
@ 2024-05-02 12:23 Jiri Olsa
  2024-05-02 12:23 ` [PATCHv4 bpf-next 1/7] uprobe: Wire up uretprobe system call Jiri Olsa
                   ` (7 more replies)
  0 siblings, 8 replies; 27+ messages in thread
From: Jiri Olsa @ 2024-05-02 12:23 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

hi,
as part of the effort on speeding up the uprobes [0] coming with
return uprobe optimization by using syscall instead of the trap
on the uretprobe trampoline.

The speed up depends on instruction type that uprobe is installed
and depends on specific HW type, please check patch 1 for details.

Patches 1-6 are based on bpf-next/master, but path 1 and 2 are
apply-able on linux-trace.git tree probes/for-next branch.
Patch 7 is based on man-pages master.

v4 changes:
  - added acks [Oleg,Andrii,Masami]
  - reworded the man page and adding more info to NOTE section [Masami]
  - rewrote bpf tests not to use trace_pipe [Andrii]
  - cc-ed linux-man list

Also available at:
  https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  uretprobe_syscall

thanks,
jirka


Notes to check list items in Documentation/process/adding-syscalls.rst:

- System Call Alternatives
  New syscall seems like the best way in here, becase we need
  just to quickly enter kernel with no extra arguments processing,
  which we'd need to do if we decided to use another syscall.

- Designing the API: Planning for Extension
  The uretprobe syscall is very specific and most likely won't be
  extended in the future.

  At the moment it does not take any arguments and even if it does
  in future, it's allowed to be called only from trampoline prepared
  by kernel, so there'll be no broken user.

- Designing the API: Other Considerations
  N/A because uretprobe syscall does not return reference to kernel
  object.

- Proposing the API
  Wiring up of the uretprobe system call si in separate change,
  selftests and man page changes are part of the patchset.

- Generic System Call Implementation
  There's no CONFIG option for the new functionality because it
  keeps the same behaviour from the user POV.

- x86 System Call Implementation
  It's 64-bit syscall only.

- Compatibility System Calls (Generic)
  N/A uretprobe syscall has no arguments and is not supported
  for compat processes.

- Compatibility System Calls (x86)
  N/A uretprobe syscall is not supported for compat processes.

- System Calls Returning Elsewhere
  N/A.

- Other Details
  N/A.

- Testing
  Adding new bpf selftests and ran ltp on top of this change.

- Man Page
  Attached.

- Do not call System Calls in the Kernel
  N/A.


[0] https://lore.kernel.org/bpf/ZeCXHKJ--iYYbmLj@krava/
---
Jiri Olsa (6):
      uprobe: Wire up uretprobe system call
      uprobe: Add uretprobe syscall to speed up return probe
      selftests/bpf: Add uretprobe syscall test for regs integrity
      selftests/bpf: Add uretprobe syscall test for regs changes
      selftests/bpf: Add uretprobe syscall call from user space test
      selftests/bpf: Add uretprobe compat test

 arch/x86/entry/syscalls/syscall_64.tbl                      |   1 +
 arch/x86/kernel/uprobes.c                                   | 115 ++++++++++++++++++++++++++++
 include/linux/syscalls.h                                    |   2 +
 include/linux/uprobes.h                                     |   3 +
 include/uapi/asm-generic/unistd.h                           |   5 +-
 kernel/events/uprobes.c                                     |  24 ++++--
 kernel/sys_ni.c                                             |   2 +
 tools/include/linux/compiler.h                              |   4 +
 tools/testing/selftests/bpf/.gitignore                      |   1 +
 tools/testing/selftests/bpf/Makefile                        |   7 +-
 tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c       | 123 ++++++++++++++++++++++++++++-
 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c     | 382 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/progs/uprobe_syscall.c          |  15 ++++
 tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c |  17 +++++
 14 files changed, 691 insertions(+), 10 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
 create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall.c
 create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c

Jiri Olsa (1):
      man2: Add uretprobe syscall page

 man2/uretprobe.2 | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)
 create mode 100644 man2/uretprobe.2

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv4 bpf-next 1/7] uprobe: Wire up uretprobe system call
  2024-05-02 12:23 [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up Jiri Olsa
@ 2024-05-02 12:23 ` Jiri Olsa
  2024-05-02 12:23 ` [PATCHv4 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe Jiri Olsa
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Jiri Olsa @ 2024-05-02 12:23 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

Wiring up uretprobe system call, which comes in following changes.
We need to do the wiring before, because the uretprobe implementation
needs the syscall number.

Note at the moment uretprobe syscall is supported only for native
64-bit process.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 include/linux/syscalls.h               | 2 ++
 include/uapi/asm-generic/unistd.h      | 5 ++++-
 kernel/sys_ni.c                        | 2 ++
 4 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 7e8d46f4147f..af0a33ab06ee 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -383,6 +383,7 @@
 459	common	lsm_get_self_attr	sys_lsm_get_self_attr
 460	common	lsm_set_self_attr	sys_lsm_set_self_attr
 461	common	lsm_list_modules	sys_lsm_list_modules
+462	64	uretprobe		sys_uretprobe
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e619ac10cd23..5318e0e76799 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -972,6 +972,8 @@ asmlinkage long sys_lsm_list_modules(u64 *ids, u32 *size, u32 flags);
 /* x86 */
 asmlinkage long sys_ioperm(unsigned long from, unsigned long num, int on);
 
+asmlinkage long sys_uretprobe(void);
+
 /* pciconfig: alpha, arm, arm64, ia64, sparc */
 asmlinkage long sys_pciconfig_read(unsigned long bus, unsigned long dfn,
 				unsigned long off, unsigned long len,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 75f00965ab15..8a747cd1d735 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -842,8 +842,11 @@ __SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr)
 #define __NR_lsm_list_modules 461
 __SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)
 
+#define __NR_uretprobe 462
+__SYSCALL(__NR_uretprobe, sys_uretprobe)
+
 #undef __NR_syscalls
-#define __NR_syscalls 462
+#define __NR_syscalls 463
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index faad00cce269..be6195e0d078 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -391,3 +391,5 @@ COND_SYSCALL(setuid16);
 
 /* restartable sequence */
 COND_SYSCALL(rseq);
+
+COND_SYSCALL(uretprobe);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv4 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe
  2024-05-02 12:23 [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up Jiri Olsa
  2024-05-02 12:23 ` [PATCHv4 bpf-next 1/7] uprobe: Wire up uretprobe system call Jiri Olsa
@ 2024-05-02 12:23 ` Jiri Olsa
  2024-05-03 11:34   ` Peter Zijlstra
  2024-05-02 12:23 ` [PATCHv4 bpf-next 3/7] selftests/bpf: Add uretprobe syscall test for regs integrity Jiri Olsa
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-05-02 12:23 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

Adding uretprobe syscall instead of trap to speed up return probe.

At the moment the uretprobe setup/path is:

  - install entry uprobe

  - when the uprobe is hit, it overwrites probed function's return address
    on stack with address of the trampoline that contains breakpoint
    instruction

  - the breakpoint trap code handles the uretprobe consumers execution and
    jumps back to original return address

This patch replaces the above trampoline's breakpoint instruction with new
ureprobe syscall call. This syscall does exactly the same job as the trap
with some more extra work:

  - syscall trampoline must save original value for rax/r11/rcx registers
    on stack - rax is set to syscall number and r11/rcx are changed and
    used by syscall instruction

  - the syscall code reads the original values of those registers and
    restore those values in task's pt_regs area

  - only caller from trampoline exposed in '[uprobes]' is allowed,
    the process will receive SIGILL signal otherwise

Even with some extra work, using the uretprobes syscall shows speed
improvement (compared to using standard breakpoint):

  On Intel (11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz)

  current:
    uretprobe-nop  :    1.498 ± 0.000M/s
    uretprobe-push :    1.448 ± 0.001M/s
    uretprobe-ret  :    0.816 ± 0.001M/s

  with the fix:
    uretprobe-nop  :    1.969 ± 0.002M/s  < 31% speed up
    uretprobe-push :    1.910 ± 0.000M/s  < 31% speed up
    uretprobe-ret  :    0.934 ± 0.000M/s  < 14% speed up

  On Amd (AMD Ryzen 7 5700U)

  current:
    uretprobe-nop  :    0.778 ± 0.001M/s
    uretprobe-push :    0.744 ± 0.001M/s
    uretprobe-ret  :    0.540 ± 0.001M/s

  with the fix:
    uretprobe-nop  :    0.860 ± 0.001M/s  < 10% speed up
    uretprobe-push :    0.818 ± 0.001M/s  < 10% speed up
    uretprobe-ret  :    0.578 ± 0.000M/s  <  7% speed up

The performance test spawns a thread that runs loop which triggers
uprobe with attached bpf program that increments the counter that
gets printed in results above.

The uprobe (and uretprobe) kind is determined by which instruction
is being patched with breakpoint instruction. That's also important
for uretprobes, because uprobe is installed for each uretprobe.

The performance test is part of bpf selftests:
  tools/testing/selftests/bpf/run_bench_uprobes.sh

Note at the moment uretprobe syscall is supported only for native
64-bit process, compat process still uses standard breakpoint.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/kernel/uprobes.c | 115 ++++++++++++++++++++++++++++++++++++++
 include/linux/uprobes.h   |   3 +
 kernel/events/uprobes.c   |  24 +++++---
 3 files changed, 135 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 6c07f6daaa22..81e6ee95784d 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -12,6 +12,7 @@
 #include <linux/ptrace.h>
 #include <linux/uprobes.h>
 #include <linux/uaccess.h>
+#include <linux/syscalls.h>
 
 #include <linux/kdebug.h>
 #include <asm/processor.h>
@@ -308,6 +309,120 @@ static int uprobe_init_insn(struct arch_uprobe *auprobe, struct insn *insn, bool
 }
 
 #ifdef CONFIG_X86_64
+
+asm (
+	".pushsection .rodata\n"
+	".global uretprobe_syscall_entry\n"
+	"uretprobe_syscall_entry:\n"
+	"pushq %rax\n"
+	"pushq %rcx\n"
+	"pushq %r11\n"
+	"movq $" __stringify(__NR_uretprobe) ", %rax\n"
+	"syscall\n"
+	".global uretprobe_syscall_check\n"
+	"uretprobe_syscall_check:\n"
+	"popq %r11\n"
+	"popq %rcx\n"
+
+	/* The uretprobe syscall replaces stored %rax value with final
+	 * return address, so we don't restore %rax in here and just
+	 * call ret.
+	 */
+	"retq\n"
+	".global uretprobe_syscall_end\n"
+	"uretprobe_syscall_end:\n"
+	".popsection\n"
+);
+
+extern u8 uretprobe_syscall_entry[];
+extern u8 uretprobe_syscall_check[];
+extern u8 uretprobe_syscall_end[];
+
+void *arch_uprobe_trampoline(unsigned long *psize)
+{
+	static uprobe_opcode_t insn = UPROBE_SWBP_INSN;
+	struct pt_regs *regs = task_pt_regs(current);
+
+	/*
+	 * At the moment the uretprobe syscall trampoline is supported
+	 * only for native 64-bit process, the compat process still uses
+	 * standard breakpoint.
+	 */
+	if (user_64bit_mode(regs)) {
+		*psize = uretprobe_syscall_end - uretprobe_syscall_entry;
+		return uretprobe_syscall_entry;
+	}
+
+	*psize = UPROBE_SWBP_INSN_SIZE;
+	return &insn;
+}
+
+static unsigned long trampoline_check_ip(void)
+{
+	unsigned long tramp = uprobe_get_trampoline_vaddr();
+
+	return tramp + (uretprobe_syscall_check - uretprobe_syscall_entry);
+}
+
+SYSCALL_DEFINE0(uretprobe)
+{
+	struct pt_regs *regs = task_pt_regs(current);
+	unsigned long err, ip, sp, r11_cx_ax[3];
+
+	if (regs->ip != trampoline_check_ip())
+		goto sigill;
+
+	err = copy_from_user(r11_cx_ax, (void __user *)regs->sp, sizeof(r11_cx_ax));
+	if (err)
+		goto sigill;
+
+	/* expose the "right" values of r11/cx/ax/sp to uprobe_consumer/s */
+	regs->r11 = r11_cx_ax[0];
+	regs->cx  = r11_cx_ax[1];
+	regs->ax  = r11_cx_ax[2];
+	regs->sp += sizeof(r11_cx_ax);
+	regs->orig_ax = -1;
+
+	ip = regs->ip;
+	sp = regs->sp;
+
+	uprobe_handle_trampoline(regs);
+
+	/*
+	 * uprobe_consumer has changed sp, we can do nothing,
+	 * just return via iret
+	 */
+	if (regs->sp != sp)
+		return regs->ax;
+	regs->sp -= sizeof(r11_cx_ax);
+
+	/* for the case uprobe_consumer has changed r11/cx */
+	r11_cx_ax[0] = regs->r11;
+	r11_cx_ax[1] = regs->cx;
+
+	/*
+	 * ax register is passed through as return value, so we can use
+	 * its space on stack for ip value and jump to it through the
+	 * trampoline's ret instruction
+	 */
+	r11_cx_ax[2] = regs->ip;
+	regs->ip = ip;
+
+	err = copy_to_user((void __user *)regs->sp, r11_cx_ax, sizeof(r11_cx_ax));
+	if (err)
+		goto sigill;
+
+	/* ensure sysret, see do_syscall_64() */
+	regs->r11 = regs->flags;
+	regs->cx  = regs->ip;
+
+	return regs->ax;
+
+sigill:
+	force_sig(SIGILL);
+	return -1;
+}
+
 /*
  * If arch_uprobe->insn doesn't use rip-relative addressing, return
  * immediately.  Otherwise, rewrite the instruction so that it accesses
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index f46e0ca0169c..b503fafb7fb3 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -138,6 +138,9 @@ extern bool arch_uretprobe_is_alive(struct return_instance *ret, enum rp_check c
 extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
 extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
 					 void *src, unsigned long len);
+extern void uprobe_handle_trampoline(struct pt_regs *regs);
+extern void *arch_uprobe_trampoline(unsigned long *psize);
+extern unsigned long uprobe_get_trampoline_vaddr(void);
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index e4834d23e1d1..c550449d66be 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1474,11 +1474,20 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
 	return ret;
 }
 
+void * __weak arch_uprobe_trampoline(unsigned long *psize)
+{
+	static uprobe_opcode_t insn = UPROBE_SWBP_INSN;
+
+	*psize = UPROBE_SWBP_INSN_SIZE;
+	return &insn;
+}
+
 static struct xol_area *__create_xol_area(unsigned long vaddr)
 {
 	struct mm_struct *mm = current->mm;
-	uprobe_opcode_t insn = UPROBE_SWBP_INSN;
+	unsigned long insns_size;
 	struct xol_area *area;
+	void *insns;
 
 	area = kmalloc(sizeof(*area), GFP_KERNEL);
 	if (unlikely(!area))
@@ -1502,7 +1511,8 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
 	/* Reserve the 1st slot for get_trampoline_vaddr() */
 	set_bit(0, area->bitmap);
 	atomic_set(&area->slot_count, 1);
-	arch_uprobe_copy_ixol(area->pages[0], 0, &insn, UPROBE_SWBP_INSN_SIZE);
+	insns = arch_uprobe_trampoline(&insns_size);
+	arch_uprobe_copy_ixol(area->pages[0], 0, insns, insns_size);
 
 	if (!xol_add_vma(mm, area))
 		return area;
@@ -1827,7 +1837,7 @@ void uprobe_copy_process(struct task_struct *t, unsigned long flags)
  *
  * Returns -1 in case the xol_area is not allocated.
  */
-static unsigned long get_trampoline_vaddr(void)
+unsigned long uprobe_get_trampoline_vaddr(void)
 {
 	struct xol_area *area;
 	unsigned long trampoline_vaddr = -1;
@@ -1878,7 +1888,7 @@ static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
 	if (!ri)
 		return;
 
-	trampoline_vaddr = get_trampoline_vaddr();
+	trampoline_vaddr = uprobe_get_trampoline_vaddr();
 	orig_ret_vaddr = arch_uretprobe_hijack_return_addr(trampoline_vaddr, regs);
 	if (orig_ret_vaddr == -1)
 		goto fail;
@@ -2123,7 +2133,7 @@ static struct return_instance *find_next_ret_chain(struct return_instance *ri)
 	return ri;
 }
 
-static void handle_trampoline(struct pt_regs *regs)
+void uprobe_handle_trampoline(struct pt_regs *regs)
 {
 	struct uprobe_task *utask;
 	struct return_instance *ri, *next;
@@ -2187,8 +2197,8 @@ static void handle_swbp(struct pt_regs *regs)
 	int is_swbp;
 
 	bp_vaddr = uprobe_get_swbp_addr(regs);
-	if (bp_vaddr == get_trampoline_vaddr())
-		return handle_trampoline(regs);
+	if (bp_vaddr == uprobe_get_trampoline_vaddr())
+		return uprobe_handle_trampoline(regs);
 
 	uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
 	if (!uprobe) {
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv4 bpf-next 3/7] selftests/bpf: Add uretprobe syscall test for regs integrity
  2024-05-02 12:23 [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up Jiri Olsa
  2024-05-02 12:23 ` [PATCHv4 bpf-next 1/7] uprobe: Wire up uretprobe system call Jiri Olsa
  2024-05-02 12:23 ` [PATCHv4 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe Jiri Olsa
@ 2024-05-02 12:23 ` Jiri Olsa
  2024-05-02 12:23 ` [PATCHv4 bpf-next 4/7] selftests/bpf: Add uretprobe syscall test for regs changes Jiri Olsa
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Jiri Olsa @ 2024-05-02 12:23 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

Add uretprobe syscall test that compares register values before
and after the uretprobe is hit. It also compares the register
values seen from attached bpf program.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/include/linux/compiler.h                |   4 +
 .../selftests/bpf/prog_tests/uprobe_syscall.c | 163 ++++++++++++++++++
 .../selftests/bpf/progs/uprobe_syscall.c      |  15 ++
 3 files changed, 182 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
 create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall.c

diff --git a/tools/include/linux/compiler.h b/tools/include/linux/compiler.h
index 8a63a9913495..6f7f22ac9da5 100644
--- a/tools/include/linux/compiler.h
+++ b/tools/include/linux/compiler.h
@@ -62,6 +62,10 @@
 #define __nocf_check __attribute__((nocf_check))
 #endif
 
+#ifndef __naked
+#define __naked __attribute__((__naked__))
+#endif
+
 /* Are two types/vars the same type (ignoring qualifiers)? */
 #ifndef __same_type
 # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
new file mode 100644
index 000000000000..311ac19d8992
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -0,0 +1,163 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+
+#ifdef __x86_64__
+
+#include <unistd.h>
+#include <asm/ptrace.h>
+#include <linux/compiler.h>
+#include "uprobe_syscall.skel.h"
+
+__naked unsigned long uretprobe_regs_trigger(void)
+{
+	asm volatile (
+		"movq $0xdeadbeef, %rax\n"
+		"ret\n"
+	);
+}
+
+__naked void uretprobe_regs(struct pt_regs *before, struct pt_regs *after)
+{
+	asm volatile (
+		"movq %r15,   0(%rdi)\n"
+		"movq %r14,   8(%rdi)\n"
+		"movq %r13,  16(%rdi)\n"
+		"movq %r12,  24(%rdi)\n"
+		"movq %rbp,  32(%rdi)\n"
+		"movq %rbx,  40(%rdi)\n"
+		"movq %r11,  48(%rdi)\n"
+		"movq %r10,  56(%rdi)\n"
+		"movq  %r9,  64(%rdi)\n"
+		"movq  %r8,  72(%rdi)\n"
+		"movq %rax,  80(%rdi)\n"
+		"movq %rcx,  88(%rdi)\n"
+		"movq %rdx,  96(%rdi)\n"
+		"movq %rsi, 104(%rdi)\n"
+		"movq %rdi, 112(%rdi)\n"
+		"movq   $0, 120(%rdi)\n" /* orig_rax */
+		"movq   $0, 128(%rdi)\n" /* rip      */
+		"movq   $0, 136(%rdi)\n" /* cs       */
+		"pushf\n"
+		"pop %rax\n"
+		"movq %rax, 144(%rdi)\n" /* eflags   */
+		"movq %rsp, 152(%rdi)\n" /* rsp      */
+		"movq   $0, 160(%rdi)\n" /* ss       */
+
+		/* save 2nd argument */
+		"pushq %rsi\n"
+		"call uretprobe_regs_trigger\n"
+
+		/* save  return value and load 2nd argument pointer to rax */
+		"pushq %rax\n"
+		"movq 8(%rsp), %rax\n"
+
+		"movq %r15,   0(%rax)\n"
+		"movq %r14,   8(%rax)\n"
+		"movq %r13,  16(%rax)\n"
+		"movq %r12,  24(%rax)\n"
+		"movq %rbp,  32(%rax)\n"
+		"movq %rbx,  40(%rax)\n"
+		"movq %r11,  48(%rax)\n"
+		"movq %r10,  56(%rax)\n"
+		"movq  %r9,  64(%rax)\n"
+		"movq  %r8,  72(%rax)\n"
+		"movq %rcx,  88(%rax)\n"
+		"movq %rdx,  96(%rax)\n"
+		"movq %rsi, 104(%rax)\n"
+		"movq %rdi, 112(%rax)\n"
+		"movq   $0, 120(%rax)\n" /* orig_rax */
+		"movq   $0, 128(%rax)\n" /* rip      */
+		"movq   $0, 136(%rax)\n" /* cs       */
+
+		/* restore return value and 2nd argument */
+		"pop %rax\n"
+		"pop %rsi\n"
+
+		"movq %rax,  80(%rsi)\n"
+
+		"pushf\n"
+		"pop %rax\n"
+
+		"movq %rax, 144(%rsi)\n" /* eflags   */
+		"movq %rsp, 152(%rsi)\n" /* rsp      */
+		"movq   $0, 160(%rsi)\n" /* ss       */
+		"ret\n"
+);
+}
+
+static void test_uretprobe_regs_equal(void)
+{
+	struct uprobe_syscall *skel = NULL;
+	struct pt_regs before = {}, after = {};
+	unsigned long *pb = (unsigned long *) &before;
+	unsigned long *pa = (unsigned long *) &after;
+	unsigned long *pp;
+	unsigned int i, cnt;
+	int err;
+
+	skel = uprobe_syscall__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "uprobe_syscall__open_and_load"))
+		goto cleanup;
+
+	err = uprobe_syscall__attach(skel);
+	if (!ASSERT_OK(err, "uprobe_syscall__attach"))
+		goto cleanup;
+
+	uretprobe_regs(&before, &after);
+
+	pp = (unsigned long *) &skel->bss->regs;
+	cnt = sizeof(before)/sizeof(*pb);
+
+	for (i = 0; i < cnt; i++) {
+		unsigned int offset = i * sizeof(unsigned long);
+
+		/*
+		 * Check register before and after uretprobe_regs_trigger call
+		 * that triggers the uretprobe.
+		 */
+		switch (offset) {
+		case offsetof(struct pt_regs, rax):
+			ASSERT_EQ(pa[i], 0xdeadbeef, "return value");
+			break;
+		default:
+			if (!ASSERT_EQ(pb[i], pa[i], "register before-after value check"))
+				fprintf(stdout, "failed register offset %u\n", offset);
+		}
+
+		/*
+		 * Check register seen from bpf program and register after
+		 * uretprobe_regs_trigger call
+		 */
+		switch (offset) {
+		/*
+		 * These values will be different (not set in uretprobe_regs),
+		 * we don't care.
+		 */
+		case offsetof(struct pt_regs, orig_rax):
+		case offsetof(struct pt_regs, rip):
+		case offsetof(struct pt_regs, cs):
+		case offsetof(struct pt_regs, rsp):
+		case offsetof(struct pt_regs, ss):
+			break;
+		default:
+			if (!ASSERT_EQ(pp[i], pa[i], "register prog-after value check"))
+				fprintf(stdout, "failed register offset %u\n", offset);
+		}
+	}
+
+cleanup:
+	uprobe_syscall__destroy(skel);
+}
+#else
+static void test_uretprobe_regs_equal(void)
+{
+	test__skip();
+}
+#endif
+
+void test_uprobe_syscall(void)
+{
+	if (test__start_subtest("uretprobe_regs_equal"))
+		test_uretprobe_regs_equal();
+}
diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall.c b/tools/testing/selftests/bpf/progs/uprobe_syscall.c
new file mode 100644
index 000000000000..8a4fa6c7ef59
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/uprobe_syscall.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <string.h>
+
+struct pt_regs regs;
+
+char _license[] SEC("license") = "GPL";
+
+SEC("uretprobe//proc/self/exe:uretprobe_regs_trigger")
+int uretprobe(struct pt_regs *ctx)
+{
+	__builtin_memcpy(&regs, ctx, sizeof(regs));
+	return 0;
+}
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv4 bpf-next 4/7] selftests/bpf: Add uretprobe syscall test for regs changes
  2024-05-02 12:23 [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up Jiri Olsa
                   ` (2 preceding siblings ...)
  2024-05-02 12:23 ` [PATCHv4 bpf-next 3/7] selftests/bpf: Add uretprobe syscall test for regs integrity Jiri Olsa
@ 2024-05-02 12:23 ` Jiri Olsa
  2024-05-02 12:23 ` [PATCHv4 bpf-next 5/7] selftests/bpf: Add uretprobe syscall call from user space test Jiri Olsa
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Jiri Olsa @ 2024-05-02 12:23 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

Adding test that creates uprobe consumer on uretprobe which changes some
of the registers. Making sure the changed registers are propagated to the
user space when the ureptobe syscall trampoline is used on x86_64.

To be able to do this, adding support to bpf_testmod to create uprobe via
new attribute file:
  /sys/kernel/bpf_testmod_uprobe

This file is expecting file offset and creates related uprobe on current
process exe file and removes existing uprobe if offset is 0. The can be
only single uprobe at any time.

The uprobe has specific consumer that changes registers used in ureprobe
syscall trampoline and which are later checked in the test.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   | 123 +++++++++++++++++-
 .../selftests/bpf/prog_tests/uprobe_syscall.c |  67 ++++++++++
 2 files changed, 189 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index eb2b78552ca2..27a12d125b9f 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -10,6 +10,7 @@
 #include <linux/percpu-defs.h>
 #include <linux/sysfs.h>
 #include <linux/tracepoint.h>
+#include <linux/namei.h>
 #include "bpf_testmod.h"
 #include "bpf_testmod_kfunc.h"
 
@@ -343,6 +344,119 @@ static struct bin_attribute bin_attr_bpf_testmod_file __ro_after_init = {
 	.write = bpf_testmod_test_write,
 };
 
+/* bpf_testmod_uprobe sysfs attribute is so far enabled for x86_64 only,
+ * please see test_uretprobe_regs_change test
+ */
+#ifdef __x86_64__
+
+static int
+uprobe_ret_handler(struct uprobe_consumer *self, unsigned long func,
+		   struct pt_regs *regs)
+
+{
+	regs->ax  = 0x12345678deadbeef;
+	regs->cx  = 0x87654321feebdaed;
+	regs->r11 = (u64) -1;
+	return true;
+}
+
+struct testmod_uprobe {
+	struct path path;
+	loff_t offset;
+	struct uprobe_consumer consumer;
+};
+
+static DEFINE_MUTEX(testmod_uprobe_mutex);
+
+static struct testmod_uprobe uprobe = {
+	.consumer.ret_handler = uprobe_ret_handler,
+};
+
+static int testmod_register_uprobe(loff_t offset)
+{
+	int err = -EBUSY;
+
+	if (uprobe.offset)
+		return -EBUSY;
+
+	mutex_lock(&testmod_uprobe_mutex);
+
+	if (uprobe.offset)
+		goto out;
+
+	err = kern_path("/proc/self/exe", LOOKUP_FOLLOW, &uprobe.path);
+	if (err)
+		goto out;
+
+	err = uprobe_register_refctr(d_real_inode(uprobe.path.dentry),
+				     offset, 0, &uprobe.consumer);
+	if (err)
+		path_put(&uprobe.path);
+	else
+		uprobe.offset = offset;
+
+out:
+	mutex_unlock(&testmod_uprobe_mutex);
+	return err;
+}
+
+static void testmod_unregister_uprobe(void)
+{
+	mutex_lock(&testmod_uprobe_mutex);
+
+	if (uprobe.offset) {
+		uprobe_unregister(d_real_inode(uprobe.path.dentry),
+				  uprobe.offset, &uprobe.consumer);
+		uprobe.offset = 0;
+	}
+
+	mutex_unlock(&testmod_uprobe_mutex);
+}
+
+static ssize_t
+bpf_testmod_uprobe_write(struct file *file, struct kobject *kobj,
+			 struct bin_attribute *bin_attr,
+			 char *buf, loff_t off, size_t len)
+{
+	unsigned long offset;
+	int err;
+
+	if (kstrtoul(buf, 0, &offset))
+		return -EINVAL;
+
+	if (offset)
+		err = testmod_register_uprobe(offset);
+	else
+		testmod_unregister_uprobe();
+
+	return err ?: strlen(buf);
+}
+
+static struct bin_attribute bin_attr_bpf_testmod_uprobe_file __ro_after_init = {
+	.attr = { .name = "bpf_testmod_uprobe", .mode = 0666, },
+	.write = bpf_testmod_uprobe_write,
+};
+
+static int register_bpf_testmod_uprobe(void)
+{
+	return sysfs_create_bin_file(kernel_kobj, &bin_attr_bpf_testmod_uprobe_file);
+}
+
+static void unregister_bpf_testmod_uprobe(void)
+{
+	testmod_unregister_uprobe();
+	sysfs_remove_bin_file(kernel_kobj, &bin_attr_bpf_testmod_uprobe_file);
+}
+
+#else
+static int register_bpf_testmod_uprobe(void)
+{
+	return 0;
+}
+
+static void unregister_bpf_testmod_uprobe(void) { }
+#endif
+
 BTF_KFUNCS_START(bpf_testmod_common_kfunc_ids)
 BTF_ID_FLAGS(func, bpf_iter_testmod_seq_new, KF_ITER_NEW)
 BTF_ID_FLAGS(func, bpf_iter_testmod_seq_next, KF_ITER_NEXT | KF_RET_NULL)
@@ -655,7 +769,13 @@ static int bpf_testmod_init(void)
 		return ret;
 	if (bpf_fentry_test1(0) < 0)
 		return -EINVAL;
-	return sysfs_create_bin_file(kernel_kobj, &bin_attr_bpf_testmod_file);
+	ret = sysfs_create_bin_file(kernel_kobj, &bin_attr_bpf_testmod_file);
+	if (ret < 0)
+		return ret;
+	ret = register_bpf_testmod_uprobe();
+	if (ret < 0)
+		return ret;
+	return 0;
 }
 
 static void bpf_testmod_exit(void)
@@ -669,6 +789,7 @@ static void bpf_testmod_exit(void)
 		msleep(20);
 
 	sysfs_remove_bin_file(kernel_kobj, &bin_attr_bpf_testmod_file);
+	unregister_bpf_testmod_uprobe();
 }
 
 module_init(bpf_testmod_init);
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index 311ac19d8992..1a50cd35205d 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -149,15 +149,82 @@ static void test_uretprobe_regs_equal(void)
 cleanup:
 	uprobe_syscall__destroy(skel);
 }
+
+#define BPF_TESTMOD_UPROBE_TEST_FILE "/sys/kernel/bpf_testmod_uprobe"
+
+static int write_bpf_testmod_uprobe(unsigned long offset)
+{
+	size_t n, ret;
+	char buf[30];
+	int fd;
+
+	n = sprintf(buf, "%lu", offset);
+
+	fd = open(BPF_TESTMOD_UPROBE_TEST_FILE, O_WRONLY);
+	if (fd < 0)
+		return -errno;
+
+	ret = write(fd, buf, n);
+	close(fd);
+	return ret != n ? (int) ret : 0;
+}
+
+static void test_uretprobe_regs_change(void)
+{
+	struct pt_regs before = {}, after = {};
+	unsigned long *pb = (unsigned long *) &before;
+	unsigned long *pa = (unsigned long *) &after;
+	unsigned long cnt = sizeof(before)/sizeof(*pb);
+	unsigned int i, err, offset;
+
+	offset = get_uprobe_offset(uretprobe_regs_trigger);
+
+	err = write_bpf_testmod_uprobe(offset);
+	if (!ASSERT_OK(err, "register_uprobe"))
+		return;
+
+	uretprobe_regs(&before, &after);
+
+	err = write_bpf_testmod_uprobe(0);
+	if (!ASSERT_OK(err, "unregister_uprobe"))
+		return;
+
+	for (i = 0; i < cnt; i++) {
+		unsigned int offset = i * sizeof(unsigned long);
+
+		switch (offset) {
+		case offsetof(struct pt_regs, rax):
+			ASSERT_EQ(pa[i], 0x12345678deadbeef, "rax");
+			break;
+		case offsetof(struct pt_regs, rcx):
+			ASSERT_EQ(pa[i], 0x87654321feebdaed, "rcx");
+			break;
+		case offsetof(struct pt_regs, r11):
+			ASSERT_EQ(pa[i], (__u64) -1, "r11");
+			break;
+		default:
+			if (!ASSERT_EQ(pa[i], pb[i], "register before-after value check"))
+				fprintf(stdout, "failed register offset %u\n", offset);
+		}
+	}
+}
+
 #else
 static void test_uretprobe_regs_equal(void)
 {
 	test__skip();
 }
+
+static void test_uretprobe_regs_change(void)
+{
+	test__skip();
+}
 #endif
 
 void test_uprobe_syscall(void)
 {
 	if (test__start_subtest("uretprobe_regs_equal"))
 		test_uretprobe_regs_equal();
+	if (test__start_subtest("uretprobe_regs_change"))
+		test_uretprobe_regs_change();
 }
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv4 bpf-next 5/7] selftests/bpf: Add uretprobe syscall call from user space test
  2024-05-02 12:23 [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up Jiri Olsa
                   ` (3 preceding siblings ...)
  2024-05-02 12:23 ` [PATCHv4 bpf-next 4/7] selftests/bpf: Add uretprobe syscall test for regs changes Jiri Olsa
@ 2024-05-02 12:23 ` Jiri Olsa
  2024-05-02 16:33   ` Andrii Nakryiko
  2024-05-02 12:23 ` [PATCHv4 bpf-next 6/7] selftests/bpf: Add uretprobe compat test Jiri Olsa
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-05-02 12:23 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

Adding test to verify that when called from outside of the
trampoline provided by kernel, the uretprobe syscall will cause
calling process to receive SIGILL signal and the attached bpf
program is not executed.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 .../selftests/bpf/prog_tests/uprobe_syscall.c | 95 +++++++++++++++++++
 .../bpf/progs/uprobe_syscall_executed.c       | 17 ++++
 2 files changed, 112 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c

diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index 1a50cd35205d..c6fdb8c59ea3 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -7,7 +7,10 @@
 #include <unistd.h>
 #include <asm/ptrace.h>
 #include <linux/compiler.h>
+#include <linux/stringify.h>
+#include <sys/wait.h>
 #include "uprobe_syscall.skel.h"
+#include "uprobe_syscall_executed.skel.h"
 
 __naked unsigned long uretprobe_regs_trigger(void)
 {
@@ -209,6 +212,91 @@ static void test_uretprobe_regs_change(void)
 	}
 }
 
+#ifndef __NR_uretprobe
+#define __NR_uretprobe 462
+#endif
+
+__naked unsigned long uretprobe_syscall_call_1(void)
+{
+	/*
+	 * Pretend we are uretprobe trampoline to trigger the return
+	 * probe invocation in order to verify we get SIGILL.
+	 */
+	asm volatile (
+		"pushq %rax\n"
+		"pushq %rcx\n"
+		"pushq %r11\n"
+		"movq $" __stringify(__NR_uretprobe) ", %rax\n"
+		"syscall\n"
+		"popq %r11\n"
+		"popq %rcx\n"
+		"retq\n"
+	);
+}
+
+__naked unsigned long uretprobe_syscall_call(void)
+{
+	asm volatile (
+		"call uretprobe_syscall_call_1\n"
+		"retq\n"
+	);
+}
+
+static void test_uretprobe_syscall_call(void)
+{
+	LIBBPF_OPTS(bpf_uprobe_multi_opts, opts,
+		.retprobe = true,
+	);
+	struct uprobe_syscall_executed *skel;
+	int pid, status, err, go[2], c;
+
+	if (pipe(go))
+		return;
+
+	skel = uprobe_syscall_executed__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "uprobe_syscall_executed__open_and_load"))
+		goto cleanup;
+
+	pid = fork();
+	if (!ASSERT_GE(pid, 0, "fork"))
+		goto cleanup;
+
+	/* child */
+	if (pid == 0) {
+		close(go[1]);
+
+		/* wait for parent's kick */
+		err = read(go[0], &c, 1);
+		if (err != 1)
+			exit(-1);
+
+		uretprobe_syscall_call();
+		_exit(0);
+	}
+
+	skel->links.test = bpf_program__attach_uprobe_multi(skel->progs.test, pid,
+							    "/proc/self/exe",
+							    "uretprobe_syscall_call", &opts);
+	if (!ASSERT_OK_PTR(skel->links.test, "bpf_program__attach_uprobe_multi"))
+		goto cleanup;
+
+	/* kick the child */
+	write(go[1], &c, 1);
+	err = waitpid(pid, &status, 0);
+	ASSERT_EQ(err, pid, "waitpid");
+
+	/* verify the child got killed with SIGILL */
+	ASSERT_EQ(WIFSIGNALED(status), 1, "WIFSIGNALED");
+	ASSERT_EQ(WTERMSIG(status), SIGILL, "WTERMSIG");
+
+	/* verify the uretprobe program wasn't called */
+	ASSERT_EQ(skel->bss->executed, 0, "executed");
+
+cleanup:
+	uprobe_syscall_executed__destroy(skel);
+	close(go[1]);
+	close(go[0]);
+}
 #else
 static void test_uretprobe_regs_equal(void)
 {
@@ -219,6 +307,11 @@ static void test_uretprobe_regs_change(void)
 {
 	test__skip();
 }
+
+static void test_uretprobe_syscall_call(void)
+{
+	test__skip();
+}
 #endif
 
 void test_uprobe_syscall(void)
@@ -227,4 +320,6 @@ void test_uprobe_syscall(void)
 		test_uretprobe_regs_equal();
 	if (test__start_subtest("uretprobe_regs_change"))
 		test_uretprobe_regs_change();
+	if (test__start_subtest("uretprobe_syscall_call"))
+		test_uretprobe_syscall_call();
 }
diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
new file mode 100644
index 000000000000..0d7f1a7db2e2
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <string.h>
+
+struct pt_regs regs;
+
+char _license[] SEC("license") = "GPL";
+
+int executed = 0;
+
+SEC("uretprobe.multi")
+int test(struct pt_regs *regs)
+{
+	executed = 1;
+	return 0;
+}
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv4 bpf-next 6/7] selftests/bpf: Add uretprobe compat test
  2024-05-02 12:23 [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up Jiri Olsa
                   ` (4 preceding siblings ...)
  2024-05-02 12:23 ` [PATCHv4 bpf-next 5/7] selftests/bpf: Add uretprobe syscall call from user space test Jiri Olsa
@ 2024-05-02 12:23 ` Jiri Olsa
  2024-05-02 16:35   ` Andrii Nakryiko
  2024-05-02 12:23 ` [PATCHv4 7/7] man2: Add uretprobe syscall page Jiri Olsa
  2024-05-02 16:43 ` [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up Andrii Nakryiko
  7 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-05-02 12:23 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

Adding test that adds return uprobe inside 32-bit task
and verify the return uprobe and attached bpf programs
get properly executed.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/testing/selftests/bpf/.gitignore        |  1 +
 tools/testing/selftests/bpf/Makefile          |  7 ++-
 .../selftests/bpf/prog_tests/uprobe_syscall.c | 60 +++++++++++++++++++
 3 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index f1aebabfb017..69d71223c0dd 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -45,6 +45,7 @@ test_cpp
 /veristat
 /sign-file
 /uprobe_multi
+/uprobe_compat
 *.ko
 *.tmp
 xskxceiver
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 82247aeef857..a94352162290 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -133,7 +133,7 @@ TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \
 	xskxceiver xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata \
 	xdp_features bpf_test_no_cfi.ko
 
-TEST_GEN_FILES += liburandom_read.so urandom_read sign-file uprobe_multi
+TEST_GEN_FILES += liburandom_read.so urandom_read sign-file uprobe_multi uprobe_compat
 
 ifneq ($(V),1)
 submake_extras := feature_display=0
@@ -631,6 +631,7 @@ TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko	\
 		       $(OUTPUT)/xdp_synproxy				\
 		       $(OUTPUT)/sign-file				\
 		       $(OUTPUT)/uprobe_multi				\
+		       $(OUTPUT)/uprobe_compat				\
 		       ima_setup.sh 					\
 		       verify_sig_setup.sh				\
 		       $(wildcard progs/btf_dump_test_case_*.c)		\
@@ -752,6 +753,10 @@ $(OUTPUT)/uprobe_multi: uprobe_multi.c
 	$(call msg,BINARY,,$@)
 	$(Q)$(CC) $(CFLAGS) -O0 $(LDFLAGS) $^ $(LDLIBS) -o $@
 
+$(OUTPUT)/uprobe_compat:
+	$(call msg,BINARY,,$@)
+	$(Q)echo "int main() { return 0; }" | $(CC) $(CFLAGS) -xc -m32 -O0 - -o $@
+
 EXTRA_CLEAN := $(SCRATCH_DIR) $(HOST_SCRATCH_DIR)			\
 	prog_tests/tests.h map_tests/tests.h verifier/tests.h		\
 	feature bpftool							\
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index c6fdb8c59ea3..bfea9a0368a4 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -5,6 +5,7 @@
 #ifdef __x86_64__
 
 #include <unistd.h>
+#include <stdlib.h>
 #include <asm/ptrace.h>
 #include <linux/compiler.h>
 #include <linux/stringify.h>
@@ -297,6 +298,58 @@ static void test_uretprobe_syscall_call(void)
 	close(go[1]);
 	close(go[0]);
 }
+
+static void test_uretprobe_compat(void)
+{
+	LIBBPF_OPTS(bpf_uprobe_multi_opts, opts,
+		.retprobe = true,
+	);
+	struct uprobe_syscall_executed *skel;
+	int err, go[2], pid, c, status;
+
+	if (pipe(go))
+		return;
+
+	skel = uprobe_syscall_executed__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "uprobe_syscall_executed__open_and_load"))
+		goto cleanup;
+
+	pid = fork();
+	if (pid < 0)
+		goto cleanup;
+
+	/* child */
+	if (pid == 0) {
+		close(go[1]);
+
+		/* wait for parent's kick */
+		err = read(go[0], &c, 1);
+		if (err != 1)
+			exit(-1);
+		execl("./uprobe_compat", "./uprobe_compat", NULL);
+		exit(-1);
+	}
+
+	skel->links.test = bpf_program__attach_uprobe_multi(skel->progs.test, pid,
+							    "./uprobe_compat", "main", &opts);
+	if (!ASSERT_OK_PTR(skel->links.test, "bpf_program__attach_uprobe_multi"))
+		goto cleanup;
+
+	/* kick the child */
+	write(go[1], &c, 1);
+	err = waitpid(pid, &status, 0);
+	ASSERT_EQ(err, pid, "waitpid");
+
+	/* verify the child exited normally and the bpf program got executed */
+	ASSERT_EQ(WIFEXITED(status), 1, "WIFEXITED");
+	ASSERT_EQ(WEXITSTATUS(status), 0, "WEXITSTATUS");
+	ASSERT_EQ(skel->bss->executed, 1, "executed");
+
+cleanup:
+	uprobe_syscall_executed__destroy(skel);
+	close(go[0]);
+	close(go[1]);
+}
 #else
 static void test_uretprobe_regs_equal(void)
 {
@@ -312,6 +365,11 @@ static void test_uretprobe_syscall_call(void)
 {
 	test__skip();
 }
+
+static void test_uretprobe_compat(void)
+{
+	test__skip();
+}
 #endif
 
 void test_uprobe_syscall(void)
@@ -322,4 +380,6 @@ void test_uprobe_syscall(void)
 		test_uretprobe_regs_change();
 	if (test__start_subtest("uretprobe_syscall_call"))
 		test_uretprobe_syscall_call();
+	if (test__start_subtest("uretprobe_compat"))
+		test_uretprobe_compat();
 }
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv4 7/7] man2: Add uretprobe syscall page
  2024-05-02 12:23 [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up Jiri Olsa
                   ` (5 preceding siblings ...)
  2024-05-02 12:23 ` [PATCHv4 bpf-next 6/7] selftests/bpf: Add uretprobe compat test Jiri Olsa
@ 2024-05-02 12:23 ` Jiri Olsa
  2024-05-02 13:43   ` Alejandro Colomar
  2024-05-02 16:43 ` [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up Andrii Nakryiko
  7 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-05-02 12:23 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

Adding man page for new uretprobe syscall.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 man2/uretprobe.2 | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)
 create mode 100644 man2/uretprobe.2

diff --git a/man2/uretprobe.2 b/man2/uretprobe.2
new file mode 100644
index 000000000000..08fe6a670430
--- /dev/null
+++ b/man2/uretprobe.2
@@ -0,0 +1,45 @@
+.\" Copyright (C) 2024, Jiri Olsa <jolsa@kernel.org>
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+uretprobe \- execute pending return uprobes
+.SH SYNOPSIS
+.nf
+.B int uretprobe(void)
+.fi
+.SH DESCRIPTION
+Kernel is using
+.BR uretprobe()
+syscall to trigger uprobe return probe consumers instead of using
+standard breakpoint instruction.
+
+The uretprobe syscall is not supposed to be called directly by user, it's allowed
+to be invoked only through user space trampoline provided by kernel.
+When called from outside of this trampoline, the calling process will receive
+.BR SIGILL .
+
+.SH RETURN VALUE
+.BR uretprobe()
+return value is specific for given architecture.
+
+.SH VERSIONS
+This syscall is not specified in POSIX,
+and details of its behavior vary across systems.
+.SH STANDARDS
+None.
+.SH NOTES
+.BR uretprobe()
+syscall is initially introduced on x86-64 architecture, because doing syscall
+is faster than doing breakpoint trap on it. It might be extended to other
+architectures.
+
+.BR uretprobe()
+syscall exists only to allow the invocation of return uprobe consumers.
+It should
+.B never
+be called directly.
+Details of the arguments (if any) passed to
+.BR uretprobe ()
+and the return value are specific for given architecture.
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 7/7] man2: Add uretprobe syscall page
  2024-05-02 12:23 ` [PATCHv4 7/7] man2: Add uretprobe syscall page Jiri Olsa
@ 2024-05-02 13:43   ` Alejandro Colomar
  2024-05-02 20:13     ` Jiri Olsa
  0 siblings, 1 reply; 27+ messages in thread
From: Alejandro Colomar @ 2024-05-02 13:43 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

[-- Attachment #1: Type: text/plain, Size: 2728 bytes --]

Hi Jiri,

On Thu, May 02, 2024 at 02:23:13PM +0200, Jiri Olsa wrote:
> Adding man page for new uretprobe syscall.
> 
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  man2/uretprobe.2 | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
>  create mode 100644 man2/uretprobe.2
> 
> diff --git a/man2/uretprobe.2 b/man2/uretprobe.2
> new file mode 100644
> index 000000000000..08fe6a670430
> --- /dev/null
> +++ b/man2/uretprobe.2
> @@ -0,0 +1,45 @@
> +.\" Copyright (C) 2024, Jiri Olsa <jolsa@kernel.org>
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +uretprobe \- execute pending return uprobes
> +.SH SYNOPSIS
> +.nf
> +.B int uretprobe(void)
> +.fi
> +.SH DESCRIPTION
> +Kernel is using
> +.BR uretprobe()
> +syscall to trigger uprobe return probe consumers instead of using
> +standard breakpoint instruction.
> +

Please use .P instead of a blank.  See man-pages(7):

   Formatting conventions (general)
     Paragraphs should be separated by suitable markers (usually either
     .P or .IP).  Do not separate paragraphs using blank lines, as this
     results in poor rendering in some output formats  (such  as  Post‐
     Script and PDF).

> +The uretprobe syscall is not supposed to be called directly by user, it's allowed

s/by user/by the user/

> +to be invoked only through user space trampoline provided by kernel.

s/user space/user-space/

Missing a few 'the' too, here and in the rest of the page.

> +When called from outside of this trampoline, the calling process will receive
> +.BR SIGILL .
> +
> +.SH RETURN VALUE
> +.BR uretprobe()

You're missing a space here:

.BR uretprobe ()

> +return value is specific for given architecture.
> +
> +.SH VERSIONS
> +This syscall is not specified in POSIX,
> +and details of its behavior vary across systems.
> +.SH STANDARDS
> +None.

You could add a HISTORY section.

Have a lovely day!
Alex

> +.SH NOTES
> +.BR uretprobe()
> +syscall is initially introduced on x86-64 architecture, because doing syscall
> +is faster than doing breakpoint trap on it. It might be extended to other
> +architectures.
> +
> +.BR uretprobe()
> +syscall exists only to allow the invocation of return uprobe consumers.
> +It should
> +.B never
> +be called directly.
> +Details of the arguments (if any) passed to
> +.BR uretprobe ()
> +and the return value are specific for given architecture.
> -- 
> 2.44.0
> 
> 

-- 
<https://www.alejandro-colomar.es/>
A client is hiring kernel driver, mm, and/or crypto developers;
contact me if interested.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 bpf-next 5/7] selftests/bpf: Add uretprobe syscall call from user space test
  2024-05-02 12:23 ` [PATCHv4 bpf-next 5/7] selftests/bpf: Add uretprobe syscall call from user space test Jiri Olsa
@ 2024-05-02 16:33   ` Andrii Nakryiko
  0 siblings, 0 replies; 27+ messages in thread
From: Andrii Nakryiko @ 2024-05-02 16:33 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

On Thu, May 2, 2024 at 5:24 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding test to verify that when called from outside of the
> trampoline provided by kernel, the uretprobe syscall will cause
> calling process to receive SIGILL signal and the attached bpf
> program is not executed.
>
> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  .../selftests/bpf/prog_tests/uprobe_syscall.c | 95 +++++++++++++++++++
>  .../bpf/progs/uprobe_syscall_executed.c       | 17 ++++
>  2 files changed, 112 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> index 1a50cd35205d..c6fdb8c59ea3 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> @@ -7,7 +7,10 @@
>  #include <unistd.h>
>  #include <asm/ptrace.h>
>  #include <linux/compiler.h>
> +#include <linux/stringify.h>
> +#include <sys/wait.h>
>  #include "uprobe_syscall.skel.h"
> +#include "uprobe_syscall_executed.skel.h"
>
>  __naked unsigned long uretprobe_regs_trigger(void)
>  {
> @@ -209,6 +212,91 @@ static void test_uretprobe_regs_change(void)
>         }
>  }
>
> +#ifndef __NR_uretprobe
> +#define __NR_uretprobe 462
> +#endif
> +
> +__naked unsigned long uretprobe_syscall_call_1(void)
> +{
> +       /*
> +        * Pretend we are uretprobe trampoline to trigger the return
> +        * probe invocation in order to verify we get SIGILL.
> +        */
> +       asm volatile (
> +               "pushq %rax\n"
> +               "pushq %rcx\n"
> +               "pushq %r11\n"
> +               "movq $" __stringify(__NR_uretprobe) ", %rax\n"
> +               "syscall\n"
> +               "popq %r11\n"
> +               "popq %rcx\n"
> +               "retq\n"
> +       );
> +}
> +
> +__naked unsigned long uretprobe_syscall_call(void)
> +{
> +       asm volatile (
> +               "call uretprobe_syscall_call_1\n"
> +               "retq\n"
> +       );
> +}
> +
> +static void test_uretprobe_syscall_call(void)
> +{
> +       LIBBPF_OPTS(bpf_uprobe_multi_opts, opts,
> +               .retprobe = true,
> +       );
> +       struct uprobe_syscall_executed *skel;
> +       int pid, status, err, go[2], c;
> +
> +       if (pipe(go))
> +               return;

very unlikely to fail, but still, ASSERT_OK() would be in order here

But regardless:

Acked-by: Andrii Nakryiko <andrii@kernel.org>

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 bpf-next 6/7] selftests/bpf: Add uretprobe compat test
  2024-05-02 12:23 ` [PATCHv4 bpf-next 6/7] selftests/bpf: Add uretprobe compat test Jiri Olsa
@ 2024-05-02 16:35   ` Andrii Nakryiko
  0 siblings, 0 replies; 27+ messages in thread
From: Andrii Nakryiko @ 2024-05-02 16:35 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

On Thu, May 2, 2024 at 5:24 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding test that adds return uprobe inside 32-bit task
> and verify the return uprobe and attached bpf programs
> get properly executed.
>
> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/testing/selftests/bpf/.gitignore        |  1 +
>  tools/testing/selftests/bpf/Makefile          |  7 ++-
>  .../selftests/bpf/prog_tests/uprobe_syscall.c | 60 +++++++++++++++++++
>  3 files changed, 67 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
> index f1aebabfb017..69d71223c0dd 100644
> --- a/tools/testing/selftests/bpf/.gitignore
> +++ b/tools/testing/selftests/bpf/.gitignore
> @@ -45,6 +45,7 @@ test_cpp
>  /veristat
>  /sign-file
>  /uprobe_multi
> +/uprobe_compat
>  *.ko
>  *.tmp
>  xskxceiver
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 82247aeef857..a94352162290 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -133,7 +133,7 @@ TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \
>         xskxceiver xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata \
>         xdp_features bpf_test_no_cfi.ko
>
> -TEST_GEN_FILES += liburandom_read.so urandom_read sign-file uprobe_multi
> +TEST_GEN_FILES += liburandom_read.so urandom_read sign-file uprobe_multi uprobe_compat
>
>  ifneq ($(V),1)
>  submake_extras := feature_display=0
> @@ -631,6 +631,7 @@ TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko      \
>                        $(OUTPUT)/xdp_synproxy                           \
>                        $(OUTPUT)/sign-file                              \
>                        $(OUTPUT)/uprobe_multi                           \
> +                      $(OUTPUT)/uprobe_compat                          \
>                        ima_setup.sh                                     \
>                        verify_sig_setup.sh                              \
>                        $(wildcard progs/btf_dump_test_case_*.c)         \
> @@ -752,6 +753,10 @@ $(OUTPUT)/uprobe_multi: uprobe_multi.c
>         $(call msg,BINARY,,$@)
>         $(Q)$(CC) $(CFLAGS) -O0 $(LDFLAGS) $^ $(LDLIBS) -o $@
>
> +$(OUTPUT)/uprobe_compat:
> +       $(call msg,BINARY,,$@)
> +       $(Q)echo "int main() { return 0; }" | $(CC) $(CFLAGS) -xc -m32 -O0 - -o $@
> +
>  EXTRA_CLEAN := $(SCRATCH_DIR) $(HOST_SCRATCH_DIR)                      \
>         prog_tests/tests.h map_tests/tests.h verifier/tests.h           \
>         feature bpftool                                                 \
> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> index c6fdb8c59ea3..bfea9a0368a4 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> @@ -5,6 +5,7 @@
>  #ifdef __x86_64__
>
>  #include <unistd.h>
> +#include <stdlib.h>
>  #include <asm/ptrace.h>
>  #include <linux/compiler.h>
>  #include <linux/stringify.h>
> @@ -297,6 +298,58 @@ static void test_uretprobe_syscall_call(void)
>         close(go[1]);
>         close(go[0]);
>  }
> +
> +static void test_uretprobe_compat(void)
> +{
> +       LIBBPF_OPTS(bpf_uprobe_multi_opts, opts,
> +               .retprobe = true,
> +       );
> +       struct uprobe_syscall_executed *skel;
> +       int err, go[2], pid, c, status;
> +
> +       if (pipe(go))
> +               return;

ASSERT_OK() missing, like in the previous patch

Thanks for switching to pipe() + global variable instead of using trace_pipe.

Acked-by: Andrii Nakryiko <andrii@kernel.org>

> +
> +       skel = uprobe_syscall_executed__open_and_load();
> +       if (!ASSERT_OK_PTR(skel, "uprobe_syscall_executed__open_and_load"))
> +               goto cleanup;
> +

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up
  2024-05-02 12:23 [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up Jiri Olsa
                   ` (6 preceding siblings ...)
  2024-05-02 12:23 ` [PATCHv4 7/7] man2: Add uretprobe syscall page Jiri Olsa
@ 2024-05-02 16:43 ` Andrii Nakryiko
  2024-05-02 20:04   ` Jiri Olsa
  7 siblings, 1 reply; 27+ messages in thread
From: Andrii Nakryiko @ 2024-05-02 16:43 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

On Thu, May 2, 2024 at 5:23 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> hi,
> as part of the effort on speeding up the uprobes [0] coming with
> return uprobe optimization by using syscall instead of the trap
> on the uretprobe trampoline.
>
> The speed up depends on instruction type that uprobe is installed
> and depends on specific HW type, please check patch 1 for details.
>
> Patches 1-6 are based on bpf-next/master, but path 1 and 2 are
> apply-able on linux-trace.git tree probes/for-next branch.
> Patch 7 is based on man-pages master.
>
> v4 changes:
>   - added acks [Oleg,Andrii,Masami]
>   - reworded the man page and adding more info to NOTE section [Masami]
>   - rewrote bpf tests not to use trace_pipe [Andrii]
>   - cc-ed linux-man list
>
> Also available at:
>   https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>   uretprobe_syscall
>

It looks great to me, thanks! Unfortunately BPF CI build is broken,
probably due to some of the Makefile additions, please investigate and
fix (or we'll need to fix something on BPF CI side), but it looks like
you'll need another revision, unfortunately.

pw-bot: cr

  [0] https://github.com/kernel-patches/bpf/actions/runs/8923849088/job/24509002194



But while we are at it.

Masami, Oleg,

What should be the logistics of landing this? Can/should we route this
through the bpf-next tree, given there are lots of BPF-based
selftests? Or you want to take this through
linux-trace/probes/for-next? In the latter case, it's probably better
to apply only the first two patches to probes/for-next and the rest
should still go through the bpf-next tree (otherwise we are running
into conflicts in BPF selftests). Previously we were handling such
cross-tree dependencies by creating a named branch or tag, and merging
it into bpf-next (so that all SHAs are preserved). It's a bunch of
extra work for everyone involved, so the simplest way would be to just
land through bpf-next, of course. But let me know your preferences.

Thanks!

> thanks,
> jirka
>
>
> Notes to check list items in Documentation/process/adding-syscalls.rst:
>
> - System Call Alternatives
>   New syscall seems like the best way in here, becase we need

typo (thanks, Gmail): because

>   just to quickly enter kernel with no extra arguments processing,
>   which we'd need to do if we decided to use another syscall.
>
> - Designing the API: Planning for Extension
>   The uretprobe syscall is very specific and most likely won't be
>   extended in the future.
>
>   At the moment it does not take any arguments and even if it does
>   in future, it's allowed to be called only from trampoline prepared
>   by kernel, so there'll be no broken user.
>
> - Designing the API: Other Considerations
>   N/A because uretprobe syscall does not return reference to kernel
>   object.
>
> - Proposing the API
>   Wiring up of the uretprobe system call si in separate change,

typo: is

>   selftests and man page changes are part of the patchset.
>
> - Generic System Call Implementation
>   There's no CONFIG option for the new functionality because it
>   keeps the same behaviour from the user POV.
>
> - x86 System Call Implementation
>   It's 64-bit syscall only.
>
> - Compatibility System Calls (Generic)
>   N/A uretprobe syscall has no arguments and is not supported
>   for compat processes.
>
> - Compatibility System Calls (x86)
>   N/A uretprobe syscall is not supported for compat processes.
>
> - System Calls Returning Elsewhere
>   N/A.
>
> - Other Details
>   N/A.
>
> - Testing
>   Adding new bpf selftests and ran ltp on top of this change.
>
> - Man Page
>   Attached.
>
> - Do not call System Calls in the Kernel
>   N/A.
>
>
> [0] https://lore.kernel.org/bpf/ZeCXHKJ--iYYbmLj@krava/
> ---
> Jiri Olsa (6):
>       uprobe: Wire up uretprobe system call
>       uprobe: Add uretprobe syscall to speed up return probe
>       selftests/bpf: Add uretprobe syscall test for regs integrity
>       selftests/bpf: Add uretprobe syscall test for regs changes
>       selftests/bpf: Add uretprobe syscall call from user space test
>       selftests/bpf: Add uretprobe compat test
>
>  arch/x86/entry/syscalls/syscall_64.tbl                      |   1 +
>  arch/x86/kernel/uprobes.c                                   | 115 ++++++++++++++++++++++++++++
>  include/linux/syscalls.h                                    |   2 +
>  include/linux/uprobes.h                                     |   3 +
>  include/uapi/asm-generic/unistd.h                           |   5 +-
>  kernel/events/uprobes.c                                     |  24 ++++--
>  kernel/sys_ni.c                                             |   2 +
>  tools/include/linux/compiler.h                              |   4 +
>  tools/testing/selftests/bpf/.gitignore                      |   1 +
>  tools/testing/selftests/bpf/Makefile                        |   7 +-
>  tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c       | 123 ++++++++++++++++++++++++++++-
>  tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c     | 382 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tools/testing/selftests/bpf/progs/uprobe_syscall.c          |  15 ++++
>  tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c |  17 +++++
>  14 files changed, 691 insertions(+), 10 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
>  create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall.c
>  create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
>
> Jiri Olsa (1):
>       man2: Add uretprobe syscall page
>
>  man2/uretprobe.2 | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
>  create mode 100644 man2/uretprobe.2

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up
  2024-05-02 16:43 ` [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up Andrii Nakryiko
@ 2024-05-02 20:04   ` Jiri Olsa
  2024-05-03 18:03     ` Andrii Nakryiko
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-05-02 20:04 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

On Thu, May 02, 2024 at 09:43:02AM -0700, Andrii Nakryiko wrote:
> On Thu, May 2, 2024 at 5:23 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > hi,
> > as part of the effort on speeding up the uprobes [0] coming with
> > return uprobe optimization by using syscall instead of the trap
> > on the uretprobe trampoline.
> >
> > The speed up depends on instruction type that uprobe is installed
> > and depends on specific HW type, please check patch 1 for details.
> >
> > Patches 1-6 are based on bpf-next/master, but path 1 and 2 are
> > apply-able on linux-trace.git tree probes/for-next branch.
> > Patch 7 is based on man-pages master.
> >
> > v4 changes:
> >   - added acks [Oleg,Andrii,Masami]
> >   - reworded the man page and adding more info to NOTE section [Masami]
> >   - rewrote bpf tests not to use trace_pipe [Andrii]
> >   - cc-ed linux-man list
> >
> > Also available at:
> >   https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> >   uretprobe_syscall
> >
> 
> It looks great to me, thanks! Unfortunately BPF CI build is broken,
> probably due to some of the Makefile additions, please investigate and
> fix (or we'll need to fix something on BPF CI side), but it looks like
> you'll need another revision, unfortunately.
> 
> pw-bot: cr
> 
>   [0] https://github.com/kernel-patches/bpf/actions/runs/8923849088/job/24509002194

yes, I think it's missing the 32-bit libc for uprobe_compat binary,
probably it needs to be added to github.com:libbpf/ci.git setup-build-env/action.yml ?
hm but I'm not sure how to test it, need to check

> 
> 
> 
> But while we are at it.
> 
> Masami, Oleg,
> 
> What should be the logistics of landing this? Can/should we route this
> through the bpf-next tree, given there are lots of BPF-based
> selftests? Or you want to take this through
> linux-trace/probes/for-next? In the latter case, it's probably better
> to apply only the first two patches to probes/for-next and the rest
> should still go through the bpf-next tree (otherwise we are running

I think this was the plan, previously mentioned in here:
https://lore.kernel.org/bpf/20240423000943.478ccf1e735a63c6c1b4c66b@kernel.org/

> into conflicts in BPF selftests). Previously we were handling such
> cross-tree dependencies by creating a named branch or tag, and merging
> it into bpf-next (so that all SHAs are preserved). It's a bunch of
> extra work for everyone involved, so the simplest way would be to just
> land through bpf-next, of course. But let me know your preferences.
> 
> Thanks!
> 
> > thanks,
> > jirka
> >
> >
> > Notes to check list items in Documentation/process/adding-syscalls.rst:
> >
> > - System Call Alternatives
> >   New syscall seems like the best way in here, becase we need
> 
> typo (thanks, Gmail): because

ok

> 
> >   just to quickly enter kernel with no extra arguments processing,
> >   which we'd need to do if we decided to use another syscall.
> >
> > - Designing the API: Planning for Extension
> >   The uretprobe syscall is very specific and most likely won't be
> >   extended in the future.
> >
> >   At the moment it does not take any arguments and even if it does
> >   in future, it's allowed to be called only from trampoline prepared
> >   by kernel, so there'll be no broken user.
> >
> > - Designing the API: Other Considerations
> >   N/A because uretprobe syscall does not return reference to kernel
> >   object.
> >
> > - Proposing the API
> >   Wiring up of the uretprobe system call si in separate change,
> 
> typo: is

ok, thanks

jirka

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 7/7] man2: Add uretprobe syscall page
  2024-05-02 13:43   ` Alejandro Colomar
@ 2024-05-02 20:13     ` Jiri Olsa
  2024-05-02 22:06       ` Alejandro Colomar
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-05-02 20:13 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

On Thu, May 02, 2024 at 03:43:27PM +0200, Alejandro Colomar wrote:
> Hi Jiri,
> 
> On Thu, May 02, 2024 at 02:23:13PM +0200, Jiri Olsa wrote:
> > Adding man page for new uretprobe syscall.
> > 
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  man2/uretprobe.2 | 45 +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 45 insertions(+)
> >  create mode 100644 man2/uretprobe.2
> > 
> > diff --git a/man2/uretprobe.2 b/man2/uretprobe.2
> > new file mode 100644
> > index 000000000000..08fe6a670430
> > --- /dev/null
> > +++ b/man2/uretprobe.2
> > @@ -0,0 +1,45 @@
> > +.\" Copyright (C) 2024, Jiri Olsa <jolsa@kernel.org>
> > +.\"
> > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> > +.\"
> > +.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
> > +.SH NAME
> > +uretprobe \- execute pending return uprobes
> > +.SH SYNOPSIS
> > +.nf
> > +.B int uretprobe(void)
> > +.fi
> > +.SH DESCRIPTION
> > +Kernel is using
> > +.BR uretprobe()
> > +syscall to trigger uprobe return probe consumers instead of using
> > +standard breakpoint instruction.
> > +
> 
> Please use .P instead of a blank.  See man-pages(7):
> 
>    Formatting conventions (general)
>      Paragraphs should be separated by suitable markers (usually either
>      .P or .IP).  Do not separate paragraphs using blank lines, as this
>      results in poor rendering in some output formats  (such  as  Post‐
>      Script and PDF).

ok, will do

> 
> > +The uretprobe syscall is not supposed to be called directly by user, it's allowed
> 
> s/by user/by the user/

ok

> 
> > +to be invoked only through user space trampoline provided by kernel.
> 
> s/user space/user-space/

ok

> 
> Missing a few 'the' too, here and in the rest of the page.

ok, will check

> 
> > +When called from outside of this trampoline, the calling process will receive
> > +.BR SIGILL .
> > +
> > +.SH RETURN VALUE
> > +.BR uretprobe()
> 
> You're missing a space here:
> 
> .BR uretprobe ()

ok

> 
> > +return value is specific for given architecture.
> > +
> > +.SH VERSIONS
> > +This syscall is not specified in POSIX,
> > +and details of its behavior vary across systems.
> > +.SH STANDARDS
> > +None.
> 
> You could add a HISTORY section.

ok, IIUC for this syscall it should contain just kernel version where
it got merged, right?

> 
> Have a lovely day!

thanks for review,
jirka

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 7/7] man2: Add uretprobe syscall page
  2024-05-02 20:13     ` Jiri Olsa
@ 2024-05-02 22:06       ` Alejandro Colomar
  0 siblings, 0 replies; 27+ messages in thread
From: Alejandro Colomar @ 2024-05-02 22:06 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

[-- Attachment #1: Type: text/plain, Size: 481 bytes --]

Hi Jiri,

On Thu, May 02, 2024 at 10:13:12PM +0200, Jiri Olsa wrote:
> > You could add a HISTORY section.
> 
> ok, IIUC for this syscall it should contain just kernel version where
> it got merged, right?

Yep.

> 
> > 
> > Have a lovely day!
> 
> thanks for review,
> jirka

Thanks for the page.

Have a lovely night!
Alex

-- 
<https://www.alejandro-colomar.es/>
A client is hiring kernel driver, mm, and/or crypto developers;
contact me if interested.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe
  2024-05-02 12:23 ` [PATCHv4 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe Jiri Olsa
@ 2024-05-03 11:34   ` Peter Zijlstra
  2024-05-03 13:04     ` Jiri Olsa
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2024-05-03 11:34 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Thomas Gleixner,
	Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski, rick.p.edgecombe

On Thu, May 02, 2024 at 02:23:08PM +0200, Jiri Olsa wrote:
> Adding uretprobe syscall instead of trap to speed up return probe.
> 
> At the moment the uretprobe setup/path is:
> 
>   - install entry uprobe
> 
>   - when the uprobe is hit, it overwrites probed function's return address
>     on stack with address of the trampoline that contains breakpoint
>     instruction
> 
>   - the breakpoint trap code handles the uretprobe consumers execution and
>     jumps back to original return address
> 
> This patch replaces the above trampoline's breakpoint instruction with new
> ureprobe syscall call. This syscall does exactly the same job as the trap
> with some more extra work:
> 
>   - syscall trampoline must save original value for rax/r11/rcx registers
>     on stack - rax is set to syscall number and r11/rcx are changed and
>     used by syscall instruction
> 
>   - the syscall code reads the original values of those registers and
>     restore those values in task's pt_regs area
> 
>   - only caller from trampoline exposed in '[uprobes]' is allowed,
>     the process will receive SIGILL signal otherwise
> 

Did you consider shadow stacks? IIRC we currently have userspace shadow
stack support available, and that will utterly break all of this.

It would be really nice if the new scheme would consider shadow stacks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe
  2024-05-03 11:34   ` Peter Zijlstra
@ 2024-05-03 13:04     ` Jiri Olsa
  2024-05-03 15:53       ` Edgecombe, Rick P
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-05-03 13:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Thomas Gleixner,
	Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski, rick.p.edgecombe

On Fri, May 03, 2024 at 01:34:53PM +0200, Peter Zijlstra wrote:
> On Thu, May 02, 2024 at 02:23:08PM +0200, Jiri Olsa wrote:
> > Adding uretprobe syscall instead of trap to speed up return probe.
> > 
> > At the moment the uretprobe setup/path is:
> > 
> >   - install entry uprobe
> > 
> >   - when the uprobe is hit, it overwrites probed function's return address
> >     on stack with address of the trampoline that contains breakpoint
> >     instruction
> > 
> >   - the breakpoint trap code handles the uretprobe consumers execution and
> >     jumps back to original return address
> > 
> > This patch replaces the above trampoline's breakpoint instruction with new
> > ureprobe syscall call. This syscall does exactly the same job as the trap
> > with some more extra work:
> > 
> >   - syscall trampoline must save original value for rax/r11/rcx registers
> >     on stack - rax is set to syscall number and r11/rcx are changed and
> >     used by syscall instruction
> > 
> >   - the syscall code reads the original values of those registers and
> >     restore those values in task's pt_regs area
> > 
> >   - only caller from trampoline exposed in '[uprobes]' is allowed,
> >     the process will receive SIGILL signal otherwise
> > 
> 
> Did you consider shadow stacks? IIRC we currently have userspace shadow
> stack support available, and that will utterly break all of this.

nope.. I guess it's the extra ret instruction in the trampoline that would
make it crash?

> 
> It would be really nice if the new scheme would consider shadow stacks.

I seem to have the hw with support for user_shstk, let me test that

thanks,
jirka

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe
  2024-05-03 13:04     ` Jiri Olsa
@ 2024-05-03 15:53       ` Edgecombe, Rick P
  2024-05-03 19:18         ` Jiri Olsa
  0 siblings, 1 reply; 27+ messages in thread
From: Edgecombe, Rick P @ 2024-05-03 15:53 UTC (permalink / raw)
  To: olsajiri, peterz
  Cc: songliubraving, luto, mhiramat, andrii, linux-api,
	john.fastabend, linux-kernel, mingo, rostedt, ast, tglx, yhs,
	linux-man, oleg, daniel, linux-trace-kernel, bpf, bp, x86

On Fri, 2024-05-03 at 15:04 +0200, Jiri Olsa wrote:
> On Fri, May 03, 2024 at 01:34:53PM +0200, Peter Zijlstra wrote:
> > On Thu, May 02, 2024 at 02:23:08PM +0200, Jiri Olsa wrote:
> > > Adding uretprobe syscall instead of trap to speed up return probe.
> > > 
> > > At the moment the uretprobe setup/path is:
> > > 
> > >    - install entry uprobe
> > > 
> > >    - when the uprobe is hit, it overwrites probed function's return
> > > address
> > >      on stack with address of the trampoline that contains breakpoint
> > >      instruction
> > > 
> > >    - the breakpoint trap code handles the uretprobe consumers execution
> > > and
> > >      jumps back to original return address

Hi,

I worked on the x86 shadow stack support.

I didn't know uprobes did anything like this. In hindsight I should have looked
more closely. The current upstream behavior is to overwrite the return address
on the stack?

Stupid uprobes question - what is actually overwriting the return address on the
stack? Is it the kernel? If so perhaps the kernel could just update the shadow
stack at the same time.

> > > 
> > > This patch replaces the above trampoline's breakpoint instruction with new
> > > ureprobe syscall call. This syscall does exactly the same job as the trap
> > > with some more extra work:
> > > 
> > >    - syscall trampoline must save original value for rax/r11/rcx registers
> > >      on stack - rax is set to syscall number and r11/rcx are changed and
> > >      used by syscall instruction
> > > 
> > >    - the syscall code reads the original values of those registers and
> > >      restore those values in task's pt_regs area
> > > 
> > >    - only caller from trampoline exposed in '[uprobes]' is allowed,
> > >      the process will receive SIGILL signal otherwise
> > > 
> > 
> > Did you consider shadow stacks? IIRC we currently have userspace shadow
> > stack support available, and that will utterly break all of this.
> 
> nope.. I guess it's the extra ret instruction in the trampoline that would
> make it crash?

The original behavior seems problematic for shadow stack IIUC. I'm not sure of
the additional breakage with the new behavior.

Roughly, how shadow stack works is there is an additional protected stack for
the app thread. The HW pushes to from the shadow stack with CALL, and pops from
it with RET. But it also continues to push and pop from the normal stack. On
pop, if the values don't match between the two stacks, an exception is
generated. The whole point is to prevent the app from overwriting its stack
return address to return to random places.

Userspace cannot (normally) write to the shadow stack, but the kernel can do
this or adust the SSP (shadow stack pointer). So in the kernel (for things like
sigreturn) there is an ability to do what is needed. Ptracers also can do things
like this.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up
  2024-05-02 20:04   ` Jiri Olsa
@ 2024-05-03 18:03     ` Andrii Nakryiko
  2024-05-03 20:39       ` Jiri Olsa
  0 siblings, 1 reply; 27+ messages in thread
From: Andrii Nakryiko @ 2024-05-03 18:03 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

On Thu, May 2, 2024 at 1:04 PM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Thu, May 02, 2024 at 09:43:02AM -0700, Andrii Nakryiko wrote:
> > On Thu, May 2, 2024 at 5:23 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > >
> > > hi,
> > > as part of the effort on speeding up the uprobes [0] coming with
> > > return uprobe optimization by using syscall instead of the trap
> > > on the uretprobe trampoline.
> > >
> > > The speed up depends on instruction type that uprobe is installed
> > > and depends on specific HW type, please check patch 1 for details.
> > >
> > > Patches 1-6 are based on bpf-next/master, but path 1 and 2 are
> > > apply-able on linux-trace.git tree probes/for-next branch.
> > > Patch 7 is based on man-pages master.
> > >
> > > v4 changes:
> > >   - added acks [Oleg,Andrii,Masami]
> > >   - reworded the man page and adding more info to NOTE section [Masami]
> > >   - rewrote bpf tests not to use trace_pipe [Andrii]
> > >   - cc-ed linux-man list
> > >
> > > Also available at:
> > >   https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> > >   uretprobe_syscall
> > >
> >
> > It looks great to me, thanks! Unfortunately BPF CI build is broken,
> > probably due to some of the Makefile additions, please investigate and
> > fix (or we'll need to fix something on BPF CI side), but it looks like
> > you'll need another revision, unfortunately.
> >
> > pw-bot: cr
> >
> >   [0] https://github.com/kernel-patches/bpf/actions/runs/8923849088/job/24509002194
>
> yes, I think it's missing the 32-bit libc for uprobe_compat binary,
> probably it needs to be added to github.com:libbpf/ci.git setup-build-env/action.yml ?
> hm but I'm not sure how to test it, need to check

You can create a custom PR directly against Github repo
(kernel-patches/bpf) and BPF CI will run all the tests on your custom
code. This way you can iterate without spamming the mailing list.

But I'm just wondering if it's worth complicating setup just for
testing this x32 compat mode. So maybe just dropping one of those
patches would be better?

>
> >
> >
> >
> > But while we are at it.
> >
> > Masami, Oleg,
> >
> > What should be the logistics of landing this? Can/should we route this
> > through the bpf-next tree, given there are lots of BPF-based
> > selftests? Or you want to take this through
> > linux-trace/probes/for-next? In the latter case, it's probably better
> > to apply only the first two patches to probes/for-next and the rest
> > should still go through the bpf-next tree (otherwise we are running
>
> I think this was the plan, previously mentioned in here:
> https://lore.kernel.org/bpf/20240423000943.478ccf1e735a63c6c1b4c66b@kernel.org/
>

Ok, then we'll have to land this patch set as two separate ones. It's
fine, let's figure out if you need to do anything for shadow stacks
and try to land it soon.

> > into conflicts in BPF selftests). Previously we were handling such
> > cross-tree dependencies by creating a named branch or tag, and merging
> > it into bpf-next (so that all SHAs are preserved). It's a bunch of
> > extra work for everyone involved, so the simplest way would be to just
> > land through bpf-next, of course. But let me know your preferences.
> >
> > Thanks!
> >
> > > thanks,
> > > jirka
> > >
> > >
> > > Notes to check list items in Documentation/process/adding-syscalls.rst:
> > >
> > > - System Call Alternatives
> > >   New syscall seems like the best way in here, becase we need
> >
> > typo (thanks, Gmail): because
>
> ok
>
> >
> > >   just to quickly enter kernel with no extra arguments processing,
> > >   which we'd need to do if we decided to use another syscall.
> > >
> > > - Designing the API: Planning for Extension
> > >   The uretprobe syscall is very specific and most likely won't be
> > >   extended in the future.
> > >
> > >   At the moment it does not take any arguments and even if it does
> > >   in future, it's allowed to be called only from trampoline prepared
> > >   by kernel, so there'll be no broken user.
> > >
> > > - Designing the API: Other Considerations
> > >   N/A because uretprobe syscall does not return reference to kernel
> > >   object.
> > >
> > > - Proposing the API
> > >   Wiring up of the uretprobe system call si in separate change,
> >
> > typo: is
>
> ok, thanks
>
> jirka

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe
  2024-05-03 15:53       ` Edgecombe, Rick P
@ 2024-05-03 19:18         ` Jiri Olsa
  2024-05-03 19:38           ` Edgecombe, Rick P
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-05-03 19:18 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: olsajiri, peterz, songliubraving, luto, mhiramat, andrii,
	linux-api, john.fastabend, linux-kernel, mingo, rostedt, ast,
	tglx, yhs, linux-man, oleg, daniel, linux-trace-kernel, bpf, bp,
	x86

On Fri, May 03, 2024 at 03:53:15PM +0000, Edgecombe, Rick P wrote:
> On Fri, 2024-05-03 at 15:04 +0200, Jiri Olsa wrote:
> > On Fri, May 03, 2024 at 01:34:53PM +0200, Peter Zijlstra wrote:
> > > On Thu, May 02, 2024 at 02:23:08PM +0200, Jiri Olsa wrote:
> > > > Adding uretprobe syscall instead of trap to speed up return probe.
> > > > 
> > > > At the moment the uretprobe setup/path is:
> > > > 
> > > >    - install entry uprobe
> > > > 
> > > >    - when the uprobe is hit, it overwrites probed function's return
> > > > address
> > > >      on stack with address of the trampoline that contains breakpoint
> > > >      instruction
> > > > 
> > > >    - the breakpoint trap code handles the uretprobe consumers execution
> > > > and
> > > >      jumps back to original return address
> 
> Hi,
> 
> I worked on the x86 shadow stack support.
> 
> I didn't know uprobes did anything like this. In hindsight I should have looked
> more closely. The current upstream behavior is to overwrite the return address
> on the stack?
> 
> Stupid uprobes question - what is actually overwriting the return address on the
> stack? Is it the kernel? If so perhaps the kernel could just update the shadow
> stack at the same time.

yes, it's in kernel - arch_uretprobe_hijack_return_addr .. so I guess
we need to update the shadow stack with the new return value as well

> 
> > > > 
> > > > This patch replaces the above trampoline's breakpoint instruction with new
> > > > ureprobe syscall call. This syscall does exactly the same job as the trap
> > > > with some more extra work:
> > > > 
> > > >    - syscall trampoline must save original value for rax/r11/rcx registers
> > > >      on stack - rax is set to syscall number and r11/rcx are changed and
> > > >      used by syscall instruction
> > > > 
> > > >    - the syscall code reads the original values of those registers and
> > > >      restore those values in task's pt_regs area
> > > > 
> > > >    - only caller from trampoline exposed in '[uprobes]' is allowed,
> > > >      the process will receive SIGILL signal otherwise
> > > > 
> > > 
> > > Did you consider shadow stacks? IIRC we currently have userspace shadow
> > > stack support available, and that will utterly break all of this.
> > 
> > nope.. I guess it's the extra ret instruction in the trampoline that would
> > make it crash?
> 
> The original behavior seems problematic for shadow stack IIUC. I'm not sure of
> the additional breakage with the new behavior.

I can see it's broken also for current uprobes

> 
> Roughly, how shadow stack works is there is an additional protected stack for
> the app thread. The HW pushes to from the shadow stack with CALL, and pops from
> it with RET. But it also continues to push and pop from the normal stack. On
> pop, if the values don't match between the two stacks, an exception is
> generated. The whole point is to prevent the app from overwriting its stack
> return address to return to random places.
> 
> Userspace cannot (normally) write to the shadow stack, but the kernel can do
> this or adust the SSP (shadow stack pointer). So in the kernel (for things like
> sigreturn) there is an ability to do what is needed. Ptracers also can do things
> like this.

hack below seems to fix it for the current uprobe setup,
we need similar fix for the uretprobe syscall trampoline setup

jirka


---
diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
index 42fee8959df7..99a0948a3b79 100644
--- a/arch/x86/include/asm/shstk.h
+++ b/arch/x86/include/asm/shstk.h
@@ -21,6 +21,7 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clon
 void shstk_free(struct task_struct *p);
 int setup_signal_shadow_stack(struct ksignal *ksig);
 int restore_signal_shadow_stack(void);
+void uprobe_change_stack(unsigned long addr);
 #else
 static inline long shstk_prctl(struct task_struct *task, int option,
 			       unsigned long arg2) { return -EINVAL; }
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index 59e15dd8d0f8..d2c4dbe5843c 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -577,3 +577,11 @@ long shstk_prctl(struct task_struct *task, int option, unsigned long arg2)
 		return wrss_control(true);
 	return -EINVAL;
 }
+
+void uprobe_change_stack(unsigned long addr)
+{
+	unsigned long ssp;
+
+	ssp = get_user_shstk_addr();
+	write_user_shstk_64((u64 __user *)ssp, (u64)addr);
+}
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 81e6ee95784d..88afbeaacb8f 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -348,7 +348,7 @@ void *arch_uprobe_trampoline(unsigned long *psize)
 	 * only for native 64-bit process, the compat process still uses
 	 * standard breakpoint.
 	 */
-	if (user_64bit_mode(regs)) {
+	if (0 && user_64bit_mode(regs)) {
 		*psize = uretprobe_syscall_end - uretprobe_syscall_entry;
 		return uretprobe_syscall_entry;
 	}
@@ -1191,8 +1191,10 @@ arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs
 		return orig_ret_vaddr;
 
 	nleft = copy_to_user((void __user *)regs->sp, &trampoline_vaddr, rasize);
-	if (likely(!nleft))
+	if (likely(!nleft)) {
+		uprobe_change_stack(trampoline_vaddr);
 		return orig_ret_vaddr;
+	}
 
 	if (nleft != rasize) {
 		pr_err("return address clobbered: pid=%d, %%sp=%#lx, %%ip=%#lx\n",

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe
  2024-05-03 19:18         ` Jiri Olsa
@ 2024-05-03 19:38           ` Edgecombe, Rick P
  2024-05-03 20:17             ` Jiri Olsa
  2024-05-03 23:01             ` Deepak Gupta
  0 siblings, 2 replies; 27+ messages in thread
From: Edgecombe, Rick P @ 2024-05-03 19:38 UTC (permalink / raw)
  To: olsajiri
  Cc: songliubraving, luto, mhiramat, andrii, linux-api,
	john.fastabend, debug, linux-kernel, mingo, rostedt, ast, tglx,
	yhs, oleg, linux-man, daniel, peterz, linux-trace-kernel, bp,
	bpf, x86, broonie

+Some more shadow stack folks from other archs. We are discussing how uretprobes
work with shadow stack.

Context:
https://lore.kernel.org/lkml/ZjU4ganRF1Cbiug6@krava/

On Fri, 2024-05-03 at 21:18 +0200, Jiri Olsa wrote:
> 
> hack below seems to fix it for the current uprobe setup,
> we need similar fix for the uretprobe syscall trampoline setup

It seems like a reasonable direction.

Security-wise, applications cannot do this on themselves, or it is an otherwise
privileged thing right?



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe
  2024-05-03 19:38           ` Edgecombe, Rick P
@ 2024-05-03 20:17             ` Jiri Olsa
  2024-05-03 20:35               ` Edgecombe, Rick P
  2024-05-03 23:01             ` Deepak Gupta
  1 sibling, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-05-03 20:17 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: olsajiri, songliubraving, luto, mhiramat, andrii, linux-api,
	john.fastabend, debug, linux-kernel, mingo, rostedt, ast, tglx,
	yhs, oleg, linux-man, daniel, peterz, linux-trace-kernel, bp,
	bpf, x86, broonie

On Fri, May 03, 2024 at 07:38:18PM +0000, Edgecombe, Rick P wrote:
> +Some more shadow stack folks from other archs. We are discussing how uretprobes
> work with shadow stack.
> 
> Context:
> https://lore.kernel.org/lkml/ZjU4ganRF1Cbiug6@krava/
> 
> On Fri, 2024-05-03 at 21:18 +0200, Jiri Olsa wrote:
> > 
> > hack below seems to fix it for the current uprobe setup,
> > we need similar fix for the uretprobe syscall trampoline setup
> 
> It seems like a reasonable direction.
> 
> Security-wise, applications cannot do this on themselves, or it is an otherwise
> privileged thing right?

when uretprobe is created, kernel overwrites the return address on user
stack to point to user space trampoline, so the setup is in kernel hands

with the hack below on top of this patchset I'm no longer seeing shadow
stack app crash on uretprobe.. I'll try to polish it and send out next
week, any suggestions are welcome ;-)

thanks,
jirka


---
diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
index 42fee8959df7..d374305a6851 100644
--- a/arch/x86/include/asm/shstk.h
+++ b/arch/x86/include/asm/shstk.h
@@ -21,6 +21,8 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clon
 void shstk_free(struct task_struct *p);
 int setup_signal_shadow_stack(struct ksignal *ksig);
 int restore_signal_shadow_stack(void);
+void uprobe_change_stack(unsigned long addr);
+void uprobe_push_stack(unsigned long addr);
 #else
 static inline long shstk_prctl(struct task_struct *task, int option,
 			       unsigned long arg2) { return -EINVAL; }
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index 59e15dd8d0f8..804c446231d9 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -577,3 +577,24 @@ long shstk_prctl(struct task_struct *task, int option, unsigned long arg2)
 		return wrss_control(true);
 	return -EINVAL;
 }
+
+void uprobe_change_stack(unsigned long addr)
+{
+	unsigned long ssp;
+
+	ssp = get_user_shstk_addr();
+	write_user_shstk_64((u64 __user *)ssp, (u64)addr);
+}
+
+void uprobe_push_stack(unsigned long addr)
+{
+	unsigned long ssp;
+
+	ssp = get_user_shstk_addr();
+	ssp -= SS_FRAME_SIZE;
+	write_user_shstk_64((u64 __user *)ssp, (u64)addr);
+
+	fpregs_lock_and_load();
+	wrmsrl(MSR_IA32_PL3_SSP, ssp);
+	fpregs_unlock();
+}
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 81e6ee95784d..259457838020 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -416,6 +416,7 @@ SYSCALL_DEFINE0(uretprobe)
 	regs->r11 = regs->flags;
 	regs->cx  = regs->ip;
 
+	uprobe_push_stack(r11_cx_ax[2]);
 	return regs->ax;
 
 sigill:
@@ -1191,8 +1192,10 @@ arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs
 		return orig_ret_vaddr;
 
 	nleft = copy_to_user((void __user *)regs->sp, &trampoline_vaddr, rasize);
-	if (likely(!nleft))
+	if (likely(!nleft)) {
+		uprobe_change_stack(trampoline_vaddr);
 		return orig_ret_vaddr;
+	}
 
 	if (nleft != rasize) {
 		pr_err("return address clobbered: pid=%d, %%sp=%#lx, %%ip=%#lx\n",

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe
  2024-05-03 20:17             ` Jiri Olsa
@ 2024-05-03 20:35               ` Edgecombe, Rick P
  2024-05-06 10:56                 ` Jiri Olsa
  0 siblings, 1 reply; 27+ messages in thread
From: Edgecombe, Rick P @ 2024-05-03 20:35 UTC (permalink / raw)
  To: olsajiri
  Cc: songliubraving, luto, mhiramat, andrii, debug, john.fastabend,
	linux-api, linux-kernel, mingo, rostedt, ast, tglx, linux-man,
	oleg, yhs, daniel, peterz, linux-trace-kernel, bp, bpf, x86,
	broonie

On Fri, 2024-05-03 at 22:17 +0200, Jiri Olsa wrote:
> when uretprobe is created, kernel overwrites the return address on user
> stack to point to user space trampoline, so the setup is in kernel hands

I mean for uprobes in general. I'm didn't have any specific ideas in mind, but
in general when we give the kernel more abilities around shadow stack we have to
think if attackers could use it to work around shadow stack protections.

> 
> with the hack below on top of this patchset I'm no longer seeing shadow
> stack app crash on uretprobe.. I'll try to polish it and send out next
> week, any suggestions are welcome ;-)

Thanks. Some comments below.

> 
> thanks,
> jirka
> 
> 
> ---
> diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
> index 42fee8959df7..d374305a6851 100644
> --- a/arch/x86/include/asm/shstk.h
> +++ b/arch/x86/include/asm/shstk.h
> @@ -21,6 +21,8 @@ unsigned long shstk_alloc_thread_stack(struct task_struct
> *p, unsigned long clon
>  void shstk_free(struct task_struct *p);
>  int setup_signal_shadow_stack(struct ksignal *ksig);
>  int restore_signal_shadow_stack(void);
> +void uprobe_change_stack(unsigned long addr);
> +void uprobe_push_stack(unsigned long addr);

Maybe name them:
shstk_update_last_frame();
shstk_push_frame();


>  #else
>  static inline long shstk_prctl(struct task_struct *task, int option,
>                                unsigned long arg2) { return -EINVAL; }
> diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
> index 59e15dd8d0f8..804c446231d9 100644
> --- a/arch/x86/kernel/shstk.c
> +++ b/arch/x86/kernel/shstk.c
> @@ -577,3 +577,24 @@ long shstk_prctl(struct task_struct *task, int option,
> unsigned long arg2)
>                 return wrss_control(true);
>         return -EINVAL;
>  }
> +
> +void uprobe_change_stack(unsigned long addr)
> +{
> +       unsigned long ssp;

Probably want something like:

	if (!features_enabled(ARCH_SHSTK_SHSTK))
		return;

So this doesn't try the below if shadow stack is disabled.

> +
> +       ssp = get_user_shstk_addr();
> +       write_user_shstk_64((u64 __user *)ssp, (u64)addr);
> +}

Can we know that there was a valid return address just before this point on the
stack? Or could it be a sigframe or something?

> +
> +void uprobe_push_stack(unsigned long addr)
> +{
> +       unsigned long ssp;

	if (!features_enabled(ARCH_SHSTK_SHSTK))
		return;

> +
> +       ssp = get_user_shstk_addr();
> +       ssp -= SS_FRAME_SIZE;
> +       write_user_shstk_64((u64 __user *)ssp, (u64)addr);
> +
> +       fpregs_lock_and_load();
> +       wrmsrl(MSR_IA32_PL3_SSP, ssp);
> +       fpregs_unlock();
> +}
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 81e6ee95784d..259457838020 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -416,6 +416,7 @@ SYSCALL_DEFINE0(uretprobe)
>         regs->r11 = regs->flags;
>         regs->cx  = regs->ip;
>  
> +       uprobe_push_stack(r11_cx_ax[2]);

I'm concerned this could be used to push arbitrary frames to the shadow stack.
Couldn't an attacker do a jump to the point that calls this syscall? Maybe this
is what peterz was raising.

>         return regs->ax;
>  
>  sigill:
> @@ -1191,8 +1192,10 @@ arch_uretprobe_hijack_return_addr(unsigned long
> trampoline_vaddr, struct pt_regs
>                 return orig_ret_vaddr;
>  
>         nleft = copy_to_user((void __user *)regs->sp, &trampoline_vaddr,
> rasize);
> -       if (likely(!nleft))
> +       if (likely(!nleft)) {
> +               uprobe_change_stack(trampoline_vaddr);
>                 return orig_ret_vaddr;
> +       }
>  
>         if (nleft != rasize) {
>                 pr_err("return address clobbered: pid=%d, %%sp=%#lx,
> %%ip=%#lx\n",


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up
  2024-05-03 18:03     ` Andrii Nakryiko
@ 2024-05-03 20:39       ` Jiri Olsa
  2024-05-07  7:47         ` Jiri Olsa
  0 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2024-05-03 20:39 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiri Olsa, Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

On Fri, May 03, 2024 at 11:03:24AM -0700, Andrii Nakryiko wrote:
> On Thu, May 2, 2024 at 1:04 PM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Thu, May 02, 2024 at 09:43:02AM -0700, Andrii Nakryiko wrote:
> > > On Thu, May 2, 2024 at 5:23 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > > >
> > > > hi,
> > > > as part of the effort on speeding up the uprobes [0] coming with
> > > > return uprobe optimization by using syscall instead of the trap
> > > > on the uretprobe trampoline.
> > > >
> > > > The speed up depends on instruction type that uprobe is installed
> > > > and depends on specific HW type, please check patch 1 for details.
> > > >
> > > > Patches 1-6 are based on bpf-next/master, but path 1 and 2 are
> > > > apply-able on linux-trace.git tree probes/for-next branch.
> > > > Patch 7 is based on man-pages master.
> > > >
> > > > v4 changes:
> > > >   - added acks [Oleg,Andrii,Masami]
> > > >   - reworded the man page and adding more info to NOTE section [Masami]
> > > >   - rewrote bpf tests not to use trace_pipe [Andrii]
> > > >   - cc-ed linux-man list
> > > >
> > > > Also available at:
> > > >   https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> > > >   uretprobe_syscall
> > > >
> > >
> > > It looks great to me, thanks! Unfortunately BPF CI build is broken,
> > > probably due to some of the Makefile additions, please investigate and
> > > fix (or we'll need to fix something on BPF CI side), but it looks like
> > > you'll need another revision, unfortunately.
> > >
> > > pw-bot: cr
> > >
> > >   [0] https://github.com/kernel-patches/bpf/actions/runs/8923849088/job/24509002194
> >
> > yes, I think it's missing the 32-bit libc for uprobe_compat binary,
> > probably it needs to be added to github.com:libbpf/ci.git setup-build-env/action.yml ?
> > hm but I'm not sure how to test it, need to check
> 
> You can create a custom PR directly against Github repo
> (kernel-patches/bpf) and BPF CI will run all the tests on your custom
> code. This way you can iterate without spamming the mailing list.

I'm running CI tests like that, but I think I need to change the action
which is in other repo (github.com:libbpf/ci.git)

> 
> But I'm just wondering if it's worth complicating setup just for
> testing this x32 compat mode. So maybe just dropping one of those
> patches would be better?

well, we had compat process crashing on uretprobe because of this change,
so I rather keep the test.. or it can go in later on when the CI stuff is
figured out.. I got busy with the shadow stack issue today, will check on
the CI PR next week

jirka

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe
  2024-05-03 19:38           ` Edgecombe, Rick P
  2024-05-03 20:17             ` Jiri Olsa
@ 2024-05-03 23:01             ` Deepak Gupta
  1 sibling, 0 replies; 27+ messages in thread
From: Deepak Gupta @ 2024-05-03 23:01 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: olsajiri, songliubraving, luto, mhiramat, andrii, linux-api,
	john.fastabend, linux-kernel, mingo, rostedt, ast, tglx, yhs,
	oleg, linux-man, daniel, peterz, linux-trace-kernel, bp, bpf,
	x86, broonie

On Fri, May 03, 2024 at 07:38:18PM +0000, Edgecombe, Rick P wrote:
>+Some more shadow stack folks from other archs. We are discussing how uretprobes
>work with shadow stack.
>
>Context:
>https://lore.kernel.org/lkml/ZjU4ganRF1Cbiug6@krava/

Thanks Rick.

Yeah I didn't give enough attention to uprobes either.
Although now that I think for RISC-V shadow stack, it shouldn't be an issue.
On RISC-V return addresses don't get pushed as part of call instruction.
There is a distinct instruction "shadow stack push of return address" in prolog.
Similarly in epilog there is distinct instruction "shadow stack pop and check with
link register".

On RISC-V, uretprobe would install a uprobe on function start and when it's hit.
It'll replace pt_regs->ra = trampoline_handler. As function will resume, trampoline
addr will get pushed and popped. Although trampoline_handler would have to be enlightened
to eventually return to original return site.

>
>On Fri, 2024-05-03 at 21:18 +0200, Jiri Olsa wrote:
>>
>> hack below seems to fix it for the current uprobe setup,
>> we need similar fix for the uretprobe syscall trampoline setup
>
>It seems like a reasonable direction.
>
>Security-wise, applications cannot do this on themselves, or it is an otherwise
>privileged thing right?
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe
  2024-05-03 20:35               ` Edgecombe, Rick P
@ 2024-05-06 10:56                 ` Jiri Olsa
  0 siblings, 0 replies; 27+ messages in thread
From: Jiri Olsa @ 2024-05-06 10:56 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: olsajiri, songliubraving, luto, mhiramat, andrii, debug,
	john.fastabend, linux-api, linux-kernel, mingo, rostedt, ast,
	tglx, linux-man, oleg, yhs, daniel, peterz, linux-trace-kernel,
	bp, bpf, x86, broonie

On Fri, May 03, 2024 at 08:35:24PM +0000, Edgecombe, Rick P wrote:
> On Fri, 2024-05-03 at 22:17 +0200, Jiri Olsa wrote:
> > when uretprobe is created, kernel overwrites the return address on user
> > stack to point to user space trampoline, so the setup is in kernel hands
> 
> I mean for uprobes in general. I'm didn't have any specific ideas in mind, but
> in general when we give the kernel more abilities around shadow stack we have to
> think if attackers could use it to work around shadow stack protections.
> 
> > 
> > with the hack below on top of this patchset I'm no longer seeing shadow
> > stack app crash on uretprobe.. I'll try to polish it and send out next
> > week, any suggestions are welcome ;-)
> 
> Thanks. Some comments below.
> 
> > 
> > thanks,
> > jirka
> > 
> > 
> > ---
> > diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
> > index 42fee8959df7..d374305a6851 100644
> > --- a/arch/x86/include/asm/shstk.h
> > +++ b/arch/x86/include/asm/shstk.h
> > @@ -21,6 +21,8 @@ unsigned long shstk_alloc_thread_stack(struct task_struct
> > *p, unsigned long clon
> >  void shstk_free(struct task_struct *p);
> >  int setup_signal_shadow_stack(struct ksignal *ksig);
> >  int restore_signal_shadow_stack(void);
> > +void uprobe_change_stack(unsigned long addr);
> > +void uprobe_push_stack(unsigned long addr);
> 
> Maybe name them:
> shstk_update_last_frame();
> shstk_push_frame();

ok

> 
> 
> >  #else
> >  static inline long shstk_prctl(struct task_struct *task, int option,
> >                                unsigned long arg2) { return -EINVAL; }
> > diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
> > index 59e15dd8d0f8..804c446231d9 100644
> > --- a/arch/x86/kernel/shstk.c
> > +++ b/arch/x86/kernel/shstk.c
> > @@ -577,3 +577,24 @@ long shstk_prctl(struct task_struct *task, int option,
> > unsigned long arg2)
> >                 return wrss_control(true);
> >         return -EINVAL;
> >  }
> > +
> > +void uprobe_change_stack(unsigned long addr)
> > +{
> > +       unsigned long ssp;
> 
> Probably want something like:
> 
> 	if (!features_enabled(ARCH_SHSTK_SHSTK))
> 		return;

ok

> 
> So this doesn't try the below if shadow stack is disabled.
> 
> > +
> > +       ssp = get_user_shstk_addr();
> > +       write_user_shstk_64((u64 __user *)ssp, (u64)addr);
> > +}
> 
> Can we know that there was a valid return address just before this point on the
> stack? Or could it be a sigframe or something?

when uprobe hijack the return address it assumes it's on the top of the stack,
so it's saved and replaced with address of the user space trampoline

> 
> > +
> > +void uprobe_push_stack(unsigned long addr)
> > +{
> > +       unsigned long ssp;
> 
> 	if (!features_enabled(ARCH_SHSTK_SHSTK))
> 		return;
> 
> > +
> > +       ssp = get_user_shstk_addr();
> > +       ssp -= SS_FRAME_SIZE;
> > +       write_user_shstk_64((u64 __user *)ssp, (u64)addr);
> > +
> > +       fpregs_lock_and_load();
> > +       wrmsrl(MSR_IA32_PL3_SSP, ssp);
> > +       fpregs_unlock();
> > +}
> > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> > index 81e6ee95784d..259457838020 100644
> > --- a/arch/x86/kernel/uprobes.c
> > +++ b/arch/x86/kernel/uprobes.c
> > @@ -416,6 +416,7 @@ SYSCALL_DEFINE0(uretprobe)
> >         regs->r11 = regs->flags;
> >         regs->cx  = regs->ip;
> >  
> > +       uprobe_push_stack(r11_cx_ax[2]);
> 
> I'm concerned this could be used to push arbitrary frames to the shadow stack.
> Couldn't an attacker do a jump to the point that calls this syscall? Maybe this
> is what peterz was raising.

of course never say never, but here's my reasoning why I think it's ok

the page with the syscall trampoline is mapped in user space and can be
found in procfs maps file under '[uprobes]' name

the syscall can be called only from this trampoline, if it's called from
anywhere else the calling process receives SIGILL

now if you run the uretprobe syscall without any pending uretprobe for
the task it will receive SIGILL before it gets to the point of pushing
address on the shadow stack

and to configure the uretprobe you need to have CAP_PERFMON or CAP_SYS_ADMIN

if you'd actually managed to get the pending uretprobe instance, the shadow
stack entry is going to be used/pop-ed right away in the trampoline with
the ret instruction

and as I mentioned above it's ensured that the syscall is returning to the
trampoline and it can't be called from any other place

> 
> >         return regs->ax;
> >  
> >  sigill:
> > @@ -1191,8 +1192,10 @@ arch_uretprobe_hijack_return_addr(unsigned long
> > trampoline_vaddr, struct pt_regs
> >                 return orig_ret_vaddr;
> >  
> >         nleft = copy_to_user((void __user *)regs->sp, &trampoline_vaddr,
> > rasize);
> > -       if (likely(!nleft))
> > +       if (likely(!nleft)) {
> > +               uprobe_change_stack(trampoline_vaddr);
> >                 return orig_ret_vaddr;
> > +       }
> >  
> >         if (nleft != rasize) {
> >                 pr_err("return address clobbered: pid=%d, %%sp=%#lx,
> > %%ip=%#lx\n",
> 

I'll try to add uprobe test under tools/testing/selftests/x86/test_shadow_stack.c
and send that and change below as part of new version

thanks for the comments,
jirka


---
diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
index 42fee8959df7..2e1ddcf98242 100644
--- a/arch/x86/include/asm/shstk.h
+++ b/arch/x86/include/asm/shstk.h
@@ -21,6 +21,8 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clon
 void shstk_free(struct task_struct *p);
 int setup_signal_shadow_stack(struct ksignal *ksig);
 int restore_signal_shadow_stack(void);
+int shstk_update_last_frame(unsigned long val);
+int shstk_push_frame(unsigned long val);
 #else
 static inline long shstk_prctl(struct task_struct *task, int option,
 			       unsigned long arg2) { return -EINVAL; }
@@ -31,6 +33,8 @@ static inline unsigned long shstk_alloc_thread_stack(struct task_struct *p,
 static inline void shstk_free(struct task_struct *p) {}
 static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; }
 static inline int restore_signal_shadow_stack(void) { return 0; }
+static inline int shstk_update_last_frame(unsigned long val) { return 0; }
+static inline int shstk_push_frame(unsigned long val) { return 0; }
 #endif /* CONFIG_X86_USER_SHADOW_STACK */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index 59e15dd8d0f8..66434dfde52e 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -577,3 +577,32 @@ long shstk_prctl(struct task_struct *task, int option, unsigned long arg2)
 		return wrss_control(true);
 	return -EINVAL;
 }
+
+int shstk_update_last_frame(unsigned long val)
+{
+	unsigned long ssp;
+
+	if (!features_enabled(ARCH_SHSTK_SHSTK))
+		return 0;
+
+	ssp = get_user_shstk_addr();
+	return write_user_shstk_64((u64 __user *)ssp, (u64)val);
+}
+
+int shstk_push_frame(unsigned long val)
+{
+	unsigned long ssp;
+
+	if (!features_enabled(ARCH_SHSTK_SHSTK))
+		return 0;
+
+	ssp = get_user_shstk_addr();
+	ssp -= SS_FRAME_SIZE;
+	if (write_user_shstk_64((u64 __user *)ssp, (u64)val))
+		return -EFAULT;
+
+	fpregs_lock_and_load();
+	wrmsrl(MSR_IA32_PL3_SSP, ssp);
+	fpregs_unlock();
+	return 0;
+}
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 81e6ee95784d..ae6c3458a675 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -406,6 +406,11 @@ SYSCALL_DEFINE0(uretprobe)
 	 * trampoline's ret instruction
 	 */
 	r11_cx_ax[2] = regs->ip;
+
+	/* make the shadow stack follow that */
+	if (shstk_push_frame(regs->ip))
+		goto sigill;
+
 	regs->ip = ip;
 
 	err = copy_to_user((void __user *)regs->sp, r11_cx_ax, sizeof(r11_cx_ax));
@@ -1191,8 +1196,13 @@ arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs
 		return orig_ret_vaddr;
 
 	nleft = copy_to_user((void __user *)regs->sp, &trampoline_vaddr, rasize);
-	if (likely(!nleft))
+	if (likely(!nleft)) {
+		if (shstk_update_last_frame(trampoline_vaddr)) {
+			force_sig(SIGSEGV);
+			return -1;
+		}
 		return orig_ret_vaddr;
+	}
 
 	if (nleft != rasize) {
 		pr_err("return address clobbered: pid=%d, %%sp=%#lx, %%ip=%#lx\n",

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up
  2024-05-03 20:39       ` Jiri Olsa
@ 2024-05-07  7:47         ` Jiri Olsa
  0 siblings, 0 replies; 27+ messages in thread
From: Jiri Olsa @ 2024-05-07  7:47 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, Steven Rostedt, Masami Hiramatsu, Oleg Nesterov,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	linux-kernel, linux-trace-kernel, linux-api, linux-man, x86, bpf,
	Song Liu, Yonghong Song, John Fastabend, Peter Zijlstra,
	Thomas Gleixner, Borislav Petkov (AMD),
	Ingo Molnar, Andy Lutomirski

On Fri, May 03, 2024 at 10:39:21PM +0200, Jiri Olsa wrote:
> On Fri, May 03, 2024 at 11:03:24AM -0700, Andrii Nakryiko wrote:
> > On Thu, May 2, 2024 at 1:04 PM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Thu, May 02, 2024 at 09:43:02AM -0700, Andrii Nakryiko wrote:
> > > > On Thu, May 2, 2024 at 5:23 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > > > >
> > > > > hi,
> > > > > as part of the effort on speeding up the uprobes [0] coming with
> > > > > return uprobe optimization by using syscall instead of the trap
> > > > > on the uretprobe trampoline.
> > > > >
> > > > > The speed up depends on instruction type that uprobe is installed
> > > > > and depends on specific HW type, please check patch 1 for details.
> > > > >
> > > > > Patches 1-6 are based on bpf-next/master, but path 1 and 2 are
> > > > > apply-able on linux-trace.git tree probes/for-next branch.
> > > > > Patch 7 is based on man-pages master.
> > > > >
> > > > > v4 changes:
> > > > >   - added acks [Oleg,Andrii,Masami]
> > > > >   - reworded the man page and adding more info to NOTE section [Masami]
> > > > >   - rewrote bpf tests not to use trace_pipe [Andrii]
> > > > >   - cc-ed linux-man list
> > > > >
> > > > > Also available at:
> > > > >   https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> > > > >   uretprobe_syscall
> > > > >
> > > >
> > > > It looks great to me, thanks! Unfortunately BPF CI build is broken,
> > > > probably due to some of the Makefile additions, please investigate and
> > > > fix (or we'll need to fix something on BPF CI side), but it looks like
> > > > you'll need another revision, unfortunately.
> > > >
> > > > pw-bot: cr
> > > >
> > > >   [0] https://github.com/kernel-patches/bpf/actions/runs/8923849088/job/24509002194
> > >
> > > yes, I think it's missing the 32-bit libc for uprobe_compat binary,
> > > probably it needs to be added to github.com:libbpf/ci.git setup-build-env/action.yml ?
> > > hm but I'm not sure how to test it, need to check
> > 
> > You can create a custom PR directly against Github repo
> > (kernel-patches/bpf) and BPF CI will run all the tests on your custom
> > code. This way you can iterate without spamming the mailing list.
> 
> I'm running CI tests like that, but I think I need to change the action
> which is in other repo (github.com:libbpf/ci.git)
> 
> > 
> > But I'm just wondering if it's worth complicating setup just for
> > testing this x32 compat mode. So maybe just dropping one of those
> > patches would be better?
> 
> well, we had compat process crashing on uretprobe because of this change,
> so I rather keep the test.. or it can go in later on when the CI stuff is
> figured out.. I got busy with the shadow stack issue today, will check on
> the CI PR next week

ok, it's not as easy as just adding the package.. I don't want to delay
this on my missing github skills, I'll skip the test in next version and
submit it separately when the github ci is ready for that

jirka

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2024-05-07  7:47 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-02 12:23 [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up Jiri Olsa
2024-05-02 12:23 ` [PATCHv4 bpf-next 1/7] uprobe: Wire up uretprobe system call Jiri Olsa
2024-05-02 12:23 ` [PATCHv4 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe Jiri Olsa
2024-05-03 11:34   ` Peter Zijlstra
2024-05-03 13:04     ` Jiri Olsa
2024-05-03 15:53       ` Edgecombe, Rick P
2024-05-03 19:18         ` Jiri Olsa
2024-05-03 19:38           ` Edgecombe, Rick P
2024-05-03 20:17             ` Jiri Olsa
2024-05-03 20:35               ` Edgecombe, Rick P
2024-05-06 10:56                 ` Jiri Olsa
2024-05-03 23:01             ` Deepak Gupta
2024-05-02 12:23 ` [PATCHv4 bpf-next 3/7] selftests/bpf: Add uretprobe syscall test for regs integrity Jiri Olsa
2024-05-02 12:23 ` [PATCHv4 bpf-next 4/7] selftests/bpf: Add uretprobe syscall test for regs changes Jiri Olsa
2024-05-02 12:23 ` [PATCHv4 bpf-next 5/7] selftests/bpf: Add uretprobe syscall call from user space test Jiri Olsa
2024-05-02 16:33   ` Andrii Nakryiko
2024-05-02 12:23 ` [PATCHv4 bpf-next 6/7] selftests/bpf: Add uretprobe compat test Jiri Olsa
2024-05-02 16:35   ` Andrii Nakryiko
2024-05-02 12:23 ` [PATCHv4 7/7] man2: Add uretprobe syscall page Jiri Olsa
2024-05-02 13:43   ` Alejandro Colomar
2024-05-02 20:13     ` Jiri Olsa
2024-05-02 22:06       ` Alejandro Colomar
2024-05-02 16:43 ` [PATCHv4 bpf-next 0/7] uprobe: uretprobe speed up Andrii Nakryiko
2024-05-02 20:04   ` Jiri Olsa
2024-05-03 18:03     ` Andrii Nakryiko
2024-05-03 20:39       ` Jiri Olsa
2024-05-07  7:47         ` Jiri Olsa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.