diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2023-06-30 11:35:41 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2023-06-30 11:35:41 -0700 |
commit | b30d7a77c53ec04a6d94683d7680ec406b7f3ac8 (patch) | |
tree | 5c8d99d15eb1a9b28810a5358b098ac18daefa71 /tools/perf/util/scripting-engines/trace-event-python.c | |
parent | d2a6fd45c5c4a5c5fdfe6c57f74f630e61d8d9a0 (diff) | |
parent | 4d60e83dfcee794213878155463d8f7353a80864 (diff) | |
download | linux-b30d7a77c53ec04a6d94683d7680ec406b7f3ac8.tar.gz linux-b30d7a77c53ec04a6d94683d7680ec406b7f3ac8.tar.bz2 linux-b30d7a77c53ec04a6d94683d7680ec406b7f3ac8.zip |
Merge tag 'perf-tools-for-v6.5-1-2023-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next
Pull perf tools updates from Namhyung Kim:
"Internal cleanup:
- Refactor PMU data management to handle hybrid systems in a generic
way.
Do more work in the lexer so that legacy event types parse more
easily. A side-effect of this is that if a PMU is specified,
scanning sysfs is avoided improving start-up time.
- Fix hybrid metrics, for example, the TopdownL1 works for both
performance and efficiency cores on Intel machines. To support
this, sort and regroup events after parsing.
- Add reference count checking for the 'thread' data structure.
- Lots of fixes for memory leaks in various places thanks to the ASAN
and Ian's refcount checker.
- Reduce the binary size by replacing static variables with local or
dynamically allocated memory.
- Introduce shared_mutex for annotate data to reduce memory
footprint.
- Make filesystem access library functions more thread safe.
Test:
- Organize cpu_map tests into a single suite.
- Add metric value validation test to check if the values are within
correct value ranges.
- Add perf stat stdio output test to check if event and metric names
match.
- Add perf data converter JSON output test.
- Fix a lot of issues reported by shellcheck(1). This is a
preparation to enable shellcheck by default.
- Make the large x86 new instructions test optional at build time
using EXTRA_TESTS=1.
- Add a test for libpfm4 events.
perf script:
- Add 'dsoff' outpuf field to display offset from the DSO.
$ perf script -F comm,pid,event,ip,dsoff
ls 2695501 cycles: 152cc73ef4b5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so+0x1c4b5)
ls 2695501 cycles: ffffffff99045b3e ([kernel.kallsyms])
ls 2695501 cycles: ffffffff9968e107 ([kernel.kallsyms])
ls 2695501 cycles: ffffffffc1f54afb ([kernel.kallsyms])
ls 2695501 cycles: ffffffff9968382f ([kernel.kallsyms])
ls 2695501 cycles: ffffffff99e00094 ([kernel.kallsyms])
ls 2695501 cycles: 152cc718a8d0 (/usr/lib/x86_64-linux-gnu/libselinux.so.1+0x68d0)
ls 2695501 cycles: ffffffff992a6db0 ([kernel.kallsyms])
- Adjust width for large PID/TID values.
perf report:
- Robustify reading addr2line output for srcline by checking sentinel
output before the actual data and by using timeout of 1 second.
- Allow config terms (like 'name=ABC') with breakpoint events.
$ perf record -e mem:0x55feb98dd169:x/name=breakpoint/ -p 19646 -- sleep 1
perf annotate:
- Handle x86 instruction suffix like 'l' in 'movl' generally.
- Parse instruction operands properly even with a whitespace. This is
needed for llvm-objdump output.
- Support RISC-V binutils lookup using the triplet prefixes.
- Add '<' and '>' key to navigate to prev/next symbols in TUI.
- Fix instruction association and parsing for LoongArch.
perf stat:
- Add --per-cache aggregation option, optionally specify a cache
level like `--per-cache=L2`.
$ sudo perf stat --per-cache -a -e ls_dmnd_fills_from_sys.ext_cache_remote --\
taskset -c 0-15,64-79,128-143,192-207\
perf bench sched messaging -p -t -l 100000 -g 8
# Running 'sched/messaging' benchmark:
# 20 sender and receiver threads per group
# 8 groups == 320 threads run
Total time: 7.648 [sec]
Performance counter stats for 'system wide':
S0-D0-L3-ID0 16 17,145,912 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID8 16 14,977,628 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID16 16 262,539 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID24 16 3,140 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID32 16 27,403 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID40 16 17,026 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID48 16 7,292 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID56 16 2,464 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID64 16 22,489,306 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID72 16 21,455,257 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID80 16 11,619 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID88 16 30,978 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID96 16 37,628 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID104 16 13,594 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID112 16 10,164 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID120 16 11,259 ls_dmnd_fills_from_sys.ext_cache_remote
7.779171484 seconds time elapsed
- Change default (no event/metric) formatting for default metrics so
that events are hidden and the metric and group appear.
Performance counter stats for 'ls /':
1.85 msec task-clock # 0.594 CPUs utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
97 page-faults # 52.517 K/sec
2,187,173 cycles # 1.184 GHz
2,474,459 instructions # 1.13 insn per cycle
531,584 branches # 287.805 M/sec
13,626 branch-misses # 2.56% of all branches
TopdownL1 # 23.5 % tma_backend_bound
# 11.5 % tma_bad_speculation
# 39.1 % tma_frontend_bound
# 25.9 % tma_retiring
- Allow --cputype option to have any PMU name (not just hybrid).
- Fix output value not to added when it runs multiple times with -r
option.
perf list:
- Show metricgroup description from JSON file called
metricgroups.json.
- Allow 'pfm' argument to list only libpfm4 events and check each
event is supported before showing it.
JSON vendor events:
- Avoid event grouping using "NO_GROUP_EVENTS" constraints. The
topdown events are correctly grouped even if no group exists.
- Add "Default" metric group to print it in the default output. And
use "DefaultMetricgroupName" to indicate the real metric group
name.
- Add AmpereOne core PMU events.
Misc:
- Define man page date correctly.
- Track exception level properly on ARM CoreSight ETM.
- Allow anonymous struct, union or enum when retrieving type names
from DWARF.
- Fix incorrect filename when calling `perf inject --jit`.
- Handle PLT size correctly on LoongArch"
* tag 'perf-tools-for-v6.5-1-2023-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (269 commits)
perf test: Skip metrics w/o event name in stat STD output linter
perf test: Reorder event name checks in stat STD output linter
perf pmu: Remove a hard coded cpu PMU assumption
perf pmus: Add notion of default PMU for JSON events
perf unwind: Fix map reference counts
perf test: Set PERF_EXEC_PATH for script execution
perf script: Initialize buffer for regs_map()
perf tests: Fix test_arm_callgraph_fp variable expansion
perf symbol: Add LoongArch case in get_plt_sizes()
perf test: Remove x permission from lib/stat_output.sh
perf test: Rerun failed metrics with longer workload
perf test: Add skip list for metrics known would fail
perf test: Add metric value validation test
perf jit: Fix incorrect file name in DWARF line table
perf annotate: Fix instruction association and parsing for LoongArch
perf annotation: Switch lock from a mutex to a sharded_mutex
perf sharded_mutex: Introduce sharded_mutex
tools: Fix incorrect calculation of object size by sizeof
perf subcmd: Fix missing check for return value of malloc() in add_cmdname()
perf parse-events: Remove unneeded semicolon
...
Diffstat (limited to 'tools/perf/util/scripting-engines/trace-event-python.c')
-rw-r--r-- | tools/perf/util/scripting-engines/trace-event-python.c | 49 |
1 files changed, 31 insertions, 18 deletions
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c index 41d4f9e6a8b7..94312741443a 100644 --- a/tools/perf/util/scripting-engines/trace-event-python.c +++ b/tools/perf/util/scripting-engines/trace-event-python.c @@ -93,8 +93,6 @@ PyMODINIT_FUNC PyInit_perf_trace_context(void); #define TRACE_EVENT_TYPE_MAX \ ((1 << (sizeof(unsigned short) * 8)) - 1) -static DECLARE_BITMAP(events_defined, TRACE_EVENT_TYPE_MAX); - #define N_COMMON_FIELDS 7 static char *cur_field_name; @@ -419,6 +417,7 @@ static PyObject *python_process_callchain(struct perf_sample *sample, struct addr_location *al) { PyObject *pylist; + struct callchain_cursor *cursor; pylist = PyList_New(0); if (!pylist) @@ -427,19 +426,20 @@ static PyObject *python_process_callchain(struct perf_sample *sample, if (!symbol_conf.use_callchain || !sample->callchain) goto exit; - if (thread__resolve_callchain(al->thread, &callchain_cursor, evsel, + cursor = get_tls_callchain_cursor(); + if (thread__resolve_callchain(al->thread, cursor, evsel, sample, NULL, NULL, scripting_max_stack) != 0) { pr_err("Failed to resolve callchain. Skipping\n"); goto exit; } - callchain_cursor_commit(&callchain_cursor); + callchain_cursor_commit(cursor); while (1) { PyObject *pyelem; struct callchain_cursor_node *node; - node = callchain_cursor_current(&callchain_cursor); + node = callchain_cursor_current(cursor); if (!node) break; @@ -471,9 +471,11 @@ static PyObject *python_process_callchain(struct perf_sample *sample, struct addr_location node_al; unsigned long offset; + addr_location__init(&node_al); node_al.addr = map__map_ip(map, node->ip); - node_al.map = map; + node_al.map = map__get(map); offset = get_offset(node->ms.sym, &node_al); + addr_location__exit(&node_al); pydict_set_item_string_decref( pyelem, "sym_off", @@ -493,7 +495,7 @@ static PyObject *python_process_callchain(struct perf_sample *sample, _PyUnicode_FromString(dsoname)); } - callchain_cursor_advance(&callchain_cursor); + callchain_cursor_advance(cursor); PyList_Append(pylist, pyelem); Py_DECREF(pyelem); } @@ -541,6 +543,7 @@ static PyObject *python_process_brstack(struct perf_sample *sample, pydict_set_item_string_decref(pyelem, "cycles", PyLong_FromUnsignedLongLong(entries[i].flags.cycles)); + addr_location__init(&al); thread__find_map_fb(thread, sample->cpumode, entries[i].from, &al); dsoname = get_dsoname(al.map); @@ -553,6 +556,7 @@ static PyObject *python_process_brstack(struct perf_sample *sample, pydict_set_item_string_decref(pyelem, "to_dsoname", _PyUnicode_FromString(dsoname)); + addr_location__exit(&al); PyList_Append(pylist, pyelem); Py_DECREF(pyelem); } @@ -596,7 +600,6 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample, PyObject *pylist; u64 i; char bf[512]; - struct addr_location al; pylist = PyList_New(0); if (!pylist) @@ -607,7 +610,9 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample, for (i = 0; i < br->nr; i++) { PyObject *pyelem; + struct addr_location al; + addr_location__init(&al); pyelem = PyDict_New(); if (!pyelem) Py_FatalError("couldn't create Python dictionary"); @@ -646,6 +651,7 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample, PyList_Append(pylist, pyelem); Py_DECREF(pyelem); + addr_location__exit(&al); } exit: @@ -733,6 +739,9 @@ static void regs_map(struct regs_dump *regs, uint64_t mask, const char *arch, ch bf[0] = 0; + if (size <= 0) + return; + if (!regs || !regs->regs) return; @@ -760,17 +769,18 @@ static void set_regs_in_dict(PyObject *dict, * 10 chars is for register name. */ int size = __sw_hweight64(attr->sample_regs_intr) * 28; - char bf[size]; + char *bf = malloc(size); - regs_map(&sample->intr_regs, attr->sample_regs_intr, arch, bf, sizeof(bf)); + regs_map(&sample->intr_regs, attr->sample_regs_intr, arch, bf, size); pydict_set_item_string_decref(dict, "iregs", _PyUnicode_FromString(bf)); - regs_map(&sample->user_regs, attr->sample_regs_user, arch, bf, sizeof(bf)); + regs_map(&sample->user_regs, attr->sample_regs_user, arch, bf, size); pydict_set_item_string_decref(dict, "uregs", _PyUnicode_FromString(bf)); + free(bf); } static void set_sym_in_dict(PyObject *dict, struct addr_location *al, @@ -934,6 +944,9 @@ static void python_process_tracepoint(struct perf_sample *sample, unsigned long long nsecs = sample->time; const char *comm = thread__comm_str(al->thread); const char *default_handler_name = "trace_unhandled"; + DECLARE_BITMAP(events_defined, TRACE_EVENT_TYPE_MAX); + + bitmap_zero(events_defined, TRACE_EVENT_TYPE_MAX); if (!event) { snprintf(handler_name, sizeof(handler_name), @@ -1162,11 +1175,11 @@ static int python_export_thread(struct db_export *dbe, struct thread *thread, t = tuple_new(5); - tuple_set_d64(t, 0, thread->db_id); + tuple_set_d64(t, 0, thread__db_id(thread)); tuple_set_d64(t, 1, machine->db_id); tuple_set_d64(t, 2, main_thread_db_id); - tuple_set_s32(t, 3, thread->pid_); - tuple_set_s32(t, 4, thread->tid); + tuple_set_s32(t, 3, thread__pid(thread)); + tuple_set_s32(t, 4, thread__tid(thread)); call_object(tables->thread_handler, t, "thread_table"); @@ -1185,7 +1198,7 @@ static int python_export_comm(struct db_export *dbe, struct comm *comm, tuple_set_d64(t, 0, comm->db_id); tuple_set_string(t, 1, comm__str(comm)); - tuple_set_d64(t, 2, thread->db_id); + tuple_set_d64(t, 2, thread__db_id(thread)); tuple_set_d64(t, 3, comm->start); tuple_set_s32(t, 4, comm->exec); @@ -1206,7 +1219,7 @@ static int python_export_comm_thread(struct db_export *dbe, u64 db_id, tuple_set_d64(t, 0, db_id); tuple_set_d64(t, 1, comm->db_id); - tuple_set_d64(t, 2, thread->db_id); + tuple_set_d64(t, 2, thread__db_id(thread)); call_object(tables->comm_thread_handler, t, "comm_thread_table"); @@ -1291,7 +1304,7 @@ static void python_export_sample_table(struct db_export *dbe, tuple_set_d64(t, 0, es->db_id); tuple_set_d64(t, 1, es->evsel->db_id); tuple_set_d64(t, 2, maps__machine(es->al->maps)->db_id); - tuple_set_d64(t, 3, es->al->thread->db_id); + tuple_set_d64(t, 3, thread__db_id(es->al->thread)); tuple_set_d64(t, 4, es->comm_db_id); tuple_set_d64(t, 5, es->dso_db_id); tuple_set_d64(t, 6, es->sym_db_id); @@ -1381,7 +1394,7 @@ static int python_export_call_return(struct db_export *dbe, t = tuple_new(14); tuple_set_d64(t, 0, cr->db_id); - tuple_set_d64(t, 1, cr->thread->db_id); + tuple_set_d64(t, 1, thread__db_id(cr->thread)); tuple_set_d64(t, 2, comm_db_id); tuple_set_d64(t, 3, cr->cp->db_id); tuple_set_d64(t, 4, cr->call_time); |