Linux gets double-quick double-update to fix kernel Oops!

by

Paul
Ducklin

Linux
has
never
suffered
from
the
infamous
BSoD,
short
for

blue
screen
of
death,
the
name
given
to
the
dreaded
“something
went
terribly
wrong”
message
associated
with
a
Windows
system
crash.

Microsoft
has
tried
many
things
over
the
years
to
shake
that
nickname
“BSoD”,
including
changing
the
background
colour
used
when
crash
messages
appear,
adding
a
super-sized
sad-face
emoticon
to
make
the
message
feel
more
compassionate,
displaying
QR
codes
that
you
can
snap
with
your
phone
to
help
you
diagnose
the
problem,
and
not
filling
the
screen
with
a
technobabble
list
of
kernel
code
objects
that
just
happened
to
be
loaded
at
the
time.

(Those
crash
dump
lists
often
led
to
anti-virus
and
threat-prevention
software
being
blamed
for
every
system
crash,
simply
because
their
names
tended
to
show
up
at
or
near
the
top
of
the
list
of
loaded
modules
–
not
because
they
had
anything
to
do
with
the
crash,
but
because
they
generally
loaded
early
on
and
just
happened
to
be
at
the
top
of
the
list,
thus
making
a
convenient
scaepgoat.)

Even
better,
“BSoD”
is
no
longer
the
everyday,
throwaway
pejorative
term
that
it
used
to
be,
because
Windows
crashes
a
lot
less
often
than
it
used
to.

We’re
not
suggesting
that
Windows
never
crashes,
or
imlying
that
it
is
now
magically
bug-free;
merely
noting
that
you
generally
don’t
need
the
word
BSoD
as
often
as
you
used
to.

Linux
crash
notifications

Of
course,
Linux
has
never
had
BSoDs,
not
even
back
when
Windows
seemed
to
have
them
all
the
time,
but
that’s
not
because
Linux
never
crashes,
or
is
magically
bug-free.

It’s
simply
that
Linux
does’t
BSoD
(yes,
the
term
can
be
used
as
an
intransitive
verb,
as
in
“my
laptop
BSoDded
half
way
through
an
email”),
because
–
in
a
delightful
understatment
–
it
suffers
an

oops,
or
if
the
oops
is
severe
enough
that
the
system
can’t
reliably
stay
up
even
with
degraded
performance,
it

panics.

(It’s
also
possible
to
configure
a
Linux
kernel
so
that
an

oops
always
get
“promoted”
to
a

panic,
for
environments
where
security
considerations
make
it
better
to
have
a
system
that
shuts
down
abruptly,
albeit
with
some
data
not
getting
saved
in
time,
than
a
system
that
ends
up
in
an
uncertain
state
that
could
lead
to
data
leakage
or
data
corruption.)

An

oops
typically
produces
console
output
something
like
this
(we’ve
provided
source
code
below
if
you
want
to
explore

oopses
and

panics
for
yourself):


[12710.153112] oops init (level = 1)
[12710.153115] triggering oops via BUG()
[12710.153127] ------------[ cut here ]------------
[12710.153128] kernel BUG at /home/duck/Articles/linuxoops/oops.c:17!
[12710.153132] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[12710.153748] CPU: 0 PID: 5531 Comm: insmod . . . 
[12710.154322] Hardware name: XXXX
[12710.154940] RIP: 0010:oopsinit+0x3a/0xfc0 [oops]
[12710.155548] Code: . . . . .
[12710.156191] RSP: . . .  EFLAGS: . . .
[12710.156849] RAX: . . .  RBX: . . .  RCX: . . .
[12710.157513] RDX: . . .  RSI: . . .  RDI: . . .
[12710.158171] RBP: . . .  R08: . . .  R09: . . .
[12710.158826] R10: . . .  R11: . . .  R12: . . .
[12710.159483] R13: . . .  R14: . . .  R15: . . .
[12710.160143] FS:  . . .  GS: . . .  knlGS: . . . 
. . . . .
[12710.163474] Call Trace:
[12710.164129]  
[12710.164779]  do_one_initcall+0x56/0x230
[12710.165424]  do_init_module+0x4a/0x210
[12710.166050]  __do_sys_finit_module+0x9e/0xf0
[12710.166711]  do_syscall_64+0x37/0x90
[12710.167320]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[12710.167958] RIP: 0033:0x7f6c28b15e39
[12710.168578] Code: . . . . .
[. . . . .
[12710.173349]  
[12710.174032] Modules linked in: . . . . .
[12710.180294] ---[ end trace 0000000000000000 ]---

Unfortunately,
when
kernel
version
6.2.3
came
out
at
the
end
of
last
week,
two
tiny
changes
quickly
proved
to
be

problematic,
with
users
reporting
kernel
oopses
when
managing
disk
storage.

Kernel
6.1.16
was
apparently
subject
to
the
same
changes,
and
thus
prone
to
the
same
oopsiness.

For
example,
plugging
in
an
removable
drive
and
mounting
it
worked
fine,
but
unmounting
the
drive
when
you’d
finished
with
it
could
cause
an

oops.

Although
an
oops
doesn’t
immediately
freeze
the
whole
computer,
kernel-level
code
crashes
when
umounting
disk
storage
are
worrisone
enough
that
a
well-informed
user
would
probably
want
to
shut
down
as
soon
as
possible,
in
case
of
ongoing
trouble
leading
to
data
corruption…

…but
some
users
reported
that
the
oops
prevented
what’s
known
in
the
jargon
as
an

orderly
shutdown,
requiring
forcibly
cycling
the
power,
by
holding
down
the
power
button
for
several
seconds,
or
temporarily
cutting
the
mains
supply
to
a
server.

The
good
news
is
that
kernels

6.2.4
and

6.1.17
were
immediately
released
over
the
weekend
to
roll
back
the
problems.

Given
the
velocity
of
Linux
kernel
releases,
those
updates
have
already
been
followed
by

6.2.5
and

6.1.18,
which
were
themselves
updated
(today,
2023-03-13)
by

6.2.6
and

6.1.19.

What
to
do?

If
you
are
using
a
6.x-version
Linux
kernel
and
you
aren’t
already
bang
up-to-date,
make
sure
you
don’t
install
6.2.3
or
6.1.16
along
the
way.

If
you’ve
already
got
one
of
those
versions
(we
had
6.2.3
for
a
couple
of
days
and
were
unable
to
provoke
a
driver
crash,
presumably
because
our
kernel
configuration
shielded
us
inadvertently
from
triggering
the
bug),
consider
updating
as
soon
as
you
can…

…because
even
if
you
haven’t
suffered
any
disk-volume-based
trouble
so
far,
you
may
be
immune
by
good
fortune,
but
by
upgrading
your
kernel
again
you
will
become
immune
by
design.

EXPLORING
OOPS
AND
PANIC
EVENTS
ON
YOUR
OWN

You
will
need
a
kernel
built
from
source
code
that’s
already
installed
on
your
test
computer.

Create
a
directory,
let’s
call
it
/test/oops,
and
save
this
source
code
as
oops.c:


#include <linux/kernel.h> 
#include <linux/module.h> 
#include <linux/moduleparam.h> 
#include <linux/init.h> 

MODULE_LICENSE("GPL");

static int level = 0;
module_param(level,int,0660);
 
static int oopsinit(void) { 
   printk("oops init (level = %d)n",level);
   // level: 0->just load; 1->oops; 2->panic
   switch (level) {
      case 1:
         printk("triggering oops via BUG()n");
         BUG(); 
         break;
      case 2: 
         printk("forcing a full-on panic()n");
         panic("oops module"); 
         break;
   }
   return 0; 
} 

static void oopsexit(void) { 
   printk("oops exitn"); 
} 
 
module_init(oopsinit); 
module_exit(oopsexit);

Create
a
file
in
the
same
directory
called
Kbuild
to
control
the
build
parameters,
like
this:


 EXTRA_CFLAGS = -Wall -g
 obj-m        = oops.o

Then
build
the
module
as
shown
below.

The
-C
option
tells
make
where
to
start
looking
for
Makefiles,
thus
pointing
the
build
process
at
the
right
kernel
source
code
tree,
and
the
M=
setting
tells
make
where
to
find
the
actual
module
code
to
build
on
this
occasion.

You
must
provide
the
full,
absolute
path
for
M=,
so
don’t
try
to
save
typing
by
using
./
(the
current
directory
moves
around
during
the
build
process):


/test/oops$ make -C /where/you/built/the/kernel M=/test/oops
CC [M]  /home/duck/Articles/linuxoops/oops.o
MODPOST /home/duck/Articles/linuxoops/Module.symvers
CC [M]  /home/duck/Articles/linuxoops/oops.mod.o
LD [M]  /home/duck/Articles/linuxoops/oops.ko

You
can
load
and
unload
the
new
oops.ko
kernel
module
with
the
parameter
level=0
just
to
check
that
it
works.

Look
in
dmesg
for
a
log
of
the
init
and
exit
calls:


/test/oops# insmod oops.ko level=0
/test/oops# rmmod oops
/test/oops# dmesg
. . .
[12690.998373] oops: loading out-of-tree module taints kernel.
[12690.999113] oops init (level = 0)
[12704.198814] oops exit

To
provoke
an

oops
(recoverable)
or
a

panic
(will
hang
your
computer),
use
level=1
or
level=2
respectively.

Don’t
forget
to
save
all
your
work
before
triggering
either
condition
(you
will
need
to
reboot
afterwards),
and
don’t
do
this
on
someone
else’s
computer
without
formal
permission.

About Author

AndyC

Andy Curtis is an award-winning security consultant, researcher and public speaker. He has been working in the computer security industry since the early 1990s, having been employed by state and federal government, leading healthcare and banking providers across three continents. He has given talks about computer security for some of the world’s largest companies, worked with law enforcement agencies on investigations into hacking groups, and is a regular voice on TV and radio explaining IT security threats.

See author's posts