BU40N MT1959 firmware format: packed region at 0x158000 (compression method known?)

Discussion of LibreDrive mode, compatible drives and firmwares
Post Reply
ibizara
Posts: 1
Joined: Mon Jun 22, 2026 12:33 pm

BU40N MT1959 firmware format: packed region at 0x158000 (compression method known?)

Post by ibizara »

I'm comparing BU40N 1.00 and 1.03 firmware internals while investigating OmniDrive behaviour and noticed a large packed-looking region that differs almost completely between firmware versions despite having a very similar header structure.

Firmware images examined:

Code: Select all

BU40N_1.00_stock.bin
MD5: edb28fcd7a239281ace26a468d382a9c

BU40N_1.03_MK.bin
MD5: 74ebaf627d2aac5f899191d6caceb54c
The 1.00 image is the original LG firmware. The 1.03_MK image is the MakeMKV/LibreDrive patched firmware based on LG 1.03.

Looking at the raw firmware images, there appears to be a module beginning at offset 0x158000.

BU40N 1.00

Code: Select all

offset:       0x158000
field 1:      0x00072464 (468,068)
field 2:      0x0004E98F (321,935)
region end:   0x1A698F
BU40N 1.03_MK

Code: Select all

offset:       0x158000
field 1:      0x00071C60 (466,016)
field 2:      0x0004E3B1 (320,433)
region end:   0x1A63B1
The start of the region looks like:

Code: Select all

0x158000:
00072464
0004E98F
07060605
07070706
08070706
08080808
...
and the corresponding structure in 1.03_MK is almost identical.

After the first 8 bytes there is approximately 0x140 bytes of highly structured low-valued data, followed by a high-entropy stream that differs almost completely between firmware versions.

The interesting part is that the 0x140-byte structure does not appear random. Treating it as code-length data produced the following result:

First 288 entries → Kraft sum = 1.0
Last 32 entries → Kraft sum = 1.0
Entire 320-byte table → Kraft sum = 2.0

This suggests the region contains two complete prefix-code (Huffman-like) tables, arranged as:

Code: Select all

0x158000  size field
0x158004  size field
0x158008  288-byte code-length table
0x158128  32-byte code-length table
0x158148  compressed bitstream
The table structure is nearly identical between 1.00 and 1.03_MK, while the compressed stream contents are almost entirely different.

I extracted the stream and tested the obvious formats:

Code: Select all

zlib
raw deflate
gzip
bzip2
lzma/xz
All failed.

I also looked at MediaTek's documented ALICE firmware compression (used in some MTK products). While there are some conceptual similarities (table-driven compressed instruction streams), this BU40N format does not appear to be a standard ALICE container.

Questions:
  1. Has anyone identified the compression/packing method used for this MT1959 firmware block?
  2. Does the decompressor reside inside the firmware itself, or in MT1959 boot ROM / mask ROM?
  3. Does this region contain executable ARM code, servo/DSP microcode, or some other firmware component?
  4. Has this area ever been reverse engineered by the LibreDrive / MakeMKV developers or anyone working on MTK optical drive firmware?
I'm mainly trying to understand the firmware format and whether this packed region has ever been decoded or modified successfully.
RibShark
Posts: 17
Joined: Mon Apr 29, 2019 6:27 pm

Re: BU40N MT1959 firmware format: packed region at 0x158000 (compression method known?)

Post by RibShark »

ibizara wrote: Mon Jun 22, 2026 12:40 pm Questions:
  1. Has anyone identified the compression/packing method used for this MT1959 firmware block?
  2. Does the decompressor reside inside the firmware itself, or in MT1959 boot ROM / mask ROM?
  3. Does this region contain executable ARM code, servo/DSP microcode, or some other firmware component?
  4. Has this area ever been reverse engineered by the LibreDrive / MakeMKV developers or anyone working on MTK optical drive firmware?
  1. I failed to work out what it was when I tried.
  2. The decompression code is in the firmware it seems, as ARM code.
  3. It contains THUMB code; various areas in the firmware jump to this code via a thunk. It's always decompressed to the same place so the addresses are static.
  4. Not sure about the MakeMKV dev but I haven't tried much to reverse the compression algorithm yet.
Lemme know if you are able to work this out, would be super helpful (I'm the OmniDrive dev). Right now I'm relying on RAM dumps from the drive which have this part decompressed, but being able to do this and recompress back into the firmware could open up some doors.
ibizara
Posts: 1
Joined: Mon Jun 22, 2026 12:33 pm

Re: BU40N MT1959 firmware format: packed region at 0x158000 (compression method known?)

Post by ibizara »

Small update / WIP.

I made some progress on the 0x158000 packed block in BU40N 1.00.

The original table split still looks correct:

Code: Select all

0x158008  288-byte literal/length code-length table
0x158128   32-byte distance code-length table
0x158148  compressed bitstream
I now have an experimental Python decoder that expands the BU40N 1.00 block from:

Code: Select all

compressed:   0x72464
decompressed: 0x4e98f
The format appears to be a custom canonical-Huffman + LZ77-style scheme, but not standard DEFLATE.

Current working assumptions:

Code: Select all

bitstream:       MSB-first
Huffman:         canonical codes, bit-reversed for lookup
symbol 256:      literal zero, not EOF
symbols 257-287: length symbols
distance:        raw distance symbol + 1
This produces an output file of the advertised decompressed size. As a sanity check, the decoded output contains:
0x27b76: CAETDVD_59110933
So it is definitely producing structured data from the packed block.

Important caveat: I do not think this is 100% solved yet. The output contains plausible Thumb-looking code and strings, but it does not currently decompile cleanly as one linear ARM/Thumb image. There may still be a small semantic difference in the decoder, a relocation/fixup step, a second transform, or simply mixed code/data/microcode in the decompressed payload.

Here is the current Python script:

Code: Select all

#!/usr/bin/env python3
import argparse
import struct
from pathlib import Path

LBASE = [
    3, 4, 5, 6, 7, 8, 9, 10,
    11, 13, 15, 17, 19, 23, 27, 31,
    35, 43, 51, 59, 67, 83, 99, 115,
    131, 163, 195, 227, 258, 258, 258,
]

LEXT = [
    0, 0, 0, 0, 0, 0, 0, 0,
    1, 1, 1, 1, 2, 2, 2, 2,
    3, 3, 3, 3, 4, 4, 4, 4,
    5, 5, 5, 5, 0, 0, 0,
]


class BitReader:
    def __init__(self, data: bytes):
        self.data = data
        self.bitpos = 0

    def read(self, n: int) -> int:
        value = 0

        for i in range(n):
            if self.bitpos >= len(self.data) * 8:
                raise EOFError("ran out of compressed input")

            byte = self.data[self.bitpos >> 3]
            bit = (byte >> (7 - (self.bitpos & 7))) & 1
            value |= bit << i
            self.bitpos += 1

        return value


def reverse_bits(value: int, width: int) -> int:
    out = 0
    for _ in range(width):
        out = (out << 1) | (value & 1)
        value >>= 1
    return out


def build_canonical_table(lengths: bytes) -> dict[tuple[int, int], int]:
    counts: dict[int, int] = {}

    for length in lengths:
        if length:
            counts[length] = counts.get(length, 0) + 1

    code = 0
    next_code: dict[int, int] = {}

    for bits in range(1, max(counts.keys(), default=0) + 1):
        code = (code + counts.get(bits - 1, 0)) << 1
        next_code[bits] = code

    table: dict[tuple[int, int], int] = {}

    for symbol, length in enumerate(lengths):
        if not length:
            continue

        canonical = next_code[length]
        next_code[length] += 1

        # Required for this stream.
        stored_code = reverse_bits(canonical, length)
        table[(stored_code, length)] = symbol

    return table


def decode_symbol(br: BitReader, table: dict[tuple[int, int], int]) -> int:
    code = 0

    for length in range(1, 32):
        code |= br.read(1) << (length - 1)

        symbol = table.get((code, length))
        if symbol is not None:
            return symbol

    raise ValueError(f"bad Huffman code at bit {br.bitpos}")


def decompress_partition(firmware: bytes, offset: int = 0x158000) -> tuple[bytes, int, int, int]:
    compressed_size, output_size = struct.unpack_from("<II", firmware, offset)

    lit_table_off = offset + 8
    dist_table_off = lit_table_off + 288
    stream_off = dist_table_off + 32

    lit_lengths = firmware[lit_table_off:lit_table_off + 288]
    dist_lengths = firmware[dist_table_off:dist_table_off + 32]
    stream = firmware[stream_off:stream_off + compressed_size]

    lit_tree = build_canonical_table(lit_lengths)
    dist_tree = build_canonical_table(dist_lengths)

    br = BitReader(stream)
    out = bytearray()

    while len(out) < output_size:
        symbol = decode_symbol(br, lit_tree)

        if symbol < 256:
            out.append(symbol)
            continue

        # In this format, symbol 256 behaves as literal zero.
        if symbol == 256:
            out.append(0)
            continue

        length_index = symbol - 257

        if length_index < 0 or length_index >= len(LBASE):
            raise ValueError(
                f"bad length symbol {symbol} at output={len(out):#x}, bit={br.bitpos}"
            )

        length = LBASE[length_index]
        extra_bits = LEXT[length_index]

        if extra_bits:
            length += br.read(extra_bits)

        distance_symbol = decode_symbol(br, dist_tree)

        # Unlike DEFLATE, this currently appears to use raw distance symbols.
        distance = distance_symbol + 1

        if distance <= 0 or distance > len(out):
            raise ValueError(
                f"invalid distance {distance} at output={len(out):#x}, bit={br.bitpos}"
            )

        for _ in range(length):
            out.append(out[-distance])

            if len(out) >= output_size:
                break

    return bytes(out), compressed_size, output_size, br.bitpos


def main() -> None:
    parser = argparse.ArgumentParser(
        description="Experimental BU40N 1.00 0x158000 partition decoder"
    )
    parser.add_argument("firmware", help="input BU40N firmware .bin")
    parser.add_argument("-o", "--output", default="decoded_158000.bin")
    parser.add_argument("--offset", default="0x158000")

    args = parser.parse_args()

    firmware = Path(args.firmware).read_bytes()
    offset = int(args.offset, 0)

    decoded, compressed_size, output_size, bits_used = decompress_partition(
        firmware, offset
    )

    Path(args.output).write_bytes(decoded)

    print(f"partition offset:   {offset:#x}")
    print(f"compressed size:    {compressed_size:#x}")
    print(f"decompressed size:  {len(decoded):#x}/{output_size:#x}")
    print(f"bits consumed:      {bits_used}")
    print(f"wrote:              {args.output}")


if __name__ == "__main__":
    main()
Run with:

Code: Select all

python3 decode_158000.py BU40N_1.00_stock.bin

partition offset:   0x158000
compressed size:    0x72464
decompressed size:  0x4e98f/0x4e98f
bits consumed:      1632522
wrote:              decoded_158000.bin

strings -a -tx decoded_158000.bin | grep CAETDVD
Expected string:

Code: Select all

27b76 CAETDVD_59110933
If anyone has a RAM dump of this region after the drive has decompressed it, comparing that against this output would probably show exactly what is still missing.
Last edited by ibizara on Sun Jun 28, 2026 2:35 pm, edited 1 time in total.
Post Reply