Super Smash Bros Ultimate Data.ark unpacking

hackeronetwoone · Post by **hackeronetwoone** » Mon Nov 26, 2018 8:02 pm

Hi
I am trying to unpack files from super smash bros ultimate.
someone on gbatemp (https://gbatemp.net/threads/smash-ultim ... st-8398077) already wrote an unpacker but this only getx out 2GB of files from the .arc format, there are lots of files still in there.
The header is weird, strings seem to be encrypted or compressed, file header is unknown, file type is unknown.
But i dont think it is encrypted as Fix unpacker doesnt decrypt anything.
Here is a screenshot of the hex of the start of the file.

Can anyone provide a guide or links to how we can start to unpack this.

Thanks

Edit1: I have tried to use quickbms with arc2 script, but this creates lots of DAT files and doesnt extract the music, which means its the incorrect format so arc2 doesnt work.

aluigi · Post by **aluigi** » Mon Nov 26, 2018 8:19 pm

If those 64 bit fields in the image are correct, it looks like that data.arc is 14Gb huge

I see a NUS3 header at 0x38, that one is a file.
I think the 2Gb you are talking about are just the 64bit at offset 0x10 which is indeed that size

hackeronetwoone · Post by **hackeronetwoone** » Mon Nov 26, 2018 8:21 pm

Yes the fields are correct, it is 14GB well spotted!
So do you think that this 14GB file only has 2GB worth of data?
Aluigi, if you have a guide on how i can figure this out it owuld be helpful. I would like to learn so i can write unpackers for games in the future!

edit: Some research online said it could be using Lz10 format, but that according to online looks backwards, the header of this file starts at the normal position, so i am not sure if its possible lossless LZ10

aluigi · Post by **aluigi** » Mon Nov 26, 2018 8:52 pm

Usually with the analysis of big archives it's enough to collect the first and last N megabytes (usually 2 or 10 Mb), but in this case we already know that the first file is over 2Gb so it's quite a problem to download a so huge file just for trying to figure out the format.

Anyway there is an alternative way, for example you can upload just a part of this arc file.
The following script for quickbms (quickbms_4gb_files.exe) creates two files that you can upload for analysis:

Code: Select all

log "upload.dat" 0x87fe38b8 2097152
goto -2097152
savepos OFFSET
log "upload2.dat" OFFSET 2097152

It's even possible that the header is just those 0x38 bytes at the beginning and so there are only 4 files in that arc

hackeronetwoone · Post by **hackeronetwoone** » Mon Nov 26, 2018 8:59 pm

aluigi wrote:Usually with the analysis of big archives it's enough to collect the first and last N megabytes (usually 2 or 10 Mb), but in this case we already know that the first file is over 2Gb so it's quite a problem to download a so huge file just for trying to figure out the format.

Anyway there is an alternative way, for example you can upload just a part of this arc file.
The following script for quickbms (quickbms_4gb_files.exe) creates two files that you can upload for analysis:
Code: Select all
log "upload.dat" 0x87fe38b8 2097152
goto -2097152
savepos OFFSET
log "upload2.dat" OFFSET 2097152
It's even possible that the header is just those 0x38 bytes at the beginning and so there are only 4 files in that arc

Ow no, you are mistaken, the first file is not 2GB. I mean the program extracts about 2-3GB of files. it extracts 97 WebM files (861MB) and 1,358 LOPUS files (1.61GB).

Here are the files with quick_bms.

Edit: Looks like there's 00 00 00 A0 at the start/end of new files?

hackeronetwoone · Post by **hackeronetwoone** » Mon Nov 26, 2018 10:51 pm

I think 2MB wont be enough since some files are large so
Here is the first 20 MB and last 20 MB of the file via script.
And first 100MB and last 100 MB.
Uploaded to mega

https://mega.nz/#F!rS4wHCiB!ne2JrWP-a9AJNXDS292NZA

aluigi · Post by **aluigi** » Tue Nov 27, 2018 11:35 am

The 2 Gb of data extracted by that tool aren't the archived files but they are the data included in the first file (NUS3 type).

Can you tell me what's the exact total size of data.arc?
Is it 14'435'753'256 bytes?

upload.dat is of no help unfortunately, I expected some structures or useful things but I was wrong.
There is something that look like an index at the end of upload2 but it's probably a false positive, something unrelated.

hackeronetwoone · Post by **hackeronetwoone** » Tue Nov 27, 2018 12:10 pm

aluigi wrote:The 2 Gb of data extracted by that tool aren't the archived files but they are the data included in the first file (NUS3 type).

Can you tell me what's the exact total size of data.arc?
Is it 14'435'753'256 bytes?

upload.dat is of no help unfortunately, I expected some structures or useful things but I was wrong.
There is something that look like an index at the end of upload2 but it's probably a false positive, something unrelated.

Yes I think so.
Someone had a arc extractor here as well https://github.com/shinyquagsire23/arcshark this works and extracts the files
maybe it will help with the quickbms version

some more info according to https://twitter.com/ShinyQuagsire
- That I can tell, it looks like SSBU has no filenames for files in data.arc. Or rather, it has filenames, but they're all hashed with an inline function before it reaches data.arc
- It's interesting though because it looks like they also hash each file/folder name in some tables as well, so there's a level of flexibility beyond just hashing entire paths.

aluigi · Post by **aluigi** » Tue Nov 27, 2018 2:31 pm

Why don't you use that extractor?
Is it no longer compatible with the format?

Anyway I made a skeleton of script on the fly based on that main.cpp but it can't work without testing and tuning, I leave it here just to avoid to do the job from scratch again in case someone wants to return on it:

Code: Select all

callfunction arc_header 1

goto offset_4
callfunction offset4_header 1
callfunction offset4_ext_header 1

for i = 0 < bgm_unk_movie_entries
    callfunction entry_triplet 1
next i
for i = 0 < entries
    callfunction entry_pair 1
next i
for i = 0 < entries
    get off4_nums long
next i
for i = 0 < entries_2
    callfunction file_pair 1
next i
for i = 0 < num_files
    callfunction entry_triplet 1
next i
for i = 0 < 0xE
    callfunction big_hash_entry 1
next i
for i = 0 < entries_big
    callfunction big_file_entry 1

    math offset + offset_2
    log "" OFFSET comp_size

next i
for i = 0 < 0x248f73
    callfunction entry_pair 1
next i
for i = 0 < 0x89b11
    callfunction quad_entries 1
next i
for i = 0 < entries_big
    callfunction entry_pair 1
next i
for i = 0 < entries_3
    callfunction entry_pair 1
next i

goto offset_5
callfunction offset5_header 1

for i = 0 < entries
    callfunction entry_pair 1
next i
for i = 0 < 0x247a1
    callfunction entry_pair 1
next i
for i = 0 < entries
    callfunction entry_pair 1
next i
for i = 0 < 0x71a94
    callfunction entry_pair 1
next i
/*
for i = 0 < entries_2
    get entires_5 long
next i
*/
for i = 0 < entries_2
    callfunction entry_pair 1
next i


startfunction arc_header
#{
    uint64_t magic;
    uint64_t offset_1;
    uint64_t offset_2;
    uint64_t offset_3;
    uint64_t offset_4;
    uint64_t offset_5;
    uint64_t offset_6;
endfunction #} arc_header;

startfunction offset5_header
#{
    uint64_t total_size;
    uint32_t entries;
    uint32_t entries_2;
    uint32_t something2;
endfunction #} offset5_header;

startfunction offset4_header
#{
    uint32_t total_size;
    uint32_t entries_big;
    uint32_t entries_bigfiles_1;
    uint32_t entries;
    
    uint32_t entries_2;
    uint32_t entries_3;
    uint32_t unk6;
    uint32_t entries_4;
    
    uint32_t entries_bigfiles_2;
    uint32_t unk8;
    uint32_t unk9;
    uint32_t unk10;
    
    uint32_t unk11;
endfunction #} offset4_header;

startfunction offset4_ext_header
#{
    uint32_t bgm_unk_movie_entries;
    uint32_t entries;
    uint32_t entries_2;
    uint32_t num_files;
endfunction #} offset4_ext_header;

startfunction entry_triplet
#{
    uint32_t a;
    uint8_t b;
    uint8_t c[3];
    uint32_t d;
endfunction #} entry_triplet;

startfunction entry_pair
#{
    uint32_t a;
    uint8_t b;
    uint8_t c[3];
endfunction #} entry_pair;

startfunction file_pair
#{
    uint64_t size;
    uint64_t offset;
endfunction #} file_pair;

startfunction big_hash_entry
#{
    callfunction entry_pair 1 hash1
    callfunction entry_pair 1 hash2
    callfunction entry_pair 1 hash3
    callfunction entry_pair 1 hash4
    uint32_t unk;
    uint32_t unk2;
    uint32_t unk3;
    uint16_t unk4;
    uint16_t unk5;
    uint32_t unk6;
endfunction #} big_hash_entry;

startfunction big_file_entry
#{
    uint64_t offset;
    uint32_t decomp_size;
    uint32_t comp_size;
    uint32_t unk;
    uint32_t unk2;
    uint32_t unk3;
endfunction #} __attribute__((packed)) big_file_entry;

startfunction quad_entry
#{
    uint32_t a;
    uint32_t b;
    uint32_t c;
    uint32_t d;
endfunction #} quad_entry;

hackeronetwoone · Post by **hackeronetwoone** » Tue Nov 27, 2018 3:06 pm

Thansk I will give it ago.

Spy Ring · Post by **Spy Ring** » Thu Mar 07, 2019 2:58 pm

For data.arc extraction use this tool: https://github.com/Ploaj/ArcCross/releases/download/LatestCommit/ArcCross.zip

ZenHAX

Super Smash Bros Ultimate Data.ark unpacking

Super Smash Bros Ultimate Data.ark unpacking

Re: Super Smash Bros Ultimate Data.ark unpacking

Re: Super Smash Bros Ultimate Data.ark unpacking

Re: Super Smash Bros Ultimate Data.ark unpacking

Re: Super Smash Bros Ultimate Data.ark unpacking

Re: Super Smash Bros Ultimate Data.ark unpacking

Re: Super Smash Bros Ultimate Data.ark unpacking

Re: Super Smash Bros Ultimate Data.ark unpacking

Re: Super Smash Bros Ultimate Data.ark unpacking

Re: Super Smash Bros Ultimate Data.ark unpacking

Re: Super Smash Bros Ultimate Data.ark unpacking