fishhook 源码笔记

网上关于 fishhook 的源码解读、原理分析的文章已经很多了。这篇文章仅是自己对 fishhook 的学习总结笔记。其实是之前看的 fishhook,但总感觉不写下来的话,一段时间不看,有些知识点就忘了,所以还是出一篇博客记录下吧。

为了开发一个完整的程序,将程序的各功能分成各模块,模块单独开发,开发完成后单独编译,获得最终的静态产物(如 iOS 里的静态库 .a)。当所有模块开发完成之后,将所有的静态产物文件放在一起进行链接,所有的文件打包到最终的可执行文件中,即可运行起最终的程序。这就是静态链接

静态链接解决了代码复用、协同工作等问题。但因为所有的静态产物会在链接过程中打包进最终的可执行文件中,这就导致最终的可执行文件会比较大。而且如果某个模块出现 bug 或改动,就需要该模块重新编译,整个程序重新链接、重新下载安装才能生效。为了解决这样的问题,出现了动态链接。像 UIKit、Foundation 等系统库,为了所有 App 可以复用同一份代码、节省存储空间、即时更新,可在程序运行期间进行加载动态库,需要用到的动态库符号,在运行期进行寻址、绑定。

fishhook 的工作原理正是程序运行期更改符号的地址,以实现方法 hook。比如动态库方法 NSLog,系统在程序运行期加载动态库,寻找 NSLog 的符号地址进行绑定为 0x00000001,该地址函数功能为输出打印内容到控制台。fishhook 的工作则是找到 NSLog 的符号地址将其绑定为自己提供的函数地址 0x00000002,该地址函数功能为拼接自定义数据输出打印内容到控制台。具体实现后续详解。

动态库符号地址查找与绑定

Mach-O 中 __DATA 段有两个 Section 与动态符号绑定有关系

非懒加载符号指针表(Non-Lazy Symbol Pointers,__got:存储了 non-lazily 绑定的符号,这些符号在 Mach-O 加载的时候绑定完成。

懒加载符号指针表(Lazy Symbol Pointers,__la_symbol_ptr:存储了 lazy 绑定的方法,这些方法在第一次调用时,由 dyld_stub_binder 进行绑定。

第一次访问 NSLog 符号的时候先去 stub,stub 告诉从 __la_symbol_ptr 查找,__la_symbol_ptr 表示还没有 NSLog 符号真实函数地址,需要动态绑定,于是去 __got 查找 dyld_stub_binder 函数的地址,进行查找真实的 NSLog 地址。找到后调用 NSLog 函数,并把这个地址保存进 __la_symbol_ptr。下次调用 NSLog 函数的时候在 __la_symbol_ptr 就能得到真实地址进行跳转。

通过 MachOView 可以查看这两个表:


为了更直观的了解动态库符号地址的查找与绑定过程,接下来通过断点系统函数一步步跟踪 NSLog 符号执行过程,主要是为了验证懒加载符号的查找过程。

新建一个空工程,在 VC 的 viewDidLoad 方法里写上两行 NSLog 的调用代码,并分别打上断点。注意保证这里第一行的 NSLog 函数是在当前工程运行后第一次被调用。

运行后程序会断点到第一行 NSLog 代码,此时通过 lldb 的 dis 命令进行反汇编当前函数,得到的结果如下:

bl 是汇编的跳转指令,意味着下一步程序将跳转到 0x10452e0dc 这个地址。如上图,给这个地址打个断点,跳到下一步执行:

ldr 汇编指令,x16 寄存器 + 0x5f98(即 0x000000010452e52c),通过 br 命令跳转到 x16 寄存器指定的地址处。注意,现在这里只有一个函数地址,在符号懒加载并绑定地址之后,再执行的时候这里就能直接执行目标函数了。如上图,给这个地址打个断点,跳到下一步执行:

b 汇编指令,立即跳转到目标地址 0x10452e508。如上图,给这个地址打个断点,跳到下一步执行:

ldr 汇编指令,x16 寄存器 + 0x5b30(即 0x0000000184ac9474,注意图中标注,这个地址的函数是 dyld_stub_binder,用来寻址动态库符号并进行绑定的。)通过 br 命令跳转到 x16 寄存器指定的地址处。如上图,给这个地址打个断点,跳到下一步执行:

这就进入了 dyld_stub_binder 函数,可以看到这个函数里又将跳转到 0x184acb034 地址,注释为 _dyld_fast_stub_entry(void*, long) 函数。总之就是执行了一系列桩函数用来符号寻址并进行符号绑定。

点击进行下一步断点的时候,又回到了我们自己的程序,到了第二行 NSLog 代码,再次通过 lldb 的 dis 命令进行反汇编当前函数,得到的结果如下:

如上图,函数将通过 bl 命令跳转到 0x10452e0dc 处。同样的,给这个地址打个断点,跳到下一步执行:

ldr 汇编指令,x16 寄存器 + 0x5f98(即 0x00000001861eaba8),
通过 br 命令跳转到 x16 寄存器指定的地址处。注意,从图中就能看出这里已经知道了 NSLog 的地址了,下一步将直接执行 NSLog 函数。如上图,给这个地址打个断点,跳到下一步执行:

这里就直接进入了 NSLog 函数,没有那些 dyld_stub_binder 函数的寻址过程了。

通过以上断点过程,能够验证动态库符号的懒加载过程。

fishhook 使用及验证

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
- (void)fishhook_nslog {
NSLog(@"fishhook before"); // fishhook before

struct rebinding rebindingLog;
// 需要 hook 的方法名
rebindingLog.name = "NSLog";
// 替换函数
rebindingLog.replacement = myLog;
// 保存原本函数指针
rebindingLog.replaced = (void **)&sys_nslog;

struct rebinding rebindings[] = {rebindingLog};

rebind_symbols(rebindings, 1);

NSLog(@"fishhook after"); // fishhook after---->🍺🍺🍺
}

// 函数指针,用来保存原来的函数
static void (*sys_nslog)(NSString *format, ...);

// 替换函数(注意:不定参数未处理)
void myLog(NSString * _Nonnull format, ...) {
NSString *message = [format stringByAppendingString:@"---->🍺🍺🍺"];
(*sys_nslog)(message);
}

使用上一章节断点的方式验证 hook 之后的结果。在上一章节的断点过程中已经知道了 NSLog 在第一次被调用通过 dyld_stub_binder 函数寻找符号地址并绑定之后,再次被调用的时候已经能够知道 NSLog 的符号地址了。这里我们在 hook 代码之后后的 NSLog 调用时打断点看看结果如何:


能够看到,执行 NSLog 时将会跳转到 (void *)0x0000000100a07ca8: myLog at /Users/gonghonglou/Desktop/HookDemo/Example/HookDemo/HKViewController.m:156。给 0x0000000100a07ca8 地址打上断点,点击执行下一步程序将会跳转到我们自己定义的 myLog 函数内,即 HKViewController.m:156 处。

配合 MachoView 验证

如果用 MachoView 查看当前项目的 Mach-O 文件,能够看到 NSLog 符号的地址偏移量,加上程序的起始地址就能得到 NSLog 符号地址,通过 dis 命令反汇编目标符号地址同样能够验证符号更改绑定过程。

通过上图能够看到 NSLog 的 offset 是 0x3c078
我们已经验证了 NSLog 符号第一次被调用时会去 stub 函数查找绑定,这里我们就跳过这一过程,将断点打在第二次调用 NSLog 符号的位置:


通过 image list 打印所有 image 的地址,主工程的起始地址为:0x0000000100a18000,通过 x 命令获取该地址加上 offset 的地址内容,iOS 里为小端模式,从后往前读该地址存储的内容为:0x01861eaba8,通过 dis 命令反汇编这个地址即可看到函数内容为 NSLog。

跳过这一断点,fishhook 重新绑定符号之后,断点落在了下图位置:

通过 x 命令重新获取主工程起始地址加上 offset 的地址内容为:0x0100a2bcb4,再次通过 dis 命令反汇编这个地址即可看到函数内容改为了 myLog。

Mach-O 格式文件

接下来为了清楚 fishhook 工作的具体原理,首先要准备的基础知识就是 Mach-O 格式文件。

Mach-O(Mach Object File Format)描述了 macOS 系统上可执行文件的格式。一个典型的 Mach-O 文件格式如下图:

或通过 MachOView 工具更直观的查看,如下图:

可在 mach-o/loader.h 文件内找到 Mach-O 相关的结构体,接下来大概了解一些结构体。(该文件内的结构体包含 x86arm64 两种,为节约篇幅,这里就只示例 arm64 结构了)

从 Mach-O 文件格式图里能够看出,Mach-O 文件主要包含以下三个部分:

1、Header:记录了 cpu 架构、文件类型能信息。
2、Load commands:包含了很多个 Segment command 加载命令,存储了 Mach-O 的布局信息。
3、Data:包含了很多个 Segment,每个 Segmengt 又包含了很多的 section,记录了具体的代码数据。

mach_header_64

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/*
* The 64-bit mach header appears at the very beginning of object files for
* 64-bit architectures.
*/
struct mach_header_64 {
uint32_t magic; /* mach magic number identifier */
cpu_type_t cputype; /* cpu specifier */
cpu_subtype_t cpusubtype; /* machine specifier */
uint32_t filetype; /* type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
uint32_t reserved; /* reserved */
};

上边图片里恰好能看到 Mach64 Header 结构内容。
magic:魔数,内容有:

1
2
3
/* Constant for the magic field of the mach_header_64 (64-bit architectures) */
#define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */
#define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */

cputype:cpu 类型,内容有 CPU_TYPE_ARM、CPU_TYPE_ARM64 等。
filetype:文件类型,内容有:

1
2
3
4
5
6
7
#define	MH_OBJECT	0x1		/* relocatable object file */
#define MH_EXECUTE 0x2 /* demand paged executable file */
#define MH_DYLIB 0x6 /* dynamically bound shared library */
#define MH_DYLINKER 0x7 /* dynamic link editor */
#define MH_DSYM 0xa /* companion file with only debug
sections */
......

ncmds:load command 数量。
sizeofcmds:所有 load command 大小。
flags:文件标记,内容有:

1
2
3
4
5
6
7
8
9
10
/* Constants for the flags field of the mach_header */
#define MH_NOUNDEFS 0x1 /* the object file has no undefined
references */
#define MH_INCRLINK 0x2 /* the object file is the output of an
incremental link against a base file
and can't be link edited again */
#define MH_DYLDLINK 0x4 /* the object file is input for the
dynamic linker and can't be staticly
link edited again */
......

load_command

1
2
3
4
struct load_command {
uint32_t cmd; /* type of load command */
uint32_t cmdsize; /* total size of command in bytes */
};

cmd:load command 的类型,内容有:

1
2
3
4
5
6
7
8
/* Constants for the cmd field of all load commands, the type */
#define LC_SEGMENT 0x1 /* segment of this file to be mapped */
#define LC_SYMTAB 0x2 /* link-edit stab symbol table info */
#define LC_SYMSEG 0x3 /* link-edit gdb symbol table info (obsolete) */
#define LC_THREAD 0x4 /* thread */
#define LC_UNIXTHREAD 0x5 /* unix thread (includes a stack) */
#define LC_LOADFVMLIB 0x6 /* load a specified fixed VM shared library */
......

cmdsize:load command 总计大小。

segment_command_64

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/*
* The 64-bit segment load command indicates that a part of this file is to be
* mapped into a 64-bit task's address space. If the 64-bit segment has
* sections then section_64 structures directly follow the 64-bit segment
* command and their size is reflected in cmdsize.
*/
struct segment_command_64 { /* for 64-bit architectures */
uint32_t cmd; /* LC_SEGMENT_64 */
uint32_t cmdsize; /* includes sizeof section_64 structs */
char segname[16]; /* segment name */
uint64_t vmaddr; /* memory address of this segment */
uint64_t vmsize; /* memory size of this segment */
uint64_t fileoff; /* file offset of this segment */
uint64_t filesize; /* amount to map from the file */
vm_prot_t maxprot; /* maximum VM protection */
vm_prot_t initprot; /* initial VM protection */
uint32_t nsects; /* number of sections in segment */
uint32_t flags; /* flags */
};

section_64

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
/*
* A segment is made up of zero or more sections. Non-MH_OBJECT files have
* all of their segments with the proper sections in each, and padded to the
* specified segment alignment when produced by the link editor. The first
* segment of a MH_EXECUTE and MH_FVMLIB format file contains the mach_header
* and load commands of the object file before its first section. The zero
* fill sections are always last in their segment (in all formats). This
* allows the zeroed segment padding to be mapped into memory where zero fill
* sections might be. The gigabyte zero fill sections, those with the section
* type S_GB_ZEROFILL, can only be in a segment with sections of this type.
* These segments are then placed after all other segments.
*
* The MH_OBJECT format has all of its sections in one segment for
* compactness. There is no padding to a specified segment boundary and the
* mach_header and load commands are not part of the segment.
*
* Sections with the same section name, sectname, going into the same segment,
* segname, are combined by the link editor. The resulting section is aligned
* to the maximum alignment of the combined sections and is the new section's
* alignment. The combined sections are aligned to their original alignment in
* the combined section. Any padded bytes to get the specified alignment are
* zeroed.
*
* The format of the relocation entries referenced by the reloff and nreloc
* fields of the section structure for mach object files is described in the
* header file <reloc.h>.
*/
struct section_64 { /* for 64-bit architectures */
char sectname[16]; /* name of this section */
char segname[16]; /* segment this section goes in */
uint64_t addr; /* memory address of this section */
uint64_t size; /* size in bytes of this section */
uint32_t offset; /* file offset of this section */
uint32_t align; /* section alignment (power of 2) */
uint32_t reloff; /* file offset of relocation entries */
uint32_t nreloc; /* number of relocation entries */
uint32_t flags; /* flags (section type and attributes)*/
uint32_t reserved1; /* reserved (for offset or index) */
uint32_t reserved2; /* reserved (for count or sizeof) */
uint32_t reserved3; /* reserved */
};

symtab_command

1
2
3
4
5
6
7
8
9
10
11
12
13
/*
* The symtab_command contains the offsets and sizes of the link-edit 4.3BSD
* "stab" style symbol table information as described in the header files
* <nlist.h> and <stab.h>.
*/
struct symtab_command {
uint32_t cmd; /* LC_SYMTAB */
uint32_t cmdsize; /* sizeof(struct symtab_command) */
uint32_t symoff; /* symbol table offset */
uint32_t nsyms; /* number of symbol table entries */
uint32_t stroff; /* string table offset */
uint32_t strsize; /* string table size in bytes */
};

提供符号表、字符串表相对于 Mach-O 文件在磁盘中的文件偏移

dysymtab_command

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
struct dysymtab_command {
uint32_t cmd; /* LC_DYSYMTAB */
uint32_t cmdsize; /* sizeof(struct dysymtab_command) */

......

/*
* The sections that contain "symbol pointers" and "routine stubs" have
* indexes and (implied counts based on the size of the section and fixed
* size of the entry) into the "indirect symbol" table for each pointer
* and stub. For every section of these two types the index into the
* indirect symbol table is stored in the section header in the field
* reserved1. An indirect symbol table entry is simply a 32bit index into
* the symbol table to the symbol that the pointer or stub is referring to.
* The indirect symbol table is ordered to match the entries in the section.
*/
uint32_t indirectsymoff; /* file offset to the indirect symbol table */
uint32_t nindirectsyms; /* number of indirect symbol table entries */

......
};

提供间接符号表相对于 Mach-O 文件在磁盘中的文件偏移

nlist_64

1
2
3
4
5
6
7
8
9
10
11
12
/*
* This is the symbol table entry structure for 64-bit architectures.
*/
struct nlist_64 {
union {
uint32_t n_strx; /* index into the string table */
} n_un;
uint8_t n_type; /* type flag, see below */
uint8_t n_sect; /* section number or NO_SECT */
uint16_t n_desc; /* see <mach-o/stab.h> */
uint64_t n_value; /* value of this symbol (or stab offset) */
};

该结构体存在于 mach-o/nlist.h 文件中。表示符号表 Symbol Table 中的结构体。

fishhook 官图流程

How it works
dyld binds lazy and non-lazy symbols by updating pointers in particular sections of the __DATA segment of a Mach-O binary. fishhook re-binds these symbols by determining the locations to update for each of the symbol names passed to rebind_symbols and then writing out the corresponding replacements.
For a given image, the __DATA segment may contain two sections that are relevant for dynamic symbol bindings: __nl_symbol_ptr and __la_symbol_ptr. __nl_symbol_ptr is an array of pointers to non-lazily bound data (these are bound at the time a library is loaded) and __la_symbol_ptr is an array of pointers to imported functions that is generally filled by a routine called dyld_stub_binder during the first call to that symbol (it’s also possible to tell dyld to bind these at launch). In order to find the name of the symbol that corresponds to a particular location in one of these sections, we have to jump through several layers of indirection. For the two relevant sections, the section headers (struct sections from <mach-o/loader.h>) provide an offset (in the reserved1 field) into what is known as the indirect symbol table. The indirect symbol table, which is located in the __LINKEDIT segment of the binary, is just an array of indexes into the symbol table (also in __LINKEDIT) whose order is identical to that of the pointers in the non-lazy and lazy symbol sections. So, given struct section nl_symbol_ptr, the corresponding index in the symbol table of the first address in that section is indirect_symbol_table[nl_symbol_ptr->reserved1]. The symbol table itself is an array of struct nlists (see <mach-o/nlist.h>), and each nlist contains an index into the string table in __LINKEDIT which where the actual symbol names are stored. So, for each pointer __nl_symbol_ptr and __la_symbol_ptr, we are able to find the corresponding symbol and then the corresponding string to compare against the requested symbol names, and if there is a match, we replace the pointer in the section with the replacement.
The process of looking up the name of a given entry in the lazy or non-lazy pointer tables looks like this:

我们结合 MachOView 走一遍 fishhook 的流程图,验证一遍过程。

1、在 Lazy Symbol Pointers Table 里找到 NSLog,其所在角标为 5

2、在 Indirect Symbols Table 里角标 5 位置处找到 NSLog,Data 为 0xA11

3、将 Indirect Symbols Table 里找到的 NSlog 的 Data 值转为十进制(0xA11 = 2577)。在 Symbols Table -> Symbols 表中角标 2577 位置处找到 NSLog,Data 为 0xBE4

4、在 String Table 里找到起始位置为 0x50BB8,加上 Indirect Symbols 表里找到的 NSlog 的 Data 值 0xBE4,即:0x50BB8 + 0xBE4 = 0x5179c。在 String Table 里找到 0x5179c 位置,果然找到 _NSLog 字符。


此时匹配到目标符号后即可修改懒加载符号表里目标位置处绑定的符号地址了,将其修改为我们自定义的 myLog 函数地址即完成了 hook 操作。

fishhook 源码解读

struct rebinding

1
2
3
4
5
struct rebinding {
const char *name; // 被 hook 的目标函数名称(旧函数名称)
void *replacement; // 替换函数地址(新函数地址)
void **replaced; // 被 hook 的目标函数地址的指针(旧函数地址的指针)
};

在 fishhook.h 里能够看到这个 rebinding 结构体,用来存储 hook 信息。

struct rebindings_entry

1
2
3
4
5
6
7
struct rebindings_entry {
struct rebinding *rebindings; // 将要 hook 的函数的 rebinding 数组
size_t rebindings_nel; // 数组长度
struct rebindings_entry *next; // 下个 entry,类似链表结构
};

static struct rebindings_entry *_rebindings_head; // rebindings_entry head 入口

_rebindings_head 静态变量,存储 rebindings_entry 入口
通过 next 遍历 rebindings_entry 结构体
每个 rebindings_entry 结构体里通过 rebindings_nel 遍历将要 hook 的函数的 rebindings 数组。

在后续的 perform_rebinding_with_section 方法里能够看到具体的遍历规则

rebind_symbols_image

1
2
3
4
5
6
7
8
9
10
int rebind_symbols_image(void *header,
intptr_t slide,
struct rebinding rebindings[],
size_t rebindings_nel) {
struct rebindings_entry *rebindings_head = NULL;
int retval = prepend_rebindings(&rebindings_head, rebindings, rebindings_nel);
rebind_symbols_for_image(rebindings_head, header, slide);
free(rebindings_head);
return retval;
}

指定 image 进行 hook,逻辑同 rebind_symbols 方法,具体分析看 rebind_symbols 方法。

rebind_symbols

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
int rebind_symbols(struct rebinding rebindings[], size_t rebindings_nel) {
// 将传入的 rebindings 数组存储到全局的链表中
int retval = prepend_rebindings(&_rebindings_head, rebindings, rebindings_nel);
// 异常直接返回
if (retval < 0) {
return retval;
}
// 如果是第一次调用,注册 image 加载回调
// If this was the first call, register callback for image additions (which is also invoked for
// existing images, otherwise, just run on existing images
if (!_rebindings_head->next) {
_dyld_register_func_for_add_image(_rebind_symbols_for_image);
} else {
// 不是第一次调用,遍历所有的 image 进行 hook
uint32_t c = _dyld_image_count();
for (uint32_t i = 0; i < c; i++) {
_rebind_symbols_for_image(_dyld_get_image_header(i), _dyld_get_image_vmaddr_slide(i));
}
}
return retval;
}

prepend_rebindings

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
static int prepend_rebindings(struct rebindings_entry **rebindings_head,
struct rebinding rebindings[],
size_t nel) {
// 创建一个新的 rebindings_entry 结构体
struct rebindings_entry *new_entry = malloc(sizeof(struct rebindings_entry));
if (!new_entry) {
return -1;
}
// 给 rebindings 开辟空间,大小即传入的 rebindings 数组的大小
new_entry->rebindings = malloc(sizeof(struct rebinding) * nel);
if (!new_entry->rebindings) {
free(new_entry);
return -1;
}
// 将传入的 rebindings 赋值给 new_entry->rebindings
memcpy(new_entry->rebindings, rebindings, sizeof(struct rebinding) * nel);
// 将传入的 rebindings 的 长度赋值给 new_entry->rebindings_nel
new_entry->rebindings_nel = nel;
// 将全局变量 rebindings_head 设置为链表的头部
new_entry->next = *rebindings_head;
*rebindings_head = new_entry;
return 0;
}

_rebind_symbols_for_image

1
2
3
4
static void _rebind_symbols_for_image(const struct mach_header *header,
intptr_t slide) {
rebind_symbols_for_image(_rebindings_head, header, slide);
}

该方法要么通过 _dyld_register_func_for_add_image 注册回调被调用,要么遍历所有的 image 主动调用,传入 headerslide

1
extern void _dyld_register_func_for_add_image(void (*func)(const struct mach_header* mh, intptr_t vmaddr_slide))    __OSX_AVAILABLE_STARTING(__MAC_10_1, __IPHONE_2_0);

rebind_symbols_for_image

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
static void rebind_symbols_for_image(struct rebindings_entry *rebindings,
const struct mach_header *header,
intptr_t slide) {
Dl_info info;
// dladdr 函数获取 header 地址的符号信息
if (dladdr(header, &info) == 0) {
return;
}

segment_command_t *cur_seg_cmd;
segment_command_t *linkedit_segment = NULL;
struct symtab_command* symtab_cmd = NULL;
struct dysymtab_command* dysymtab_cmd = NULL;

uintptr_t cur = (uintptr_t)header + sizeof(mach_header_t);
// 遍历 load commands,寻找 linkedit segment、symtab command、dysymtab command
for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
cur_seg_cmd = (segment_command_t *)cur;
if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
if (strcmp(cur_seg_cmd->segname, SEG_LINKEDIT) == 0) {
// cmd == LC_SEGMENT_64 && segname == SEG_LINKEDIT 时,即为 linkedit segment
// 注:
// #define LC_SEGMENT_64 0x19 /* 64-bit segment of this file to be mapped */
// #define SEG_LINKEDIT "__LINKEDIT" /* the segment containing all structs */
linkedit_segment = cur_seg_cmd;
}
} else if (cur_seg_cmd->cmd == LC_SYMTAB) {
// cmd == LC_SYMTAB 即为 symtab command
// 注:
// #define LC_SYMTAB 0x2 /* link-edit stab symbol table info */
symtab_cmd = (struct symtab_command*)cur_seg_cmd;
} else if (cur_seg_cmd->cmd == LC_DYSYMTAB) {
// cmd == LC_DYSYMTAB 即为 dysymtab command
// 注:
// #define LC_DYSYMTAB 0xb /* dynamic link-edit symbol table info */
dysymtab_cmd = (struct dysymtab_command*)cur_seg_cmd;
}
}

// 未找到目标信息,直接返回
if (!symtab_cmd || !dysymtab_cmd || !linkedit_segment ||
!dysymtab_cmd->nindirectsyms) {
return;
}

// 查找符号表/字符串表地址
// Find base symbol/string table addresses
uintptr_t linkedit_base = (uintptr_t)slide + linkedit_segment->vmaddr - linkedit_segment->fileoff;
nlist_t *symtab = (nlist_t *)(linkedit_base + symtab_cmd->symoff);
char *strtab = (char *)(linkedit_base + symtab_cmd->stroff);

// 获取间接符号表
// Get indirect symbol table (array of uint32_t indices into symbol table)
uint32_t *indirect_symtab = (uint32_t *)(linkedit_base + dysymtab_cmd->indirectsymoff);

cur = (uintptr_t)header + sizeof(mach_header_t);
// 遍历 load commands,寻找懒加载符号表(S_LAZY_SYMBOL_POINTERS)、非懒加载符号表(S_NON_LAZY_SYMBOL_POINTERS),进行更改符号地址
for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
cur_seg_cmd = (segment_command_t *)cur;
if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
if (strcmp(cur_seg_cmd->segname, SEG_DATA) != 0 &&
strcmp(cur_seg_cmd->segname, SEG_DATA_CONST) != 0) {
continue;
}
for (uint j = 0; j < cur_seg_cmd->nsects; j++) {
section_t *sect =
(section_t *)(cur + sizeof(segment_command_t)) + j;
if ((sect->flags & SECTION_TYPE) == S_LAZY_SYMBOL_POINTERS) {
// 注:
// #define S_LAZY_SYMBOL_POINTERS 0x7 /* section with only lazy symbol pointers */
perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
}
if ((sect->flags & SECTION_TYPE) == S_NON_LAZY_SYMBOL_POINTERS) {
// 注:
// #define S_NON_LAZY_SYMBOL_POINTERS 0x6 /* section with only non-lazy symbol pointers */
perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
}
}
}
}
}

perform_rebinding_with_section

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
static void perform_rebinding_with_section(struct rebindings_entry *rebindings,
section_t *section,
intptr_t slide,
nlist_t *symtab,
char *strtab,
uint32_t *indirect_symtab) {
// 根据 section 里记录的 reserved1 查找 Indirect Symbols
// 注:
// uint32_t reserved1; /* reserved (for offset or index) */
uint32_t *indirect_symbol_indices = indirect_symtab + section->reserved1;
// 根据随机偏移量 slide + section 地址找到符号指针表,以修改其中的符号绑定
void **indirect_symbol_bindings = (void **)((uintptr_t)slide + section->addr);
// 遍历 section,查找目标符号
for (uint i = 0; i < section->size / sizeof(void *); i++) {
// 根据 section 角标位置,在 Indirect Symbols 里相应位置找到目标符号位置
uint32_t symtab_index = indirect_symbol_indices[i];
if (symtab_index == INDIRECT_SYMBOL_ABS || symtab_index == INDIRECT_SYMBOL_LOCAL ||
symtab_index == (INDIRECT_SYMBOL_LOCAL | INDIRECT_SYMBOL_ABS)) {
continue;
}
// 根据在 Indirect Symbols 里找到目标符号位置,在 symtab 符号表里查找字符串表偏移位置
// 注:
// union {
// uint32_t n_strx; /* index into the string table */
// } n_un;
uint32_t strtab_offset = symtab[symtab_index].n_un.n_strx;
// 字符串表加上偏移位置即可找到符号名称
char *symbol_name = strtab + strtab_offset;
struct rebindings_entry *cur = rebindings;
// 遍历 rebindings_entry,匹配出目标符号,进行符号地址重绑定
while (cur) {
// 遍历每个 rebindings_entry 了的 rebindings 数组
for (uint j = 0; j < cur->rebindings_nel; j++) {
if (strlen(symbol_name) > 1 &&
strcmp(&symbol_name[1], cur->rebindings[j].name) == 0) {
if (cur->rebindings[j].replaced != NULL &&
indirect_symbol_bindings[i] != cur->rebindings[j].replacement) {
// 将匹配到的目标符号地址赋值给 replaced
*(cur->rebindings[j].replaced) = indirect_symbol_bindings[i];
}
// 匹配到目标符号,进行符号地址重绑定 replacement
indirect_symbol_bindings[i] = cur->rebindings[j].replacement;
goto symbol_loop;
}
}
cur = cur->next;
}
symbol_loop:;
}
}

至此,fishhook 的源码笔记就结束了。

Reference