Serg Iakovlev

Introduction
Показано , как с помощью inline assembly получить доступ к hardware I/O из си . Использован AT&T синтаксис .

The Basics
Пример базового доступа к I/O :

// --------------------------------------------------------------------------
// From NOPE-OS
// --------------------------------------------------------------------------
// arminb@aundb-online.de
// --------------------------------------------------------------------------

/* получить байт из порта*/
inline unsigned char inportb(unsigned int port)
{
unsigned char ret;

asm volatile ("inb %%dx,%%al":"=a" (ret):"d" (port));
return ret;
}

/* записать байт в порт */
/* July 6, 2001 added space between :: to make code compatible with gpp */
inline void outportb(unsigned int port,unsigned char value)
{
asm volatile ("outb %%al,%%dx": :"d" (port), "a" (value));
}

Памятка
Код читает и пишет байты . Если вам нужны слова или двойные слова , нужно изменить размер регистров (al, ax, eax) и возвращаемые аргументы . На момент использования функции девайс должен быть задисэблен .

Пример
Следующий пример показывает как прочитать из CMOS's extended memory. Пример работает из-под DOS.

#include <stdio.h>

/* Input a byte from a port */
inline unsigned char inportb(unsigned int port)
{
unsigned char ret;

asm volatile ("inb %%dx,%%al":"=a" (ret):"d" (port));
return ret;
}

/* Output a byte to a port */
/* July 6, 2001 added space between :: to make code compatible with gpp */
inline void outportb(unsigned int port,unsigned char value)
{
asm volatile ("outb %%al,%%dx": :"d" (port), "a" (value));
}

/* Stop Interrupts */
inline void stopints()
{
asm ("cli");
}

unsigned char highmem, lowmem;
unsigned int mem;

int main()
{
/* need to stop ints before accessing the CMOS chip */
stopints();

/* write to port 0x70 with the CMOS register we want to read */
/* 0x30 is the CMOS reg that hold the low byte of the mem count */
outportb(0x70,0x30);

/* read CMOS values from port 0x71 */
lowmem = inportb(0x71);

/* write to port 0x70 with the CMOS register we want to read */
/* 0x31 is the CMOS reg that hold the high byte of the mem count */
outportb(0x70,0x31);

/* read CMOS values from port 0x71 */
highmem = inportb(0x71);

/* fix the low and high bytes into one value */
mem = highmem;
mem = mem<<8;
mem += lowmem;

printf("\nOld style CMOS extended memory count is %uk.\n", mem);
}

Ссылки

*	DJGPP QuickAsm Programming Guide

*	Inline Assembler in DJGPP

Mixing Assembly and C
by Chase

About This Guide
Здесь рассказывается о том , как использовать C и Assembly для создания операционных систем на x86.

System Requirments

*	Intel compatible 386 or greater cpu

*	4 megabytes of memory

*	DOS compatible boot disk

*	color vga adapter

GCC/DJGPP
Для начала создадим простой текстовой файл :

int main(void)
{
repeat:
goto repeat;
}

Сохраним его как "kernel32.c" . Выполним компиляцию :

gcc -ffreestanding -c -o kernel32.o kernel32.c

Линковка :

ld -Ttext 0x100000 --oformat binary -o kernel32.bin kernel32.o

Должно вылететь предупреждение "warning: cannot find entry symbol start; defaulting to 00100000" .
gcc компилит исходник в обьектный файл . ld линкует обьектный файл в бинарный , который должен загрузиться по адресу 0x100000 (адрес 1-го мегабайта , куда будет положено ядро). Обратите внимание - мы могли бы использовать одну команду gcc без ld , и компилятор сам бы вызвал линковщик неявно , но мы делаем по-другому . Дело в том , что мы собираемся слинковать несколько обьектных файлов . Список параметров :

*	-ffreestanding -- генерация кода , которому не нужна никакая ось (т.е.:kernel code)

*	-c -- компиляция , но не линковка (создание обьектного файла)

*	-o -- задание имени обьектного файла

Список параметров для ld :

*	--oformat -- формат binary

*	-Ttext -- адрес , с которого загружается код

*	-o -- имя создаваемого файла

For complete desriptions of all command options you can look at the online manuals for gcc and ld.

NASM
NASM which stands for Netwide Assembler is "a free portable assembler for the Intel 80x86 microprocessor" . You can find out where to download NASM at the NASM website. If you're running a type of unix look in the pakage collection of your distribution for a copy of NASM. NASM, just like DJGPP, works via the command line. This means that you write your code in any program that can produce text files and then compile from a prompt. If you downloaded the Windows version you may want to rename "nasmw.exe" to "nasm.exe" . The first thing we'll be doing is to make an assembly code kernel that does the same thing as our C kernel, nothing.
Create a text file that contains the following code:

[BITS 32]
repeat: jmp repeat

Save as "kernel32.asm" . From the same location as the file enter the command

nasm -f coff -o kernel32.o kernel32.asm

If you recieved no errors then enter the command

ld -Ttext 0x100000 --oformat binary -o kernel32.bin kernel32.o

You should get a warning that says "warning: cannot find entry symbol start; defaulting to 00100000" .
Copy the new kernel32.bin over to the same location as the old kernel32.bin . Try loading your new kernel just like the old one. When you run the loader your system should hang just like the C kernel did.
To make it easier to test our kernels it's going to be alot easier if we can just return to DOS after running them. After all having to hit reset after testing each kernel gets old very fast. To accomplish this you need to know a little about how the loader works. The loader reads "kernel32.bin" into memory and places it at the first megabyte of memory. Then the loader sets up all selectors to access the first four megabytes of memory and executes a far call to the first instruction at 0x100000. So to return to the loader from the kernel all we have to do is execute a far return. The loader then reenables interupts, frees any memory it used, and returns to DOS.
Create a text file that contains the following code:

[BITS 32]
retf

Save as "kernel32.asm" . From the same location as the file enter the command

nasm -f coff -o kernel32.o kernel32.asm

If you recieved no errors then enter the command

ld -Ttext 0x100000 --oformat binary -o kernel32.bin kernel32.o

You should get a warning that says "warning: cannot find entry symbol start; defaulting to 00100000" .
Try loading your new kernel just like the old one. When you run the loader you should be returned back to a DOS prompt. Be sure not to mess up the stack in your kernels, otherwise the far return won't work and anything could happen. Here is a list of what each of the paramaters we'll be using means for nasm :

*	-f -- specify the output format, we'll be using coff (coff is a type of object file)

*	-o -- specify the name of the file that is created

To display a "Hi" message just like the sample kernel, make a kernel that contains the following code:

[BITS 32]
mov byte [es:0xb8f9c],'H'
mov byte [es:0xb8f9e],'i'
retf

Since es points to a selector who's base addess is zero and the color text area starts at 0xb8000 the letters H and i are displayed near the end of a standard 80x25 text display. We'll discuss display adapters in a video article(hopefully), for now all you need to know is that to write a character to the display you just copy it's ASCII value to 0xb8000 to get it to show up in the upper left corner. To display a character in any other location just add 2 to 0xb8000 for evey place to the right. Text wraps down to the start of the next row when you reach the end of the column.

Mixing C and Assembly
Most of the following text is taken directly from the nasm docs

External Symbol Names
Most 32-bit C compilers share the convention used by 16-bit compilers, that the names of all global symbols (functions or data) they define are formed by prefixing an underscore to the name as it appears in the C program. However, not all of them do: the ELF specification states that C symbols do not have a leading underscore on their assembly-language names.

Function Definitions and Function Calls
The C calling convention in 32-bit programs is as follows. In the following description, the words caller and callee are used to denote the function doing the calling and the function which gets called.

*	The caller pushes the function's parameters on the stack, one after another, in reverse order (right to left, so that the first argument specified to the function is pushed last).

*	The caller then executes a near CALL instruction to pass control to the callee.

The callee receives control, and typically (although this is not actually necessary, in functions which do not need to access their parameters) starts by saving the value of ESP in EBP so as to be able to use EBP as a base pointer to find its parameters on the stack. However, the caller was probably doing this too, so part of the calling convention states that EBP must be preserved by any C function. Hence the callee, if it is going to set up EBP as a frame pointer, must push the previous value first.

*	Update: 1-21-2001 GCC based compilers also expect EBX EDI and ESI to be preserved by any function.

The callee may then access its parameters relative to EBP . The doubleword at [EBP] holds the previous value of EBP as it was pushed; the next doubleword, at [EBP+4] , holds the return address, pushed implicitly by CALL . The parameters start after that, at [EBP+8] . The leftmost parameter of the function, since it was pushed last, is accessible at this offset from EBP ; the others follow, at successively greater offsets. Thus, in a function such as printf() which takes a variable number of parameters, the pushing of the parameters in reverse order means that the function knows where to find its first parameter, which tells it the number and type of the remaining ones.

*	The callee may also wish to decrease ESP further, so as to allocate space on the stack for local variables, which will then be accessible at negative offsets from EBP .

*	The callee, if it wishes to return a value to the caller, should leave the value in AL , AX or EAX depending on the size of the value. Floating-point results are typically returned in ST0 .

*	Once the callee has finished processing, it restores ESP from EBP if it had allocated local stack space, then pops the previous value of EBP , and returns via RET (equivalently, RETN ).

When the caller regains control from the callee, the function parameters are still on the stack, so it typically adds an immediate constant to ESP to remove them (instead of executing a number of slow POP instructions). Thus, if a function is accidentally called with the wrong number of parameters due to a prototype mismatch, the stack will still be returned to a sensible state since the caller, which knows how many parameters it pushed, does the removing.

Thus, you would define a function in C style in the following way:

global _myfunc
_myfunc: push ebp
mov ebp,esp
sub esp,0x40 ; 64 bytes of local stack space
mov ebx,[ebp+8] ; first parameter to function
; some more code
leave ; mov esp,ebp / pop ebp
ret

At the other end of the process, to call a C function from your assembly code, you would do something like this:

extern _printf
; and then, further down...
push dword [myint] ; one of my integer variables
push dword mystring ; pointer into my data segment
call _printf
add esp,byte 8 ; `byte' saves space
; then those data items...
segment _DATA
myint dd 1234
mystring db 'This number -> %d <- should be 1234',10,0

This piece of code is the assembly equivalent of the C code

int myint = 1234;
printf("This number -> %d <- should be 1234\n", myint);

Accessing Data Items
To get at the contents of C variables, or to declare variables which C can access, you need only declare the names as GLOBAL or EXTERN . (Again, the names require leading underscores.) Thus, a C variable declared as int i can be accessed from assembler as

extern _i
mov eax,[_i]

And to declare your own integer variable which C programs can access as extern int j , you do this (making sure you are assembling in the _DATA segment, if necessary):

global _j
_j dd 0

To access a C array, you need to know the size of the components of the array. For example, int variables are four bytes long, so if a C program declares an array as int a[10] , you can access a[3] by coding mov ax,[_a+12] . (The byte offset 12 is obtained by multiplying the desired array index, 3, by the size of the array element, 4.) The sizes of the C base types in 32-bit compilers are: 1 for char , 2 for short , 4 for int , long and float , and 8 for double . Pointers, being 32-bit addresses, are also 4 bytes long.
To access a C data structure, you need to know the offset from the base of the structure to the field you are interested in. You can either do this by converting the C structure definition into a NASM structure definition (using STRUC ), or by calculating the one offset and using just that.
To do either of these, you should read your C compiler's manual to find out how it organises data structures. NASM gives no special alignment to structure members in its own STRUC macro, so you have to specify alignment yourself if the C compiler generates it. Typically, you might find that a structure like

struct {
char c;
int i;
} foo;

might be eight bytes long rather than five, since the int field would be aligned to a four-byte boundary. However, this sort of feature is sometimes a configurable option in the C compiler, either using command-line options or #pragma lines, so you have to find out how your own compiler does it.

Helper Macros for the 32-bit C Interface
If you find the underscores inconvenient, you can define macros to replace the GLOBAL and EXTERN directives as follows:

%macro cglobal 1
global _%1
%define %1 _%1
%endmacro

%macro cextern 1
extern _%1
%define %1 _%1
%endmacro

(These forms of the macros only take one argument at a time; a %rep construct could solve this.)
If you then declare an external like this:

cextern printf

then the macro will expand it as

extern _printf
%define printf _printf

Thereafter, you can reference printf as if it was a symbol, and the preprocessor will put the leading underscore on where necessary.
The cglobal macro works similarly. You must use cglobal before defining the symbol in question, but you would have had to do that anyway if you used GLOBAL .
Included in the NASM archives, in the misc directory, is a file c32.mac of macros. It defines three macros: proc , arg and endproc . These are intended to be used for C-style procedure definitions, and they automate a lot of the work involved in keeping track of the calling convention.
An example of an assembly function using the macro set is given here:

proc _proc32
%$i arg
%$j arg
mov eax,[ebp + %$i]
mov ebx,[ebp + %$j]
add eax,[ebx]
endproc

This defines _proc32 to be a procedure taking two arguments, the first (i ) an integer and the second (j ) a pointer to an integer. It returns i + *j .
Note that the arg macro has an EQU as the first line of its expansion, and since the label before the macro call gets prepended to the first line of the expanded macro, the EQU works, defining %$i to be an offset from BP . A context-local variable is used, local to the context pushed by the proc macro and popped by the endproc macro, so that the same argument name can be used in later procedures. Of course, you don't have to do that.
arg can take an optional parameter, giving the size of the argument. If no size is given, 4 is assumed, since it is likely that many function parameters will be of type int or pointers.

Our first mixed kernel
Create a text file that contains the following code:

extern void sayhi(void);
extern void quit(void);
int main(void)
{
sayhi();
quit();
}

Save as "mix_c.c" .
Create another text file that contains the following code:

[BITS 32]
GLOBAL _sayhi
GLOBAL _quit
SECTION .text
_sayhi: mov byte [es:0xb8f9c],'H'
mov byte [es:0xb8f9e],'i'
ret
_quit: mov esp,ebp
pop ebp
retf

Save as "mix_asm.asm" .
From the same location as the files enter the commands

gcc -ffreestanding -c -o mix_c.o mix_c.c
nasm -f coff -o mix_asm.o mix_asm.asm

If you recieved no errors then enter the command

ld -Ttext 0x100000 --oformat binary -o kernel32.bin mix_c.o mix_asm.o

You should get a warning that says "warning: cannot find entry symbol start; defaulting to 00100000" . Copy the new kernel32.bin over to the same location as the old kernel32.bin . Try loading your new kernel just like the old one. When you run the loader your system should display "Hi" in the bottom right corner of your screen and you should be returned to the prompt.

Additional Information
When linking your object files your code will appear inside of your output file in the order of the input files. Also when using constants in your C code such as myfunc("Hello"); gcc based compilers will put your constants in the code segment before the beginning of the function in which it's declared. When jumping or calling binary outputted C code you have three options to avoid this problem. You can create a function at the beginning your C code without constants thats calls or jumps to the next function. You can link another file (assembly or C) before your C code that is just there to call your C code. And your last option is too use the gcc option -fwritable-strings to move your constants into the data segment.
There is a problem with ld on Linux. The problem is that the ld that comes with linux distros lists support for the coff object format , but apparently you have to rebuilt binutils from http://www.gnu.org to get it working. I found two possible solutions. Recompile ld, or under edit your assembly files and remove all the leading underscores. Then when you assemble with nasm use the-f aout option instead of coff. I've tested the second method briefly and it works.

About The Loader
The loader in this lesson makes a small GDT with selectors for the first 4 megabytes of memory and puts them in the segment registers before calling the kernel. It also leaves all interrupts disabled while the kernel runs. Don't try to enable int's in your kernel with this loader because a protected mode IDT is never setup. Different lessons will be using different loaders, so don't assume that you don't need to download the loader for whatever lesson you're on. If your want to take a look, the source for the loader is here.

Оставьте свой комментарий !

Ваше имя:

Комментарий:

Оба поля являются обязательными

Автор	Комментарий к данной статье