'22-1학기' 카테고리의 글 목록

22-1학기

System SW

2022. 4. 21. 19:01

1. Introduction

- System SW 소개

System SW vs Application, History of OS

2. Unix overview

- Unix and Linux : 특징과 장단점, 구성

- Kernel & User mode, logging in, Shell, File system, Program & Process, Error handling, User identification, Signals, Time values

3. Files and directories

- File system = file data + file attribute

- 파일 관련 함수

4. Time and date

- 시간 관련 함수

5. Process control

- Process란, process 상태, identifiers, process termination

- fork, vfork, exit, wait, waitpid, exec

3. Files and directories

overview

< File system = file data + file attribute >

file attribute : Type, permission, size, time, user/group ID

파일 정보

관련 함수 : stat() & fstat() & lstat()

- 파일 정보 get -> return 0 : OK / -1 : error

#include <sys/stat.h>
- int stat(const char *pathname, struct stat *buf) : file에 대한 information
- int fstat(int filedes, struct stat *buf) : 열려 있는 file information
- int lstat(const char *pathname, struct stat *buf) : symbolic link에 대한 자체적 정보, pathname이 sym아니면 stat과 똑같음

관련 구조체

struct stat {
	mode_t st_mode; 	/* file type & mode (permissions) */
	inode
	device num
 	links number
  	uid_t st_uid;
  	gid_t st_gid;
  	st_size
  	time_t st_atime;	/* time of last access */
  	time_t st_mtime;	/* time of last modification */
  	time_t st_ctime;	/* time of last file status change */
}

관련 변수

st_mode

4bit / 3bit / 9bit

file type : socket, symbolic link, regular, block special, directory, character special, FIFO
- Regular file : 가장 흔함, data 포함, UNIX에서는 data가 text인지 binary인지 구분 x (applications에서 interpret)
- Directory : 다른 파일의 이름과 해당 파일의 정보에 대한 포인터 포함
- Block special file : 고정된 크기의 buffer로 I/O acess (E.g. disk)
- Character special file : 가변 크기 + unbuffered I/O access (E.g. keyboard, mouse)
- FIFO : process간 통신에 사용 ( = pipe )
- Socket : process간 network 통신에 사용
- Symbolic link : 다른 file을 가리킴

special bits

permission bits

+ file type macro

ex) S_ISREG(buf.st_mode) -> 참이면 1 아니면 0

파일 권한

관련 정보 :

real ID, effective ID
- real user(group) ID : 실제 사용자
- effective user(group) ID : file access permission 결정
- 보통 이 둘은 같지만 setuid(setgid) bit가 set이 되면 달라질 수 있음 -> rws
- 새로운 파일의 user ID는 process의 effective ID로 설정
File access permissions
- 10bit : 1bit -> -, d, b, c ... , 3bit -> user, 3bit -> group, 3bit -> others

관련 함수 : access(), umask(), chmod()&fchmod()

- 권한 확인 ( return 0 : OK / -1 : error )

#include <unistd.h>
- int access(const char *pathname, int mode) : mode (R_OK, W_OK, X_OF, F_OK)

- 권한 mask ( return previous file mode creation mask )

#include <sys/stat.h>
- mode_t umask(mode_t cmask) : 파일 또는 디렉토리가 생성될 때에 불필요하게 많은 권한을 갖지 않도록 통제하는 함수, umask(022) 이런 식으로 씀

- 권한 변경 ( return 0 : OK / -1 : error )

#include <sys/stat.h>
- int chmod(const char *pathname, mode_t mode) : 특정 파일 권한 변경
- int fchmod(int filedes, mode_t mode) : 열려있는 파일 권한 변경
- process effective user ID == file owner ID 또는 superuser여야 권한 변경이 가능
- &~ 는 이 권한 제외, | 는 이 권한 포함. 원래 있던 권한에 추가하고 싶으면 statbuf.st_mode &~ S_IWGRP

- user ID / group ID 변경 ( return 0 : OK / -1 : error )

#include <unistd.h>
- int chown(const char *pathname, uid_t owner, gid_t group) : 특정 파일의 user/group ID 변경
- int fchown(int filedes, uid_t owner, gid_t group) : 열린 파일의 user/group ID 변경
- int lchown(const char *pathname, uid_t owner, gid_t group) : symbolic link 자체의 정보 변경
- superuser 만이 file의 owner를 바꿀 수 있다.

파일 링킹 / 복사 / 링크 제거 ..

관련 함수 : link(), unlink(), symlink(), readlink(), remove()

- 파일 링크 생성 / 제거 (hard link / symlink) ( return 0 : OK / -1 : error )

#include <unistd.h>
- int link(const char *existingpath, const char *newpath)
  - 존재하는 파일에 hard link 생성
  - creates a new directory entry, newpath that references the existing file existingpath
  - link count 증가
  - 같은 file system 내부에서만 링크 형성 가능
- int unlink(const char *pathname)
  - Removes the directory entry
  - Decrements the link count -> count가 0이 되면 file content is deleted
  - 다른 프로세스가 열고 있으면 content는 제거 안됨
- int symlink(const char *actualpath, const char *sympath)
  - create a new directory entry, sympath that points to actualpath
  - file/directory 실제로 없어도 생성 가능
  - 다른 file system이어도 생성 가능
  - Dangling link : 없는 파일에 생성 / 원본 파일 없어져도 point
- int remove(const char *pathname)
  - file은 unlink, directory는 rmdir

- symbolic link 읽기 ( return number of bytes read : OK / -1 : error )

#include <unistd.h>
- ssize_t readlink(const char *pathname, char *buf, size_t bufsize)
  - symbolic link의 origianl 파일명을 buf에 저장 (buf size가 충분해야 함)
  - 읽어들인 buffer 수 return, error 시 -1 return ( symbolic link 아니면 error )

관련 정보

- cp vs ln vs ln -s

hard link : inode를 가리킴, 같은 file system 내에 있어야 함, superuser만 directory에 hardlink 만들 수 있음
symbolic link : 다른 file system이어도 OK, 누구든 directory에 symbolic link 만들 수 있음, dangling link

파일 이름 / 위치 변경

관련 함수 : rename()

- 파일 이름/위치 변경 ( return 0 : OK / -1 : error )

#include <stdio.h>
- int rename(const char *oldname, const char *newname)
  - command : cp -> move 시 ( mv filename path ) , rename 시 ( mv existfilename newfilename )
  - oldname과 newname이 같은 file system에 존재해야 함

파일 시간 변경

관련 정보 :

time 종류
- st_atime : last-access time of file data
- st_mtime : last-modification time of file data
- st_ctime : last-change time of i-node status

struct utimbuf

struct utimbuf{
	time_t actime;
	time_t modtime;
}

관련 함수 : utime()

- 파일 시간 변경 ( return 0 : OK / -1 : error )

#include <utime.h>
- int utime(const char *pathname, const struct utimbuf *times)
  - times == NULL 이면 current time으로 설정
  - st_ctime은 utime이 호출되면 자동적으로 update (현재 시각으로)

directory 생성 / 제거

관련 함수 : mkdir(), rmdir()

- directory 생성 ( return 0 : OK / -1 : error )

#include <sys/stat.h>
- int mkdir(const char *pathname, mode_t mode)
  - create a new empty directory
  - dot & dot-dot은 자동 생성

- directory 제거 ( return 0 : OK / -1 : error )

#include <unistd.h>
- int rmdir(const char *pathname)
  - Delete an empty directory

directory read

관련 정보

Write permission bits for a directory
- Means that we can create/remove files in the directory
- Does not mean that we can write to the directory itself.
- We need some APIs that can deal with directory itself.
DIR
- represents a directory stream
- <dirent.h>
struct dirent

struct dirent{
	ino_t d_ino;			/* i-node number */
   	char d_name[NAME_MAX+1];	/* null-terminated filename */
}

관련 함수 : opendir(), closedir(), readdir(), rewinddir()

- directory 열/닫기 ( return pointer : OK / NULL : error ) ( return 0 : OK / -1 : error )

#include <dirent.h>
- DIR *opendir(const char *pathname)
- int closedir(DIR *dp)

- directory entry 읽기 ( return pointer : OK / NULL : end of directory or error)

#include <dirent.h>
- struct dirent *readdir(DIR *dp)
  - directory entry를 읽어와서 dirent struct 반환
  - 순서
    1. 한 번 call하면 첫번째 directory entry를 읽음
    2. 1번 과정이 끝난 후 directory pointer가 다음 entry로 넘어감
    3. directory의 끝이라면 NULL return
- void rewinddir(DIR *dp)
  - directory pointer를 처음으로 돌림

working directory

관련 함수 : chdir()&fchdir(), getcwd()

- 현재 working directory 변경 ( return 0 : OK / -1 : error )

#include <unistd.h>
- int chdir(const char *pathname)
- int fchdir(int filedes)
- .c 파일로 실행시키면 shell의 current working directory는 변경 x
  - shell은 기본적으로 각각의 program을 서로 독립적인 process로 실행하기 때문
  - shell의 current working process를 변경하고 싶다면 built-in command인 "cd" 사용

- 현재 working directory 확인 ( return buf : OK / NULL : error )

#include <unistd.h>
- char *getcwd(char *buf, size_t size)
  - size 길이의 buf에 경로 저장 ( size가 경로 + NULL을 수용 가능한 크기여야 함 )
  - 절대 경로 반환

4. Time and date

summary

time(), gettimeofday()

time_t time(time_t *calptr) -> return value of time : OK, -1 : error
- 1970.1.1 00:00 기준으로 seconds return
int gettimeofday(struct timeval *tp, void *tzp) -> return 0 always
- time()보다 높은 resolution (microsecond)
- 두번째 인자인 tzp는 timezone set인데 현재는 안쓰이므로 NULL로 설정
- struct timeval { time_t tv_sec; long tv_usec; }

gmtime(), localtime()

tm structure
- tm_sec은 윤초 때문에 59가 아님 (윤초 : 지구 자전 때문에 시간 보정을 위해 사용)
- tm_mon은 0이 1월, tm_wday도 0이 일요일
- tm_year은 현재 - 1900 ex) 2022 = 122 로 출력됨

struct tm *gmtime(const time_t *calptr) -> return pointer to broken-down time
- UTC 기준 calender time
struct tm *localtime(const time_t *calptr) -> return pointer to broken-down time
- calender time 을 local time으로 바꿔줌

mktime()

time_t mktime(struct tm *tmptr) -> return calendar time : OK / -1 : error
- broken-down time -> time_t

asctime(), ctime()

char *asctime(const struct tm *tmptr) -> return pointer to null-terminated string
char *ctime(const time_t *calptr) -> return pointer to null-terminated string
두 함수 모두26-byte string으로 표현 (data(1) command와 비슷함)

strftime()

size_t strftime(char *buf, size_t maxsize, const char *format, const struct tm *tmptr) -> return number of characters stored in array if room, 0 otherwise

%A, %B, %c %C

5. Process control

process

process
- a program in execution
- program image(text, data, stack, heap) + environment(kernel data structure, address space, open files...)
  - text : 작성된 코드 (기계어 포함)
  - data : 전역 변수, 정적 변수, 배열, 구조체 .. ( 초기화 O )
  - bss : 초기화 x 데이터
  - ------------ in disk -----------------
  - heap : 동적 할당
  - stack : 지역 변수, 매개 변수, 복귀 번지 ..
  - PCB

process state

process identifiers ( PID )
- every process has a unique process ID -> process 종료 시 재사용 가능
- process 0 (scheduler process or swapper) : part of kernel, system process
- process 1 (init process)
  - invoked at the end of the bootstrap procedure
  - initialize the UNIX system with /etc/rc* or /etc/inittab -> 읽고 시스템을 다중 사용자 형태로
  - 죽지 않음
  - speruser 권한을 가지고 실행됨

process 생성 - fork(), vfork()

pid_t fork(void)
- Create a child process
- 한 번 call하는데 두 번 return
- parent와 child는 fork()를 call한 지점에서 실행 지속
- child copy : parent's data, heap, stack -> 처음엔 공유하다가 변경 시 분리 | share : text, open files

fork 실패 이유 : system 내부에 이미 너무 많은 process, real user ID에 할당된 process 개수 초과
fork 사용
- A process wants to duplicate itself : 동시에 다른 section 수행, network server에 자주 쓰임
- A process wants to execute a different program : child는 fork()에서 return 하자마자 exec(), shell에서 자주 씀

pid_t vfork(void)
- fork()의 2번째 사용 목적에 알맞음
- 부모 process의 address space를 공유함 -> race condition이 발생하지 않도록 부모는 block
- child runs first 를 보장함

process 종료 - exit()

void exit(int status), void _Exit(int status), void _exit(int status)
- normal program termination & value of status is returned to the parent
- exit vs _Exit, _ exit
  - _ 붙은 함수는 즉시 kernel로 return
  - exit()는 cleanup processing(open streams are flushed and closed)을 거친 후 kernel로 return
process termination (둘 다 kernel에서는 같은 동작 수행)
1. normal termination
  - return from main, calling exit & _Exit & _exit
  - termination status : exit status & return value가 termination status로 바뀜
2. abnormal termination
  - calling abort, receipt of a signal
  - termination status : kernel이 비정상 종료의 이유를 나타내는 termination status를 만듦
3. termination status 존재 이유는 부모에게 어떻게 종료되었는지를 알리기 위해서
4. zombie state
  - kernel이 정보를 유지하고 있는 상태. 단, PID & termination status, CPU usage time 만 유지
  - parent가 먼저 종료하면 init process가 parent가 돼서 wait()후 termination status를 처리해줌

process 대기 - wait(), waitpid(), MACRO

wait() & waitpid()를 호출하면 children이 running 중일 때 block, child 없으면 error
pid_t wait(int *statloc) -> return child process ID : OK / -1 : error
- 이미 종료해서 좀비 상태인 child가 있으면 child의 상태와 함께 return
- 아직 종료 안됐으면 block
- 여러 child가 있으면 아무거나 하나 기다림
- staloc에다 child의 termination status를 저장함
pid_t waitpid(pid_t pid, int *statloc, int options) -> return child process ID : OK / 0 or -1 : error
- pid argument
  - pid == -1 : 아무나
  - pid > 0 : pid 일치하는 process
  - pid == 0 : 같은 group process
  - pid < 0 : wait child whose process group ID equals the absolute value of pid
- option argument
  - WNOHANG : 원하는 pid가 종료하지 않았다면 not block
  - WUNTRACES : 원하는 pid가 정지되어 있으면 not block
MACRO

program 실행 - exec() ( return -1 : error / no return : success )

program 실행
- process가 call하면 process는 new program으로 완전히 대체
  - (text, data) 만 바꿈
  - pid는 변하지 않음
- new program main() 실행
- 인자 : pathname/filename, list/vector, environ/envp[]
  - pathname(-) / filename(p)
    - filename은 PATH가 미리 설정되어 있어야함
  - arg list(l) / arg vector(v)
    - 둘 다 맨 뒤에 NULL pointer 필수
  - environ(-) / envp[](e)
    - envp[] 맨 뒤에 NULL pointer((char*) 0 or NULL) 필수
- execve만 system call

Proxy server 1

Network programming : involves writing programs that communicate with other programs across a computer network
Unix/Linux 제공 low-level network services : TCP/IP,UDP
API : Socket, XTI
client&server exaples : Web client(browser) & server, FTP client & server, telnet client & server
Proxy server
- 인터넷의 캐시 프로그램으로 최근에 가져온 자료들을 보관하는 저장장소를 의미
- 같은 자료를 다시 요구할 경우 Proxy 내의 캐시에 있는 자료를 제공함으로써 자료가 빠르게 전송
- 사용자는 Proxy를 사용해서 자료를 받아 오겠다는 사실을 호스트에 알려주어야 proxy 사용 가능
- 더 느려질 수도 있음

1. Cache implement

Cache의 기능 : Original server의 데이터를 저장함으로써 network delay 감소, HIT/MISS 판별
Hash function
- 임의의 길이의 데이터를 고정된 길이의 데이터로 매핑하는 함수
- 종류 : MD5, SHA, CRC32
Multiple processing
- on-demand manner (client가 요청하는 경우만 server 생성)
- main server는 child와 client를 이어주는 역할
- child가 종료되면 parent는 신호를 받음 -> signal handling

2. Proxy server implementation

단계
1. Simple server/client implementation -> socket 통신
2. HTTP request/response handling
  - protocol : 통신을 원하는 개체간에 어떻게 통신할 것인가를 약속한 규약
  - protocol stack : 특정 machine에서 사용하는 protocol list
  - HTTP request format -> ppt
  - HTTP response format -> ppt
  - Proxy
3. Signal handling
4. Proxy server implementation

Chapter 1

2022. 4. 12. 22:27

chapter 1

주요 주제

인터넷이란?

네트워크 엣지

네트워크 코어

delay, loss, throughput in networks

protocol layers, service models

보안

역사

1.1 What is the Internet?

인터넷이란?

구성 요소 3가지 : computing devices, communication links, packet switches

computing devices :

host = end system
running network apps

communication links

(유선) fiber, copper,/ (무선) radio, satellite
- satellite와 radio의 차이 : satelite는 단말과 바로 통신하면 단말의 파워가 아주 세야함, 가로막는 게 없으므로 고주파 (직진성 강한 주파수) 사용 가능
transmission rate : bandwidth

packet switches : routers and switches

프로토콜이란?

프로토콜은 컴퓨터 내부에서, 또는 컴퓨터 사이에서 데이터의 교환 방식을 정의하는 규칙 체계입니다.

1.2 network edge : end systems, access networks, links

network 구성 요소 : network edge, access networks & physical media, network core

Access network

end system이 edge router에 연결하기 위해선? 개인/기관/기지국 망 이용 -> bandwidth 및 share 여부 고려해야함

1. digital subscrber line (DSL) : 기존의 전화망 사용

컴퓨터 -- DSL modem -- 기존의 전화망 (집)///(야외) ------------- DSLAM ---- 기존 전화 / 인터넷 트래픽 분배
- DSLAM 부분에서 다른 집들의 전화선과 합쳐짐
특징 : downstream transmission rate가 더 높음

2. cable network

컴퓨터 -- cable modem (집)//-------(다른 집들과 공유)---------- CMTS(cable modem termination system) ---- ISP
frequency division multimplexing : 대역폭이 넓어서 선 하나를 나눠씀

3. Home network

집 내부에 switch가 있어서 다른 end system들과 연결됨.
switch --- cable or DSL modem // --- to/from headend or central office

Ethernet (Enterprise access networks)

회사나 대학 등에서 자주 쓰임

Wireless access networks

shared wireless access network connects end system to router
- via base station aka "access point"
종류
- wireless LANs : 집 wifi 개념
- wide-area wireless access : 공공 wifi 개념

Host : sends packets of data

packet transmission delay = 패킷의 길이 / 링크의 속도(도로의 폭) = L/R (sec)
- 패킷이 나오는데 걸리는 시간

Physical media

용어 : bit(운반 데이터), physical link(운반해주는 선), guided media(정해진 경로->유선),

unguided media(퍼져나가는 형태->무선), twisted pair(TP) (유선)

coax, fiber
- coax cable : 광섬유 사용 전 많이 씀, R이 큼
- fiber optic cable
  - R값이 매우 크고 error가 거의 없음
  - 가격 저렴하지만 광신호 -> 전기 신호로 바꾸는 비용이 비쌈
  - 전반사가 일어나는 범위 내에서 여러 파장을 이용함
radio
- 무선
- type : terrestrial microwave ( LAN -> 공유기 필요, wide-area -> 기지국 필요), satellite(-> propagation delay)
- 다양한 대역폭 존재
  - 주파수가 너무 높으면(적외선~자외선) : 몸에 해로움 & 직진성이 높아서 장애물이 있으면 무용지물
  - 주파수가 너무 낮으면 : 파장이 길어서 안테나도 길어져야 함 => 불편
    - 안테나의 길이 : $ d = {\lambda \over 2} $

1.3 network core : packet switching, circuit switching, network structure

Packet-switching

데이터를 패킷 단위로 쪼개서 전송하는 방식

: store-and-forward

다음 링크로 전송하기 전에 저장을 한 뒤 전달하는 방식
end-end delay : $ N x {L \over R} $

: queueing delay, loss

queueing delay : store-forward에서 store 했다가 뽑아내는 시간
loss : 대기 패킷 버퍼가 꽉 차서 store 불가 => buffer overflow 나서 정보 유실

Two key network-core functions : routing & forwarding

routing (머리쓰기) : 목적지까지 어떤 루트를 사용할지 -> 경로를 찾고 만드는 부분
forwarding (몸쓰기) : 찾은 경로를 따라 가는 것
routing algorithm : routing에서 table을 생성하고 forwarding에서 table을 읽고 움직임
- 다음 지점은 다음 지점에 가봐야지 확인 가능 ( 해당 지점마다 header value에 따른 output link 있음 )

Alternative core : circuit switching ( -> packet switching의 대안 방법 )

하나의 회선을 할당받아 데이터를 주고받는 방식
독점적이기 때문에 성능이 좋지만 비용이 비싸고 여러 명이 쓰기 힘듦
방식 : FDM, TDM
- FDM (frequency division multitasking) : 할당된 대역폭을 나누어 사용하는 방식
- TDM (time division multitasking) : 할당된 대역폭을 시간단위로 나누어 사용하는 방식

Packet switching VS Circuit switching

http://gaia.cs.umass.edu/kurose_ross/interactive/ 문제 풀어보기

packet switching은 bursty data에 좋음

Internet structure : network of networks

End system들은 access ISPs 를 통해서 인터넷에 연결됨.

-> access ISP 들은 상호연결되어야 소통 가능 => $ {n(n+1) \over 2} $ 는 너무 많음 ( $O(n^2) $ connections )

-> solution : 중간에 switch를 놓으면 됨 ( $ O(n) $ connections ) ( global ISP )

solution ISP의 변화 >

ISP 여러 구역으로 쪼개짐
ISP 그룹 간 연결(IXP) 생성
network가 점점 부분부분 생김

1.4 delay, loss, throughput in networks

delay : transmission delay ( $ L \over R $ ), packets queuing delay

loss : packet arrival rate > packet capacity

Four sources of packet delay => 홉(node) 개수만큼 반복

>> $ d_{nodal} = d_{proc} + d_{queue} + d_{trans} + d_{prop} $

$ d_{proc} $ : nodal processing, node에 들어온 뒤 처리 시간
$ d_{queue} $ : queueing delay, queue에서 대기하는 시간
$ d_{trans} $ : transmission delay, $ L \over R $
$ d_{prop} $ : propagation delay, $ d \over s $ ( d=length of physical link, s=propagation speed )

Caravan analogy

propagation delay << transmission delay (propagation 거리가 가깝거나 link 속도(R)이 저속일 때)
- 맨 뒤의 bit는 transmission 중인데 첫번째 bit는 이미 도착해있는 현상 발생

"Real" Internet delays and routes

traceroute program : 패킷이 목적지까지 갈 때 거치는 라우터와 시간을 알아볼 수 있음
- TTL 값을 1로 설정하고 전송 -> 0 되면 돌아옴 -> 하나씩 늘리면서 반복함
- 라우터의 받는 IP와 보내는 IP가 다를 수 있으므로 실질적으론 보내는 IP를 파악하는 것
- 맨 마지막에는 라우터가 아니므로 UDP 포트를 아주 크게 설정해서 고의 에러로 파악

Packet loss

queue에서 발생 -> 도착하는 패킷 > 나가는 패킷 일 때 buffer overflow가 발생해서 패킷이 유실됨

Throughput

처리율
단위는 bits/time unit
instantaneous(순간)와 average(평균)
end-end througput은 작은 R에 따라 결정됨 (병목현상)

1.5 protocol layers, service models

layers

네트워크는 다음과 같은 복잡한 요소들로 이루어져 있음
- hosts, routers, links of various media, applications, protocols, hardware&software
복잡한 시스템을 다루기 위해서 분할을 함 ( but layer가 너무 많으면 오버헤드 )
Internet protocol stack : application(FTP, SMTP, HTTP), transport(TCP, UDP), network(IP, routing protocols), link(Ethernet, WiFi, PPP), physical(bits "on the wire")
- ISO/OSI 표준에는 presentation(encoding, 암호화), session(시간적인 개념)도 있지만 실제로는 안씀

Encapsulation

layer를 지날 때마다 header가 붙으면서 packaging 됨 => 오버헤드
- 실제로는 message가 길어지면 segmentation 해서 자름
목적지에 도착하면 decapsulation

1.6 networks under attack : security

network security

attack & defend
attack
- malware : virus, worm, spyware malware, 랜섬웨어, DDoS botnet
- DoS : hosts에게 바이러스를 심어두고 target에게 동시에 packet을 전송함->target(server) 고장
  - solution : 앞 단에 url 적은 가벼운 페이지로 방패막이
- packet "sniffing" : 패킷 염탐
- IP spoofing : 특정 필드의 값을 바꿈 -> 악용시에는 주로 보내는 사람 주소를 바꿔서 자기가 안 보낸 척 함

5. 검색 트리

2022. 4. 6. 15:05

이진 탐색 ( Binary Search )

이진 탐색이란?

정렬되어있는 배열에서 사용

처음과 끝 인덱스의 중간 지점보다 찾고자하는 값이

작으면 : 처음 = 중간, 끝은 그대로
크면 : 처음은 그대로, 끝 = 중간
같으면 : 반환

이진 탐색 vs 이진 탐색(검색) 트리

임의의 데이터들 -> 순서대로 정렬한 후 이진 탐색
데이터를 입력할 때부터 정렬 -> 이진 탐색(검색) 트리

코드

python 코드

####### Binary Search ######
import random
from timeit import default_timer as timer

def binary_search(x, v):
    start = 0
    end = len(x) - 1
    while start<=end:
        mid = (start+end) // 2
        if x[mid] == v : return mid
        elif x[mid] < v : start = mid + 1
        else: end = mid - 1
    return -1

x = random.sample(range(5000), 1000)
x.sort()
value = x[800]

start = timer()
index = binary_search (x, value)
print(timer() - start)

print('value ', value, 'found ', index)
print(True if index >=0 and x[index] == value else False)

c++ 코드

#include <iostream>
#include <algorithm>
using namespace std;

int A[10];

int binary_search(int target)
{
	int start = 0, end = 9;
	int mid;
	while (start <= end)
	{
		mid = (start + end) / 2;
		
		if (A[mid] == target) return mid;
		else if (A[mid] > target) end = mid - 1;
		else start = mid + 1;
	}
	return -1;
}

int main()
{
	srand(time(NULL));

	for (int i = 0; i < 10; i++)
	{
		A[i] = rand() % 100;
	}
	
	sort(A, A + 10);
	int index = binary_search(A[3]);

	cout << "A[3] = " << A[3] << endl;
	if (index > -1)
		cout << "index = " << index << ' ' << "A[index] = " << A[index] << endl;
	else
		cout << "error";

}

이진 검색 트리 ( Binary Search Tree )

이진 검색 트리란?

왼쪽 자식은 부모보다 작고 오른쪽 자식은 부모보다 큼

각 노드는 하나의 키 값을 가진다.
각 노드의 키 값은 모두 달라야 한다.
각 노드는 최대 두 개의 자식 노드를 갖는다.
각 노드의 키 값은 왼쪽의 모든 노드의 키 값보다 크고 오른쪽의 모든 노드의 키 값보다 작다.

Insert, Search, Delete

Insert

작으면 왼쪽 크면 오른쪽으로 넣음

Search

작으면 왼쪽 크면 오른쪽 찾음

Delete

1. 자식 노드가 0개인 경우 : 그냥 지움

2. 자식 노드가 1개인 경우 : 부모와 자식 이어줌

3. 자식 노드가 2개인 경우 :

오른쪽 자식에서 제일 작은 노드를 찾는다. ( 왼쪽 자식보다 큰 노드 중에서 가장 작은 노드 )
delete하고 위에서 찾은 노드를 위로 올린다.
빈 노드는 자식이 최대 하나이므로 위에서 설명한 것처럼 잇는다.

이진 검색 트리의 단점

데이터 입력 순서에 따라 성능이 달라진다.

python 코드

import random
from timeit import default_timer as timer

class Node(object):
    def __init__(self, key, parent):
        self.key = key
        self.left = None
        self.right = None
        self.parent = parent
        
def insert(node, key, parent):
    if node is None : 
        node = Node(key, parent)
    elif key < node.key : 
        node.left = insert(node.left, key, node)
    else : node.right = insert(node.right, key, node)
    return node

def search(node, key):
    if node is None or node.key == key: return node
    if key < node.key : return search(node.left, key)
    return search(node.right, key)

def delete_node(node):
    if node.left is None and node.right is None : 
        return None
    elif node.left is not None and node.right is None :
        return node.left
    elif node.left is None and node.right is not Node :
        return node.right
    else:
        s = node.right
        while s.left is not None :
            sparent = s
            s = s.left
        node.key = s.key
        if s == node.right : 
            s.right.parent = node
            node.right = s.right
        else : 
            s.right.parent = sparent
            sparent.left = s.right
    return node
            


def delete(node) : 
    if node.parent is None : node = None
    elif node is node.parent.left : node.parent.left = delete_node(node)
    else : node.parent.right = delete_node(node) 
        

x = [10,20,30]
value = 30

root = None
for i in x:
    root = insert(root, i, None)
    
start = timer()
found = search(root, value)
print(timer()-start)

if found is not None:
    print('value', value, 'found', found.key)
    print(True if found.key == value else False)
    
delete(search(root, value))
found = search(root, value)

AVL-트리

AVL-트리란?

가장 초기에 나온 균형 잡힌 이진 검색 트리
균형 이진 트리 : 각 노드의 왼쪽/오른쪽 서브 트리의 높이 차가 1 이하
- 높이 차가 2 이상이 될 경우 단일 회전 / 이중 회전

회전

- 단일 회전

자식 노드가 존재하는 부모 노드의 방향이 동일한 경우

- 이중 회전

자식 노드가 존재하는 부모 노드의 방향이 동일하지 않은 경우
- 회전을 통해 방향을 일치 시킨 후 단일 회전

스플레이 트리

스플레이 트리란?

탐색, 삽입, 삭제 시 스플레이를 하는 Tree
- Splay는 Splay tree의 기본이 되는 연산으로, 쿼리에 의해 접근한 노드를 트리 구조에 따라 특정 방식으로 rotate해서 루트까지 끌어올리는 연산 (https://blog.chodaeho.com/posts/2021/splay-tree-1/)
zig 회전 / zig-zig 회전 / zig-zag 회전

zig 회전

zig-zig 회전

zig-zag 회전

레드 블랙 트리 (Red Black Tree)

레드 블랙 트리란?

root는 블랙
완전 이진 트리 ( NIL 사용 )
모든 leaf는 블랙
레드의 자식은 블랙 ( 레드 연속 2개 불가 )
- 최소 높이는 모두 블랙인 경우
- 최대 높이는 레드-블랙 교대인 경우
root - leaf까지의 모든 경로는 같은 개수의 블랙
- 최악의 경우에도 최소 높이의 2배 이하
왼쪽 회전 / 오른쪽 회전

노드 삽입 (N이 새 노드 & 레드 노드)

case 1: 빈 트리

N을 root & 블랙으로

case 2: P가 블랙

그냥 삽입

case 3: P가 레드

case 3-1 : U가 레드

(만약 G가 root라면 3단계까지만 진행해도 됨)

case 3-2 : U가 블랙
- case 3-2-1 : P가 오른쪽 자식, N이 오른쪽 자식
- case 3-2-2 : P가 오른쪽 자식, N이 왼쪽 자식
- case 3-2-3 : P가 왼쪽 자식, N이 왼쪽 자식
- case 3-2-4 : P가 오른쪽 자식, N이 오른쪽 자식

'22-1학기 > 알고리즘' 카테고리의 다른 글

4. 선택 알고리즘 (0)	2022.04.06
3. 정렬 (0)	2022.03.22
2. 점화식과 알고리즘 복잡도 분석 (0)	2022.03.14
1. 알고리즘 표기법 (0)	2022.03.14

4. 선택 알고리즘

2022. 4. 6. 14:37

선택 알고리즘이란?

n개 중 i번째로 작은 원소를 찾는 알고리즘

방법 1

1. 퀵 정렬을 사용해서 pivot 기준으로 작으면 왼쪽, 크면 오른쪽으로 나눈다.

pivot의 위치가 i보다 크면 왼쪽 집단에서 다시 퀵 정렬을 수행
pivot의 위치가 i보다 작으면 오른쪽 집단에서 다시 퀵 정렬을 수행
- 이때, 오른쪽의 맨 앞의 원소를 0번째라고 하면 i-k째 원소를 찾아야함
pivot의 위치가 i와 같으면 pivot을 반환

이 방법의 경우 최악의 경우(pivot이 집단의 제일 작거나 큰 값으로 선택됐을 경우)에는 $ \Theta (n^2) $ 의 시간복잡도를 가진다.

방법2

1. 원소를 5개씩 그룹으로 만들고 각 그룹의 중간값을 찾는다. => $ \Theta (n) $

2. 각 그룹의 중간값의 중간값 M을 찾는다. => $ T ( \lceil{ n \over 5 }) $

3. M보다 큰 경우는 최소 $ {3 \over 5} x {n \over 2} - 2 = {3n \over 10} - 2 $

나머지 경우는 $ n - ({3n \over 10} - 2) = {7n \over 10} + 2 $

따라서 M보다 큰 값인지 작은 값인지 알 수 없는 경우는 최대 $ {7n \over 10} + 2 $ => $ T({7n \over 10} + 2)

==> 시간 복잡도

'22-1학기 > 알고리즘' 카테고리의 다른 글

5. 검색 트리 (0)	2022.04.06
3. 정렬 (0)	2022.03.22
2. 점화식과 알고리즘 복잡도 분석 (0)	2022.03.14
1. 알고리즘 표기법 (0)	2022.03.14

3. 정렬

2022. 3. 22. 18:00

정렬은 비교, 중복, 탐색을 위해서 필요함.

선택 정렬

시간 복잡도

$O(n^2)$

$ (n-1) + (n-2) + ... + 1 + cn = {n(n-1) \over 2} + cn = O(n^2) $

선택정렬이란?

가장 큰 수를 선택한 후

1. 1~n까지 가장 큰 수를 찾는다.

2. 가장 큰 수와 마지막 수의 위치를 교환한다.

3. 정렬된 수를 제외하고 정렬될 때까지 1~2를 반복한다.

c++ 코드

void selection_sort()
{
	int arr[10] = { 1,3,5,6,8,2,4,10,7,9 };

	int largest_index;

	for (int i = SIZE - 1; i > -1; i--)
	{
		largest_index = i;
		for (int j = i; j > -1; j--)
		{
			if (arr[largest_index] < arr[j])
				largest_index = j;
		}
		int temp = arr[largest_index];
		arr[largest_index] = arr[i];
		arr[i] = temp;
	}
}

Python 코드

import random
from timeit import default_timer as timer

def selection_sort(x):
    for last in range(len(x)-1, 0, -1):
        largest = 0
        for i in range(1, last+1):
            if x[i] > x[largest]:
                largest = i
        x[largest], x[last] = x[last], x[largest]

def test(x):
    for i in range(1, len(x)):
        if x[i-1] > x[i]:
            return False
    return True


data = [1, 3, 5, 6, 8, 2, 4, 10, 7, 9] 
start = timer()
selection_sort(data)
print(timer() - start)
print(data)
print(test(data))

버블 정렬

시간 복잡도

Worst : $O(n^2)$

Best : $O(n)$

$ (n-1) + (n-2) + ... + 1 + cn = {n(n-1) \over 2} + cn = O(n^2) $

버블정렬이란?

1. 두 장씩 비교하면서 큰 것을 오른쪽으로

2. 맨 오른쪽이 제일 큼

3. 맨 오른쪽 빼고 정렬될 때까지 1~2 반복

c++ 코드

void bubble_sort()
{
	int arr[10] = { 1,3,5,6,8,2,4,10,7,9 };

	bool flag = false; // stop option

	for (int i = SIZE - 1; i > -1; i--)
	{
		for (int j = 0; j < i; j++)
		{
			if (arr[j] > arr[j + 1])
			{
				flag = true;
				int temp = arr[j];
				arr[j] = arr[j + 1];
				arr[j + 1] = temp;
			}
		}
		// stop option
		if (!flag) break;
	}
}

python 코드

import random
from timeit import default_timer as timer


def bubble_sort(A):
    sorted = True # Termination conditions
    for last in range(len(A) - 1, 1, -1):
        for i in range(last):
            if A[i] > A[i + 1]:
                A[i], A[i + 1] = A[i + 1], A[i]
                sorted = False
        if sorted:
            break


def test(A):
    for i in range(1, len(A)):
        if A[i - 1] > A[i]:
            return False
    return True

x = random.sample(range(10000), 100)
start = timer()
bubble_sort(x)
print(timer() - start)
print(x)
print(test(x))

삽입 정렬

시간 복잡도

Worst : $O(n^2)$

Best : $O(n)$

$ 1 + 2 + ... + (n-1) = {n(n-1) \over 2} = O(n^2) $

삽입 정렬이란?

현재 위치 앞의 수들을 보고 현재 수가 들어갈 수 있는 곳을 찾아서 삽입

c++ 코드

void insertion_sort()
{
	int arr[10] = { 1,3,5,6,8,2,4,10,7,9 };

	for (int i = 1; i < SIZE; i++)
	{
		int loc = i - 1; //0
		int now = arr[i]; //3
		while (loc >= 0 && now < arr[loc])
		{
			arr[loc + 1] = arr[loc];
			loc--;
		}
		arr[loc + 1] = now;

	}

	print(3, arr);
}

Python 코드

def insertion_sort(A):
    for i in range(1, len(A)):
        loc = i - 1
        new_item = A[i]
        while new_item < A[loc] and loc > -1:
            A[loc+1] = A[loc] # A[loc+1], A[loc] = A[loc], A[loc+1]
            loc-=1
        A[loc+1] = new_item

data = [1, 3, 5, 6, 8, 2, 4, 10, 7, 9] 
start = timer()
insertion_sort(data)
print(timer() - start)
print(data)
print(test(data))

병합 정렬

시간 복잡도

$O(nlogn)$

마스터 정리 유형 3으로 시간 복잡도를 계산할 수 있다.

* 마스터 정리 유형 3 근사 버전:

$ T(n) = aT({n \over b}) + f(n) $

$ h(n) = n^{log_ba}$ 라고 할 때, ${f(n) \over h(n)} = \Theta (1)$ 이면 $T(n) = \Theta (h(n)logn) $

따라서,

$ T(n) = 2T({n \over 2}) + c_1n+c_2 , a = 2, b = 2, f(n) = c_1n+c_2 \\ h(n) = n^{log_2 2} $

$ {c_1n+c_2 \over n^{log_2 2}} = \Theta (1) $ 이므로 $T(n) = \Theta (n^{log_2 2} logn) = \Theta (nlogn)$

병합 정렬이란?

https://gmlwjd9405.github.io/2018/05/08/algorithm-merge-sort.html

아래 코드에서 Divide 부분이 merge_sort_fuc이고

Combine 부분이 merge라고 생각하면 된다.

c++ 코드

void merge(int list[], int left, int mid, int right)
{
	int merge_arr[SIZE];
	int i, j, k;
	i = left;
	j = mid + 1;
	k = left;

	while (i <= mid && j <= right)
	{
		if (list[i] < list[j])
			merge_arr[k++] = list[i++];
		else
			merge_arr[k++] = list[j++];
	}

	while (i <= mid)
		merge_arr[k++] = list[i++];

	while (j <= right)
		merge_arr[k++] = list[j++];

	for (int i = left; i <= right; i++)
		list[i] = merge_arr[i];
}

void merge_sort_func(int list[], int left, int right)
{
	int mid;

	if (left < right)
	{
		mid = (left + right) / 2;
		merge_sort_func(list, left, mid);
		merge_sort_func(list, mid + 1, right);
		merge(list, left, mid, right);
	}
}

void merge_sort()
{
	int arr[10] = { 1,3,5,6,8,2,4,10,7,9 };

	merge_sort_func(arr, 0, SIZE - 1);

	print(4, arr);
}

python 코드

import random
from timeit import default_timer as timer

def merge_sort(A, p, r):
    if p<r:
        q = (p+r)// 2
        merge_sort(A, p, q)
        merge_sort(A, q+1, r)
        merge(A, p, q, r)

def merge(A, p, q, r):
    i = p
    j = q+1
    t = 0
    tmp = A[:]
    while i<=q and j<=r:
        if A[i] <= A[j]:
            tmp[t] = A[i]
            i+=1
        else :
            tmp[t] = A[j]
            j+=1
        t+=1

    while i<=q:
        tmp[t] = A[i]
        t+=1
        i+=1

    while j<=r:
        tmp[t] = A[j]
        t+=1
        j+=1

    i = p
    t = 0

    while i<=r:
        A[i] = tmp[t]
        i+=1
        t+=1

def test(A):
    for i in range(1, len(A)):
        if A[i - 1] > A[i]:
            return False
    return True

x = random.sample(range(10000), 100)
start = timer()
merge_sort(x, 0, len(x)-1)
print(timer() - start)
print(x)
print(test(x))

퀵 정렬

시간 복잡도

$ O(nlogn) $

Worst

worst case는 한 개씩 쪼개지는 경우이다.

퀵 정렬의 경우 n개의 데이터가 있으면 피벗을 제외하고 n-1개의 데이터들과 비교하게 되므로

$ O(n^2) $

$ T(n) = T(n-1) + c_1n+c_2 $

Best

best case는 반씩 쪼개지는 경우이다.

$ T(n) = 2T({n \over 2}) + c_1n + c_2 $

마스터 정리 유형 3으로 시간 복잡도를 계산할 수 있다.

* 마스터 정리 유형 3 근사 버전:

$ T(n) = aT({n \over b}) + f(n) $

$ h(n) = n^{log_ba}$ 라고 할 때, ${f(n) \over h(n)} = \Theta (1)$ 이면 $T(n) = \Theta (h(n)logn) $

따라서,

$ T(n) = 2T({n \over 2}) + c_1n+c_2 , a = 2, b = 2, f(n) = c_1n+c_2 \\ h(n) = n^{log_2 2} $

$ {c_1n+c_2 \over n^{log_2 2}} = \Theta (1) $ 이므로 $T(n) = \Theta (n^{log_2 2} logn) = \Theta (nlogn)$

Average

$ T(n) = T(i-1) + T(n-i) + c_1n + c_2 $

$ T(n) = {1 \over n} \sum_{i=1}^n (T(i-1) + T(n-i)) + c_1n+ c_2 $

$ T(n) = {2 \over n} \sum_{k=0}^{n-1} T(k) + c_1n +c_2 $

$ T(n) = \Theta (nlogn) $

퀵 정렬이란?

pivot을 기준으로 pivot보다 작으면 왼쪽, 크면 오른쪽으로 정렬.

왼쪽, 오른쪽 배열에 대해서도 똑같이 진행

https://gmlwjd9405.github.io/2018/05/10/algorithm-quick-sort.html

c++ <algorithm>의 sort() 함수의 기반이 되는 알고리즘

c++ 코드

void quick_sort_func(int list[], int left, int right)
{
	if (left >= right) return;

	int pivot = left;
	int i = left + 1, j = right;

	while (i <= j)
	{
		while (i <= right && list[pivot] > list[i]) i++;
		while (j >= left && list[pivot] < list[j]) j--;

		if (i > j) {
			int temp = list[j];
			list[j] = list[pivot];
			list[pivot] = temp;
		}
		else {
			int temp = list[i];
			list[i] = list[j];
			list[j] = temp;
		}
	}


	quick_sort_func(list, left, j - 1);
	quick_sort_func(list, j + 1, right);

}

void quick_sort()
{
	int arr[10] = { 1,3,5,6,8,2,4,10,7,9 };

	quick_sort_func(arr, 0, SIZE - 1);

	print(5, arr);
}

python 코드

import random
from timeit import default_timer as timer

def partition(A, p, r):
    x = A[r]
    i = p
    for j in range(p, r):
        if A[j] <= x:
            A[i], A[j] = A[j], A[i]
            i+=1

    A[i], A[r] = A[r], A[i]
    return i


def qsort(A, p, r):
    if p<r:
        q = partition(A,p,r)
        qsort(A, p, q-1)
        qsort(A, q+1, r)

def quick_sort(A):
    qsort(A, 0, len(A)-1)

def test(A):
    for i in range(1, len(A)):
        if A[i-1] > A[i]:
            return False
    return True

x = random.sample(range(10000), 100)
print(x)
start = timer()
quick_sort(x)
print(timer() - start)
print(x)
print(test(x))

힙 정렬

시간 복잡도

$ O(nlogn) $

힙 정렬이란?

최대 힙 / 최소 힙을 사용해서 정렬하는 것.

아래 코드에서는 최대 힙을 사용하였다.

최대 힙은 맨 위가 트리에서 가장 큰 원소이므로 최대 힙이 만들어지면 맨 위의 원소를 제외하고 또다시 최대 힙을 만드는 것을 반복하며 정렬한다.

c++ 코드

/*
* parent = (i-1)/2
* left = i*2 + 1
* right = i*2 + 2
*/

void make_heap(int n, int arr[])
{
	for (int i = (n - 2) / 2; i > -1; i--)
	{
		int now = i;
		int left = i * 2 + 1, right = i * 2 + 2;
		int child = left;
		if (arr[left] < arr[right] && right < n) child = right;

		while (arr[child] > arr[now] && child < n)
		{
			int temp = arr[now];
			arr[now] = arr[child];
			arr[child] = temp;

			now = child;
			left = now * 2 + 1, right = now * 2 + 2;
			if (arr[left] < arr[right] && right < n) child = right;

		}
	}
}

void heap_sort(int n)
{
	int arr[10] = { 1,3,5,6,8,2,4,10,7,9 };

	make_heap(n, arr);
	
	for (int i = n - 1; i > -1; i--) {
		int temp = arr[i];
		arr[i] = arr[0];
		arr[0] = temp;
		make_heap(i, arr);
	}
	
	cout << "<heap_sort>" << endl;
	for (int i = 0; i < n; i++)
		cout << arr[i] << ' ';
}

기수 정렬 ( Radix sort )

시간 복잡도

$ O(n) $

n개의 원소를 k자릿수만큼 비교 => $ O(n) $을 $ k $번 반복 = $ O(n) $

기수 정렬이란?

입력이 모두 K 자릿수 이하의 자연수인 경우 자릿수 별로 안정성을 유지하며 정렬한다.

이때, 값이 같은 원소는 정렬 전후 순서가 바뀌지 않는다.

c++ 코드

int A[15] = { 123,2154,222,4,283,1560,1061,2150,3154,33,2234,345,677,8911,1 };

void rsort(int m, int n)
{
	queue<int> bucket[10];

	for (int i = 0; i < n; i++)
	{
		int index = A[i] / pow(10, m);
		index %= 10;
		bucket[index].push(A[i]);
	}

	int index = 0;

	for (int i = 0; i < 10; i++)
	{
		while (!bucket[i].empty())
		{
			A[index++] = bucket[i].front();
			bucket[i].pop();
		}
	}
}

void radix_sort()
{
	
	int k = 4;

	for (int i = 0; i < 4; i++)
		rsort(i, 15);

	cout << "<radix_sort>" << endl;
	for (int i = 0; i < 15; i++)
		cout << A[i] << ' ';
	cout << endl;
}

python 코드

import random
from timeit import default_timer as timer

def rsort(A, m):
    buckets = [[] for _ in range(10)]
    for v in A:
        index = v // (10 ** m)
        index %= 10
        buckets[index].append(v)

    res = []
    for bucket in buckets:
        res.extend(bucket)
    return res

def radix_sort(A, k):
    for i in range(0, k):
        A = rsort(A, i)
    return A

def test(A):
    for i in range(1, len(A)):
        if A[i-1] > A[i]:
            return False
    return True

x = random.sample(range(10000),100)
start = timer()
x = radix_sort(x, 4)
print(timer() - start)
print(x)
print(test(x))

계수 정렬 ( Counting Sort )

시간 복잡도

$ O(n) $

$ \Theta (k) + \Theta (n) + \Theta (k) + \Theta (n) $

$ k \le O(n) $ => $ O(n) $

계수 정렬이란?

입력 값이 모두 k 이하인 경우
$ k \le O(n) $ 인 경우에만 의미가 있음
같은 값이 몇 개인지 세어서 위치 결정

해당 수 앞에 몇 개가 존재하는지를 세어서 정렬

c++ 코드

void counting_sort()
{
	int n = 10;
	int arr[10] = { 1,3,5,6,8,2,4,10,7,9 };
	int count[11] = {};
	int sortarr[10] = {};

	for (int i = 0; i < n; i++) 
		count[arr[i]]++;

	for (int i = 1; i < 11; i++) 
		count[i] += count[i - 1];

	for (int i = 0; i < n; i++)
	{
		count[arr[i]]--;
		sortarr[count[arr[i]]] = arr[i];
	}

	cout << "<counting_sort>" << endl;
	for (int i = 0; i < n; i++)
		cout << sortarr[i] << ' ';
	cout << endl;
}

python 코드

import random
from timeit import default_timer as timer

def counting_sort(A, k):
    B = [0] * len(A)
    C = [0] * (k+1)
    for v in A:
        C[v] += 1
    for i in range(1, k+1):
        C[i] += C[i-1]
    for i in range(len(A)-1, -1, -1):
        v = A[i]
        C[v] -= 1
        B[C[v]] = v
    return B

def test(A):
    for i in range(1, len(A)):
        if A[i-1] > A[i]:
            return False
    return True

x = random.choices(range(50), k=100)
start = timer()
print(x)
x = counting_sort(x, 49)
print(timer() - start)
print(x)
print(test(x))

팀 정렬

팀 정렬이란?

2002년 Tim Peters 가 개발
파이썬 버전 2.3부터 표준 정렬 알고리즘으로 사용
데이터가 정렬된 정도에 따라 삽입 정렬과 병합 정렬 사이를 전환하는 적응형 알고리즘

시간복잡도 비교

전체 코드

#include <iostream>
#include <algorithm>
#include <queue>
#include <cmath>

using namespace std;

#define SIZE 10

void print(int option, int list[])
{
	switch (option) {
	case 1:
		cout << "<selection sort>" << "\n";
		break;
	case 2:
		cout << "<bubble sort>" << "\n";
		break;
	case 3:
		cout << "<insertion sort>" << "\n";
		break;
	case 4:
		cout << "<merge sort>" << "\n";
		break;
	case 5:
		cout << "<quick sort>" << "\n";
		break;
	}

	for (int i = 0; i < SIZE; i++)
		cout << list[i] << ' ';

	cout << "\n";
}

void selection_sort()
{
	int arr[10] = { 1,3,5,6,8,2,4,10,7,9 };

	int largest_index;

	for (int i = SIZE - 1; i > -1; i--)
	{
		largest_index = i;
		for (int j = i; j > -1; j--)
		{
			if (arr[largest_index] < arr[j])
				largest_index = j;
		}
		int temp = arr[largest_index];
		arr[largest_index] = arr[i];
		arr[i] = temp;
	}

	print(1, arr);
}

void bubble_sort()
{
	int arr[10] = { 1,3,5,6,8,2,4,10,7,9 };

	bool flag = false; // stop option

	for (int i = SIZE - 1; i > -1; i--)
	{
		for (int j = 0; j < i; j++)
		{
			if (arr[j] > arr[j + 1])
			{
				flag = true;
				int temp = arr[j];
				arr[j] = arr[j + 1];
				arr[j + 1] = temp;
			}
		}
		// stop option
		if (!flag) break;
	}

	print(2, arr);
}

void insertion_sort()
{
	int arr[10] = { 1,3,5,6,8,2,4,10,7,9 };

	for (int i = 1; i < SIZE; i++)
	{
		int loc = i - 1; //0
		int now = arr[i]; //3
		while (loc >= 0 && now < arr[loc])
		{
			arr[loc + 1] = arr[loc];
			loc--;
		}
		arr[loc + 1] = now;

	}

	print(3, arr);
}


////////////////////////////////////////////////////////////////
void merge(int list[], int left, int mid, int right)
{
	int merge_arr[SIZE];
	int i, j, k;
	i = left;
	j = mid + 1;
	k = left;

	while (i <= mid && j <= right)
	{
		if (list[i] < list[j])
			merge_arr[k++] = list[i++];
		else
			merge_arr[k++] = list[j++];
	}

	while (i <= mid)
		merge_arr[k++] = list[i++];

	while (j <= right)
		merge_arr[k++] = list[j++];

	for (int i = left; i <= right; i++)
		list[i] = merge_arr[i];
}

void merge_sort_func(int list[], int left, int right)
{
	int mid;

	if (left < right)
	{
		mid = (left + right) / 2;
		merge_sort_func(list, left, mid);
		merge_sort_func(list, mid + 1, right);
		merge(list, left, mid, right);
	}
}

void merge_sort()
{
	int arr[10] = { 1,3,5,6,8,2,4,10,7,9 };

	merge_sort_func(arr, 0, SIZE - 1);

	print(4, arr);
}

//////////////////////////////////////////////////////////////
void quick_sort_func(int list[], int left, int right)
{
	if (left >= right) return;

	int pivot = left;
	int i = left + 1, j = right;

	while (i <= j)
	{
		while (i <= right && list[pivot] > list[i]) i++;
		while (j >= left && list[pivot] < list[j]) j--;

		if (i > j) {
			int temp = list[j];
			list[j] = list[pivot];
			list[pivot] = temp;
		}
		else {
			int temp = list[i];
			list[i] = list[j];
			list[j] = temp;
		}
	}


	quick_sort_func(list, left, j - 1);
	quick_sort_func(list, j + 1, right);

}

void quick_sort()
{
	int arr[10] = { 1,3,5,6,8,2,4,10,7,9 };

	quick_sort_func(arr, 0, SIZE - 1);

	print(5, arr);
}
//////////////////////////////////////////////////////////////

/*
* parent = (i-1)/2
* left = i*2 + 1
* right = i*2 + 2
*/

void make_heap(int n, int arr[])
{
	for (int i = (n - 2) / 2; i > -1; i--)
	{
		int now = i;
		int left = i * 2 + 1, right = i * 2 + 2;
		int child = left;
		if (arr[left] < arr[right] && right < n) child = right;

		while (arr[child] > arr[now] && child < n)
		{
			int temp = arr[now];
			arr[now] = arr[child];
			arr[child] = temp;

			now = child;
			left = now * 2 + 1, right = now * 2 + 2;
			if (arr[left] < arr[right] && right < n) child = right;

		}
	}
}

void heap_sort(int n)
{
	int arr[10] = { 1,3,5,6,8,2,4,10,7,9 };

	make_heap(n, arr);
	
	for (int i = n - 1; i > -1; i--) {
		int temp = arr[i];
		arr[i] = arr[0];
		arr[0] = temp;
		make_heap(i, arr);
	}
	
	cout << "<heap_sort>" << endl;
	for (int i = 0; i < n; i++)
		cout << arr[i] << ' ';
	cout << endl;
}

//////////////////////////////////////////////////////////
int A[15] = { 123,2154,222,4,283,1560,1061,2150,3154,33,2234,345,677,8911,1 };

void rsort(int m, int n)
{
	queue<int> bucket[10];

	for (int i = 0; i < n; i++)
	{
		int index = A[i] / pow(10, m);
		index %= 10;
		bucket[index].push(A[i]);
	}

	int index = 0;

	for (int i = 0; i < 10; i++)
	{
		while (!bucket[i].empty())
		{
			A[index++] = bucket[i].front();
			bucket[i].pop();
		}
	}
}

void radix_sort()
{
	int k = 4;

	for (int i = 0; i < 4; i++)
		rsort(i, 15);

	cout << "<radix_sort>" << endl;
	for (int i = 0; i < 15; i++)
		cout << A[i] << ' ';
	cout << endl;
}

/////////////////////////////////////////////////////////////
void counting_sort()
{
	int n = 10;
	int arr[10] = { 1,3,5,6,8,2,4,10,7,9 };
	int count[11] = {};
	int sortarr[10] = {};

	for (int i = 0; i < n; i++) 
		count[arr[i]]++;

	for (int i = 1; i < 11; i++) 
		count[i] += count[i - 1];

	for (int i = 0; i < n; i++)
	{
		count[arr[i]]--;
		sortarr[count[arr[i]]] = arr[i];
	}

	cout << "<counting_sort>" << endl;
	for (int i = 0; i < n; i++)
		cout << sortarr[i] << ' ';
	cout << endl;
}

int main()
{
	selection_sort();
	bubble_sort();
	insertion_sort();
	merge_sort();
	quick_sort();
	heap_sort(10);
	radix_sort();
	counting_sort();

	return 0;
}

'22-1학기 > 알고리즘' 카테고리의 다른 글

5. 검색 트리 (0)	2022.04.06
4. 선택 알고리즘 (0)	2022.04.06
2. 점화식과 알고리즘 복잡도 분석 (0)	2022.03.14
1. 알고리즘 표기법 (0)	2022.03.14

2. 점화식과 알고리즘 복잡도 분석

2022. 3. 14. 15:53

점화식

점화식이란?

어떤 함수를 자신과 똑같은 함수를 이용해 나타내는 것

점화식의 점근적 분석 방법

반복 대치 : 반복해서 집어 넣는다.

추정 후 증명 : 추정 후 경계 조건 + 귀납적 가정과 전개를 이용해 증명한다. 이때 잘못 증명하지 않도록 주의해야 한다.
- 예시
  - 추정 : $T(n) \le 2T({n \over 2}) + n$ 의 점근적 복잡도는 $T(n) = O(nlogn)$이다. 즉, 충분히 큰 $n$에 대하여 $T(n) \le\ cnlogn$인 양의 상수 $c$가 존재한다.
  - 증명 :
    - 경계 조건 : $T(2) \le c2log2$인 $c$가 존재 <------필수!
    - 귀납적 가정과 전개 : $ n \over 2 $에 대해 추정이 맞으면, 즉 $T({n \over 2}) \le c{n \over 2}log{n \over 2}$ 이라면 $T(n) \le 2T({n \over 2}) + n \\ \le 2c({n \over 2})log{n \over 2} + n = cnlogn - cnlog2 + n \\ = cnlogn + n(1-clog2) \le cnlogn $ $( c \ge {1 \over log2} )$