驚!Rsync 文件同步竟遭遇失???手把手教你排查,告別同步煩惱!
問(wèn)題
rsync客戶(hù)端:拋錯異常退出
發(fā)布失敗截圖如下:
# rsync -avz --delete --exclude='.git' --exclude='.svn' rsync://<rsync_srv>:<rsync_port>/path/to/folder /tmp/rsync-test
receiving incremental file list
...
rsync: read error: Connection reset by peer (104)
rsync error: error in rsync protocol data stream (code 12) at io.c(759) [receiver=3.0.6]
rsync: connection unexpectedly closed (99 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) [generator=3.0.6]
rsync客戶(hù)端:進(jìn)程僵死
# rsync -avzP --delete--exclude='.git' --exclude='.svn' rsync://<rsync_srv>:<rsync_port>/path/to/folder /tmp/rsync-test
Password:
receiving incremental file list
./
<rsync-test-pkg>-SNAPSHOT.jar
^C
rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(551) [generator=3.0.9]
rsync error: received SIGUSR1 (code 19) at main.c(1298) [receiver=3.0.9]
執行第二,甚至第三次時(shí),才成功:
# rsync -avzP --delete--exclude='.git' --exclude='.svn' rsync://<rsync_srv>:<rsync_port>/path/to/folder /tmp/rsync-test
Password:
receiving incremental file list
./
<rsync-test-pkg>-SNAPSHOT.jar
60606801 100% 16.13MB/s0:00:03 (xfer#1, to-check=0/3)
sent 21035 bytesreceived 43167123 bytes5758421.07 bytes/sec
total size is 60607326speedup is 1.40
網(wǎng)上搜了下,發(fā)現已經(jīng)有人發(fā)現 rsync 類(lèi)似問(wèn)題了,引用其博客:
盡管您可能已經(jīng)在rsyncd服務(wù)的后端進(jìn)程中設置了--timeout選項(即在rsyncd.conf配置中),然而,在某些情況下(under the circumstances),這個(gè)選項可能根本不起作用,一些極不穩定的網(wǎng)絡(luò )導致大量TCP超時(shí)連接,進(jìn)而導致 rsync 進(jìn)程失敗,雖然斷裂的 TCP 連線(xiàn)已經(jīng)消失,但 rsync 應用進(jìn)程卻可能因為種種原因(如因等候I/O中斷而處于不可中斷狀態(tài)),而遺留在系統之中,并最終變成為僵尸進(jìn)程(zombie process)。
按照其手冊頁(yè)的解釋?zhuān)瑀sync 命令本身的 timeout 預設為0,也就是沒(méi)有逾時(shí)設置,因此運行中的 rsync 進(jìn)程將會(huì )永久地等待遠端的反應。在rsyncd服務(wù)后端進(jìn)程的 rsyncd.conf中設置timeout選項,同時(shí)在rsync客戶(hù)端命令行中使用timeout選項,實(shí)踐證明是可杜絕此問(wèn)題的。
--timeout
參數,再次執行后確實(shí)能夠異常退出了:
# rsync -avzP --timeout=60 --delete--exclude='.git' --exclude='.svn' rsync://<rsync_srv>:<rsync_port>/path/to/folder /tmp/rsync-testPassword: receiving incremental file list./<rsync-test-pkg>-SNAPSHOT.jar2011425333% 19.18MB/s0:00:02[receiver] io timeout after 60 seconds -- exitingrsync error: timeout in data send/receive (code 30) at io.c(140) [receiver=3.0.9]rsync: connection unexpectedly closed (115 bytes received so far) [generator]rsync error: error in rsync protocol data stream (code 12) at io.c(605) [generator=3.0.9]
rsync服務(wù)端:異常日志
2018/10/26 14:40:30 [4228] name lookup failed for <rsync-client>: Name or service not known
2018/10/26 14:40:30 [4228] connect from UNKNOWN (<rsync-client>)
2018/10/26 14:40:30 [4228] rsync on path/to/folder from UNKNOWN (<rsync-client>)
2018/10/26 14:40:30 [4228] building file list
2018/10/26 14:40:35 [4228] rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Connection timed out (110)
2018/10/26 14:40:35 [4228] rsync error: error in rsync protocol data stream (code 12) at io.c(1525) [sender=3.0.6]
rsync客戶(hù)端:strace排查
lstat("<rsync-test-pkg>-SNAPSHOT.jar", 0x7fff7d6e0000) = -1 ENOENT (No such file or directory)select(5, [4], [3], [3], {30, 0}) = 2 (in [4], out [3], left {29, 999998})select(5, [4], [], NULL, {30, 0}) = 1 (in [4], left {29, 999999})read(4, "\0\0\0\34", 8184)= 4write(3, "\26\0\0\7\1\10\0\3\0\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 26) = 26select(5, [4], [], NULL, {30, 0}./<rsync-test-pkg>-SNAPSHOT.jar) = 0 (Timeout)8.79MB/s0:00:02select(5, [4], [], NULL, {30, 0}) = 0 (Timeout)select(5, [4], [], NULL, {30, 0}[receiver] io timeout after 60 seconds -- exitingrsync error: timeout in data send/receive (code 30) at io.c(140) [receiver=3.0.9]) = 1 (in [4], left {28, 559609})--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=15649, si_status=30, si_utime=20, si_stime=5} ---wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 30}], WNOHANG, NULL) = 15649wait4(-1, 0x7fff7d6e11e4, WNOHANG, NULL) = -1 ECHILD (No child processes)rt_sigreturn()= 1read(4, "", 8184) = 0write(2, "rsync: connection unexpectedly c"..., 77rsync: connection unexpectedly closed (115 bytes received so far) [generator]) = 77write(2, "\n", 1) = 1rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER, 0x7f1e73cca670}, NULL, 8) = 0rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER, 0x7f1e73cca670}, NULL, 8) = 0getpid()= 15648kill(15649, SIGUSR1)= -1 ESRCH (No such process)write(2, "rsync error: error in rsync prot"..., 89rsync error: error in rsync protocol data stream (code 12) at io.c(605) [generator=3.0.9]) = 89write(2, "\n", 1) = 1exit_group(12)= ?+++ exited with 12 +++
問(wèn)題根源:網(wǎng)絡(luò )質(zhì)量
而我們在與發(fā)布系統在同一機房網(wǎng)絡(luò )環(huán)境下,抓到的包是這樣的:
TCP協(xié)議
# wget http://<sitename>/<rsync-test-pkg>-SNAPSHOT.jar
--2018-10-26 10:57:45--http://<sitename>/<rsync-test-pkg>-SNAPSHOT.jar
Resolving <sitename> (<sitename>)... 10.20.51.127
Connecting to <sitename> (<sitename>)|10.20.51.127|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 60606801 (58M) [application/java-archive]
Saving to: ‘<rsync-test-pkg>-SNAPSHOT.jar’
100%[==================================================================================================================================================================>] 60,606,8019.81MB/s in 5.2s
2018-10-26 10:57:50 (11.1 MB/s) - ‘<rsync-test-pkg>-SNAPSHOT.jar’ saved [60606801/60606801]
但對其流量抓包,發(fā)現仍然有丟包的現象: