XNERV SURVEYS

Memory Barriers Are Like Source Control Operations（转载）

发表于 2012-07-10 分类于 OS

If you use source control, you’re on your way towards understanding memory ordering, an important consideration when writing lock-free code in C, C++ and other languages.

In my last post, I wrote about memory ordering at compile time, which forms one half of the memory ordering puzzle. This post is about the other half: memory ordering at runtime, on the processor itself. Like compiler reordering, processor reordering is invisible to a single-threaded program. It only becomes apparent when lock-free techniques are used – that is, when shared memory is manipulated without any mutual exclusion between threads. However, unlike compiler reordering, the effects of processor reordering are only visible in multicore and multiprocessor systems.

阅读全文 »

RSYNC的核心算法（转载）

发表于 2012-05-17 分类于 Linux

rsync是unix/linux下同步文件的一个高效算法，它能同步更新两处计算机的文件与目录，并适当利用查找文件中的不同块以减少数据传输。rsync中一项与其他大部分类似程序或协定中所未见的重要特性是镜像是只对有变更的部分进行传送。rsync可拷贝／显示目录属性，以及拷贝文件，并可选择性的压缩以及递归拷贝。rsync利用由Andrew Tridgell发明的算法。这里不介绍其使用方法，只介绍其核心算法。我们可以看到，Unix下的东西，一个命令，一个工具都有很多很精妙的东西，怎么学也学不完，这就是Unix的文化啊。

本来不想写这篇文章的，因为原先发现有很多中文blog都说了这个算法，但是看了一下，发现这些中文blog要么翻译国外文章翻译地非常烂，要么就是介绍这个算法介绍得很乱让人看不懂，还有错误，误人不浅，所以让我觉得有必要写篇rsync算法介绍的文章。（当然，我成文比较仓促，可能会有一些错误，请指正）

阅读全文 »

谷歌技术"三宝"之BigTable（转载）

发表于 2012-05-04 分类于分布式及存储

2006年的OSDI有两篇google的论文，分别是BigTable和Chubby。Chubby是一个分布式锁服务，基于Paxos算法；BigTable是一个用于管理结构化数据的分布式存储系统，构建在GFS、Chubby、SSTable等google技术之上。相当多的google应用使用了BigTable，比如Google Earth和Google Analytics，因此它和GFS、MapReduce并称为谷歌技术"三宝"。

与GFS和MapReduce的论文相比，我觉得BigTable的论文难懂一些。一方面是因为自己对数据库不太了解，另一方面又是因为对数据库的理解局限于关系型数据库。尝试用关系型数据模型去理解BigTable就容易"走火入魔"。在这里推荐一篇文章（需要翻墙）：Understanding HBase and BigTable，相信这篇文章对理解BigTable/HBase的数据模型有很大帮助。

阅读全文 »

谷歌技术"三宝"之MapReduce（转载）

发表于 2012-04-26 分类于分布式及存储

江湖传说永流传：谷歌技术有"三宝"，GFS、MapReduce和大表（BigTable）！

谷歌在03到06年间连续发表了三篇很有影响力的文章，分别是03年SOSP的GFS，04年OSDI的MapReduce，和06年OSDI的BigTable。SOSP和OSDI都是操作系统领域的顶级会议，在计算机学会推荐会议里属于A类。SOSP在单数年举办，而OSDI在双数年举办。

那么这篇博客就来介绍一下MapReduce。

阅读全文 »

谷歌技术"三宝"之谷歌文件系统（转载）

发表于 2012-04-21 分类于分布式及存储

题记：初学分布式文件系统，写篇博客加深点印象。GFS的特点是使用一堆廉价的商用计算机支撑大规模数据处理。
虽然"The Google File System " 是03年发表的老文章了，但现在仍被广泛讨论，其对后来的分布式文件系统设计具有指导意义。然而，作者在设计GFS时，是基于过去很多实验观察的，并提出了很多假设作为前提，这等于给出了一个GFS的应用场景。所以我们自己在设计分布式系统时，一定要注意自己的应用场景是否和GFS相似，不能盲从GFS。

GFS的主要假设如下：

GFS的服务器都是普通的商用计算机，并不那么可靠，集群出现结点故障是常态。因此必须时刻监控系统的结点状态，当结点失效时，必须能检测到，并恢复之。
系统存储适当数量的大文件。理想的负载是几百万个文件，文件一般都超过100MB，GB级别以上的文件是很常见的，必须进行有效管理。支持小文件，但不对其进行优化。
负载通常包含两种读：大型的流式读（顺序读），和小型的随机读。前者通常一次读数百KB以上，后者通常在随机位置读几个KB。
负载还包括很多连续的写操作，往文件追加数据（append）。文件很少会被修改，支持随机写操作，但不必进行优化。
系统必须实现良好定义的语义，用于多客户端并发写同一个文件。同步的开销必须保证最小。
高带宽比低延迟更重要，GFS的应用大多需要快速处理大量的数据，很少会严格要求单一操作的响应时间。

从这些假设基本可以看出GFS期望的应用场景应该是大文件，连续读，不修改，高并发。国内的淘宝文件系统（TFS）就不一样，专门为处理小文件进行了优化。

阅读全文 »

(Stack Overflow) SAVE TRANSACTION vs BEGIN TRANSACTION (SQL Server) how to nest transactions nicely

发表于 2012-03-15 分类于 DBMS

Question

I have a stored procedure that needs to set a save point so that it can, under certain circumstances, undo everything it did and return an error code to the caller, or accept/commit it and return success to the caller. But I need it to work whether the caller has already started a transaction or not. The doc is extremely confusing on this subject. Here is what I think will work, but I’m not certain of all the ramifications.

The thing is - this Stored Procedure (SP) is called by others. So I don’t know if they’ve started a transaction or not… Even if I require users to start a transaction to use my SP, I still have questions about the proper use of Save Points …

My SP will test if a transaction is in progress, and if not, start one with BEGIN TRANSACTION. If a transaction is already in progress, it will instead create a save point with SAVE TRANSACTION MySavePointName, and save the fact this is what I did.

阅读全文 »

多版本并发控制(MVCC)在分布式系统中的应用（转载）

发表于 2012-03-13 分类于分布式及存储

问题

最近项目中遇到了一个分布式系统的并发控制问题。该问题可以抽象为：某分布式系统由一个数据中心D和若干业务处理中心L1，L2 … Ln组成；D本质上是一个key-value存储，它对外提供基于HTTP协议的CRUD操作接口。L的业务逻辑可以抽象为下面3个步骤：

read: 根据keySet {k1, … kn}从D获取keyValueSet {k1:v1, … kn:vn}
do: 根据keyValueSet进行业务处理，得到需要更新的数据集keyValueSet’ {k1′:v1′, … km’:vm’} (注：读取的keySet和更新的keySet’可能不同)
update: 把keyValueSet’更新到D （注：D保证在一次调用更新多个key的原子性）

在没有事务支持的情况下，多个L进行并发处理可能会导致数据一致性问题。比如，考虑L1和L2的如下执行顺序：

L1从D读取key:123对应的值100
L2从D读取key:123对应的100
L1将key:123更新为100 + 1
L2将key:123更新为100 + 2

如果L1和L2串行执行，key:123对应的值将为103，但上面并发执行中L1的执行效果完全被L2所覆盖，实际key:123所对应的值变成了102。

阅读全文 »

How does a mutex work? What does it cost?（转载）

发表于 2011-12-16 分类于 OS

Concurrent programming requires synchronization. We can’t have more than one thread accessing data at the same time; otherwise, we end up with a data race. The most common solution is to wrap the critical data access in a mutex. Mutexes are, of course, not free. A mutex can have a significant impact on the cost of the code we are writing. When used correctly we’ll barely notice the overhead. When misused it can cause a program to run worse in threaded mode than it would have single threaded!

阅读全文 »

(Stack Overflow) Why are hard links to directories not allowed in UNIX/Linux?

发表于 2011-10-11 分类于 Linux

Question

I read in text books that Unix/Linux doesn’t allow hard links to directories but does allow soft links. Is it because, when we have cycles and if we create hard links, and after some time we delete the original file, it will point to some garbage value?

If cycles were the sole reason behind not allowing hard links, then why are soft links to directories allowed?

阅读全文 »

(Stack Overflow) Execute vs Read bit. How do directory permissions in Linux work?

发表于 2011-09-22 分类于 Linux

Question

In my CMS, I noticed that directories need the executable bit (+x) set for the user to open them. Why is the execute permission required to read a directory, and how do directory permissions in Linux work?

Answer by Chris Down

When applying permissions to directories on Linux, the permission bits have different meanings than on regular files.

The write bit allows the affected user to create, rename, or delete files within the directory, and modify the directory’s attributes
The read bit allows the affected user to list the files within the directory
The execute bit allows the affected user to enter the directory, and access files and directories inside
The sticky bit states that files and directories within that directory may only be deleted or renamed by their owner (or root)

Article link: http://xnerv.wang/execute-vs-read-bit-how-do-directory-permissions-in-linux-work/
Reprinted from: (StackOverflow) Execute vs Read bit. How do directory permissions in Linux work?