【转】Gallery of Processor Cache Effects

发表于 2019-07-25 | 更新于 2019-07-26 | 分类于 OS

本文字数： 16k | 阅读时长 ≈ 39 分钟

转载自：http://igoro.com/archive/gallery-of-processor-cache-effects/
原作者：Igor Ostrovsky

这是少有的将缓存讲解地如此透彻的文章，全文转载如下

Most of my readers will understand that cache is a fast but small type of memory that stores recently accessed memory locations. This description is reasonably accurate, but the “boring” details of how processor caches work can help a lot when trying to understand program performance.

In this blog post, I will use code samples to illustrate various aspects of how caches work, and what is the impact on the performance of real-world programs.

The examples are in C#, but the language choice has little impact on the performance scores and the conclusions they lead to.

Example 1: Memory accesses and performance

How much faster do you expect Loop 2 to run, compared Loop 1?

阅读全文 »

NVM编程模型

发表于 2019-07-23 | 分类于 nvm

本文字数： 1.1k | 阅读时长 ≈ 3 分钟

几十年来，传统的储存模型几乎没有什么变化。如下图所示，操作系统负责与储存介质交互，对用户程序提供open/close，read/write等基本API。

除了这样直接访问，Windows和Linux均支持内存映射，这在旧的编程模型中可能并不常用，但它却是NVM编程模型的核心。

阅读全文 »

CA名词解释

发表于 2019-06-17 | 分类于基础知识

本文字数： 2.7k | 阅读时长 ≈ 7 分钟

缩写	全称	解释
U/RU	Rank Unit	1.75 inches，4.445cm
IDC	Internet Data Center
HPC	High Performance Computer
ICT	Information and Communication Technology	数据中心三大支柱之一
TCO	Total Cost of Ownership
μOps/uOp	micro-operations	一条指令可能由多个微操作完成
RS	Reservation station	Tomasulo中的扩展寄存器（寄存器重命名）
CDB	Common Data Bus	Tomasulo中广播所有结果的总线
BHT	Branch History Table	基于历史的分支预测
BTB	Branch Target Buffer	记录跳转地址
ROB	ReOrderd Buffer	支持分支预测错误的Tomasulo算法所需的数据结构，FIFO
VLIW	Very Long Instruction Word
EPIC	Explicitly Parallel Instruction Computing
POE	Plan Of Execution	EPIC在指令级别提供了该层抽象
MLP	Memory Level Parallelism
DIMM	Dual In-line Memory Module
SDRAM	Synchronous DRAMs
ATA	Advanced Technology Attachment	磁盘的一种，分为PATA(Parallel)和SATA(Serial)
SCSI	Small Computer System Interface	比ATA更先进，同样分为P和S
SAS	Serial Attached SCSI
FC	Fibre Channel
JBOD	Just a Bunch Of Disk	没有关联的一堆磁盘
ECC	Error Correcting Coding
DAS	Direct Access Storage
NAS	Network Attached Storage
SAN	Storage Area Network
WA	Write Amplification	SSD的性质
SLC	Single Level Cell	SSD的一种
ASI	Architecture Starting Image	架构启动镜像，Sampled Simulation
PCA	Principal Component Analysis	主成分分析法
SMP	Symmetric MultiProcessor	同步多处理器（也称CSM）
DSM	Distributed Shared Memory
MSI	Modified, Shared, Invalid	Snooping Protocol的一种，WB，Invalidation
SMT	Simultaneous MultiThreading	同步多线程
CMP	Chip MultiProcessor	片上多处理器
MVL	Maximum Vector Length
VLR	Vector Length Register	VLR <= MVL
VMR	Vector Mask Register
SPMD	Single Program Multiple Data
SM	Streaming MultiProcessors
TPC	Texture/Processor Clusters	N卡的概念，一个TPC包含几个SM
SP	Streaming Processor	一个SM有很多SP
flit	flow control unit	网络能够传输的最小单元
phit	physical unit	每个cycle在一个link时传输的数据量
MIN	Multistage Interconnection Network	Omega是blocking的！！！
PUE	Power Usage Effectiveness	类似有WUE(Water)/CUE(Carbon)
ATS	Automatic Transfer Switch
STS	Static Transfer Switch	切换到备用电源的静态开关
UPS	Uninterruptible Power Supply	备用电池组
PDU	Power Distributed Units	后面接机架
CRAC	Computer Room Air Conditioning	空调系统
COP	Coefficient Of Performance	约为1.0-1.5
TOR	Top Of Rank	ICT连接交换机的方式，还有End Of Row(更贵)
MDA	Main Distribution Area
HDA	Horizontal Distributed Area
EDA	Equipment Distributed Area
WSC	Warehouse-Scale Computers
SLA	Service Level Agreement	服务被提供的方式（服务提供者&用户）
MDC	Modular Data Center	类似Containerized Data Center
ACPI	Advanced Configuration and Power Interface	Intel提出的电力功率管理的标准
TDP	Thermal Design Power	热设计功耗，处理器负载最大时的功率
MTTF	Mean Time To Failure
MTTR	Mean Time To Repair
MTBF	Mean Time Between Failure
ACE	Architecturally Correct Execution	ACE bit指对程序执行正确性至关重要的bit
AVF	Architectural Vulnerability Factor

使用soot生成控制流图（Windows环境）

发表于 2019-04-26 | 分类于工具

本文字数： 722 | 阅读时长 ≈ 2 分钟

环境准备

OS: Windows 10
JDK: 1.8.0_191
soot: 3.0.1

步骤

下载Soot

Soot的版本要刚刚好（3.0.1）。如果版本过低不支持JDK1.8，过高会提示某些库找不到。

下载jar包最好下载***-jar-with-dependencies.jar，参考下载地址：链接。

手动编译java源文件

Soot既可以读取*.java，也可以读取编译好的*.class。我选择先手动将java源文件编译成class文件，因为这样可以准确地暴露出代码中的错误。手动编译使用命令：

阅读全文 »

Cyber RT vs ROS

发表于 2019-04-14 | 分类于 apollo

本文字数： 7.9k | 阅读时长 ≈ 20 分钟

在Apollo 3.5中，百度自研了运行时计算框架Cyber RT来代替ROS，官方FAQ中提到其性能、延迟以及吞吐量都要优于ROS。这篇博客记录了我依据Issue #7220来验证性能是否提高的过程。

我们测试的是发送和接收消息的延时。在Cyber RT和ROS中分别有自带的demo来完成这一功能，我们只需将这两个demo中发送的数据统一一下，并在接受时输出latency即可。

测试Cyber RT

Clone apollo最新代码

1	git clone https://github.com/ApolloAuto/apollo.git -b master

运行docker环境

1 2	./docker/scripts/dev_start.sh ./docker/scripts/dev_into.sh

apollo提供了docker环境，已经预装了各种依赖的环境，官方也建议在docker中运行。

build项目

1	bash apollo.sh build

build后的可执行文件在bazel_bin/目录下。

我们要测试的两个可执行文件即为talker和listener，它们在bazel_bin/cyber/examples/目录下，此时可以运行这两个文件测试功能是否正常。

阅读全文 »

linux命令行技巧

发表于 2019-04-06 | 分类于基础知识

本文字数： 779 | 阅读时长 ≈ 2 分钟

一些快捷键

移动光标

快捷键	功能
Ctrl-a	移动光标到行首
Ctrl-e	移动光标到行尾
Ctrl-f	右(前, front)移一个字符，效果同右方向键
Ctrl-b	左(后, back)移一个字符，效果同左方向键
Alt-f	前移一个单词
Alt-b	后移一个单词
Ctrl-l	移动光标到左上角（清空屏幕）

修改文本

快捷键	功能
Ctrl-d	删除光标位置的字符，效果同Delete键
Ctrl-t	光标位置的字符和前面的字符互换
Alt-t	光标位置的单词和前面的单词互换
Alt-l	从光标位置开始到单词尾的字母转换成小写
Alt-u	从光标位置开始到单词尾的字母转换成大写

剪切和粘贴

快捷键	功能
Ctrl-k	剪切从光标位置到行尾的文本
Ctrl-u	剪切从光标位置到行首的文本（相见恨晚）
Ctrl-y	粘贴文本到当前位置

阅读全文 »

python中的@

发表于 2019-04-04 | 分类于基础知识

本文字数： 2k | 阅读时长 ≈ 5 分钟

@在python中是函数的修饰符，为函数提供了包装的功能，为函数提供了更细粒度的扩展控制。详见官方文档。

本文仅从使用的角度来探究一下@的特性。

例1

def wrapper(fn):
    print("This is a wrapper!")

@wrapper
def func():
    print("func")
    
if __name__ == "__main__":
    func()

输出为：

This is a wrapper!
Traceback (most recent call last):
  File "***.py", line 9, in <module>
    func()
TypeError: 'NoneType' object is not callable

在执行line 9前这里并没有任何函数调用，但是wrapper已经被执行了。这是因为在解析到@wrapper的时候，等效的代码为：

阅读全文 »

Bazel Notes

发表于 2019-04-01 | 分类于基础知识

本文字数： 2.8k | 阅读时长 ≈ 7 分钟

Bazel is an open-source build and test tool similar to Make, Maven, and Gradle. It uses a human-readable, high-level build language. Bazel supports projects in multiple languages and builds outputs for multiple platforms. Bazel supports large codebases across multiple repositories, and large numbers of users.

Installing on Ubuntu

It’s easy and recommended to install bazel via the binary installer, which can be downloaded from Bazel’s Github releases page

Run the binary directly, and it should be done gracefully. Note that some libraries must be installed for Bazel to work. Run command:

1	sudo apt install pkg-config zip g++ zlib1g-dev unzip python

Official examples

Bazel can be used to build various projects, while we just focus on C++ in this blog. The codes can be found here.

The directory structure looks like this:

root-dir
|-------main
        |-------BUILD
        |-------hello-world.cc
        |-------[OTHER FILES]
        |
|-------WORKSPACE
|

The WORKSPACE file is necessary for Bazel to work, although it can be remained empty for now.

阅读全文 »