http://dev.mysql.com/tech-resources/articles/primer_on_falcon_tablespaces.html
A Primer on Falcon Tablespaces
As any experienced MySQL user knows, the MySQL server uses different storage engines that live underneath an upper layer of services, utilities, and SQL interfaces, to manage the actual I/O and data storage tasks. Each storage engine has its own set of characteristics, some of which are shared with other storage engines, but some of which are unique to a particular engine. One area that a storage engine may differ from other engines is in the way it manages data on disk. Some engines (like MyISAM, Archive, and InnoDB using the file-per-table option) map each table and index to separate and distinct files that can be targeted to most any accessible file system/directory. Other engines (like InnoDB and SolidDB) use a quasi-concept of tablespaces, with different files being specified in the MySQL configuration file and used for all tables and indexes in a MySQL instance.
The Falcon transactional storage engine uses the concept of user-defined tablespaces that are more in line with what one would find in Oracle (tablespaces) or SQL Server (filegroups). Let’s take a quick look at how tablespaces are used within Falcon and what you can and can’t do with them.
Getting Started with Tablespaces
Unlike some storage engines that use a fixed page size unless you do some work at the compiler level (MyISAM is 2K; InnoDB is 16K), Falcon lets you assign a custom page size that will be used by all subsequently created tablespaces in a MySQL instance. Currently, you can specify a page size anywhere from 2-32K. The only catch? You only get to specify it once in the MySQL configuration file during the initial MySQL instance startup/creation. Note that if you want to change the current Falcon page size, you can backup any falcon objects, drop all storage containers and log files, reinitialize the MySQL instance with a new page size, and restore all your objects back to your databases.
With respect to internals, Falcon creates an initial datafile for a
tablespace at four times the configured page size and auto-increments
it in that amount each time more space is needed. You can change the
amount of the initial allocation for all new tablespaces by specifying
a number (representing MB) for the falcon_initial_allocation config parameter.
The first four pages of each tablespace datafile are not used for user
data, but for management purposes. The first page is the datafile
header, which identifies the file as a Falcon tablespace and specifies
useful things like the page size, the on-disk structure version, the
creation time, and more. The second page is a free space page, which
contains bits that indicate whether the pages that follow are free or
in use. The third page is the top of a lookup structure used to locate
the lookup structures for table data, sequences, etc., and the fourth
page is the top of a lookup structure that locates the top page of
indexes.
Just like Oracle and SQL Server, Falcon creates a system tablespace for its own use (falcon_master.fts), a temporary tablespace for workarea handling (falcon_temporary.fts), and a default tablespace for you to use for your own tables, indexes, and BLOB data (falcon_user.fts). These tablespaces are created in the default MySQL data directory. Note that a tablespace can be used for multiple databases within the MySQL instance, so Falcon tablespaces are more like Oracle tablespaces (with schemas) than SQL Server filegroups in this respect.
The syntax used to create a Falcon tablespace is very simple (below example done on the Windows platform):
mysql> create tablespace gimdata
-> add datafile 'c:/dev/falcon/gimdata.fts'
-> engine=falcon;
Query OK, 0 rows affected (0.03 sec)
To create a tablespace, you provide a unique name for the tablespace, a single datafile name (and path if desired), and use the ENGINE=FALCON
clause (done because the NDB/Cluster storage engine also uses
tablespaces). Notice that you do not provide a size for the initial
size of the tablespace’s datafile. As previously stated, Falcon
initially creates a datafile the size equal to 4X the Falcon page size,
so for the default Falcon page size (4K), this would equate to a 16K
initial datafile size. Also note that you only provide one datafile
name. Currently, each Falcon tablespace only supports one datafile with
the limit per tablespace being 110 Terabytes. Expect tablespaces to
support multiple datafiles in future Falcon versions. The final thing
to note about tablespace datafiles is that they extend automatically to
accommodate incoming data and they have no DDL defined/preassigned
limit with respect to how large they can grow.
One good thing about Falcon tablespaces is that they are dynamic in nature; you can create and drop as many of them as you would like without impacting the running of the MySQL instance. This differs from InnoDB, which requires that you manually add additional datafiles to the MySQL configuration file, stop and restart the server to add more storage space (unless you are using the InnoDB file-per-table option).
Assigning Objects to Tablespaces
Creating tables and indexes in Falcon tablespaces is easily done via the CREATE DDL syntax:
mysql> create table t1 (c1 int) engine=falcon tablespace gimdata;
Query OK, 0 rows affected (0.09 sec)
Once created, you populate and use the table as you would any other table and really give no second thoughts to the tablespace storage container itself. If, however, you find that you want to relocate existing Falcon objects to a new/different tablespace, you simply use the ALTER TABLE command and specify the new target tablespace:
mysql> alter table t1 tablespace gimdata2;
Query OK, 0 rows affected (0.14 sec)
One important thing to understand in Falcon is that you cannot separate Falcon tables and indexes into separate tablespaces. Most database systems separate indexes from data because their index access strategy requires reading the index and data alternately, which leads to a bouncing of the disk head between the data and index. The idea is that putting them in separate files, preferably on different devices, reduces contention. Falcon reads the index first, then retrieves data in storage order, which eliminates that problem so it’s better for Falcon to keep its tables and indexes stored together.
Tablespaces that house Falcon objects cannot be dropped; only empty tablespaces can be removed with the DROP TABLESPACE < tablespace name > ENGINE=FALCON
command. Note that the actual Falcon datafiles are also removed from
the server when a DROP TABLESPACE command is issued, which is nice from
a cleanup standpoint.
Monitoring Falcon Tablespaces
If you want to know what objects are assigned to what tablespaces, currently you simply query the INFORMATION_SCHEMA object FALCON_TABLES:
mysql> select * from information_schema.falcon_tables;
+-------------+--------------------+-------------+
| SCHEMA_NAME | TABLE_NAME | TABLESPACE |
+-------------+--------------------+-------------+
| GIMF | BROKER | FALCON_USER |
| GIMF | CLIENT | FALCON_USER |
| GIMF | CLIENT_TRANSACTION | FALCON_USER |
| GIMF | INVESTMENT | FALCON_USER |
| GIMF | INVESTMENT_TYPE | FALCON_USER |
| GIMF | OFFICE_LOCATION | FALCON_USER |
| GIMF | FTEST | GIMFDATA |
| GIM | T1 | GIMDATA2 |
+-------------+--------------------+-------------+
In the current Falcon Alpha, there is no way via the INFORMATION_SCHEMA
to see storage diagnostic info like total, used, and free space for
tablespaces. Before the GA release of Falcon, there will either be a
tablespace object available for this, or the space for Falcon tables
and indexes will be reported in the INFORMATION_SCHEMA.TABLES object, which can then be joined to the FALCON_TABLES object to produce such reports.
One worry that DBA’s have always had is space fragmentation and the inevitable resulting reorganizations that must be done to reclaim wasted space and more efficiently organize individual object storage within the tablespace. Falcon completely reuses any empty pages that have resulted from deleted data, so the need to perform tablespace reorganizations is very unlikely from a “plug the holes” standpoint. Performing a reorg because of object fragmentation in the tablespace is another matter and might be needed depending on the shape the object is in and the activity it receives. No diagnostics are available yet to determine such a thing, but they are on the roadmap.
Conclusion
MySQL DBA’s wishing more freedom in storage management will be pleased with what they find with Falcon tablespaces. Although not perfect, Falcon tablespaces provide an easy-to-use way for dynamically allocating storage and assigning objects where desired. More feature enhancements are on the horizon for Falcon tablespaces, so stay tuned for coming additions.
For more information on Falcon, you can download the updated Falcon whitepaper, check out other articles on Falcon that are currently on the dev zone, review the Falcon documentation, and visit the Falcon online forum.
Of course, you can also download the latest MySQL 6.0 Alpha (of which Falcon is only the first new feature) to kick the tires of Falcon tablespaces yourself. Give them a try today and shoot me what you both like and hate at rschumacher@mysql.com. The latest Alpha build can be found at: http://dev.mysql.com/downloads/mysql/6.0.html.
Thanks again for your support of MySQL!
Falcon storage engine in depth
MySQL/Sun released a new storage engine "Falcon" in January, 2007. Falcon is a high performance transactional (fully compliant with ACID) storage engine, which is beta at this time (June 2008). In this article, I describe Falcon's features and its architecture in detail.
Falcon overview
Falcon was originally created by Jim Starkey who was inventor of MVCC, Blob's, Interbase, major Firebird contributor and his wife Ann Harrison.
The most basic feature covered by Falcon is transactions. Falcon supports commit/rollback, row-level concurrency through MVCC, non-blocking read and crash recovery. These are fundamental features for most enterprise applications.
In MySQL 5.0 and 5.1, the InnoDB and NDB (MySQL Cluster) engines support transactions. NDB is used for applications that want to easily achieve both high availability (99.999%: 5minutes downtime/year) and high performance (over 10 thousands TPS) easily. InnoDB is mainly adopted by single MySQL server or master/slave replication environment, like MyISAM. Falcon primarily targets the same use cases as InnoDB. That's why Falcon is expected to be an alternative to InnoDB. By using Heartbeat (or other clustering software) and DRBD/SAN, InnoDB or Falcon can achieve 99.99% level availability. If you want to achieve 99.999%, we usually recommend NDB (Note that NDB is not a drop-in replacement for InnoDB/Falcon. Use cases are different).
Falcon's internal architecture is quite different from InnoDB and some features are covered only with InnoDB. So please do not regard Falcon as a drop-in replacement for InnoDB (however, many of features do overlap). For example, the following features are covered with InnoDB, but not covered with Falcon.
Falcon doesn't use Clustered Index
Falcon doesn't use clustered index architecture. Clustered index has both pros and cons against general B+Tree index, mainly from a performance perspective.
[부연설명]
- .클러스터드 인덱스
- 데이터를 특정 기준에 의해 물리적으로 정렬하는 것입니다.
그리고 이것은 위의 사람들 줄서기처럼 단 한개만 존재할 수 있습니다.
전화번호부를 생각하셔도 됩니다.
인명 전화번호부는 이름순으로 정렬되어 기록되어 있습니다.
인명전화부를 상호순으로 동시에 정렬 할 수 없기에 상호 전화부가 별도로 만들어진 것이기 때문입니다.
그리고 위 사람들 줄서기 처럼 범위를 주어서 검색 할 때 탁월한 기능을 발휘합니다.
한 테이블에 단 한개만 존재할 수 있습니다.
범위(Range)를 주어 검색 할 때 탁원한 기능을 발휘한다
.넌클러스터드 인덱스
실제 데이터를 가지고 있지는 않지만
어디가면 그 정보가 있다는 정보만을 갖고 있는 것이 넌클러스터드 인덱스입니다.
인덱스 페이지에 인덱스 키값과 어느 위치에 이 데이터가 실제로 존재하는지의
정보를 갖고 있게 됩니다.
한 테이블에 여러개 존재할 수 있습니다.
특정한 값으로 찾아갈때(Seek) 탁월한 성능을 발휘합니다.
Falcon does not support Read Uncommitted isolation level
Read Uncommitted, sometimes referred to as "dirty" reads, is not needed for most situations.
[부연설명]
- READ UNCOMMITTED
- 다른 트랜잭션에 의해 수정되었지만 아직 커밋되지 않은 행을 문이 읽을 수 있도록 지정합니다.
- READ UNCOMMITTED 수준에서 실행 중인 트랜잭션은 현재 트랜잭션에서 읽은 데이터를 다른 트랜잭션에서 수정하지 못하도록 하는 공유 잠금을 실행하지 않습니다. 또한 READ UNCOMMITTED 트랜잭션은 다른 트랜잭션에 의해 수정되었지만 아직 커밋되지 않은 행을 현재 트랜잭션이 읽지 못하도록 하는 배타적 잠금에 의해 차단되지도 않습니다. 이 옵션을 설정하면 커밋되지 않은 수정 내용을 읽을 수 있습니다. 이런 경우를 커밋되지 않은 읽기라고 합니다. 트랜잭션이 종료되기 전에 데이터의 값을 변경하고 데이터 집합에 행을 표시하거나 표시하지 않을 수 있습니다. 이 옵션은 트랜잭션에서 모든 SELECT 문의 모든 테이블에 NOLOCK을 설정하는 것과 같습니다. 또한 격리 수준 중에서 제한이 가장 적습니다.
Falcon does not support Statement based replication
Starting from MySQL 5.1, both statement based and row based replication are supported for binary logging. Falcon supports only row based replication. This is to guarantee consistent result between binary log and data files.
Online backup will be supported by Falcon GA, but server-enforced referential integrity will not be exposed until MySQL 6.1. Falcon GA is planned to be released in MySQL 6.0.
[부연설명]
row
based replication 를 지원함에 따라 자료의 일관성 유지 기능에 탁월한 성능을 보여준다는 얘기
Using Falcon
You can download MySQL binaries including Falcon from MySQL AB website. You can download Falcon source code, too.
http://dev.mysql.com/downloads/mysql/6.0.html
The procedure of installation is the same as previous versions (5.1/5.0/4.1 or less). You can install by rpm , extracting tar.gz, or building from source.
To check that Falcon is installed, run the "SHOW ENGINES" statement. If there is an item of "Engine: Falcon" and "Support: YES", Falcon can be used.
mysql> SHOW ENGINES \G
(snip)
*************************** 3. row ***************************
Engine: MyISAM
Support: YES
Comment: Default engine as of MySQL 3.23 with great performance
Transactions: NO
XA: NO
Savepoints: NO
*************************** 4. row ***************************
Engine: Falcon
Support: DEFAULT
Comment: Falcon storage engine
Transactions: YES
XA: NO
Savepoints: YES
*************************** 5. row ***************************
Engine: MRG_MYISAM
Support: YES
Comment: Collection of identical MyISAM tables
Transactions: NO
XA: NO
Savepoints: NO
*************************** 6. row ***************************
Engine: InnoDB
Support: YES
Comment: Supports transactions, row-level locking, and foreign keys
Transactions: YES
XA: YES
Savepoints: YES
(snip)
To create a Falcon table, just add "ENGINE=Falcon" in CREATE TABLE or ALTER TABLE statement. This is the same with creating other types of tables.
mysql> CREATE TABLE tbl1 (id INTEGER AUTO_INCREMENT PRIMARY KEY, value VARCHAR(30)) ENGINE=Falcon;
mysql> SHOW CREATE TABLE tbl1\G
*************************** 1. row ***************************
Table: tbl1
Create Table: CREATE TABLE 'tbl1' (
'id' int(11) NOT NULL AUTO_INCREMENT,
'value' varchar(30) DEFAULT NULL,
PRIMARY KEY ('id')
) ENGINE=Falcon DEFAULT CHARSET=latin1
When the ENGINE clause is omitted, it automatically becomes a Falcon table if "default-storage-engine=Falcon" is specified in the database configuration file my.cnf.
Transactions are also supported:
mysql> START TRANSACTION;
mysql> INSERT INTO tbl1 VALUES(null,'abc');
mysql> ROLLBACK;
mysql> SELECT * FROM tbl1;
Empty set (0.00 sec)
The result set is empty because the INSERT is rollbacked.
As you see, because Falcon is just one of the storage engines in MySQL, the SQL syntax is not different from other storage engines. You are likely much more interested in Falcon's internal architecture than in SQL syntax, so I will now delve into the following technical topics:
- Process/Thread and Memory model
- File structure
- Index and record retrieval
- Data type and space management
- Transaction
Process/Thread and Memory model
MySQL is a single-process (mysqld), multi-threaded RDBMS. Per each connection, mysqld allocates one dedicated thread. There are also some utility threads such as accepting connections (listener thread), and one accepting signals. Some storage engines have their own threads. Falcon has its own threads that are used for tasks such as running checkpoints and doing garbage collections. These threads run in the background. InnoDB also has similar threads.
To improve thread concurrency, Falcon does not use mutexes directly, but uses mutex-wrapper objects called "SyncObjects". SyncObjects enables read/write locking. So many threads can read shareable structures at the same time. This is a very important feature for multi-cpu core environment.
Falcon's memory structure is shown as Fig.1. Falcon has four major dedicated memory areas called Record Cache, Page Cache, Log Cache and System Cache.
Fig1. Falcon Memory architecture
The Record cache exists to cache individual records with almost no overhead. Further, the Record cache does not hold TEXT/BLOB (except TINYTEXT/TINYBLOB) column values. So, the Record cache enables the caching of many more records than a Page-type cache. Also, with the Record cache, selecting records is faster. The record cache holds more *useful* record data than a page cache, such as newer and older record versions needed for MVCC.
The Page cache is similar to the InnoDB buffer pool where information from the data files are cached. Falcon's page size is configurable (2KB,4KB,8KB,16KB,32KB is currently supported) though the InnoDB page size is fixed to 16KB. Page size tuning sometimes improves performance greatly.
The Log Buffer is used to keep uncommitted transactional records, which is similar to InnoDB's Log Buffer.
The System cache is used to cache metadata objects.
File structure
Falcon's storage management consists of two kinds of files: tablespace files and log files.
A tablespace file contains all record data, indexes, database metadata and other information. This is similar to a InnoDB data file. Falcon also supports "CREATE TABLESPACE" syntax. Please refer to Robin Schumacher's article for details.
A log file in Falcon is called the Serial log file, and it contains transactional (redo) information. All committed records are synchronously written to the Serial log file. This is quite similar to InnoDB. However, the size of Serial log files is not constant while the size is fixed in the InnoDB log file.
To guarantee durability, RDBMS needs to synchronously write to a transaction log file when transaction commits. This often causes performance problems. Falcon reduces the number of disk flushes by its "Group Commit" feature. When two or more transactions commit at the same time, Falcon sums up them and synchronously writes at one time. Falcon automatically recovers from a crash by using a Falcon tablespace file and a Serial log file.
Many RDBMS support "Checkpoint" technology, which writes both committed and uncommitted data into both data files and redo log files. Falcon writes uncommitted data only to its Serial log file. After committing transactions, committed records are transferred to the data page in the Page Cache by the gopher thread. If the transaction is rolled back, uncommitted records are not transferred, just truncated. So the cost of rollback is very cheap for Falcon.
Fig2. How Falcon Files are read/written
Index and record retrieval
The Falcon index structure is completely different from InnoDB. InnoDB uses a clustered index architecture. Clustered index contains the whole column values referenced by a primary key. When selecting records from secondary index, InnoDB has to lookup a primary key by a secondary index in the first step, then lookup the whole record by its clustered index (primary key) in the second step. By using a Clustered Index, primary key search is very fast. But secondary key search is slower than a general B-Tree index.
Falcon doesn't use clustered index. A Falcon index is an improved B-Tree index, and contains an index key value in leaf, pointing to an internal unique record number (Fig.3).
Fig3. Falcon Index Structure
A Record number is an internal unique identifier to the target record. Falcon looks up records from the Record Cache by record number (Fig.4). If the record is not cached, Falcon looks up data pages in tablespace files.
Fig4. How Falcon looks up records
Some DDL statements such as ALTER TABLE, CREATE INDEX are also faster. MySQL AB is now working on performance improvements for Falcon, so some benchmarks will be shown in the near future.
Index search by storage order
When selecting records by using an index, Falcon first lists up record numbers of all targeted records, storing them into internal bitmap, then scans the index by record number order. Since record number order is equivalent to storage order, it can reduce disk seek time. This is especially effective for range scan such as BETWEEN, >=, <= .
Index compression
Falcon indexes also have prefix and suffix compression to reduce I/O load. Suffix compression for numbers truncates trailing zeros. Suffix compression for strings truncates trailing spaces.
Prefix compression is an interesting feature of Falcon. The first index in each page of index entries has no prefix compression. The next index starts with one or two bytes indicating the number of leading bytes that are exactly the same as the leading bytes of the previous index (Fig5).
Multi-column index compression logic is more complex, but reduces index sizes similarly.
Fig5. Falcon's index compression
Index accelerator
Falcon implements an "Index accelerator" to speed up inserting/updating/deleting many records within transactions.
To reduce the cost of adding lots of index entries, Falcon implements in-memory index accelerators that buffer index changes in memory until a commit. Because all the index changes made by a transaction are applied in sorted order, Falcon avoids making scattered changes throughout the index. Adding a single large block to the serial log is also faster than adding numerous small blocks. Index accelerators improve the performance of bulk loads and mass updates or inserts on existing tables.
Data types and space management
Falcon supports all MySQL data types including GIS type.
Falcon handles all data types as variable length. For example, INTEGER is always 4 bytes for MyISAM. But for Falcon, it depends on the actual value. If value is 100, Falcon consumes only 1 byte for actual value (plus 1 byte to handle data type and 1 byte to handle length, these are common to handle to variable length columns) even though the type may be BIGINT.
TEXT/BLOB Optimization
Falcon can also handle TEXT/BLOB types efficiently. TEXT/BLOB values are stored in a different area (page) from other type values. This can avoid fragmentation. In general, BLOB values tend to be much larger than other types. Frequent update/delete of large columns often causes fragmentation of data files. Since Falcon manages TEXT/BLOB values at different pages from other values, such fragmentations don't happen.
Another strong point is that Falcon doesn't retrieve TEXT/BLOB values unless it is needed. Let me show a simple example at Fig.6. The diary table has two BIGINT, DATETIME, VARCHAR and TEXT columns. The diary_text column is much larger (over thousands bytes) than other columns. Suppose most of queries retrieve only user_id, date_added and title, which don't need diary_text. In such cases, Falcon can use memory much more efficiently because Record Cache doesn't have TEXT/BLOB column values. TEXT/BLOB column values are stored in separated data pages. So each record size cached in Record Cache is small, that's why Record Cache can cache many records regardless of BLOB/TEXT values.
Fig6. Falcon TEXT/BLOB optimization
InnoDB is different from Falcon as shown at Fig.7. All columns are cached into the InnoDB Buffer Pool even though diary_text column is not needed. So in this case, Falcon Record Cache can contain much more records in cache than InnoDB Buffer Pool, which can reduce frequent page-in/out activity significantly. Frequent page in/out often causes serious performance impact. This is one of the biggest advantages of Falcon.
Fig7. InnoDB BLOB/TEXT
Moreover, records are not physically reallocated unless really needed. Reallocating records result in a lot of disk I/Os and Falcon takes care of reducing such I/Os.
AUTO_INCREMENT
AUTO_INCREMENT(sequence) column behavior is different between Falcon and InnoDB. After rebooting mysqld, the next AUTO_INCREMENT value is the following:
- InnoDB: SELECT MAX(auto_increment_column) + 1
- Falcon: Maximum value +1 that has been inserted up to now (same as MyISAM)
This behavior can be demonstrated in the following example:
mysql> CREATE TABLE tbl1(id INT AUTO_INCREMENT PRIMARY KEY);
mysql> INSERT INTO tbl1(id) VALUES(null);
mysql> INSERT INTO tbl1(id) VALUES(null);
mysql> INSERT INTO tbl1(id) VALUES(null);
mysql> INSERT INTO tbl1(id) VALUES(100);
mysql> DELETE FROM tbl1 WHERE id=100;
mysql> SELECT id FROM tbl1;
| id |
| 1 |
| 2 |
| 3 |
Then reboot mysqld, running the following query:
mysql> INSERT INTO tbl1(id) VALUES(null);
The result is different between Falcon and InnoDB.
Falcon, MyISAM:
mysql> SELECT id FROM tbl1;
| id |
| 1 |
| 2 |
| 3 |
| 101 |
InnoDB:
mysql> SELECT id FROM tbl1;
| id |
| 1 |
| 2 |
| 3 |
| 4 |
For almost all cases, Falcon's behavior will be widely accepted.
Transaction Control (MVCC and exclusive control)
As said before, Falcon supports transactions. Not only COMMIT/ROLLBACK but also the SAVEPOINT statement is supported. When a SQL statement error occurred during transaction, only the executing statement since the last SAVEPOINT is rolled back. The transaction is still running. This is the same as InnoDB.
Let's dig down a little more about other transaction functionalities.
Falcon supports Multi Version Concurrency Control(MVCC)
MVCC enables the selection of committed rows without locking. So a select doesn't conflict with an update or other DML. This is extremely important functionality from a performance perspective.
Locking reads (SELECT FOR UPDATE) is also supported.
Transaction isolation level
Read Committed and Repeatable Read are supported. Serializable will be supported after GA. Read Uncommitted is not planned to be supported because it is not needed for most cases.
The ANSI/SQL specification specifies that Repeatable Read causes phantom read, but Falcon's Repeatable Read can avoid it. This is the same as InnoDB.
Falcon avoids Lock Escalation
When the number of records locked increases, some RDBMS escalates the lock level from row level to page or table level to save row management overhead. This is called "Lock Escalation". Lock Escalation is a problem because it decreases concurrent performance significantly. Falcon never causes lock escalation. InnoDB also never causes lock escalation, either.
Deadlock detection
Falcon automatically detects deadlocks. InnoDB also supports deadlock detection.
You might think that Falcon's transaction features are almost the same as InnoDB, but there are some differences.
Automatic detection of Lost Update (optional)
When multiple transactions update the same record at the same time, the last transaction updates all previous transactions' result. This is called "Lost Update". See the following example.
mysql> CREATE TABLE tbl2 (id INTEGER AUTO_INCREMENT PRIMARY KEY, value INTEGER) ENGINE=Falcon;
mysql> INSERT INTO tbl2 VALUES(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10);
| Transaction 1 | Transaction 2 | |
| 1) | START TRANSACTION; | START TRANSACTION; |
| 2) | SELECT value FROM tbl2 WHERE id=1; # returns 1 | |
| 3) | UPDATE tbl2 SET value=100 WHERE id=1; | |
| 4) | SELECT value FROM tbl2 WHERE id=1; # returns 1 | |
| 5) | UPDATE tbl2 SET value=10 WHERE id=1; (Innodb and Falcon running with "Innodb compatible" transactions make transaction 2 wait for transaction 1 to commit. Falcon's natural Repeatable Read Isolation mode will return an update conflict at 6). | |
| 6) | COMMIT; | |
| 7) | COMMIT; |
The final value is 10 if InnoDB is being used. This is because transaction 1 is implicitly updated by transaction 2. This is a typical situation of "Lost Update".
Falcon's natural Repeatable Read isolation does not allow Lost Update situation. Falcon's natural Repeatable Read returns error at 6) as below. The final value is 100. If multiple transactions update the same record at the same time, Falcon commits the first record, then does a rollback on all conflicting transactions at the time the first transaction is committed.
mysql> UPDATE tbl2 SET value=value+10 WHERE id=1; # 6)
ERROR 1020 (HY000): Record has changed since last read in table 'tbl2'
Falcon's natural Repeatable Read checks the row version to check if the record is changed by other transactions after starting the transaction. If the row version is updated, the latter transaction must be rolled back when the error is received. This guarantees former-win model. This is good, because Lost Update is not acceptable for many applications. SELECT FOR UPDATE is a well known workaround. SELECT FOR UPDATE blocks transaction 2 at 4) until transaction 1 is committed.
Falcon also supports "InnoDB Compatibility Mode" currently (beta release) in order to improve the throughput of concurrent transactions. For beta, Falcon will only provide repeatable read transaction in "InnoDB compatibility mode". For GA, there will be the ability to choose between the two.
If an application transaction is divided into multiple transactions (e.g Web application: one transaction to select employee info, another transaction to update it ), the Falcon former-win model doesn't work. In this case, adding a version control column is the best for many cases. This is also a well-known solution.
Avoiding next-key locking
InnoDB locks not only updated records, but also locks next to records. This is called "next-key locking". Next-key locking is needed to guarantee consistency of statement-based binary logging in InnoDB. Falcon locks only updated records. So lock extent is decreased. But please note that Falcon doesn't support statement based binary logging, only row based binary logging. In MySQL 5.1, if row-based binary logging is enabled and the READ COMMITTED isolation level is used, there is no InnoDB gap locking except in foreign key constraint checking.
Conclusion
In this article, I explained the Falcon architecture, mainly comparing it with InnoDB. The falcon team is now mainly working on bug fixing and performance improvements for the next releases of the Falcon. Performance benchmarking results will be posted in the near future.
Understanding the Falcon Transaction Storage Engine – Part 3
In Part 1 of this article series, we looked at how the Falcon transactional storage engine was architected and how it compared to some of the other MySQL storage engines. Part 2 covered how Falcon handles transaction management and concurrency. In this final article in the series, I’ll look at how Falcon manages tables and indexes, and also cover backup/recovery along with migration topics.
Table Support in Falcon
In terms of storage, both tables and indexes are stored within the user datafiles that are automatically created by the Falcon engine when the first Falcon table is created in a database.
Falcon supports standard heap tables along with all datatypes available within MySQL. Tables can house up to four billion rows in the alpha release (this limit will be removed in the GA version). To support the auto-increment feature of tables, Falcon implements a subset of the standard SQL sequence feature. A sequence is a mechanism for generating unique ascending values that are not transactional. When a sequence is incremented, it remains incremented even if the action that causes the increment fails or the transaction rolls back. Falcon generates new sequences for concurrent inserts without requiring them to wait for each other. For users of Oracle and DB2, please note: Falcon sequences are not separate objects that can be addressed and manipulated as sequences in those RDBMS’s.
Foreign keys are not implemented in the alpha release of Falcon, but are planned for the GA release.
Falcon Indexes
With respect to indexes, Falcon uses an advanced form of B-tree indexing that provides a number of the benefits of clustered indexes without the drawbacks that come from the use of such structures. Traditional database index implementations traverse indexes by bouncing between index pages and database pages, which can oftentimes lead to inefficient or costly disk access.
Clustered indexes (or index-organized tables) are structured so that the physical ordering of records corresponds to the index order, with the actual leaf pages being the data pages. While some applications benefit from this organization, the physical implementation of clustered indexes can lead to space management problems, such as page splitting and more. Furthermore, a table can be clustered on only one index, reducing the efficiency of secondary indexes.
Falcon B-tree indexes work differently in that the index is scanned first, with bits being set in a sparse bit vector to indicate selected records. Records and data pages are then processed in bit order, which is also physical order on disk.
Falcon’s indexing scheme results in a number of benefits. First, all
indexes behave like well-tuned clustered indexes. However, note that
the data is not stored in index order. So, for example, doing a SELECT without an ORDER BY
clause on a table with a Falcon index will not return data back in
ascending index order (something you would see with an actual clustered
index).
Second, index pages are locked, read, and released, with no intervening data accesses that can cause locking conflicts.
In addition to specialized B-tree indexes, Falcon also employs what are currently called “index accelerators”. To reduce the cost of adding lots of index entries, Falcon implements in-memory index regions that buffer index changes in memory until a commit. Because all the index changes made by a transaction are applied in sorted order, Falcon avoids making scattered changes throughout the index. Adding a single large block to the serial log is also faster than adding numerous small blocks. Index accelerators improve the performance of bulk loads and mass updates or inserts on existing tables.
When a Falcon transaction makes a change that affects an indexed field, it creates an entry in a private in-memory index section for that index. Each transaction that inserts or updates indexed fields has a private in-memory section for each index it changes. The segments are linked in memory to the primary index definition and to the transaction that created them. The in-memory index has the same structure as the permanent B-tree. When a transaction commits, all changes it made to each index are written to the log as a single entry. The log entry contains a sorted list of the new entries for that index, effectively the bottom level of the in-memory index. When one of the worker threads applies the changes to the index, having them in sorted order means that all changes to a bottom level page of the index are together. Therefore, splits are propagated to one upper level page at a time. When an index segment has been copied from the log to the permanent index, the in-memory copy of the segment is released.
When doing an index lookup, a transaction looks in the permanent index, and in its own in-memory index section for that index, and in the private index segments of any transaction that has committed but remains in memory. As a result, a transaction may find two entries for the same value of the same row, one from the permanent index and one from the in-memory segment of a committed transaction. Since the two resolve to the same record, and since Falcon uses a two-phase index lookup with an intermediate bit map, the additional cost is minimal.
For inserts and updates of unique and primary key indexes, and for referential constraint evaluation, Falcon must verify that there are no conflicts with committed or uncommitted data. Those checks are handled by referencing the permanent index and all in-memory index segments for that index.
One last thing regarding indexes: online add/drop index capability is planned for the first GA release of Falcon.
Falcon Backup and Recovery
Most MySQL users use the bundled mysqldump utility to backup their databases, although some rely on open source alternatives or third party backup tools. Using mysqldump is typically very easy and efficient for most databases; however it should be understood that the utility can be impractical for extremely large databases or MySQL installations that utilize multiple storage engines and require no business interruption.
There are two basic ways to backup a Falcon database with mysqldump:
- Run
mysqldumpusing the command line. - Use the graphical MySQL Administrator tool that uses mysqldump behind the scenes.
To use mysqldump from the command line to backup a Falcon database called gimf, you could enter something similar to this:
mysqldump -uroot --port=3309 --extended-insert --quick gimf > gimf.dmp
The above command specifies the ‘root’ user along with the port the MySQL instance is running on. It also specifies that the extended insert option be used, which uses MySQL’s multi-insert feature to more quickly restore a database if needed. It also uses the –quick option, which assists with large tables, and also specifies the database being backed up (gimf). Finally, it pipes the output to the actual backup file.
The same operation can be performed with the MySQL Administrator tool (which can be downloaded in the MySQL management tools bundle found at http://dev.mysql.com/downloads/gui-tools/5.0.html). Using MySQL Administrator – which now supports Falcon – makes knowing the various mysqldump options unnecessary as everything is handled in point and click fashion. Simply log into the database server with MySQL Administrator, select the Backup option in the Explorer pane, choose the option to create a new backup project, select the database(s) containing the Falcon tables you want to backup, and press the Execute button.
Falcon will support the --single-transaction option of
mysqldump in the GA release so online backups can be performed. And
when the new online backup capability of MySQL is ready to ship, Falcon
will fully support that.
Restoring a Falcon database is no different than restoring any other storage engine. A mysqldump file can be piped in via the mysql command line tool, or you can use the MySQL Administrator to do point-and-click restores of Falcon databases.
Migrating to Falcon
One of MySQL’s strengths is being able to easily migrate databases
between the different storage engines that make up the pluggable
storage engine architecture of the server. This means that you can
start with one particular engine and then switch to another (and
another) as your needs change. Current MySQL tables can easily be
migrated to Falcon via the ALTER TABLE … ENGINE=FALCON command. Of course, you need to keep in mind a few migration considerations such as:
- True clustered indexes on InnoDB tables would not be migrated and function on Falcon as they do on InnoDB.
- Applications requiring the use of the Read Uncommitted lock isolation level cannot use Falcon, as it does not support Read Uncommitted.
- Minor locking differences in InnoDB may cause some applications to behave differently with Falcon. However, most InnoDB applications will experience no problems.
- GIS indexes on MyISAM tables cannot be migrated to Falcon (Falcon supports GIS datatypes but not GIS indexes).
- Full-text indexes on MyISAM tables cannot be migrated to Falcon.
- Custom data directories for datafiles used for the MyISAM and Archive engines cannot be maintained for objects migrated to Falcon.
- Row-level replication (instead of statement-based replication) is required when replicating Falcon objects.
Migrating most any non-MySQL database to a MySQL server is easily handled through the use of the MySQL Migration toolkit. Migrating a database that is on Oracle, SQL Server, Sybase, Microsoft Access, or pretty much any RDBMS that can be accessed via a JDBC connection can be graphically performed in the Migration toolkit.
Migrating a non-MySQL database to Falcon using the Migration Toolkit can be done by following these steps:
- Download the Migration Toolkit, which is contained in the MySQL management tools bundle found at http://dev.mysql.com/downloads/gui-tools/5.0.html. Installation should take only a minute or two.
- Connect to the Source database.
- Connect to your target MySQL server that contains the Falcon engine.
- Select the schemas/tables from the source database you want to migrate to Falcon.
- When you reach the part of the migration wizard where you are asked to specify the migration of the MySQL tables (the Object Mapping screen), you will need to click on the “Set Parameter” button in the “Migration of Type … Table” section. Once you do this, click on the “User defined” radio button and then change the default ENGINE= parameter value to “falcon”.
- Continue with the rest of the migration wizard until you have successfully migrated your database over to Falcon.
Note that the Migration Toolkit can also be used to migrate MySQL databases located on one physical server to another MySQL database located on a different (or same) server.
Conclusion
We’ve reached the end in this quick introductory series of articles on the Falcon transactional storage engine. I hope that you will download and try out Falcon, and of course, let me know what you think (both good and bad). You can download a binary for Linux and Windows at http://dev.mysql.com/downloads/mysql/6.0.html and also compile from source if you’d like. And don’t forget to visit the Falcon forum to post questions and such.
Thanks, as always, for supporting MySQL!
http://dev.mysql.com/tech-resources/articles/falcon-transactional-engine-part2.html
Understanding the Falcon Transaction Storage Engine – Part 2
In Part 1 of this article series, we looked at how the Falcon transactional storage engine was architected and how it compared to some of the other MySQL storage engines. In this article, I’ll focus on how Falcon performs transaction management, including some special points on where Falcon differs from some of MySQL’s other transactional engines.
Overview of Falcon Transaction Management
The first thing to understand about transactions and Falcon is that the engine takes a multi-generational approach to managing both transactions and concurrency. This means that the engine keeps multiple iterations/generations of rows available in memory to ensure the highest possible levels of uninterrupted data access.
Falcon supports ACID-level transactional operations, which are handled in memory via the record cache. Once in-process transactions are committed, the operations are flushed to the Falcon Log for asynchronous application to the database files. The only exceptions to this rule are new BLOBs, which are immediately applied to the database files.
Falcon differs from proprietary databases such as Oracle as well as other open source RDBMS’s like Firebird in that all transactional operations are kept in memory inside the record cache. Falcon maintains a 4-byte transaction ID in the record header of the rows involved in the transaction. When a transaction commits, its transaction id is stored in the log at the front of the block containing the committed version of every record it changed, and the block containing its index changes. To provide each transaction a stable snapshot of the data that existed the moment the transaction was submitted, the Falcon record cache may contain multiple versions of data.
As was mentioned, each row in the record cache contains a transaction ID of the transaction that created it. Rows with older versions of data include a pointer to the older version of the data, and deleted rows are represented in memory by a row header with no data and contain a flag that is set to indicate a deleted row. The log entry for a deleted record is a record number with a flag indicating that the record was deleted by a committed transaction.
Falcon offers auto/non-auto commit (which is accomplished above the storage engine layer in MySQL), and provides savepoints and group commit for intelligent transaction control. Distributed transactions/two-phase commit are not in the alpha version of Falcon, but will be supported before the GA release.
Concurrency Control in Falcon
Falcon is a multi-generational transaction engine that uses MVCC (multi-version concurrency control) as its primary concurrency control mechanism. This means that readers never block writers and vice versa, with the end result being access to data whenever it is needed.
The Falcon engine defaults to the repeatable read isolation mode level with no phantoms being possible. When a transaction attempts to modify or delete a row that another in-process transaction has already obtained, the previous transaction will wait until the other transaction either commits or rolls back. If the first transaction commits, then the other transaction receives an error and must try again. However, if the first transaction rolls back, then the other transaction will succeed. It’s important to note that all transactions will always read and operate on consistent data, which mirrors how many proprietary databases like Oracle operate.
For example, suppose one MySQL session performs the following actions:
mysql> create table t (c1 int) engine=falcon;
Query OK, 0 rows affected (0.05 sec)
mysql> insert into t values (1),(2),(3);
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)
mysql> update t set c1=4 where c1=1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
Note that the session above has not committed its update. Now suppose another session logs on and attempts to update the same row as session one:
mysql> update t set c1=5 where c1=1;
The second session will wait until the first session either commits or rolls back its change. Suppose the first session commits its change; the following will occur in the second session:
mysql> update t set c1=5 where c1=1;
ERROR 1020 (HY000): Record has changed since last read in table 't'
As shown, an error is returned to the second session because the first session altered the same data that was the second session’s target. Had the first session rolled back its change, Falcon would have allowed the second session’s update to go through. Note that this behavior does differ from the InnoDB storage engine in that InnoDB would not have returned an error back to session two – the update would have gone through (if a timeout had not occurred) with zero rows having been affected.
InnoDB implements MVCC with a combination of multiple versions of
record and write locks on key ranges and records. In Repeatable Read
transaction isolation mode, InnoDB has update anomalies that require
the use of the non-standard SELECT … FOR UPDATE syntax. The result of a simple SELECT and a SELECT … FOR UPDATE may differ within the same repeatable read transaction.
Falcon uses versions of records and transaction identifiers to provide repeatable reads and to recognize and prevent inconsistent updates by concurrent transactions. Repeatable read is handled by providing each transaction with the version of each record that was committed when the transaction started.
Falcon manages concurrent updates by recognizing that the most current version of a record is not visible to the transaction that wants to modify or delete the record. In that situation, Falcon causes the second transaction to wait for the first to finish. If the first transaction commits, the second transaction gets an error indicating that its update or delete will violate transaction isolation. If the first transaction rolls back, the second transaction’s update or delete succeeds.
What this means is that under some cases where InnoDB allows an update or delete to wait and succeed, Falcon will cause it to wait and fail. On the other hand, Falcon’s read is actually repeatable with no special clauses needed to allow it to return different values for a record it had previously read. Nor is it possible in Falcon, as it is in InnoDB, to overwrite changes made by a concurrent transaction – changes which are not visible to the overwriting transaction.
Besides repeatable read, the most common lock isolation level is read committed (which Falcon also supports in alpha; the only planned isolation level for GA that is not in the Alpha version is serializable).
With respect to locking conflicts, the MySQL Falcon engine uses a sophisticated graphing model for deadlock detection that uses a lock table to resolve lock dependencies and deadlock issues. This form of deadlock resolution is superior to the standard timeout and similar models that other databases systems use.
Rollback Performance in Falcon
Because Falcon is a memory-based MVCC database (yes, it can page transactions to disk if necessary), the engine offers very fast rollback capabilities should the need to undo data changes arise. Being an Oracle DBA for years, I can’t tell you the number of times I suffered through agonizingly slow rollback times for major data changes – rollbacks that sometimes took double the time of the actual transaction itself.
Such is not the case with Falcon. To prove this point, let’s work through a quick exercise. Let’s pump a little over one million rows into an InnoDB table, perform a rollback, and then do the same thing with Falcon:
mysql> show create table big_table\G
*************************** 1. row ***************************
Table: big_table
Create Table: CREATE TABLE `big_table` (
`client_transaction_id` int(11) NOT NULL DEFAULT '0',
`client_id` int(11) NOT NULL DEFAULT '0',
`investment_id` int(11) NOT NULL DEFAULT '0',
`action` varchar(10) NOT NULL DEFAULT '',
`price` decimal(12,2) NOT NULL DEFAULT '0.00',
`number_of_units` int(11) NOT NULL DEFAULT '0',
`transaction_status` varchar(10) NOT NULL DEFAULT '',
`transaction_sub_timestamp` datetime NOT NULL,
`transaction_comp_timestamp` datetime NOT NULL,
`description` varchar(200) DEFAULT NULL,
`broker_id` bigint(10) DEFAULT NULL,
`broker_commission` decimal(10,2) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
mysql> select count(*) from big_table;
+----------+
| count(*) |
+----------+
| 1120500 |
+----------+
1 row in set (1.36 sec)
mysql> create table test_rollback like big_table;
Query OK, 0 rows affected (0.02 sec)
mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)
mysql> insert into test_rollback select * from big_table;
Query OK, 1120500 rows affected (23.70 sec)
Records: 1120500 Duplicates: 0 Warnings: 0
mysql> rollback;
Query OK, 0 rows affected (22.16 sec)
mysql> alter table test_rollback engine=falcon;
Query OK, 0 rows affected (0.55 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> insert into test_rollback select * from big_table;
Query OK, 1120500 rows affected (17.00 sec)
Records: 1120500 Duplicates: 0 Warnings: 0
mysql> rollback;
Query OK, 0 rows affected (1.66 sec)
mysql> select count(*) from test_rollback;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
mysql> show global variables like '%fal%';
+--------------------------+-----------+
| Variable_name | Value |
+--------------------------+-----------+
| falcon_log_dir | |
| falcon_max_record_memory | 209715200 |
| falcon_min_record_memory | 104857600 |
| falcon_page_cache_size | 104857600 |
| falcon_log_mask | 0 |
| falcon_debug_server | OFF |
| falcon_page_size | 4096 |
| have_falcon | YES |
+--------------------------+-----------+
8 rows in set (0.00 sec)
Falcon does a little better than InnoDB on the 1.1 million row insert, but the rollback operation is where the big difference is seen – a 92% reduction in rollback time. Of course, well put together applications should do all they can to minimize rollbacks, but since they do happen, it’s nice to know that Falcon is optimized to get them over with as quickly as possible.
Conclusion
Those needing ACID transaction support, smart concurrency control, and crash recovery will find that Falcon should fit the bill for the type of online applications being developed today. The next article in this series will look at how Falcon manages tables and indexes as well as a few other things, so be sure to look for that soon. In the meantime, please download the latest Falcon alpha release and let us know what you think. You can download a binary for Linux and Windows at http://dev.mysql.com/downloads/mysql/6.0.html and also compile from source if you’d like. And don’t forget to visit the Falcon forum to post questions and such.
Thanks, as always, for supporting MySQL!
Understanding the Falcon Transaction Storage Engine - Part 1
If you've been using MySQL for any time at all, you know that a unique advantage afforded MySQL users is the ability to use multiple storage engines to manage the underlying data stored on the server. This efficient and modular architecture provides large performance and manageability benefits for those wishing to specifically target a particular application need - such as data warehousing, transaction processing, high availability situations, etc. - while enjoying the advantage of utilizing a set of interfaces and services that are independent of any one storage engine. The MySQL server supplies engines that are internally developed, provided by third-party software vendors, and also allows a user to build their own storage engine if they desire.
Some of MySQL's engines support transaction processing and some do not. The current production engines in MySQL 5.0 that support transaction management are NDB/Cluster, InnoDB (supplied by Oracle), and the solidDB engine (supplied by Solid Information Technology). MySQL has also just introduced a new transactional storage engine - code named Falcon - that is currently available in an alpha release. Falcon is a next-generation transactional management engine designed to run on all popular hardware/architecture platforms, but it is especially designed to take advantage of the large memory caches available in 64-bit environments. Falcon is a natural fit for modern businesses and always online applications that need lightening-fast transaction performance and rock-solid data reliability.
In this first in a series of short articles on the Falcon engine, let's take a look at how Falcon has been architected and see how it differs from some of the other popular MySQL storage engines.
Falcon Architecture
The MySQL Falcon architecture is both an advanced and simplified design that makes for a high performing transactional database that requires little maintenance or troubleshooting on the part of database administration staff. The architecture of the Falcon engine is depicted in the figure below:
The architecture consists of seven basic components:
- User tablespaces - contains the actual user data and indexes stored in a Falcon database.
- Falcon serial log - contains recently committed data changes, index changes, and transactional information. The log also handles crash recovery activities for a database.
- Page cache - memory region that holds database pages being read or written.
- System metadata - Falcon's system tablespace that holds internal information.
- Record cache - memory region that holds copies of active and uncommitted records.
- System cache - memory region that contains transaction context information, index accelerators, and system metadata.
- Worker threads - moves data from the Falcon Log into the database page cache, from the page cache to disk, and performs BLOB management.
Each of these components is covered in more detail in the following sections.
Falcon Storage
Falcon offers a number of storage features that include self-balancing indexes, auto-growth of both data and log files, automatic reclamation of space released by deleted records both on data pages and in indexes, and page reorganization to consolidate free space.
Storage for User Data
In the alpha version of Falcon, the engine creates a system tablespace to hold internal engine information and a default user tablespace for user-defined objects. Currently, Falcon user tablespaces contain all Falcon tables, BLOBs, and indexes defined for databases that hold Falcon objects. DBAs and developers can create user tablespaces in any directory or filesystem they would like and place tables into them via DDL.
The underlying datafiles of Falcon tablespaces offer automatic storage extension when needed and automatic space reclamation, which makes reorganizing tables and indexes mostly unnecessary.
Page sizes in Falcon are assigned at MySQL instance initialization
time and cannot be altered. Page sizes can range anywhere from 2K up to
32K, with 4K being the default. The parameter that controls this is falcon_page_size.
Rows are stored densely on pages, with updates that increase row length being automatically handled by Falcon so DBAs need not bother with creating periodic defragmentation jobs, which can block access to data when the job is running.
The maximum storage size for a single tablespace current stands at over 100TB (around 116TB), which should be sufficient for the vast majority of application needs.
The Falcon Log
Besides primary data storage, Falcon uses another storage structure called the Falcon Serial Log to manage write-ahead logging and crash recovery. Two physical files actually make up the Falcon Log. Log entries are variable length blocks, which are read and written in one megabyte extensions. Each block has a header that includes a unique 64-bit identifier, the length of the block, the creation time of the database to which it belongs, and the block number of the oldest unapplied block in the log file. A block in a log is "applied" if all committed data in the block has been transferred to the database. Blocks are written to the first Falcon Serial Log file until it fills its first extension, after which a second file is created. New log entries are then applied to the second file, which will automatically extend, one megabyte at a time, to accommodate transaction demand.
As transactions commit, the blocks that hold their data are applied from the files to the database. Once all the blocks in the first file have been applied, the next extension of the log reopens the first file and starts overwriting it. The second Falcon Log file is closed for entries and its data is applied to the database until it is completely applied. Each time a log file is extended, Falcon checks to see whether the other file is completely applied. If so, it switches files. This process continues in this round-robin fashion.
One distinct aspect of the Falcon Log is that data created by uncommitted transactions never appears in the log, so the log does not serve as an "undo" or rollback mechanism for data changes. Simply put, only data that is intended to be found in the database ever makes it to the Falcon Log. What this practically equates to are near-instantaneous rollbacks and very fast crash recovery times.
Crash recovery in the Falcon engine is handled by the Falcon Log, with data from committed transactions that have not been applied to the user datafile (prior to a system crash) being written to the database upon restart of the system. The serial log also contains internal structural information about page allocations and releases, index page splits, record number and blob allocations and releases, etc. used in recovery.
The Falcon Log's physical location is defaulted to the data directory of the database, but can be changed by the DBA to be somewhere else on the server, which helps reduce I/O contention at the disk level. Note that custom-defined Falcon log locations are not available in the alpha release, but will be supported in the GA version.
Falcon Memory Caches
Falcon was designed to perform best on systems with generous amounts of memory. Although the memory caches utilized by Falcon are similar in some respects with other RDBMS's and MySQL engines, the cache structures offer a number of improvements over traditional memory caching strategies. The mechanisms used by Falcon with respect to memory caching include:
- Log Cache - log information is kept in memory and
flushed to the Falcon Log when transactions commit. Falcon keeps eight
windows into the log file for reading and writing, and each window is
1MB. The
falcon_log_file_memory_useparameter in the my.cnf file controls the amount of memory devoted to the Falcon log cache. This parameter is not available for alpha but will available in the GA release. - System and Index Cache - data needed by Falcon (table and field definitions, transaction state, etc.) are also maintained in memory for quick reference. In addition, local index accelerators represent index segments created by a running transaction are also stored in the system memory. When a transaction changes indexed fields, it builds an index accelerator section in system memory, representing its changes. On commit, all index changes for the transaction are written to the Serial Log in sorted order and later merged with the permanent index by the worker thread.
- Page Cache - database pages read from disk for a particular database. The page cache size is controlled by the
falcon_page_cache_sizeparameter, the default of which is 4MB, and is set in the my.cnf file. Although record and index changes go to the serial log before being written to database pages, blob data is written directly into the page cache. This avoids logging large data items that are rarely referenced or changed by the transaction that creates them. - Record Cache
- the record cache is a memory region devoted to holding rows that have
been requested by end-user queries for a particular database or created
by active transactions. Note that this cache differs from traditional
data caches in that only specific rows needed by applications reside in
the cache as opposed to entire data pages (which may contain only
subsets of needed information). The record cache can hold several
versions of records that have been modified or deleted. This technique
guarantees that active data needed to satisfy user requests is in
memory, shortens row access time, and reduces cache bloat by not
including un-requested information. The record cache also assists in
supporting the multi-version concurrency control (MVCC) mechanisms of
the Falcon engine. The record cache is controlled by two parameters.
The
falcon_min_record_memoryparameter (default 10MB) determines the minimum amount of RAM supplied to the record cache, and thefalcon_max_record_memory(default 20MB) limits the total amount of memory available to the cache.
Because of the support the record cache supplies to transactions, a
scavenge thread is used to ensure only "hot" data resides in the cache.
When the falcon_max_record_memory limit is reached,
Falcon surveys the demographics of the generational data in the cache,
and removes the oldest generations. This process is more complicated
than the standard LRU algorithm used by many database systems, but it
is faster and more efficient.
Falcon Worker Threads
Falcon has four worker threads. The first is called the "gopher" thread. Its sole function is to move committed data changes from the log to data pages and to merge logged index changes with the permanent version of the index stored in the database. The second thread handles the periodic flushing of the page cache and scavenges the record cache.
The third worker thread is called the page writer thread and it is used to write out blobs before page commit to the user datafiles. The fourth thread is the scheduler thread and, as its name implies, it schedules a variety of things like to occur like record scavenging, page flushing, and checkpoint operations.
Of course there are other server threads that exist above the storage engine layer that communicate with Falcon, but these are independent of the actual Falcon architecture.
Conclusion
Falcon's architecture is one of advanced design, but also one that's easy to understand and an implementation that pretty much takes care of itself so you don't have to. The next article in this series will focus on how Falcon manages tables and indexes, so be sure to look for that soon. In the meantime, please download the latest Falcon alpha release and let us know what you think. You can download a binary for Linux and Windows at http://dev.mysql.com/downloads/mysql/6.0.html and also compile from source if you'd like.
Thanks, as always, for supporting MySQL!
- MySQL Korea 런칭 세미나
- 보도자료: Sun & Zmanda이 백업&복구 솔루션으로 MySQL Enterprise를 제공하다
- 화이트 페이퍼 : Enterprise 2.0에 대한 MySQL 가이드
- 사례연구 : Supply Dynamics가 Business-Critical SaaS 솔루션을 제공하다
- MySQL 채용공고
신제품 출시 :
- MySQL Enterprise Monitor (2008 봄)
- MySQL Community Server 6.0 (Alpha)
- MySQL Workbench 5.0.16 (출시 예정)
- MySQL Connector/J 5.1.6 (GA)
- MySQL Connnector/ODBC 5.1.3 (출시 예정)
Hints & Tips :
- 기사 : Information Schema plug-ins에 대한 MySQL 내부 보고
- 화이트 페이퍼 : ZRM - MySQL 백업 & 복구 가이드
이벤트 :
- Live Webinar: XQuery와 XML을 사용하여 MySQL데이타를 검색하고, 집계하고, 변환하기
- Sun-MySQL World Tour 따라잡기
------------------------------
하이라이트
==============================
MySQL Korea Launching Seminar
>> 일시 : 2008년 4월 29일(화) 13:30 ~ 18:00
>> 장소 : 그랜드 인터컨티넨탈 호텔 2층 오키드룸
>> 행사등록 : http://theseminar.co.kr/sun
==============================
2008년 새봄을 맞이하며 한국 썬 마이크로시스템즈에서는 고객 여러분을 모시고,
4월 29일, 삼성동 그랜드 인터컨티넨탈 호텔에서 MySQL Korea Launching Seminar를
개최합니다.
이번 MySQL Korea Launching Seminar는 썬과 MySQL이 함께 여는 오픈데이타베이스
의 진보를 함께 하실 수 있는 자리로써 여러분께 세계 최상급의 데이타베이스 서비스로
진정한 혁신을 선사하게 될 것입니다.
이번 행사에서 MySQL을 통한 비즈니스 비용 절감 및 상호운용성 증대의 효과를 직접
경험해 보시기 바랍니다.
♣ 행사 일정
Time Contents (Speaker)
13:30 - 14:00 Registration
14:00 - 14:10 Greeting ( Wonsik Yoo (GEM VP for Korea) )
14:10 - 14:30 Sun / MySQL ( Jerry Ashford(Software Practice Director for APAC) )
14:30 - 15:00 Maximizing Scalability for the Web Economy (Larry Stefonic (MySQL VP for APAC))
· Introduce MySQL
· Dual License Policy / Community Edition & Enterprise Edition
· Introduce Products Briefly (MySQL Enterprise, Cluster, Embedded)
· References
15:00 - 15:30 MySQL Performance Tuning and Benchmarking (Colin Charles)
15:30 - 16:00 MySQL Roadmap Update (Daniel Saito)
16:00 - 16:10 Coffee Break
16:15 - 16:35 Just Try AMD! (김보규 차장 (AMD Korea))
16:35 - 17:05 오픈소스 비즈니스 모델 및 적용사례 ( 양승도 차장 (한국 레드햇) )
17:05 - 18:00 Cocktail Reception
♣ 행사 등록 및 문의
·좌석이 한정되어 있으니, 반드시 웹사이트를 통해 사전에 등록하여 주시기 바랍니다.
·등록사이트 : www.theseminar.co.kr/sun
·등록기간 : 2008년 4월 7일(월) ~ 4월 28일(월)
·행사문의 : 썬 행사 사무국 Tel : 02-3446-3880, E-mail : sun@citocomm.com
♣ 행사 안내
·본 행사는 무료입니다.
·세미나 종료 후에는 칵테일 리셉션이 진행됩니다.
·행사 후 설문지를 작성하여 제출하시는 모든 분께 티셔츠를 선물로 드립니다.
·무료 주차권이 제공됩니다.
♣ 2008년 공개S/W의 교육적 활용에 관한 국제 컨퍼런스 안내
http://kr.sun.com/korea/event
---------------------------
보도자료 : Sun & Zmanda이 백업&복구 솔루션으로 MySQL Enterprise를 제공하다
오늘 Sun Microsystems 은 백업 및 복구 소프트웨어 부문의 오픈소스 리더인 Zmanda와 함께 MySQL Enterprise 서브스크립션 사용자들에게 광범위하고 폭 넓은 데이타 백업 및 복구 솔루션을 제공하기로 협력할 것을 발표했습니다.
보도자료 읽기 :
http://www.mysql.com/news-and
화이트 페이퍼 : Enterprise 2.0에 대한 MySQL 가이드
Enterprise 2.0은 조직이 그들의 최종고객들에게 온라인 데이터 지향의 응용프로그램을 제공하고자, 웹 2.0 테크놀로지와 응용프로그램, 아키텍처, 그리고 사업 모델에 영향을 주는 유기적인 구조들을 묘사하기 위해 최근에 사용되기 시작한 용어입니다. 이 장에서 우리는 웹 2.0을 만드는 기술적이고 비즈니스적인 요소들과 그것이 현대 기업에 미치는 영향에 대해 알아볼 것입니다.
화이트 페이퍼 다운로드 :
http://www.mysql.com/why-mysql
사례연구: Supply Dynamics가 Business-Critical SaaS 솔루션을 제공하다
Supply Dynamics社는 "Material Demand Aggregation"으로 알려진, 자료 통합 솔루션에 주력하고 있는 업계 선두의 공급 체인 솔루션 제공업체 입니다. 그들이 제공하는 multi-enterprise platform은 General Electric, Honeywell 와 같은 포츈지 선정 100대 그룹을 비롯하여 여러 기업들이 원자재 비용을 15% 줄이고, 그들의 전체 공급 체인을 연결시켜줌으로써 위험요인을 더 낮추고 상품이 더 빠르게 배송 될 수 있도록 도와줍니다.
사례연구 다운로드 :
http://www.mysql.com/why-mysql
MySQL 채용 공고
MySQL은 전세계에서 가장 인기 있는 오픈소스 데이터베이스 소프트웨어입니다.
MySQL은 Sun Microsystems의 소프트웨어 그룹의 일부로서 25개 나라에서 400명 이상의 직원들이 함께하는 글로벌 조직입니다. 우리는 우리 제품의 강력한 추진력을 제공하며, 우리의 성공에 기여할 엔지니어링, 세일즈, 마케팅, 프로페셔널 서비스, 그리고 서포트 분야에서 뛰어난 인재를 찾고 있습니다! 다음과 같은 포지션에 지원하실 수 있습니다:
- Database Systems Engineer (Australia)
데이터베이스 시스템 엔지니어의 주된 업무은 영업팀을 지원하는 MySQL 세일즈 엔지니어로서 MySQL 기술 스태프와 고객들 사이에서 접촉점이 되는 것입니다. 세일즈 엔지니어는 판매 전/후 기술적인 지원을 고객들과 세일즈 직원들에게 Enterprise 와 Web 계정을 통해 제공합니다. 이상적인 지원자는 지역 세일즈 팀의 일원, 전세계적인 세일즈 엔지니어의 일원 그리고 MySQL 제품의 훌륭한 기술적인 전파자의 일원이 될 수 있을 것입니다.
지원자는 탄탄한 기술력과 기술적 개념들을 다른 사람들에게 설명할 수 있는 능력을 겸비해야 하는 동시에 문제해결과 새로운 기술을 익히고자 하는 열정이 있어야 합니다.
- Senior Software Engineer, Online Backup (Worldwide)
- QA Engineers (Worldwide)
- 그 외 기타
지금 지원하십시오:
http://www.mysql.com/jobs/
------------------------------
신제품 출시
MySQL Enterprise Monitor (2008 봄)
2008년 봄에 출시되는 Enterprise Monitor는 이제 전체적으로 사용이 가능합니다(GA 버전). 이번 업데이트는 서브스크립션 사용자들이 가장 안전하고 최신버전의 MySQL Enterprise Server로 표준화 할 수 있도록 도와주기 위해 고안된 새로운 기능들을 담고 있습니다. 다음은 새로 추가된 기능입니다:
- 새롭게 업그레이드된 Advisor - 업그레이드 된 advisor는 임의의 환경에서 동작중인 MySQL Enterprise Server 버전에 따른 잠재적으로 영향을 줄 수 있는 특정 버그를 감독하고

이올린에 북마크하기