Hadoop in Taiwan 2012 講者投稿現況


PS. 比較有趣的是今年大家都用英文命名,真是國際化呀!!

Developer / 開發者
楊詠成(Gibson Yang) / 台灣雅虎 Yahoo! oozie introductionoozie introduction & experience sharing
Chia-Hung LinBulk Synchronous ParallelHadoop MapReduce[1]. is a popular open source framework inspired by functional programming 's map and reduce functions, saving developers lots of works by covering many underlying complicated tasks. However, not all tasks fit into MapReduce's scenario, graph related computation task (e.g. social network analysis) is one such example. Google therefore developed their in-house product, Pregel[2], based on Bulk Synchronous Parallel[3] - a bridge model suitable for performing iterative algorithms, performing large scale graph processing.

1. What is Bulk Synchronous Parallel?
2. Apache Hama
3. Comparison between Hadoop MapReduce and Apache Hama

[1]. http://hadoop.apache.org/mapreduce/ [2]. http://dl.acm.org/citation.cfm?id=1582723 [3]. http://dl.acm.org/citation.cfm?id=79173.79181
Laurence Liew / Revolution Analytics, Asia Pacific. General Manager Big Data Analytics - Trends and Best Practises Using case studies from consumer behavior analytics to text mining and sentiment analysis, this session introduces big data analytics & the field of Data Science. An overview of data science and the data scientist toolkit will be presented. A discussion on use of R and Hadoop will also be presented.
Application / 應用案例
講者 / 單位名稱演講主題演講摘要
辜文元 / 逢甲大學GIS中心Hadoop於地理資訊系統之應用案例分享近年來由於遙測技術之快速發展,單幅影像解析度大幅提高使得檔案需要更大的儲存空間,此外動態攝影在環境觀測與記錄使用上也愈來愈廣泛,資料動輒以GB或TB為單位成長,使得遙測資料儲存管理的需求性日益增加。面對如此巨量的資料量往往導致傳統伺服器頻繁的出現儲存空間不足的狀況,雖然傳統伺服器可以增加硬碟來增加儲存空間,但垂直的空間擴展有一定的限制,如何因應日益增加的影像儲存需求,將會是一個很重要的課題。

Chun-Han Chen / OgilvyOne Mohohan: An on-line video transcoding service via HadoopA famous cloud computing file system and developing framework named Hadoop is mainly designed for massive textual data management, such as counting, sorting, indexing, pattern finding, and so on. However, it is merely to seek a multimedia-oriented service via Hadoop. Mohohan is an on-line multimedia transcoding system for video resources, which implemented with Amazon Web Service (AWS) EC2, AWS S3, AWS EMR, Hadoop, and ffmpeg. Its goal is reducing the overall execution time by parallel transcoding via the Hadoop cluster. The concept of Mohohan is simple: 1) to divide the video into several chunk of frames, 2) to transcode the chunks in parallel with multiple nodes (i.e., task tracker) of Hadoop cluster, and 3) to merge the transcoded results into the output. On the homogeneous SaaS comparison, a test report from an impartial third party organization named CloudHarmony has been chosen. Finally, the experiment result shows that Mohohan performs quite better than other on-line video transcoding services mentioned in the test report, such as Encoding, Zencoder, Sorenson, and Panda.
Vincent Chen / TCloud / Business Development Director 精準行銷上的應用- Hadoop in 移動裝置上網行為分析精準行銷上的應用- Hadoop in 移動裝置上網行為分析:
Administrator / 維運者
Jason Shih / Etu, SYSTEX Corp.Hadoop Security Overview - From Security Infrastructure Deployment to High-Level Services The increasing trend of adoption Hadoop open-source framework for speedy data processing and analytics capabilities for organizations to manage huge data volume have brought attention to enterprise wide security concern aiming for fine grain control of sensitive information and isolation from different level/group of access on sharing storage or computing facilities. Prior to Hadoop 0.20, Unix-like file permission were introduced, providing also cluster-wide simple authentication mechanism but lack of access control per job queue, submission and other operations. With Hadoop's new security feature and it's integration with Kerberos, it's now possible to bring strong authentication and authorization to ensure rigorous access control to data, resources and also isolation between running tasks. In this presentation, we will cover the deployment details of Hadoop security on cluster environment and implementation on high-level services base on kerberized security infrastructure. We introduce also the Etu Appliance providing fast-deployment, system-automation and built-in feature of cross-realm trust mechanism which fulfill the interoperation between existing Active Domain or external LDAP realm and help reducing both integration and operation-wide overhead from administrators.
Kenneth Ho Hadoop hardware and network best practices.