Wednesday, April 15, 2020

Data Provenance in E-Learning Essay Example

Data Provenance in E-Learning Essay We live in an information age, where the volume of data processed by humans and organizations increases exponentially by grid middleware and availability of huge storage capacity. So, Data management comprises all the disciplines related to managing data as a valuable resource. The openness of the Web and the ease to combine linked data from different sources creates new challenges. Systems that consume linked data must evaluate quality and trustworthiness of the data.A common approach for data quality assessment is the analysis of provenance information. [1] Data provenance, one kind of Meta data, relate to the transformational workflows of a data products (files, tables and virtual collections) starting from its original sources. Meta Data refers to â€Å"Data about Data†. The workflows can generate huge amount of data with rich Meta data in order to understand and reuse the data. Data provenance techniques used in e-science projects, e learning environment, etc. E-learning can be difficult to understand because different authors use the term differently.E-learning is a new education idea by using the Internet technology, it delivers the digital content, provides a learner-orient environment for the teachers and students. This definition extends the environment on the Internet. We mean that the Internet provides a learning environment for the students and teachers. This environment is learner-oriented, so we can throw out the thoughts of traditionally teacher-centre’s instruction in classroom. 2. E- Learning in Detail 2. 1 ‘E’ side of E-Learning As it apparently seems, the word can be thought of having two different sides. E’ side and ‘Learning’ side are the elements which construct this norm. ‘E’ side has more impact in the idea of E-Learning. Just for a look, it might give the explanation as ‘electronic’. But in this phenomena, ideas are broad considering different aspects of electroni c technologies. Normally in an e-learning environment, Store, access and use of information occurs seamlessly. This needs to be addressed indicating different technologies/products. That might include operating systems (Windows, Mac OS, etc) , standalone applications (word processor, excel , etc) and any other web applications.In fact, this different products/technologies collaboratively build the norm of virtual learning environment. So, rather than just think of ‘e’ as electronics, above given factors should come into the mind to fix the ‘e’ side with learning aspects. Science of e-learning involves investigation about how people learn in e-learning environments. This subsequently results in three elements such as 1) evidence 2) theory 3) applications. And now, there is a question of what the e-learning is. It is approached to answer the question by considering what, how, why of e-learning keywords.Definition derived from this, may rise doubt whether e-le arning would fulfill the conventional learning. But the fact is always there that as long as same instructional methods are used to convey the contents, medium doesnt come into any concern. When presenting multimedia materials in e-learning environments, there are concerns of how it is to be presented as there are lots of methods like words, pictures, narrations and more. The article goes through nine effects: modality effect, contiguity effect, multimedia effect, personalization effect, coherence effect, redundancy effect, pretraining effect, signaling effect, and pacing effect.Each of which explains an efficient way of presenting materials within their context. When the theory for science of e-learning is considered, it depicts that the process of meaningful learning from multimedia involves five cognitive processes: selecting words, selecting images, organizing words, organizing images, and integrating. Efficient learning happens when e-learning environments adapt and enable thes e processes. Finally, applications for science of e-learning is built combining evidence and theory in a practical manner. It is concluded that even the e-learning moves beyond multimedia contents, this three elements are required. 3] E-learning can be divided into four categories learning theories, based on demands of learner, technology form and content to be formed. Andragogy is the term used for learning of adults. Adult usually self-directed, they take responsibility for their learning. So theory of andragogy instructs us that the best way to assist adults in E-learning. Problem based learning is about teaching adults is to provide them with the tools they information and tools they Need to solve problems and to provide these in the proper sequence, level of depth, and format to maximize their usefulness.Teaching can only take place in same place and same time. But E-learning can be seen in four basic situations such same time, place (traditional classroom), same time different place, same place different time and different time different place. The environment of e-learning can be divided into several component environments. Information development meta-environment deals with making of plans for informing clients. The delivery environment is concerned with the available ICT as well as the packaging of the information into the optimal sequence and media for the target learner.The information using environment contains the experts on the topics being taught and the students who wish to learn this information. These three environments are distinct from each other. Associating and storing metadata about the learning object with the learning object makes it possible for a course designer to search for and locate existing learning objects. For this to work, those objects must be stored in an accessible location and form. Such locations are called Learning Object Repositories. There are two types of repositories; first type only contains etadata of the learning objects and actual learning objects saved on various locations. Second type is both Meta data and actual learning objects saved on same place. Learning Content Management System (LCMS) is used for a system that is more capable than a simple Learning Object Repository. This term commonly used for system that supports authoring combined with learning object repository and tools for delivering the object to students and administrative tools. And learning space contains multiple databases and multiple participants.These participants restricted to some databases based on their responsibility in the system. [4] 2. 2 Different modes of E-learning: Research paper [5] has mentioned different modes of e-learning has following. E-learning can be used in educational system in many ways. As a matter of importance of incorporating e-learning strategies into formal educational system, many approaches/techniques have been recommended from time to time. The following are a few that are quite releva nt in a particular educational setting. 2. 2. 1 Blended learning It is a concept which is of quite recent origin.It is nice amalgam of formal teaching/learning mode with distance/e-learning strategies in order to facilitate the target learner. Some particular examples of blended learning benefits include an increase in the number of students feeling and an increase in student support and consequently improved student retention rates. 2. 2. 2 Self-learning The novel idea of ‘self-learning’ received due attention recently. Both the teachers/guides and the students usually interact via mail. Classroom teaching has become little passive, however its importance can never be minimized for many genuine reasons.Devices like Communication technologies are generally employed which categorized as asynchronous or synchronous activities. Asynchronous activities use technologies such as blogs, wikis, and discussion boards. The idea here is that participants may engage in the exchange of ideas or information without the dependency of other participants’ involvement at the same time. Synchronous activities involve the exchange of ideas and information with one or more participants during the same period of time. A face to face discussion is an example of synchronous communications.Synchronous activities occur with all participants joining in at once, as with an online chat session or a virtual classroom or meeting. 2. 2. 3 Personalized learning It is a personalized-based unique learning mode reflecting differences in learners. Personalized learning has always been the burning research issues in the area of E-learning throughout the recent past. In E-learning, the following issues are emphasized: individual differences such as capacities, learning background, learning styles, learning objectives, and the changing states of individual knowledge in learning process.So E-learning in these trends attempt to provide a personalized learning which includes persona lised material, personalized objectives and personalised process. 3. Data Provenance in Detail Provenance information about a data item is information about the history of the item, starting from its creation, including information about its origins. Provenance can be distinguished into two granularities those are: workflow (or coarse-grained) provenance and data (or fine-grained) provenance. Workflow provenance represents â€Å"the entire history of the derivation of the final output of a workflow†.Data provenance, in contrast, provides a more detailed view on the derivation of single pieces of data. There is a provenance model that captures both, information about Web-based data access as well as information about the creation of data. [1] A digital object’s provenance (also referred to as audit trail and lineage) contains information about both the process and data used to derive the object. Provenance also provides documentation that’s vital to preserving da ta, determining the data’s quality and authorship, and reproducing as well as validating results. 20] Provenance in the context of workflows, both for the data they derive and for their specification, is an essential component to allow for result reproducibility, sharing, and knowledge re-use in the scientific community. [6] Scientists and engineers need to expend substantial effort managing data and recording provenance information. They support the automation of repetitive tasks, but they can also capture complex analysis processes at various levels of detail and systematically capture provenance information for the derived data products.It provides important documentation that is key to preserving the data, to determining the data’s quality and authorship, and to reproduce as well as validate the results. Workflow and workflow-based systems have emerged as an alternative to ad-hoc approaches for constructing computational scientific experiments. Workflow systems hel p scientists conceptualize and manage the analysis process, support scientists by allowing the creation and reuse of analysis tasks, aid in the discovery process by managing the data used and generated at each step, and systematically record provenance information for later use.Workflow systems have a number of advantages for constructing and managing computational tasks compared to programs and scripts. They provide a simple programming model whereby a sequence of tasks is composed by connecting the outputs of one task to the inputs of another. Furthermore, workflow systems often provide intuitive visual programming interfaces, which make them more suitable for users who do not have substantial programming expertise. Workflows also have an explicit structure. They can be viewed as graphs, where nodes represent processes (or modules) and edges capture the flow of data between the processes.The benefits of structure are well-known when it comes to exploring data. There are two basic views of provenance such as source provenance and transformation provenance. Provenance recording can be classified as lazy (inversion) and eager (annotation). Some address provenance in the context of services and workflow management. The myGrid system [SRG03] provides middleware for biological experiments represented as workflows. Chimera offers a Virtual Data Catalog for provenance information. The topic of provenance for relational databases was first discussed in the context of visualization.Trio is a database system for handling uncertain data and provenance. Provenance is related to data annotation. Annotation systems like DB-Notes and MONDRIAN enable a user to annotate a data item with an arbitrary number of notes. [10] The three main categories of the provenance scheme mentioned in the report are provenance model, query and manipulation functionality, and storage model and recording strategy. [2] â€Å"In query provenance, the word ‘query inversion’ is one of the method for identify provenance by inverting the original query†. Provenance can be characterized related to view maintenance and truth maintenance.View maintenance about, when the source of the database changes we would like to recomputed the view without recomputing whole query. Truth maintenance is about what is in the database. Query inversion method is problematic in updating and captures other languages and data models[7]. The provenance of a data item can be divided into the two parts transformation provenance and source provenance. Source provenance can be classified as original source, contributing source and input source. An important part of the provenance model is the world model, which could be either closed or open.In a closed world model the provenance management system controls transformations and data items. Contrary in an open world model the provenance management system has no or only limited control over the executed transformations and data items. [6] I n research paper [10] authors have mentioned two distinct forms of provenance prospective and retrospective. Prospective provenance captures the specification of a computational task; it corresponds to the steps that need to be followed to generate a data product or class of data products.Retrospective provenance captures the steps that were executed as well as information about the execution environment used to derive a specific data product; a detailed log of the execution of a computational task. If a provenance management system handles transformations at various levels of detail, it should provenance mechanisms for merging multiple transformations into one transformation and split a complex transformation into a sequence or graph of simpler transformations. Storage strategy describes the relationship between the provenance data and the data which is the target of provenance recording.There are three principal storage strategies: the no-coupling, the tight-coupling and the loose -coupling recording strategy. [10] Provenance systems can support a number of uses such as data quality, audit trail, replication recipes, attribution and informational perspective. Provenance information can be collected about different resources in the data processing systems such as data-oriented or process-oriented and the granularity at which it is collected such as fine grain or coarse grain.The cost of collecting and storing provenance can be inversely proportional to its granularity. There a lot of techniques to represent provenance information, specially annotation and inversion. Annotation is a form of representation in that provenance is pre-computed and readily usable as meta-data. The inversion method uses the property by which some derivations can be inverted to find the input data supplied to them to derive the output data. There is no Meta data standard for lineage representation across disciplines due to the diverse needs.So, many current provenance systems use synt actic information, semantic information and contextual information for representation. When it comes to the storage, the manner in which the provenance metadata is stored is important to its scalability. Management of provenance incurs cost for its collection and storage. Less frequently used information can be archived to reduce the storage overhead or a demand supply model based on usefulness can retain provenance for those frequently used.The most common way of disseminating provenance data is through a derivation graph that users can browse and inspect. There are a lot of popular surveyed data provenance techniques such as Chimera, myGRID, CMCS, ESSW and Trio which focus on characteristics like applied domain, workflow type, use of provenance, subject, granularity, representation scheme, semantic info, storage repository, user overhead, scalability addressed and dissemination. [2]Teaching is one of the killer applications of provenance-enabled workflow systems, in particular, fo r courses which have a strong data exploration component such as data mining and visualization. By using a provenance-enabled tool in class, an instructor can keep detailed record of all the steps she tried while responding to students questions; and after the class, all these results and their provenance can be made available to students. For assignments, students can turn the detailed provenance of their work, showing all the steps they followed to solve a problem.Provenance of Electronic Data, In a practical situation, e-science end users would be able to reproduce their results by replaying previous computational model, understand why two seemingly indistinguishable processing with the same inputs produce different results, and decide which data sets, algorithms, were involved in their derivation and analysis for deviation. Same thing apply for the provenance of electronic data. Process documentation is to electronic data what a record of ownership is to a work of art. So Proces s documentation for many applications cannot be produced in a single. Application Performance depend on documentation.They have identified various kind of p assertions which are simple pieces of documentation produced by services autonomously. Next important thing after Process Documentation in provenance is querying. Provenance queries are user-tailored queries over process documentation aimed at obtaining the provenance of electronic data. Last part of the article illustrates the Organ Transplant Management (OTM) system in health care. OTM consists of a composite process involving the surgery itself, along with such activities as data collection and patient organ analysis that must comply with a set of regulatory rules.OTM is supported by an IT infrastructure for data maintenance. [8] By making OTM provenance-aware, powerful queries can be processed without provenance awareness. Many complex decisions are made using Data provenance. For instance whether or not to donate an organ. Moreover the article speaks about existing system in data provenance. Virtual Data System and myGrid are scientific workflows that provide support for provenance. The Provenance Aware Storage System developed at Harvard University is designed to automatically produce documentation of execution by capturing file system events in an operating system. 8] 3. 1 Functional Requirements for Information Resource Provenance on the Web Before we consider about Data provenance model, Let consider the web architecture to representation Information Resource Provenance . Http protocol plays an important in the Web. Http transaction can be interpreted as many ways. At a low level perspective, a physical stream of bit is transmitted between clients and server. At a higher level perspective, those bits stream is interpreted as a message with specific bit pattern. Moreover, the architecture of the Web(WWW) defines the relation between URLs(ex: http://weather. xample. com/oaxaca) , Recourses(ex :Oaxac a Weather Report) ,Representation(ex: . html, . xml, . RDF and JSON format) . Different request for the Same recourse can return a variety of format representation such as HTML, XML ,etc.. This leads us to define standard for every format. W3C recommendations relating to how URIs ,XML entities and RDF resources are related. So from this definition, one resource may be returned for a URL and that the exact nature of this resource can be unpredictable. 3. 1. 1 The Semiotics of HTTP URLs The dereferencing of a URL can be mapped to a semiotic interpretation.Ogden and Richards’ Semiotic Triangle model explains how real world objects are related to symbols and how people think about those objects from a linguistic perspective. | |[pic] | | Symbol : URL : http://www. weather. gov/xml/current obs/KBOS. xml Referent: Recourse : the document ’/xml/current obs/KBOS. xml’ Representation :xml format 3. 1. 1 FRBR and FRIR FRBR is a conceptual model that relates user tasks of retrieval and access in library catalogues and bibliographic databases from a user’s perspective.It is not follow the new cataloguing rules and standard. library science community that distinguishes four aspects of an author’s literary work, ranging from purely concrete to completely abstract. For example, FRBR can describe how different copies of the same book, or different editions of the book, relate to each other. The most real aspect is the Item – the physical book that exists in the world. Functional Requirements for Information Resources It extends the use of following form 1. frbr: Work remains a distinct intellectual or artistic creation and corresponds to the Resource or Referent in the semiotic framework. . frbr: Expression 3. frbr: Manifestation 4. frbr: Item to electronic resources, FRIR also integrates FRBR with the W3C Provenance Ontology (PROV-O). FRIR has two levels of cryptographically computable which are content and message. 3. 1. 3 HTTP with FRBR, FRIR, and PROV-O If a client asks an HTTP server for a mime type at a URL, the server can respond with many different possible file formats. For an example If the client asks for plain txt format , the server will try to find the best way of representation content. 1. URL denotes a single : frbr: Work. 2. Same content regardless of format frbr : Expression. content digest) 3. The bit of sequence of a file aligns frbr: Manifestation. (message digest) 4. Files on disk or data as streamed over a network connection frbr: Items (transaction digest) 3. 2 Different views of Provenance Data provenance may be collected and reconstructed from different orchestration and execution frameworks. The provenance collection mechanism provides a natural â€Å"grouping† structure for representing provenance. However, it present provenance from the perspective of the â€Å"composer† of the workflow rather than the â€Å"consumer† of the provenance.The view of the provenance should be based on current task at the time and interest in that task. Different users may interest in different view of the provenance. For example we take business managers and engineers. Business managers may only interest in high level view of data. But most commonly engineers are interested in detailed step in provenance. [17] 3. 2. 1 Example for Different provenance Consumers In this they used an example scenario to express different provenance consuming type of users. They took power consumption forecast workflows as their example scenario.In this scenario there are three kind of consuming users: the software architect, the data analyst, and the campus facility operator. So they need different provenance model for each of them. The word â€Å"quality impact†, which indicates how the quality of a process affects the output quality, is then used to guide users on what processes and data objects they need to exercise more quality control upon. 3. 2. 2 An Apropos Presentat ion view Generally we use two kind classifications of approaches for determine suitable presentation view of the provenance: decomposition approach and clustering approach.Decomposition method is well suited for presence of granularities clearly defined in the provenance model. In each individual activity in the workflow, we identify the most appropriate presentation granularity to satisfy the usage requirement and to meet the user’s interest. When granular levels are not specified clear, clustering approach will be used. This approach incrementally clusters the initial fine-grained provenance information so that groups of low-level provenance nodes are combined and replaced by new higher-level nodes. This strategy needs to identify what grained provenance can be composite into module. . 3 Models for Data Provenance Sixteen team is divided and assigned to challenges and they had to submit their input for analysis provenance challenges. First provenance challenge was setup a f orum based system for the community to understand the capabilities of different provenance system and display how their provenance representation. Back end, a Functional Magnetic Resonance Imaging workflow(FMRI) was define, which participants had to either simulate or run in order to produce some provenance representation, from which a set of identified queries had to be implemented and executed. 9] This article discuss about FMRI in more depth of technical aspect. I would like to explain this concept in simple point of view. For an example take different point of new brain images as input to the FMRI work flow . FMRI unified different brain images to produce the single reference image. In addition to the FMRI workflow, the challenge specified an initial set of provenance-related queries. Sixteen team analysis contribution to the provenance challenges. They introduce a classification of the different approaches of provenance systems. which are Characteristics of Provenance Systems E xecution Environment ? Representation Technology ? Query Language ? Research Emphasis ? Challenge Implementation Properties of Provenance Representation ? Includes Workflow Representation ? Data Derivation vs. Causal Flow of Events ? Annotations ? Abstraction mechanisms. An approach to model provenance on a more detailed level is the Open Provenance Model. [1] The Open Provenance Model represents provenance by graphs. The nodes in these graph represent the artifacts, processes, and agents. The edges are directed and they have a predefined semantic depending on the type of the adjacent nodes.Provenance research in the context of databases or in the context of workflows usually focuses on the creation of data items. To represent the provenance of data from the Web we need an additional dimension. Provenance information of Web data must comprise the aspect of publishing and accessing data on the Web. In research paper [21] authors have proposed a quality assessment methodology that mea sures information quality in four quadrants: soundness, dependability, usefulness, and usable information. Each quadrant comprises several information quality criteria.For example security and timeliness would be used to measure the dependability of an information. Provenance information are used for various purposes. Purposes like the estimation of data quality, the tracing of audit trails of data, the repetition of data derivations, the determination of liabilities, and the discovery of data. In the provenance model mentioned in [12],[1] they broadly distinguish three types of provenance elements. The provenance elements represent pieces of provenance information; such an element can be the creator of a specific data item in which case this element is an instance of the data creator type.The three types of provenance elements used in the model are actors, executions, and artifacts. An actor generally performs the execution of an action or a process which in most cases yields an ar tifact such as a specific dataset. An execution may include the use of artifacts which, in turn, might be the result of another execution. The central element type in data creation is the data creation execution. Data creations represent the execution of actions or processes that create new data items. Thus, in the provenance graph of a specific data item actual data creations are represented by provenance elements of the data creation type.All data creations have a creation time and use a method. Data creators, created data item, source data, and creation guidelines are the provenance elements that are part of a data creation. Data creators are actors that perform the data creation. This model has the ability to distinguish human and non-human data creators. Human data creators, called data creating entities, are persons, groups, organizations, etc. Non-human creators are data creating devices such as sensors and data creating services such as software agents, reasoners, query engi nes, or workflow engines.Source data is often used by data creator to create new data. Examples for source data are the content of a document used for machine learning, the entries in a database used to answer a query, and the statements in a knowledge base used to entail a new statement. Other artifacts that may be used in a data creation are the creation guidelines, it is used for guiding the execution of the data creation. Examples for creation guidelines are mapping definitions, transformation rules, database queries and entailment rules. The data access centers on data access executions.Data accessors perform data access executions to retrieve data items contained in documents from a provider on the Web. To enable a detailed representation of providers the model describe in paper[12] distinguishes data providing services that process data access requests and send the documents over the Web, data publishers who use data providing services to publish their data, and service provi ders who operate data providing services. Furthermore, the model represents the execution of integrity verifications of artifacts and the results.A system that uses Web data must access this data from a provider on the Web. Information about this process and about the providers is important for a representation of provenance that aims to support the assessment of data qualities. Data published on the Web is embedded in a host artifact, usually a document. Following the terminology of the W3C Technical Architecture Group we call this artifact an information resource. Each information resource has a type, e. g. , it is an RDF document or an HTML document. The data accessor, retrieves information resources from a provider.Their provenance model allows a detailed representation of providers by distinguishing data providing services, data publishers, and service providers. [1] In paper [12] a provenance graph has represented as a tuple (PE; R; type; attr) where, ? PE denotes the set of p rovenance elements in the graph, ? R [pic] PE X PE X RN denotes the labeled edges in the graph where RN is the set of relationship names as introduced by our provenance model, ? type : PE ; ? (T) is a mapping that associates each provenance element with its types where T is the set of element types as introduced by our provenance model attr : PE ; ? (A X V ) is a mapping that associates each provenance element with additional properties represented by attribute-value pairs where A is a set of available attributes and V is a set of values They didn’t specify the sets A and V any further because the available possible values,attributes, and the meaning of these depend on the use case. However, they introduced an abbreviated notation to refer to the target of an edge in a provenance graph: if (p? 1; p? 2; rn) [pic] R we write p? 1 [pic] = p? 2. 3. 4 Using Data Provenance for Quality AssessmentFor assessment of the quality of data, we need to find out the information types that c an be used for evaluating and a methodology for calculating quality attributes. In this research paper they have introduce a provenance model custom-made to the needs for tracing and tracking provenance information about Web data. This model describes about the creation of a data item and the provenance information about who made the data to be accessed through the Web. Most of the existing approaches for information quality assessment are based on the information provided by users.Quantitative approach described in the research paper [12] follows three steps: ? Collecting the quality attributes which are needed for provenance information ? Making decision on the influence of these attributes on the assessment ? Application of a function to compute the quality In this paper author has described information quality as a combined value of multiple quality attributes, such as accuracy, completeness, believability, and timeliness. The assessment method described in the paper [12] follow s three steps. Those are, 1. Generate a provenance graph for the data item; . Annotate the provenance graph with impact values; 3. Calculate an IQ-score for

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.